CS2-AA4X

Windows GUI Context Extraction

a Major Qualifying Project Report submitted to the Faculty of the

WORCESTER POLYTECHNIC INSTITUTE

in partial fulfillment of the requirements for the Degree of Bachelor of Science by

______Austin T. Rose

March 21, 2017

______Professor Craig A. Shue 1

1. ABSTRACT

In any computer system an intelligent policy for allowing or disallowing low-level actions is critical to security. Such low-level actions may include opening up new connections to the Internet, installing new drivers, or executing downloaded files. In determining whether to allow a given action, it is necessary to collect some context regarding how the action was triggered. Is this connection to an address we have never seen before? Where was this file downloaded from? An important part of that context is whether or not a human user actu- ally requested the action in some way, through their interactions with the Graphical User Interface (GUI). That is an abstract question, which is not as straightforward to answer as others. We seek to determine a user's high-level intentions by extracting and relating properties of the GUI as a user interacts with it.

We have created a system that automatically generates information about user activity in a programmatic way, monitoring a Windows computer in real time with a low perfor- mance overhead. The information generated is well structured for consumption by security tools, and to inform policy. Deployed across an organization, this system has the potential to effectively white-list broad categories of user work-flows, in order to easily alert about any concerning anomalous behavior which would warrant further investigation.

2. INTRODUCTION

Most low-level system action, especially network activity, is directly triggered by some user interaction with the GUI. A user's interaction with a program is a sort of endorsement for its actions. That interaction by no means guarantees the security of an application, but if an unrecognized application on some computer starts making connections to the Internet without any user interaction, it is ostensibly more suspicious than a program which only makes connections in response to the local user clicking buttons in its interface.

Certainly there are many exceptions to this heuristic. To give an obvious example, modern applications often have the ability to autonomously check for updates, regardless of whether a user explicitly requested it. Nonetheless, from the perspective of a security- operator, any information providing context to a computer's actions is valuable. If we can whitelist low-level actions that result from legitimate user interaction, then actions not trig- gered by a user are easier to identify and scrutinize. Was it a periodic and expected update check or the beacon of some malware which otherwise stays hidden from the user?

In its ideal form, this security system would force malware to masquerade as a Trojan Horse with a functional GUI in order to be successful. This is a much more difficult attack vector to apply than attacks like installing a hidden remote access tool by exploiting a PDF 2 reader bug. If a piece of malware does not expose any sort of GUI to the user, then any activity it makes becomes suspicious. This is an advantage for security operators, given that advanced malware often stays under the radar by not interacting with the local user at all. By building up a white-list of work-flows that should result in network activity, we can very effectively block and detect entire categories of malicious network activity.

We are not exploring an anti-virus approach, since it offers no solutions to deal with a suspected compromised computer. This is a detection and prevention solution detecting suspicious activity, and flagging it as such to allow a higher-level controller to block it. This improves on many existing approaches, because it does not rely on any prior knowledge about the malware. Existing approaches rely on blacklisted executable signatures, domain names, IP addresses, or other indicators. By monitoring and interpreting the core libraries used to facilitate GUIs, we can automatically gain useful insight about any application. Whereas a higher-level monitoring tool could require special tailoring for each unique user application, this system can effectively monitor both present and future applications. In this way, we are able to extract important security context at a minimal cost.

The rest of this paper is organized as follows. Section 3 contrasts this project with pre- vious work that is broadly similar, in terms of its role as network security tool, as well as some work that is more specifically similar, in terms of its niche of leveraging information in the GUI for security context. Section 4 discusses some of the background concepts founda- tional to understanding our approach, and the technology involved. Section 5 describes the specifics of our implementation, and the rationale behind different design decisions. Section 6 presents the results of evaluating the tool in a few categories of effectiveness. Section 7 explores our after-the-fact reflections on the project, and describes some improvements that could be made in future work. Section 8 offers our concluding thoughts.

3. RELATED WORK

Here we discuss how this project fits in with existing research and projects in the field of computer security. We look at similar works and how our approach contrasts with them. We also consider dissimilar tools, which may be improved through our efforts.

3.1. Intrusion Detection Systems

This system we have built is understood in the security community as one form of Intrusion Detection System (IDS) that detects actions that attempt to compromise the confidentiality, 3 integrity, or availability of a resource. More specifically, we would describe our system as a host-based IDS, since it involves some sort of agent running on each host in the network.

Bro [Paxson 1999] is one particularly well known IDS, which works entirely at the network level. Bro does not secure networks -agically; it requires instruction regarding the security policy of the site it is meant to protect. In fact, one of the key contributions of Bro was to define a precise and understandable language for unambiguously defining a network's security policy. Another widely used IDS, Snort [Roesch 1999], fills a niche as a free, lightweight, and cross-platform IDS. But, like Bro and like most IDS solutions, Snort relies on predefined network patterns to look for, which is essentially a site security policy. However, the question itself of what should the specific security policy be in each case is often quite difficult to answer. We seek to supplement existing systems which require clear and predefined policy. By operating below the network level and directly with users, our system can provide deep insights to policy decisions that may otherwise effectively have to be made in isolation.

3.2. Firewalls

Firewalls are one of the most basic lines of defense for a network. Their classic operation is to block unsolicited incoming network traffic so that only specifically requested data is allowed to reach the host. Firewalls range between host-based or centralized, and ‘dumb’ or application-aware. Typically, though, a firewall will have to be host-based to be application- aware. In a host-based and application-aware firewall, protection may be extended by deny- ing any outgoing connections from applications, unless they have been specifically white- listed. Little Snitch1 is a popular Mac OS X product which does exactly that. The built-in firewall for Windows supports similar features, but its default configuration is limited to blocking incoming requests. Comodo Firewall2 is a highly-regarded host-based Windows firewall, for those that need more advanced features and precise features than the built-in Windows Firewall. Centralized firewalls can be useful to network operators by offering sim- ilar protections as host-based firewall, while additionally providing visibility into activity on the network as a whole.

Our tool, which must be installed on each protected host, is not directly involved with monitoring or preventing network activity. However, the information that it generates could be used to determined connection allowances like a host-based firewall needs to, while addi-

1https://www.obdev.at/products/littlesnitch/index.html 2https://www.comodo.com/home/internet-security/firewall.php 4 tionally supplying contextual information that would supplement the broader understand- ing gained from a centralized firewall

3.3. GUI Context Systems

A number of approaches have been implemented to understand a user's high-level actions. BINDER [Cui et al. 2005] is one recent example that takes a similar approach to our system by correlating network connections with user activities (keystrokes and clicks). Our work proposes to categorize user intent more precisely than BINDER could, by not only detect- ing input events, but by further correlating those events with the text present in the GUI components being used. Furthermore, while the BINDER work focuses heavily on deter- mining indicators of malicious activity, our work seeks only to produce high quality data for whatever policy may use it.

Another approach, coming from Virginia Tech [Zhang et al. 2012] is focused specifically on user actions in web browsers, which represent a disproportionately wide attack surface, as a casual user's main tool for interacting with the Internet. With web browsers, our ap- proach has limited success. This is mainly due to the fact that modern web browsers do not use many, if any, standard Windows GUI components that are easy to detect and categorize. This is understandable, given the highly dynamic nature of rendering a web page. Nor do they use many of the same Window Messages types and conventions found in other desk- top applications. But our goals are more broad: we want to automatically support all GUI applications to some extent, rather than targeting any single one in depth.

One previous project [Luli and Muchene 2012] from our own institution had exactly the same goals in mind, but took a different approach for extracting the text. They used screen recording and optical character recognition (OCR) to deduce what text was present on the screen at any given time. Our approach improves on this effort by extracting relevant text without the significant real-time processing power required to both record and analyze frames, constantly. Given that this system would need to be installed on every host in a protected network, it is important to keep resource requirements to a minimum.

Another previous project [Kane and Li 2014] from our own institution has succeeded in exactly the same ways our project has. However, their system was built for Linux-based hosts. While the goals and benefits of such a system are identical, regardless of the host operating system, the implementations are not at all similar. Thus, our work expands on the space by functioning on modern Windows hosts. 5

3.4. Low-Level Detection

Blade [Lu et al. 2010] implements a kernel extension for preventing drive-by malware in- fections in which an un-patched web browser exploit allows a maliciously-crafted website to surreptitiously force the host to download a file without the user's knowledge. They accom- plish this by asserting that any executable files coming from the browser receive explicit user consent. They parallel our driving philosophy that user interaction is an endorsement of a program, and been very successful in blocking malware infections on that basis.

One approach [Hofmeyr et al. 1998] coming from the University of New Mexico has had success in identifying malicious processes based on patterns in short sequences of system calls executed. This is a very different approach than ours. But, they share a common goal of finding reliable indicators that a process is malicious. In their case, the data being con- sidered (sequences of system calls) is not inherently very descriptive of the bigger picture. However, they have augmented its usefulness by identifying patterns correlated with un- desirable activity. In our case, we make no attempt to suggest what patterns in our data should be concerning. Rather, we attempt to produce data for which a human might have a much easier time identifying such concerning patterns.

3.5. Modeling GUI State

We believe that one way to improve the potential usefulness of our tool could be to define a more formal model for the data it produces. Certainly our data cannot perfectly describe what the user is doing and seeing at any time; such a goal is not efficiently obtainable. However, it may be possible to agree upon a state-machine-like model that encompasses the actions and content which we do understand very well with our data. This would provide any entity consuming our data with an explicit interface for what we can tell them about the user’s actions. Coming up with a formal model for graphical interfaces is not simple, but research in Goal Oriented Modeling Systems (GOMS) [John and Kieras 1996] proposes some foundational ideas that could lead us in a successful direction.

Often times, one atomic log from our tool will not be quite enough to accurately discern the user’s high-level intent. Rather, many simple user actions can be understood by the pattern of several sequential logs. While humans are not excellent at visually identifying such patterns in large amounts of data, there are many techniques for doing so reliably. One project [Liu et al. 2017] from University of California, Davis and Adobe Research has described success with various such techniques, applied to their own data. Markov chain models, for example, are a well-tested approach for making predictions based on patterns in previously-observed data. 6

4. BACKGROUND

This section defines and describes some of the concepts foundational to understanding how our system works, and how GUIs fit in to the larger overall picture of the Windows operating system.

4.1. Graphical User Interfaces (GUIs)

Programs on a computer do not necessarily need to have a GUI. Many services on a com- puter have no visual interface whatsoever. But unless a program was written specifically for use by developers, it often presents a GUI. If the user of a computer is not a developer of some sort, then odds are all they ever do with a computer is interact with different GUIs.

Malware is often not concerned with GUIs. At least, it is not interested in presenting a GUI of its own. More likely than not, malware wants to operate “under the radar,” which does not include interacting with user through a GUI.

The point of a GUI is for a regular (non-technical) person to understand it. Quite a lot of processing power and memory can be required to render a GUI, and make it smoothly interactive. But, ultimately, information is conveyed to users with text. At some point, along the way, the text has to be conveyed to the application as variables in some code, often as simple string data. Finding ways to hook into that part of the GUI life-cycle is a major theme of this work.

4.2. Windows GUI Fundamentals

Unsurprisingly GUI applications are made up of components that we call ‘windows.’ There is one important distinction to make about what the operating system formally considers to be a window. Contrary to how they are referred to commonly, the components making up a single application window, like buttons, text input fields, or static text, are considered to be windows by the operating system. So GUI control components have the same core set of properties and interfaces as any other window. Controls for one application likely share the same parent window. There is a single tree structure relating window parents and children that covers every existing window on the computer.

A list of the most basic GUI controls (each of which is a window) can be found in Mi- crosoft's official documentation3. These are controls like, buttons, checkboxes, lists, and text fields that users likely interact with every day.

3https://msdn.microsoft.com/en-us/library/windows/desktop/bb773173(v=vs.85).aspx 7

4.2.1. Windows Messages.

Windows each have a message queue, provided to them by the operating system. Compo- nents can communicate information to each other asynchronously by posting messages to each other's message queues. For example, when a button is clicked, it posts a WM COMMAND type message to its parent window, to notify it. The PostMessage function is used to post a new message to some components queue. Components also receive information from the operating system through their message queues. For example, when the user left-clicks the mouse inside of a particular component, the operating system posts a WM LBUTTONDOWN message to that component's message queue. It is important that any GUI component be constantly popping messages off its queue and processing them. If messages are not being removed and processed quickly enough, the application is considered to be unresponsive, and the operating system warns the user that the window may no longer be functioning properly.

A single message consists of four pieces of data. It contains a message type parameter, an identifier for the window that should receive the message, and two fields which vary in purpose depending on the message type (known as, for historical reasons, the ‘word pa- rameter’ and the ‘long parameter’). When an application calls PostMessage to deliver a message to another window, the function call returns immediately and with no interesting information, so as not to block the sender. The operating system then takes the content of the message and adds it to the appropriate queue. It is important to note that none of those four components necessarily identifies the sender of a message. Thus, when an application pops a message from its queue, it does not necessarily know what window sent the message. Sometimes this is important to know for properly handling a message. In those cases, the sender often uses the long parameter of the message to store its own window identifier, so that the recipient can use that parameter to look up the sender.

It is possible for components to interact synchronously by using the SendMessage func- tion. This is much like PostMessage but instead it directly calls the recipient main window function, that would have processed the message, skipping the queue entirely, and returning whatever value the main window function returns.

Typically, posted messages include information about keyboard or mouse input relating to the window. Posted messages are also the typical choice for windows in different processes to communicate. Thus, the message queue is a very efficient option for inter-process com- munication. Sent messages, on the other hand, are rarely initiated by user applications, because making blocking calls in a GUI application is generally undesirable. Most often, sent messages come directly from the operating system to control GUI components. 8

While there are many types of messages, there is no exhaustive list. Some message type are actually ambiguously overlapping with others, and knowing how to interpret the message depends on context or how the application happened to be written. There is a core set of windows messages which are well defined, and unambiguous. Most of these are described in the official Microsoft documentation4, organized by what GUI components they relate to.

Applications can register their own messages, which the operating system will recog- nize and which other applications may use. When an application-defined message type is intercepted, there is no deterministic way to automatically understand its contents and purpose.

4.2.2. Relation to Project Goals.

For this project we are most interested in extracting text from the GUI. So, we look for Windows message types which are specifically concerned with string data. To give a few examples, the WM CHAR message is posted when the component receives a simple keystroke. The WM MENUSELECT message is sent to the parent of a window-menu (like a right-click context menu) when an item is selected. The WM SETTEXT message is sent by the operating system to update the ‘main’ text for a component. In this case, the meaning of main text varies for different components. For a top level application window, WM SETTEXT is likely to update the window title. For a button window, the text displayed on the button would change.

Additionally, although a particular Windows message type may not have any string data as a parameter, it is regularly useful to look at any string data associated with the recipient of a message. For example, the BM CLICK message is sent to a button when the user clicks on it. A BM CLICK message does not itself contain any string data. However, relevant text can be found by simply querying the clicked button for its main text.

Each GUI component can be queried by a GetWindowText function that returns the main text of a window. As we mentioned, this is broadly defined. But GetWindowText re- turns whatever was set by a WM SETTEXT message. Components are built in natural tree- like hierarchies, so it is possible to query for the parent, siblings, or children of any given window and by extension, it is possible to query for the main text of all those related win- dows. This allows for the extraction of very useful context regarding the GUI structure. For example, knowing that a button with the text OK was clicked is not enough to accurately

4https://msdn.microsoft.com/en-us/library/ms644927(v=VS.85).aspx#system defined \ 9 guess what a user is doing. But knowing that the parent of that button was a window with the title Print Document suddenly gives a much better picture of the user's intentions.

4.2.3. Debugging Tools.

With so many messages to consider, and no prior knowledge regarding which ones may be more relevant for us, there was a lot of trial and error discovery. In this effort, it was necessary to find quality tools for investigating and debugging.

The most helpful debugging tool we found was Microsoft's own ‘Spy++’5 which is in- cluded alongside recent versions of Visual Studio.

Directly quoting the documentation [Microsoft 2015] for the tool, Spy++ can perform the following tasks:

(1) Display a graphical tree of relationships among system objects. These include processes, threads, and windows. (2) Search for specified windows, threads, processes, or messages. (3) View the properties of selected windows, threads, processes, or messages. (4) Select a window, thread, process, or message directly in the view. (5) Use the Finder Tool to select a window by mouse pointer positioning. (6) Set message options by using complex message log selection parameters.

Fig. 1. Spy++ Window Tree View

5https://msdn.microsoft.com/en-us/library/dd460756.aspx 10

Figure 1 contains an example of the tree diagram shown by Spy++ for a single File Explorer window. Looking at the tree diagrams for different windows helped gave us a sense of just how many sub-components a seemingly simple window can have and what properties of windows we might be interested in.

Fig. 2. Spy++ Window Messages

Figure 2 contains an example of Spy++ intercepting messages for a particular window. It can be configured to log only sent and/or posted message, and to log messages related to parents and/or children of the window as well. The ability to reduce background noise by only focusing on exactly one window, and considering its messages, greatly sped up the discovery process. Additionally, Spy++ displays the two custom data fields attached to each message and, if possible, interprets them for ease of reading.

Figure 3 contains an example of the message filtering options offered by Spy++. This feature was by far the most useful to us. There are thousands of possible message types and windows consume thousands of messages every second. Without being able to filter out broad categories of messages which we were not interested in monitoring, it would have been very difficult to find potentially relevant messages in the logs.

4.3. Windows Function Hooking

It is possible to ‘hook’ a function, so that whenever some code attempts to call that function, some other manually specified function is surreptitiously called instead. There are a variety of ways to accomplish this. One commonly seen approach for accomplishing this involves overwriting the first x86 instruction for the target function with a JMP (unconditional jump) 11

Fig. 3. Spy++ Window Message Filtering Options pointed at the desired callback function. Figure 4 illustrates this process, known as ‘creating a trampoline’.

Fig. 4. Trampoline Function Hook

Microsoft has an official tool for hooking Windows functions called Detours6. There is a free version of the software, but it only supports 32 bit environments. For full 64 bit support, the professional version of Detours costs several thousand dollars. There are free and open-source alternatives hooking functionality similar to Detours. ‘MinHook’7 is one of those alternatives a C/C++ library with full support for 32 and 64 bit environments.

Naturally, to install a hook over some function, one must know the address of that function. This is simple for a statically-defined function, but more complicated for a class

6https://www.microsoft.com/en-us/research/project/detours/ 7https://www.codeproject.com/Articles/44326/MinHook-The-Minimalistic-x-x-API-Hooking-Libra 12 member function. Each instance of a class seems to have its own ‘version’ or ‘copy’ of the class member function, and it would be impractical to hook each one of them separately. In reality, there is a single static ‘unbound’ version of the function, which takes an hidden extra parameter: a single instance of the class. Each instance, when told to call a member function, calls the unbound static version with itself as the extra argument. By finding the address of the unbound version, the function can be hooked in a way that affects all instances of the class. The addresses for the unbound versions of member functions are contained in the ‘virtual table’ for a class instance. All instances point to the same virtual table in memory. In fact, objects in C++ are literally virtual table pointer pointers. Thus, if one can create an instance of some class, then one can access its virtual table, find the addresses of unbound member functions, and use a library like MinHook to install a hook for them.

Function hooking is a potential approach for monitoring user interaction with GUI components. For example, in a GUI program one might install a hook for the CreateWindow function. Then, in a custom overriding function, log information about the window being created before returning control to the original function. Plenty of useful context about the GUI would be possible to extract by hooking various relevant functions. However, installing function-overriding hooks is not enough. These hooks work by overwriting function memory addresses but those memory addresses are only relevant to the single process containing them. In order to collect GUI context throughout the entire system, one would need a means to make other processes run the code.

4.4. Window DLL Injection

The following subsections describe three methods that make it possible to run custom code in remote processes that are already running and that someone else created. More com- monly this is known as DLL injection, since the code being injected must reside in a Dy- namically Linked Library (DLL). Combined with the methods described in the previous section, DLL injection makes it possible to hook arbitrary API functions across all processes in a Windows machine.

DLL injection is frequently used for both malware, and anti-malware solutions, due to its powerful nature. It is worth noting that the authors of the processes being manipulated often do not want any foreign code executing in them. Understandably, third parties inter- fering with a process do not understand how it works, and may cause strange problems. For the most part, it is out of the author's control to prevent such manipulation. Nevertheless, they do often try to implement safeguards, especially if their application is business-critical. This makes DLL injection a risky, and sometimes non-deterministic process. For example, 13 the ‘Create Remote Thread’ approach described in a following section will cause current ver- sions of the Google Chrome web browser to crash immediately, which is likely an intentional security safeguard put in place by Google.

4.4.1. “App Init DLLs” Method.

There is a combination of registry keys one can set to enable the APPINIT DLLs feature. With this, one can specify the path to a DLL that one wants all programs to load. Programs do not all automatically load the DLL, but if the DLL is specified in the APPINIT DLLs reg- istry key, then it is automatically loaded if the process loads the user32.dll library. Nearly every process on Windows, especially those with a GUI, uses the user32.dll library so one can effectively make all relevant programs load some manually specified library.

When a process loads a DLL for the first time, it automatically executes the DllMain function meant for initialization of library components before they are needed. This is a simple way to automatically inject code into nearly every process, right as they are starting. Due to its simplicity, the APPINIT DLLs approach has been used regularly by both malware, and anti-malware systems. One must take special care when writing the DLL since almost every process will load and run it, even as the computer is booting up for the first time. One mistake in the DLL is an easy way to stop the computer from being able to boot properly. Further, one is are limited in what libraries can be used from within the DllMain method, because at that point the process is only guaranteed to have loaded the kernel32.dll and user32.dll libraries. Also, 32 bit programs cannot load 64 bit DLLs, or vice-verse, so to cover both one would need to compile two DLLs.

This approach is very convenient, and easy to implement. For years, it has been the go-to method for DLL injection for both malware, and anti-malware solutions. However, re- cently, due to its high potential for abuse, Microsoft has taken steps to prevent its success. In modern versions of Windows, if the ‘Secure Boot’ hardware verification feature is enabled, then the APPINIT DLLs infrastructure is forcefully disabled. Many developers have found this to be strange, given that Secure Boot is intended to be a hardware verification mecha- nism, whereas APPINIT DLLs is a purely software mechanism. Nevertheless, Secure Boot is an important security feature that we would recommend be enabled. Thus, to support these systems, we will need a different method for DLL injection.

4.4.2. “Create Remote Thread” Method.

The core Windows kernel library CreateRemoteThread function allows one to create a thread in a remote process. This thread can run whatever function we like, as long as we can supply a pointer to that function. Here, one is likely to run into the complications of 14 working with two different virtual address spaces. If we supply the new thread a function address as it is seen in our own address space, that pointer will be meaningless to the thread in the remote address space, and the process will likely encounter a segmentation fault.

Furthermore, any data needed by the thread function will need to exist in the tar- get address space. If our goal is to make a remote process load a DLL, then data for the path of that DLL will have to exist somewhere in the target address space. All of the aforementioned hurdles can be overcome with a combination of the VirtualAllocEx and WriteProcessMemory functions, which are also exposed by the Windows kernel library. Together, they allow one to directly copy arbitrary data into a freshly allocated block of memory in a remote address space. With that, we have all of the pieces necessary to force a remote process to execute our code.

This method is fairly invasive, and not explicitly documented or supported by Microsoft. Of all the methods we tested, this was the least reliable: it often caused target applications to crash with no obvious reason as to why.

4.4.3. “Set Windows Hook” Method.

The SetWindowsHookEx function allows one to easily hook certain events in the message- handling process of programs. One is somewhat limited to setting certain predefined types of hook procedure8, but the variety in those types allow for full examination of the messages to and from any window. This can be accomplished by setting two of the hook types offered: the WH GETMESSAGE and WH CALLWNDPROC hook types. The former type will trigger one’s function any time the GetMessage function is called to pop a message from a message queue. The GetMessage hook runs first, and may even modify the message before the calling application sees it. The latter type will trigger any time the SendMessage function is called. The CallWndProc hook runs before the destination window's main procedure handles the message, and may modify that message.

When one calls SetWindowsHookEx, one must also specify whether the hook is to be local to a particular process, or global. Similarly, one can specify a particular thread, or set it globally. If one is setting a hook that can be triggered by any process other than ones’ own, then the hook procedure specified must exist in a DLL, not just in ones own processes address space. In this way, the SetWindowsHookEx function can initiate DLL injection. Any remote process which triggers the defined hook have defined will automatically load one’s DLL, thereby executing whatever code came in the DllMain method.

8https://msdn.microsoft.com/en-us/library/windows/desktop/ms644990(v=vs.85).aspx 15

With both WH GETMESSAGE and WH CALLWNDPROC hooks set, one can intercept and op- tionally modify every single message, sent or posted to any window at all. Considering this ubiquity, efficiency in one’s hook procedures is critical. The computer’s responsiveness can quickly grind to a halt if there is slow-to-run code in the hook function, because that hook is likely being called thousands of time per second by each and every window, in the case of a global hook.

There is a key difference between this approach to DLL injection and, say, the APPINIT DLLs method. Namely, that the DLL will be injected after the remote applica- tion is initalized and running to the point where it triggers a message hook, which is much different than the APPINIT DLLs case, where the DLL is loaded as the application is first initialized. If the DLL performs any significant initialization in the DllMain method, it may have noticeable ramifications for the performance of the remote process the first time it triggers the hook.

Additionally, this approach will not inject the DLL into any processes that do not trigger the hook. So, processes that do not have a window, and do not involve themselves with any message passing, will not be hooked.

4.4.4. Comparison of Methods.

Table I contains summarized pros and cons of the three aforementioned DLL-injection methods.

Table I. DLL Injection Methods

Pros Cons Blocked by Windows Secure Boot DLL loaded early, during process launch. App Init DLLs feature. Well documented and supported by Microsoft.

Low level, precise control. Invasive and unreliable. Create Remote Thread Difficult to debug. DLL loaded late, after process Well documented and supported by Microsoft. triggers a message hook. Set Windows Hook Non-window processes not affected.

4.5. Virtual Address Space Limitations

In Section 4.4.2 we mentioned the issue of DLL-injected processes having different virtual address spaces than the ‘injector’ process. This is a reality that any DLL approach will have to deal with. 16

We would like a single dedicated ‘collector’ process to do as much of the processing of intercepted messages as possible, given the performance concerns inside of a hook function. However, if a SetWindowsHookEx hook procedure intercepts a message whose parameters are pointers to larger structs, then it cannot just forward the message as-is to another process. The hook procedure itself is executing in the remote processes’ address space, so any pointers sent by value to a collector process will not point to the same data inside the collector virtual address space. Of course, the hook procedure can look at the contents of the struct pointer itself, and send relevant information by value to the collector, but it can only send a very limited amount of data with each message.

Many message types use one parameter to pass an HWND type parameter, which is a spe- cial Windows type for window identifiers. Fortunately these can be shared by value between address spaces without issue. Many windows messages use the long parameter LPARAM to communicate the HWND of the window that sent the message since there is no other way for the recipient to know. Every running window has a single, unique HWND. However, once a given window is destroyed its HWND identifier may be used for another window.

One other option for sharing data between DLL-injected processes and a collector, other than funneling data by value through multiple messages, is to use a shared data segment in the DLL. All injected processes, and the collector, would have access to this segment. Coordinating the data sharing requires some form of mutual exclusion implementation, because each process has concurrent access to the shared data segment. The need for mutual exclusion would potentially require the hook procedure to block other processes, and the performance impact of blocking in a hook procedure is severe.

In our implementation, as discussed in the following section, we were able to leverage the data parameters in window messages to share a limited amount of data between pro- cesses, thereby avoiding use of the shared data segment or any other IPC mechanism. For our needs, this was sufficient. For higher volumes of data, however, we expect other methods would become more efficient.

5. IMPLEMENTATION

With due consideration for the options available for DLL injection, we decided upon the “Set Windows Hook” approach detailed in Section 4.4.3. While our goals could likely be satisfied by any of the aforementioned approaches, this one has the benefits of being built- in to Windows. It is fairly well documented and supported by Microsoft and requires no third party tools of any kind. 17

We quickly realized that, although we could execute whatever code we pleased through the DLL injection, we ultimately would not need to. The SetWindowsHookEx allows one to take complete control over sent and posted messages, without requiring one to manually hook functions by overwriting their addresses. For our purposes, we did not even desire the ability to modify messages in transit. Simply being able to monitor messages is enough, because we want to describe what a user is doing, not interfere. Every interaction a user has with GUIs is conveyed, in some form or another, by messaged sent or posted. With visibility on every message achieved, our goal shifted to determining which of the 1000+ messages were important to log and how to properly interpret the data contained in those messages.

One caveat to this scheme is that, by not manually applying hooks, one is trusting that the operating system will faithfully call the hook procedures in every case it is supposed to. Furthermore, one does not know what other applications may call SetWindowsHookEx before or after. Although, according to Microsoft's documentation on the matter, the most recently installed hooks are called first, granting some control. We decided that for the purposes for this project, and paper, the necessary trust was a perfectly acceptable caveat.

5.1. Requirements of the Hook Procedure

When it came to actually writing the GetMessage and CallWndProc hook procedures, we found that the goal was to evenly balance two priorities:

(1) Exiting as quickly as possible, so as to avoid adding noticeable delay to the application we are blocking. (2) Collecting enough relevant information about the intercepted message to form an accu- rate log of what actions the user made to cause it.

Doing the post-processing and logging inside the hook procedure itself was unaccept- able because that is too much work to force upon other applications every time a relevant message comes along. Additionally, the logistics of having many processes logging string data to the same source at indeterminate times are not simple. Instead, the hook procedure needs to quickly convey, to one central source, any relevant and already-in-memory data about the message, then promptly exit.

Somewhat amusingly, the quickest way to get that data from the hooked process over to our own collection process is by using windows message queues. If our collector process creates some GUI window (whether or not that window is visible to the user), then it has a message queue assigned to it by the operating system. Then, the hook procedure can post messages to that queue. PostMessage is quick and non-blocking. Once the hook procedure 18 posts messages to our queue, it can safely exit and defer control back to the intended appli- cation code, while in the background the operating system takes care of putting the data in the right place.

There is one concern about the viability of this approach: If we are hooking calls for retrieving posted messages, and calling PostMessage inside of the hook procedure, then there seems to be potential to accidentally recurse ad infinitum. Indeed, this is a mistake that will promptly crash the computer if executed (as we verified, experimentally). However it is not difficult to avoid. We register a new type of windows message with the operating system, which returns to us a message type ID that is not associated with any other exist- ing message types. Arbitrarily, we refer to the new message type as UWM COLLECTOR. That custom message type is used to post data about intercepted messages, from the hooked ap- plications to the central collector. When the hook procedure is launched due to the central collector calling GetMessage, it immediately checks if the intercepted message is of type UWM COLLECTOR, and exits immediately if so. This prevents a potential infinite loop.

The custom message type solves one problem, but comes with another: we can send a very limited amount of data along with a message. When one of the hook procedures is en- tered, it has four pieces of information about an intercepted message. There is the message’s identifier, the identifier of the window to receive the message, and the the two parameters whose purpose vary depending on the message type: the ‘word parameter’ and ‘long parame- ter’. All four pieces of information are important for the collector to accurately interpret the message. However, when we have the hook procedure send the collector a UWM COLLECTOR type message to convey that information, we only have two spots to include data: the word and long parameters of our custom message. The four pieces of information we need to send will never fit into a single message. If we post two messages from the hook procedure to the collector, the collector may receive messages from other hooked applications in between the two. Since applications never explicitly know who posted a given message to them, the collector would have no automatic way to know which pair of posted messages came from the same source.

We resolve this conundrum with a novel scheme. One of the four pieces of data that the hook procedure needs to send is the identifier of the window which is supposed to receive the intercepted message. Window identifiers are globally unique, and are safe to send by value between virtual address spaces. So, we have the hook procedure send three UWM COLLECTOR messages to the collector, each message has two fields for us to fill in, and in the first field (the word parameter) we always put that globally unique recipient-window identifier. Then, in the three long parameters for the three custom messages, we put the other three pieces of data which the collector needs. 19

As the collector receives UWM COLLECTOR messages, it maintains a mapping between the window identifier found in the word parameter, and a list of each long parameter value received. Once that list contains three entries for a single window identifier, the collector knows that it has received all of the information needed to describe a single intercepted message, and can clear that entry from the mapping before post-processing and logging about it. Now, even if the collector is receiving messages interleaved from several hooked applications at once, it can accurately reassemble the data.

5.2. Process Summary

Fig. 5. Summarized Collector Implementation

Figure 5 illustrates each step of our tools approach to intercepting and logging window messages, with brief descriptions as follows:

(1) The Collector loads our DLL into memory. (a) Calls a DLL-exported function to store own HWND window identifier in DLL shared memory segment. (b) Calls a DLL-exported function to store an two arrays of UINT window message type identifiers, for the hook procedures to use as filters. (c) Calls a DLL-exported function to call SetWindowsHookEx, registering the hook pro- cedures with the OS. (2) A remote window process triggers either the GETMESSAGE or the CALLWNDPROC hook. 20

(3) The OS loads our DLL into the remote processes address space. (4) The remote process executes the hook procedure in our DLL, providing it the message parameters before they are passed through. (5) If the message type is found in the shared-memory filter array, then its parameters are split into three new messages and posted to the collector. (6) Collector interprets the data, and queries the receiving window for data such as its process ID, executable path, and main window text. (7) Collector logs event to local socket or file.

6. RESULTS

We have evaluated the efficacy and performance of this tool in three ways. The first is a subjective measurement, attempting to qualify the high-level understandability of the logs we produce. Primarily, we would like to ensure that our output makes it obvious to a reader what the user was doing. The second measurement compares actual screen shots of applica- tion windows to the hierarchy inferred and recorded by the tool. This supplements the first evaluation by considering not only what actions the user was taking, but also the struc- ture of the GUI they are using. The third is a numerical measurement of the performance overhead incurred by computer in order to run the tool. This is important because perfor- mance oversights can easily bog the entire system down, given that it is intercepting critical messages from the operating system.

6.1. High-level Actions

For this part of the evaluation, we look at the different types of log the tool can produce. In most cases, the logs are clearly worded, and there is not much else to say about them. However, we faced ambiguities interpreting certain Windows messages, which are reflected in the tool's output. In these cases, we describe as best we can how we have seen the mes- sages used. Each of the following subsections corresponds to a particular Windows message type we have chosen to intercept, describes how the message comes about, and what our logs look like.

6.1.1. WM LBUTTONDOWN.

This message is posted to a window's message queue when the user left clicks the mouse inside of that window. The message carries information about any modifier keys, or other mouse buttons, which were pressed during the left click. 21

Listing 1 contains an example of what the log looks like when a user left-clicks in a WordPad document, while also holding down the shift key and right mouse button.

Mouse click in window. Left mouse button down. Right mouse button down. Shift key down. |Class Name: RICHEDIT50W |Parent Text: Schedule.rtf - WordPad |Parent Class Name: WordPadClass |Exe Filename: wordpad.exe |Exe Directory: C:\Program Files\Windows NT\Accessories

Listing 1. Log from an LBUTTONDOWN message

6.1.2. WM CHAR.

This message is posted to a window's message queue when the user types a single character while that window has keyboard focus. The message carries information about which key was typed. They include all printable Unicode characters, including white-space, as well as keys like Backspace, or Escape.

Listing 2 contains an example of what the log looks like when a user types “test” and presses enter in the search bar of the file explorer.

Character (t) typed in window. |Class Name: DirectUIHWND |Parent Class Name: SearchEditBoxWrapperClass |Exe Filename: explorer.exe |Exe Directory: C:\Windows Character (e) typed in window. ... Character (s) typed in window. ... Character (t) typed in window. ... Character (ENTER) typed in window. ...

Listing 2. Log from CHAR messages

6.1.3. WM HOTKEY.

This message is posted to a window's message queue when the user issues a hot-key while that window has keyboard focus. The message caries information about what keys were 22 involved. In this case, hot-key can mean any of the modifier keys (Shift, Control, Alt, Win- dows) held while another key is pressed. It can also refer to single key “hotkeys” like the “PrintScreen” key.

Listing 3 contains an example of what the log looks like when a user presses Ctrl+P while in Microsoft Word.

Hotkey issued in window. CTRL held. Key: P. |Text: Microsoft Word Document |Class Name: _WwG |Parent Text: Essay.docx |Parent Class Name: _WwB |Exe Filename: WINWORD.EXE |Exe Directory: C:\Program Files (x86)\\root\Office16

Listing 3. Log from a HOTKEY message

6.1.4. BM CLICK.

This message is sent to a window when that window is a button control (including ra- dio/check boxes) to simulate a click.

From the official documentation9: This message causes the button to receive the WM LBUTTONDOWN and WM LBUTTONUP messages, and the button’s parent window to receive a BN CLICKED notification code [in a WM COMMAND message].

Listing 4 contains an example of what the log looks like when a user clicks a button labelled “Additional Settings” in a Control Panel settings page.

Button clicked. |Text: A&dditional settings... |Class Name: Button |Parent Text: Formats |Parent Class Name: #32770 |Exe Filename: rundll32.exe |Exe Directory: C:\Windows\system32

Listing 4. Log from button CLICK message

6.1.5. BM SETCHECK.

9https://msdn.microsoft.com/en-us/library/windows/desktop/bb775985(v=vs.85).aspx 23

This message is sent to a window when that window is a button control (specifically, a radio/check box), in order to modify its state: checked or not.

Listing 5 contains an example of what the log looks like when a user checks a checkbox labelled “Use the desktop language bar when it's available” found in Control Panel language settings.

Radio button or checkbox unchecked. |Text: Use the desktop language bar when it’s available |Class Name: Button |Parent Class Name: CtrlNotifySink |Exe Filename: explorer.exe |Exe Directory: C:\Windows

Listing 5. Log from button SETCHECK message

6.1.6. WM CUT, WM COPY, WM PASTE.

It is worth grouping these three messages together, as they are all triggered by the set of their familiar corresponding clipboard actions: cut copy and paste.

Listing 6 contains an example of what the logging looks like when a user uses cut, copy, and paste to modify some text in a WordPad document. 24

WM_CUT sent to window. |Text: Hello, ! |Class Name: RICHEDIT50W |Parent Text: Document - WordPad |Parent Class Name: WordPadClass |Exe Filename: wordpad.exe |Exe Directory: C:\Program Files\Windows NT\Accessories WM_COPY sent to window. |Text: Hello, sir! |Class Name: RICHEDIT50W |Parent Text: Document - WordPad |Parent Class Name: WordPadClass |Exe Filename: wordpad.exe |Exe Directory: C:\Program Files\Windows NT\Accessories WM_PASTE sent to window. |Text: Hello, sir! Hello, sir! |Class Name: RICHEDIT50W |Parent Text: Document - WordPad |Parent Class Name: WordPadClass |Exe Filename: wordpad.exe |Exe Directory: C:\Program Files\Windows NT\Accessories

Listing 6. Log from CUT, COPY, and PASTE messages

6.1.7. WM COMMAND.

This message is used in a variety of contexts. Directly from Microsoft's documentation10 this message is ‘sent when the user selects a command item from a menu, when a control sends a notification message to its parent window, or when an accelerator keystroke is translated.’

In more general terms: a window control, like a button, will notify its parent about events by sending a WM COMMAND message. The message contains a control-specific notifica- tion code indicating the type of event. There are many possible event codes for each simple control. For example, the BN CLICK notification code indicates that a button was clicked, or the EN CHANGE notification cdoe indicates that the value of a text edit control has changed.

Listing 7 contains an example of what the log looks like when a user changes the name of a file on their desktop.

10https://msdn.microsoft.com/en-us/library/windows/desktop/ms647591(v=vs.85).aspx 25

Edit control changed. |Text: .sxl |Class Name: Edit |Parent Text: FolderView |Parent Class Name: SysListView32 |Exe Filename: explorer.exe |Exe Directory: C:\Windows Edit control changed. |Text: t.sxl ... Edit control changed. |Text: te.sxl ... Edit control changed. |Text: tes.sxl ... Edit control changed. |Text: test.sxl ...

Listing 7. Log from COMMAND message with text edit notification

6.1.8. WM INITMENUPOPUP and WM UNINITMENUPOPUP.

The WM INITMENUPOPUP message is sent to a window when a drop-down menu or sub- menu is about to become active. The long parameter contains a boolean indicating whether the menu being activated is the common window menu. If a window receives WM INITMENUPOPUP, it will also receive WM UNINITMENUPOPUP when that menu is de- stroyed.

Listing 8 contains an example of what the log looks like when the user opens up the main window menu in the Sublime Text 3 text editor, and then closes it. 26

System window menu activating. |Text: untitled - Sublime Text (UNREGISTERED) |Class Name: PX_WINDOW_CLASS |Parent Class Name: #32769 |First Child Class Name: ScrollBar |Exe Filename: sublime_text.exe |Exe Directory: C:\Program Files\Sublime Text 3 Drop-down or submenu closed. |Text: untitled - Sublime Text (UNREGISTERED) ...

Listing 8. Log from INITMENUPOPUP and UNINITMENUPOPUP messages

6.1.9. WM INITDIALOG.

From the official documentation11 this message is “sent to the dialog box procedure imme- diately before a dialog box is displayed.”

Listing 9 contains an example of what the log looks like when the user opens up the ‘Properties’ from an application on the task bar.

Dialog window initialized. |Text: Visual Studio 2015 Properties |Class Name: #32770 |Parent Class Name: #32769 |First Child Text: Shortcut |First Child Class Name: #32770 |Exe Filename: dllhost.exe |Exe Directory: C:\Windows\system32

Listing 9. Log from INITDIALOG message

6.1.10. WM SYSCOMMAND.

From the official documentation12 “A window receives this message when the user chooses a command from the Window menu (formerly known as the system or control menu) or when the user chooses the maximize button, minimize button, restore button, or close button.”

Listing 10 contains an example of what the log looks like when the user minimizes a file explorer window, and then closes it from the task bar. Note that no information is logged

11https://msdn.microsoft.com/en-us/library/windows/desktop/ms645428(v=vs.85).aspx 12https://msdn.microsoft.com/en-us/library/windows/desktop/ms646360(v=vs.85).aspx 27 about the window after the close notification at the bottom. By the time our collector learns that a window received a click ‘X’ to close, the window is destroyed.

Window minimized. |Text: File Explorer |Class Name: CabinetWClass |Parent Class Name: #32769 |First Child Text: UIRibbonDockLeft |First Child Class Name: UIRibbonCommandBarDock |Exe Filename: explorer.exe |Exe Directory: C:\Windows Window closed with ’X’ button.

Listing 10. Log from SYSCOMMAND message

6.1.11. WM CONTEXTMENU.

This message is sent to a window to notify it that there has been a right-click in the window area. The idea is that a right-click is a request for some form of context menu. The receiving application is supposed to either display its own custom context menu, or pass the message along to the default window procedure to load a default context menu.

Listing 11 contains an example of what the log looks like when the user right clicks on their task bar.

Right click (requesting context menu) in window. |Text: Running applications |Class Name: MSTaskListWClass |Parent Text: Running applications |Parent Class Name: MSTaskSwWClass |Exe Filename: explorer.exe |Exe Directory: C:\Windows

Listing 11. Log from CONTEXTMENU message

6.1.12. WM ENTERMENULOOP.

This message is sent to a window to notify is that a menu modal has been opened. A window that receives this should also receive a WM EXITMENULOOP message.

Listing 12 contains an example of what the log looks like when the user right clicks on their desktop, opening a context menu. 28

Right click context menu opened in window. |Class Name: SHELLDLL_DefView |Parent Text: Program Manager |Parent Class Name: Progman |First Child Text: FolderView |First Child Class Name: SysListView32 |Exe Filename: explorer.exe |Exe Directory: C:\Windows

Listing 12. Log from ENTERMENULOOP message

6.1.13. WM MENUSELECT.

This message is sent to a menu’s owner window when a menu item has been selected. Note that in this case ‘selected’ can mean moused-over, not necessarily clicked and activated. This makes it unclear in our logs what menu item is ultimately chosen. Our best inter- pretation thus far is to say whatever item got the most recent WM MENUSELECT before a WM EXITMENULOOP was chosen. This gives accurate results, but can hopefully be improved.

Listing 13 contains an example of what the log looks like what the user moves their mouse across a menu modal and clicks “New Folder” (a continuation of the previous section's scenario).

Mouse is over menu item. |Item Text: La&rge icons |Flags: Item is highlighted. Item is selected with mouse. Mouse is over menu item. |Item Text: R&efresh |Flags: Item is highlighted. Item is selected with mouse. Mouse is over menu item. |Item Text: &Paste |Flags: Item is grayed. Item is highlighted. Item is selected with mouse. Mouse is over menu item. |Item Text: Paste &shortcut |Flags: Item is grayed. Item is highlighted. Item is selected with mouse. Mouse is over menu item. |Item Text: &Folder |Flags: Item is highlighted. Item is selected with mouse.

Listing 13. Log from MENUSELECT message 29

6.1.14. WM EXITMENULOOP.

This message is sent to a window to notify is that a menu modal has been closed.

Listing 14 contains an example of what the log looks after the user selects a menu item (following on the scenario from the previous two sections).

Right click context menu closed in window. |Class Name: SHELLDLL_DefView |Parent Text: Program Manager |Parent Class Name: Progman |First Child Text: FolderView |First Child Class Name: SysListView32 |Exe Filename: explorer.exe |Exe Directory: C:\Windows

Listing 14. Log from EXITMENULOOP message

6.1.15. WM SETTEXT.

This message is sent to a window to change its ‘main’ text. The effect of this message varies depending on what class of window is receiving it.

Listing 15 contains an example of what the log looks after the user selects “New Folder” from a right-click menu on the desktop (following on the previous sections’ scenarios), when a new folder actually appears on the desktop.

Window text updated. |Text: New folder |Class Name: Edit |Parent Text: FolderView |Parent Class Name: SysListView32 |Exe Filename: explorer.exe |Exe Directory: C:\Windows

Listing 15. Log from SETTEXT message

6.1.16. WM ACTIVATEAPP.

From the official documentation13 this message is ‘sent when a window belonging to a dif- ferent application than the active window is about to be activated. The message is sent to

13https://msdn.microsoft.com/en-us/library/windows/desktop/ms632614(v=vs.85).aspx 30 the application whose window is being activated and to the application whose window is being deactivated.’

Listing 16 contains an example of what the log looks after the user switches from a Sublime Text window to the Desktop.

Window activated. |Text: Program Manager |Class Name: Progman |Parent Class Name: #32769 |First Child Class Name: SHELLDLL_DefView |Exe Filename: explorer.exe |Exe Directory: C:\Windows Window deactivated. |Text: untitled - Sublime Text (UNREGISTERED) |Class Name: PX_WINDOW_CLASS |Parent Class Name: #32769 |First Child Class Name: ScrollBar |Exe Filename: sublime_text.exe |Exe Directory: C:\Program Files\Sublime Text 3

Listing 16. Log from ACTIVATEAPP message

6.2. GUI Structure Mapping

For this part of the evaluation, we compare screenshots of application windows to a textual representation of their structure, which was queried and logged by the tool.

6.2.1. Example 1.

Figure 6 contains an image of a dialog window opened from the Windows Control panel, used to modify regional date/time settings. The following output, in Listing ??, is the tool's representation of the hierarchy of windows rendered.

[HWND: 203C4] [Classname: #32770] [Text: Region] |-> [HWND: 2041C] [Classname: #32770] [Text: Formats] |-> [HWND: 20416] [Classname: Static] [Text: &Format: English (United States)] |-> [HWND: 20418] [Classname: ComboBox] [Text: Match Windows display language (recommended)] |-> [HWND: 20414] [Classname: SysLink] [Text: Change sorting method] |-> [HWND: 20408] [Classname: SysLink] [Text: Language preferences] 31

|-> [HWND: 2040A] [Classname: Button] [Text: Date and time formats] |-> [HWND: 2040C] [Classname: Static] [Text: &Short date:] |-> [HWND: 2040E] [Classname: ComboBox] [Text: M/d/yyyy] |-> [HWND: 6005C] [Classname: Static] [Text: &Long date:] |-> [HWND: 403E6] [Classname: ComboBox] [Text: dddd, MMMM d, yyyy] |-> [HWND: 203D4] [Classname: Static] [Text: S&hort time:] |-> [HWND: 403EC] [Classname: ComboBox] [Text: h:mm tt] |-> [HWND: 503F0] [Classname: Static] [Text: L&ong time:] |-> [HWND: 403EA] [Classname: ComboBox] [Text: h:mm:ss tt] |-> [HWND: 203C8] [Classname: Static] [Text: First day of &week:] |-> [HWND: 203CA] [Classname: ComboBox] [Text: Sunday] |-> [HWND: 50058] [Classname: Button] [Text: Examples] |-> [HWND: 50188] [Classname: Static] [Text: Short date:] |-> [HWND: 8029A] [Classname: Static] [Text: 2 / 2 6 / 2 0 1 7 ] |-> [HWND: 50208] [Classname: Static] [Text: Long date:] |-> [HWND: 203F8] [Classname: Static] [Text: S u n d a y , F e b r u a r y 2 6 , 2 0 1 7 ] |-> [HWND: 203FA] [Classname: Static] [Text: Short time:] |-> [HWND: 203DE] [Classname: Static] [Text: 7:45 PM] |-> [HWND: 203E0] [Classname: Static] [Text: Long time:] |-> [HWND: 203E2] [Classname: Static] [Text: 7:45:53 PM] |-> [HWND: 203E4] [Classname: Button] [Text: A&dditional settings...] |-> [HWND: 303E8] [Classname: Button] [Text: OK] |-> [HWND: 203FC] [Classname: Button] [Text: Cancel] |-> [HWND: 20422] [Classname: Button] [Text: &Apply] |-> [HWND: 20420] [Classname: Button] [Text: Help] |-> [HWND: 2041E] [Classname: SysTabControl32] [Text: NONE]

Listing 17. Date/Time Control Panel Settings

Here we can see the root of the hierarchy is a window with the text “Region” - correctly lining up with the title bar of the window we are considering. Every other window element logged is shown as a direct child of that root element. Intuitively, this may seem incorrect. A natural guess might be to say that the top level window, “Region” has three children, one for each tab, and the rest of the components are children of those tab components.

However, in this case the tool's logging is correct. Tab control windows do not have their inner controls as children. They are all siblings, and when the selected tab is changed some windows switch from visible to hidden and vice versa.

Beyond that confusion, each component in the window is represented accurately. We can see windows with the class “Static” where there is unmodifiable text in the window, and “ComboBox” windows for each of the drop-down menus seen.

6.2.2. Example 2. 32

Fig. 6. Date/time settings window

Fig. 7. Chrome Browser Window

Figure 7 contains an image of the Google Chrome web browser, opened to Stack Overflow. The following output is the tool's representation of the hierarchy of windows rendered.

[HWND: 2030C] [Classname: Chrome_WidgetWin_1] [Text: Stack Overflow - Google Chrome] |-> [HWND: 20352] [Classname: Chrome_RenderWidgetHostHWND] [Text: Chrome Legacy Window] |-> [HWND: 2031C] [Classname: Intermediate D3D Window] [Text: NONE]

Listing 18. Chrome Browser Window 33

Here we see that there are, seemingly, very few windows composing the hierarchy. Es- pecially compared to the previous example. This is despite the fact that a web browser ought to contain plenty of text, buttons, checkboxes, and the like. In this case, the problem is that modern web browsers are unlikely to be implementing standard Windows controls. This is quite reasonable, given the dynamic nature of the content a browser must render. But it presents a clear limitation with our approach: we cannot say much about the structure or content of the GUI if it is a web browser.

Nonetheless, we do at least see the title of the active webpage as the caption of the top-level window. Plus, while we may not know much about the structure and content of the GUI in a web browser, we do know a few things how the user is interacting with it. Like any GUI application must, a web browser stills receive window messages from the operating system to learn about keyboard and mouse inputs, resize events, and more. Thus, the message logging done by the tool is still able to provide useful insight.

6.3. Performance Overhead

To evaluate the performance overhead, we set up two different testing scenarios: one for evaluating the GetMessage hook procedure, and for the the CallWndProc hook procedure. Although both hook procedures take nearly identical steps, we found that sent messages are processed significantly faster than posted messages. Thus, it would be misleading to compare the two.

Additionally, we found that our time measurements were not precise enough to detect a change in time before and after a single GetMessage or SendMessage function call. So in either case, we measure the time before and after many functions, not just one.

All of our tests were run inside a Virtual Box virtual machine. The machine had 8GB of memory, and two cores of a 2.9GHz Intel Core i5 processor. The machine was created from an official Microsoft Windows 10 development image14.

6.3.1. GetMessage Hook Performance.

In order to evaluate the GetMessage hook procedure, we set up a simple GUI application (the receiver), with a main loop that records a ‘start time’ when it receives its first message of type UWM EVAL, which is a custom message type that we register just for these tests. The receiver counts how many UWM EVAL messages it has received, and records an ‘end time’ after 100 have arrived. Then it logs the difference, in milliseconds, between the start end end times.

14https://developer.microsoft.com/en-us/windows/downloads/virtual-machines 34

We create a second simple GUI application (the trigger) with a button we can click to trigger sending UWM EVAL messages to the receiver window. The button causes a batch of 100 messages in a row to be sent, repeated 50 times. There is a small delay between each batch because the maximum default queue size is 10,000, and if we overflow the queue messages are lost leading to incomplete measurements.

We collect the 50 time logs from the receiver once. The data are plotted as cumulative distribution functions in Figures 8 (32 bit) and 9 (64 bit). PLEASE NOTE that in order to make the differences more easily visible, the X axes do not begin at 0. They begin at 500.

Table II contains averages of the data including, what is ultimately the most descriptive measurement, the average amount of time added by the hook for a single GetMessage call: 47.4 microseconds for 32 bit, and 51.4 microseconds for 64 bit.

Fig. 8. 32-bit GetMessage hook performance Fig. 9. 64-bit GetMessage hook performance

Table II. GetMessage Performance

Avg Duration Avg Duration Difference Messages Time Added Architecture without Hook (ms) with Hook (ms) (ms) per Trial per Message (ms) 32 556.9 561.64 4.74 100 0.0474 64 556.84 561.98 5.14 100 0.0514

6.3.2. CallWndProc Hook Performance.

In order to evaluate the CallWndProc hook procedure, we set up a very similar receiver/trig- ger pair of applications as in the GetMessage evaluation. Since we are measuring the time taken for SendMessage function calls in this case, it is the trigger that must record be- fore/after times, not the receiver. Additionally, in this case we measure the time taken to send 250 messages, rather than 100. This was chosen somewhat arbitrarily, based on the 35 fact that we saw GetMessage function calls taking about 2.5x longer than ‘SendMessage‘ calls.

The 50 data points each for 32 and 64 bit trials are plotted as cumulative distribution functions in Figures 10 (32 bit) and 11 (64 bit). PLEASE NOTE that in order to make the differences more easily visible, the X axes do not begin at 0. They begin at 300. This is different from the GetMessage plots, so the resolution of the plots differ.

Table III contains averages of the data including, what is ultimately the most descrip- tive measurement, the average amount of time added by the hook for a single SendMessage call: 68.4 microseconds for 32 bit, and 196.9 microseconds for 64 bit.

Fig. 10. 32-bit CallWndProc hook performance Fig. 11. 64-bit CallWndProc hook performance

Table III. CallWndProc Performance

Avg Duration Avg Duration Difference Messages Time Added Architecture without Hook (ms) with Hook (ms) (ms) per Trial per Message (ms) 32 451.82 468.92 17.1 250 0.0684 64 469.46 518.56 49.1 250 0.1969

7. DISCUSSION

Here we reflect, after the fact, on our decisions in developing this system and the avenues left open for future work to improve.

7.1. Performance Gains

Our approach has a small performance impact on a computer running it. Certainly, we would like to see the microseconds of delay shortened to nanoseconds. However, in moder- ate usage, for example editing documents, sending emails, or browsing the web, the per- 36 formance overhead is not noticeable to the user. Typical interactions with GUIs do not feel slower to a user.

While we were sensitive to the performance concerns of this tool, they were not our primary concern. For us, the most important requirement was that the tool produce good data, so the broad majority of our development hours were spent towards that end. There is room for much more thorough optimization. Given our lack of focused effort in that regard, there are many ‘low-hanging fruit’ changes that could substantially improve performance.

For one example, every time a message is intercepted the collector queries the destina- tion window for several pieces of data. Some of these data, like the process ID or executable path, are unlikely to change in a short-medium period of time. If 100 messages are inter- cepted from one window in short succession, then 99 extremely redundant calls are made to copy these data into new local buffers. Some form of hash map associating HWND window identifiers with such data would allow a marked reduction in unnecessary operations. It is even possible to know exactly when an entry should be added to or evicted from the map, since WM CREATE messages will be seen for any new window, and WM DESTROY messages will be seen for any window being destroyed.

The obvious oversights and potential improvements in our tool’s performance mean that our evaluations of time-added represent an absolute worst-case scenario. In its current state, the overhead is not noticeable with moderate GUI usage. For a worst-case scenario, that is an encouraging place to start.

7.2. Trust Dependence

One open question was mentioned in Section 5 regarding other applications setting their own hooks after our own. Windows will call the most recently installed hook procedure first, and it is up to that hook procedure to call CallNextHookEx in order to pass the message to the next hook procedure. Each hook procedure may potentially modify the message con- tents before passing them to the next hook, or it may choose not to pass the message to the next hook at all. In either case, this stops us from being able to guarantee the integrity of the data we produce. Ideally, to avoid this we would ensure that our hook procedure is the most recently installed, and re-install it otherwise. Windows does offer a WH DEBUG hook type for SetWindowsHookEx, specifically for getting information about other hook proce- dures. We have not explored it, but it does seem to allow one to know if they have the most recent hook procedure or not. That being said, we do not have an obvious answer for what to do if another hook is installed afterwards. Are there user-facing applications which some- how require having the most recent hook? If so, we would not want to break them. Worse, 37 what if we respond to new hooks by reinstalling our hook on top, and another application does the same. Then we have forced an infinite hooking loop, undoubtably handicapping the computer’s performance if not crashing it entirely.

7.3. Formalization

The data produced does not conform to any formal model. In its present state, although readability is high, our approach could use some rigor. We discussed some advantages of formality in Section 3.5. As a proof of concept, it was beyond the scope of the project for us to pursue such a model. That model would need to adapt as we discovered new messages worth intercepting, or new ways to interpret messages. This exploration would need to be performed before our system was deployed in a production security system. At that point, it would be critical to specify a level of certainty regarding the actions being reported.

7.4. Technical Features

Presently, one must launch a 32 bit instance of our collector to intercept messages for 32 bit processes, and a 64 bit instance for 64 bit processes. For full coverage on a 64 bit version of Windows, this means running two separate collectors simultaneously. This is a mild, yet avoidable inconvenience. From a technical standpoint, it is possible to have the 64 bit instance launch a 32 bit instance as a child process and use some form of inter-process communication to control it. It would then be possible to synchronize their efforts, and apply hooks to both 64 and 32 bit processes without needing to operate two collectors.

The gracefulness of our tool’s error-handling provides room for improvement. Unforseen crashes (typically due to mistakes in custom message handler functions actively being de- veloped) can fail to clean up, and leave zombie hooks in the system. The only way to unset the zombie hook is with a system reboot and until then the hook procedure needlessly for- wards messages to a nonexistant window.

8. CONCLUSION

Given the accuracy and understandability of the information produced, and the prototypical nature of this project, the performance overhead seen in our implementation is reasonable. Many high-level user actions can be easily discerned from our logging, and the hierarchical structure of windows being used can be perfectly conveyed. Any system consuming our data will gain valuable, and previously unobtainable, security context about the actions of that host. 38

Although there are certainly performance and quality of life improvements to be made throughout our proof of concept, the underlying scheme for intercepting and interpreting messages is demonstrably successful. The idea of relaying GUI activity for insight is not, to our knowledge, being applied in any existing security tools. Nor is our particular approach to efficiently monitoring and interpereting messages system-wide.

By no means is our logging complete. There are many more messages that would be worth intercepting, and many more potential steps for improving the accuracy of interpre- tation of the messages we do currently intercept. However, our implementation serves as a clear proof of concept that it is very possible to efficiently extract high-level details about a user's actions by monitoring window messages.

Our tool offers a straightforward means to specify new message types for global inter- ception, and to supply handlers for those messages. As such, it should serve as a helpful platform and guide for further exploration into a unique security solution.

ACKNOWLEDGMENTS

The author would like to acknowledge contributions from the following people. Without their efforts, this project would not have been possible.

— Craig Shue, Associate Professor at Worcester Polytechnic Institute and the project advisor. This project is his brainchild. His guidance kept us away from rabbit-holes and focused on the bigger picture. — Curtis Taylor, Computer Science PhD candidate at Worcester Polytechnic Institute. His ongoing work with a Linux-based yet similar system served as the starting point for this project. His insights have several times helped to guide our efforts to success. — Kane, P. R., Luli, K., Muchene, D. and Li, B. Z. WPI alumni whose MQP projects were prequels to this one.

REFERENCES

CUI,W.,KATZ,R.H.,AND TAN,W.-T. 2005. Binder: An extrusion-based break-in detector for personal comput- ers. In Proceedings of the Annual Conference on USENIX Annual Technical Conference. ATEC ’05. USENIX Association, Berkeley, CA, USA, 18–18.

HOFMEYR,S.A.,FORREST,S.,AND SOMAYAJI, A. 1998. Intrusion detection using sequences of system calls. J. Comput. Secur. 6, 3, 151–180.

JOHN,B.E.AND KIERAS, D. E. 1996. The goms family of user interface analysis techniques: Comparison and contrast. ACM Trans. Comput.-Hum. Interact. 3, 4, 320–351.

KANE,P.R.AND LI, B. Z. 2014. Insider threat detection with text libraries and machine learning.

LIU,Z.,WANG,Y.,DONTCHEVA,M.,HOFFMAN,M.,WALKER,S.,AND WILSON, A. 2017. Patterns and sequences: Interactive exploration of clickstreams to understand common visitor paths. IEEE Transactions on Visualiza- tion and Computer Graphics 23, 1, 321–330. 39

LU,L.,YEGNESWARAN,V.,PORRAS,P.,AND LEE, W. 2010. Blade: An attack-agnostic approach for preventing drive-by malware infections. In Proceedings of the 17th ACM Conference on Computer and Communications Security. CCS ’10. ACM, New York, NY, USA, 440–450.

LULI,K.AND MUCHENE, D. 2012. The big picture: Using desktop imagery for detection of insider threats.

MICROSOFT, M. 2015. Introducing spy++.

PAXSON, V. 1999. Bro: A system for detecting network intruders in real-time. Comput. Netw. 31, 23-24, 2435–2463.

ROESCH, M. 1999. Snort - lightweight intrusion detection for networks. In Proceedings of the 13th USENIX Con- ference on System Administration. LISA ’99. USENIX Association, Berkeley, CA, USA, 229–238.

ZHANG,H.,BANICK,W.,YAO,D.D.,AND RAMAKRISHNAN, N. 2012. User intention-based traffic dependence analysis for anomaly detection. In IEEE Symposium on Security and Privacy Workshops. IEEE, San Francisco, CA, US, 104–112.