A Highly Immersive Approach to Teaching Reverse Engineering Golden G
Total Page:16
File Type:pdf, Size:1020Kb
A Highly Immersive Approach to Teaching Reverse Engineering Golden G. Richard III Department of Computer Science University of New Orleans New Orleans, LA 70148 Email: [email protected] Abstract most operating systems courses, but for reverse engi- neering, the devil really is in the details. For example, While short training courses in reverse engineering are to understand an offensive technique like Shadow frequently offered at meetings like Blackhat and Walker [13], which relies on de-synchronization of the through training organizations such as SANS, there are data and instruction TLBs in Intel’s split-TLB design virtually no reverse engineering courses offered in aca- to hide code and data, it’s not enough to have seen a demia. This paper discusses possible reasons for this Powerpoint slide depicting the abstract functionality of situation, emphasizes the importance of teaching re- a TLB—that’s a good start, but both more details and verse engineering (and applied computer security edu- hands-on experience are needed. Furthermore, assem- cation in general), and presents the overall design of a bler language courses, if they exist at all as independ- semester-long course in reverse engineering malware, ent courses in a modern computing curriculum, tend to recently offered by the author at the University of New be much weaker than in the past, often emphasizing the Orleans. use of High Level Assembler (HLA) [4] and develop- 1 Introduction ment of toy applications. In many curricula, the as- sembler language course of old has been folded into the Reverse engineering of software involves detailed undergraduate architecture class. This is understand- analysis of the low-level structure and run-time effects able to some degree because of the limited use of as- of a software component. The component under inves- sembler in general-purpose computing, but is a chal- tigation might be an entire application, a kernel mod- lenging issue facing someone developing a reverse ule, a software patch, or a single function. Reverse engineering course. engineering is often applied directly to application bi- naries (or bytecode, in the case of applications written When the author began development of the reverse in interpreted languages), in the absence of available engineering course described in this paper, he per- source code. Reverse engineering has many uses, in- formed a routine search for other reverse engineering cluding facilitating software interoperability, evalua- courses—what are others doing? What tools are they tions of the effects of security patches, security audit- using? How are they managing to address all of the ing (e.g., determining if the effects of an application necessary skills in a single semester? The search are malicious or unwanted), and enhancing software yielded little. There appeared to be almost no reverse functionality or performance. It might also be per- engineering courses being taught at all in academia. formed out of simple curiosity, to understand how a Why? Student disinterest can’t be the reason—the software component works, or to “crack” software to author’s own students were wildly enthusiastic, despite defeat copy protection techniques. Reverse engineering dire warnings of severe mental pain, lost sleep, and also plays a critical role in the deep understanding and nightmares involving segmentation and instruction attempted mitigation of malicious software and this prefetch caches. Why are there almost no hardcore paper focuses on that role. reverse engineering courses in academia? One possi- bility is that there is a perception that necessary skills The skills needed for effective reverse engineering are can’t be developed in a single semester. While only diverse and some are not emphasized in current- additional experience beyond a single course can fully generation computing curricula. Reverse engineering develop a student’s reverse engineering skills, an im- requires not only tenacity (a primary skill), but also pressive set of skills can certainly be fostered within a strong assembler language skills, knowledge of operat- single semester. Another possibility is the fear (on the ing systems internals and both low-level and high-level part of either faculty members or administration) that APIs, and in many cases, significant knowledge of teaching reverse engineering isn’t a good idea. What hardware details. An abstract view of how paging impact will such a course have on the school’s comput- works or a brief description of what a translation look- ing infrastructure? Should we be teaching students to aside buffer (TLB) does is considered sufficient in 1 “crack” software? Precautions are necessary, but teaching the reverse engineering course. This labora- avoiding horror stories isn’t difficult, given a modest tory contains high-end workstations that boot Linux amount of dedicated computing resources. A third and run Windows XP (as well as a variety of other possibility is that a reverse engineering course might guest operating systems) under VMWare. The lab is not fit into curricula that increasingly emphasize “ab- served by a dedicated gigabit network switch that can stract”, hands-off computing. Finally, developing a be isolated from the main departmental network and a reverse engineering course, at least for the first time, dedicated file server. Linux boots from the drives in- requires a significant amount of effort for the faculty stalled in the workstations, but all user files, including member—it’s far easier for faculty members to flip the VMWare images for Windows XP on the work- Powerpoint slides than to prepare and then grind station, are stored on the fileserver. The reverse engi- through long assembly listings with their students. neering course concentrated primarily on reversing Whatever the reason, hands-on, applied computing is Windows binaries, so virtually all analysis was per- essential if we are to properly develop systems- formed under XP running as a guest inside VMWare. oriented computer scientists who are able to solve im- Students were instructed to turn networking off inside portant problems in computer security. VMWare during analysis of malware and to make The thesis of this paper is that hardcore reverse engi- heavy use of snapshots to maintain the integrity of their neering can be taught effectively, with a non-fatal de- analysis environments. While “jump of out VM” gree of effort on the part of a dedicated instructor, and proofs of concept now exist that allow arbitrary code with laboratory facilities that are within the reach of execution on the host OS from a guest OS, no software most departments. Furthermore, the effort is well worth of this type was analyzed in the reverse engineering it, and trains students not only to “do” reverse engi- course. Furthermore, while the students were taught neering, but also to be much better systems people, essential anti-VM techniques and various workarounds, providing them with better grounding in architecture, the majority of the malware that they analyzed did not assembler language, operating systems internals, and actively use anti-VM measures, to increase the useful- software optimization, all as a result of their hands-on ness of the VM-based analysis environment. experiences. Studying reverse engineering teaches The use of VMWare allowed us to push a single user students how to look hard at the details of systems, environment for reverse engineering, packaged in a rather than just noting whether systems do or do not 20GB VMWare image, into each student’s account. have some desired effect. This environment could then be easily refreshed if con- taminated during malware analysis. The environment The remainder of the paper describes the laboratory for malware analysis contained the following software: setup, topics covered in the reverse engineering course, teaching methods, and a discussion of the particular • The sysinternals tools [15], including procmon, malware samples analyzed by students in a reverse regmon, etc. for dynamic analysis. These tools are engineering course taught by the author. freely downloadable. 2 Course Details • Visual C++ Express Edition [7], which is available free to students. Each student activated her own 2.1 What is it, Really? copy of Visual C++ inside her private VMWare The course described in this paper isn’t a traditional image. While the university has a site license for computer security course, nor even a traditional aca- the standard Visual C++ distribution, the use of demic course, in the “listen to lectures and do a few Express Edition was ideal, because it allowed stu- laboratory exercises” sense. Neither is it a hacking dents to migrate the package to their personal ma- course, although the topics covered could be watered chines for work outside the laboratory. down sufficiently to fit within the scope of a single • The MASM32 SDK [6], which is freely down- semester course on hacking or offensive computing. loadable and contains necessary include files and Instead, the goal of the course is to teach students, libraries for writing assembler using Windows op- within a single semester, the necessary skills to exam- erating systems. ine and understand malicious software. • OllyDbg [9], a very popular Windows debugger, 2.2 Laboratory Setup which is freely downloadable. The Department of Computer Science at the University • IDA Pro [5], a disassembler with extensive support of New Orleans has a dedicated laboratory for digital for plugins, for static analysis. This was the only forensics research and instruction, which was used for essential commercial product. For most of the 2 analyses performed in an introductory reverse en- • Windows Portable Executable (PE) format. Deep gineering course, even the freeware 4.x version is understanding of executable file formats is crucial sufficient, if resources don’t permit purchase of for analyzing modern malware, since malware of- sufficient 5.x licenses. The most important plugin ten parses and modifies executable files to hide for IDA Pro in the context of the course was and to propagate.