Reverse Software Engineering As a Project-Based Learning Tool
Total Page:16
File Type:pdf, Size:1020Kb
Paper ID #33764 Reverse Software Engineering as a Project-Based Learning Tool Ms. Cynthia C. Fry, Baylor University CYNTHIA C. FRY is currently a Senior Lecturer of Computer Science at Baylor University. She worked at NASA’s Marshall Space Flight Center as a Senior Project Engineer, a Crew Training Manager, and the Science Operations Director for STS-46. She was an Engineering Duty Officer in the U.S. Navy (IRR), and worked with the Naval Maritime Intelligence Center as a Scientific/Technical Intelligence Analyst. She was the owner and chief systems engineer for Systems Engineering Services (SES), a computer systems design, development, and consultation firm. She joined the faculty of the School of Engineering and Computer Science at Baylor University in 1997, where she teaches a variety of engineering and computer science classes, she is the Faculty Advisor for the Women in Computer Science (WiCS), the Director of the Computer Science Fellows program, and is a KEEN Fellow. She has authored and co- authored over fifty peer-reviewed papers. Mr. Zachary Michael Steudel Zachary Steudel is a 2021 graduate of Baylor University’s computer science department. In his time at Baylor, he worked as a Teaching Assistant under Ms. Cynthia C. Fry. As part of the Teaching Assistant role, Zachary designed and created the group project for the Computer Systems course. Zachary Steudel worked as a Software Developer Intern at Amazon in the Summer of 2019, a Software Engineer Intern at Microsoft in the Summer of 2020, and begins his full-time career with Amazon in the summer of 2021 as a software engineer. Mr. Joshua Craig Hunter, Baylor University Joshua Hunter is a Sophomore Computer Science student at Baylor University working as Computer Sci- ence and Calculus tutor. Joshua worked alongside Zachary Steudel to design and create the group project for the Computer Systems course in the Fall of 2020. Joshua is a member of the Theta Tau professional Engineering and Computer Science organization and will be working as a Software Engineering intern at L3 Harris this summer. c American Society for Engineering Education, 2021 Reverse Engineering as a Project-Based Learning Tool Abstract Although the concept of reverse software engineering is used in many fields, in the context of software engineering and security, it has come to include fields such as binary code patching, malware analysis, debugging, legacy compatibility, and network protocols analysis, to name a few.[1] Despite its broad use in software engineering, however, there is little work in computer science education that considers how reverse engineering can be taught effectively.[2] This may be a result of the compressed timetable of a four-year college education in computer science, where the need for the courses in the core curriculum, as well as the upper-level computer science electives, constantly find themselves in tension with regard to the short timetable necessary to produce a qualified computer scientist. Additionally, the constant changes in the discipline demand an ever-changing and updating curriculum. So, it is understandably difficult to find where and how the topic of reverse software engineering might be introduced within the curriculum; however, it has also become clear that it is a necessary inclusion. This paper will document a long-term research effort on the effectiveness of using a very simple reverse software engineering project in a sophomore-level computer systems course. We will report on the development of a series of class exercises that are inserted incrementally into a course. The goal of these projects is to lead students to a deeper understanding of computer systems, the continuing need for low-level understanding of software, and the development of critical thinking and problem-solving skills in the discernment and analysis of an unknown binary file. Introduction Reverse software engineering is “the practice of analyzing a software system, either in whole or in part, to extract design and implementation information.”[3] For the purposes of this paper, the term will refer to the process of determining the behavior or an unknown executable or binary file. In a sophomore-level course in computer systems, CSI 2334, “Introduction to Computer Systems,” at Baylor University a group project is introduced with the goals of: • to lead students to a deeper understanding of computer systems, • to understand the need for low-level understanding of software, • to learn how to value and work with a team, and • to develop critical thinking and problem-solving skills. These project goals support several of the class’ learning objectives, namely that students should be able to: • work effectively as a member of a small team, and • illustrate their understanding of code hardening. This project is introduced roughly half-way through the semester, where the class is told that an unknown binary has been found on the server, and their team must determine and change its behavior. Students are placed in their two-person teams. The topic of reverse software engineering, “reversing”, is discussed in sufficient detail, along with instructions on initial steps to take in the project. These steps involve doing research on the variety of online tools that are available to help determine behavior of binary files. Students are also encouraged to write a version of “Hello World” in C++, running the .exe file through some of these freely available online tools to develop an understanding of the “topography” of compiled code. Once the online tools are well understood and the students have worked with their unknown executable, they must identify any malicious code segments (if found) and modify the behavior of the code, depending on the functionality of the unknown executable. The teams document their journey, submit a final report and scrubbed executable, and present their findings. Leading up to the project introduction, there are several smaller, individual projects (mini projects, or MPs) that provide help in doing some preliminary research and understanding of architecture and the limitations of the hardware of a computer: • MP0, “Click Here” – Students are notified that an unknown executable has just been downloaded to their machine. They must do some initial research to determine whether the file is safe to open, what might happen when the file is opened, and how to determine functionality before executing a file. A research paper is submitted. • MP1, “The Bank Problem” – Students must research why a series of ten thousand charges, read in as single-precision, floating-point numbers, provides different results when a sum is calculated without order, in ascending order, and in descending order. Students must do research to discover the nature of the errors involved, and which of the sums is the most accurate. A research paper is submitted. • MP2, “Stack versus Heap” – Students must investigate the relationship between stack memory and the heap. They are asked to write a program that dynamically re-allocates an array of memory in the heap, and investigate where and how dynamic memory is allocated versus statically allocated memory. To help in this, they are asked to trace through the disassembly of this code and report on their findings. A research paper is submitted. These mini projects are designed to help students develop the researching skills, critical thinking skills, and communication skills, all in preparation for the group project. Development of the Project Before the semester begins, the group project is designed and developed. Typically, a small game or utility program is written in C/C++. After implementing the core functionality of the program, malicious elements are added to it. Typical malicious elements added to a semester’s project program include: • Fork bombs: Code that spawns new processes continuously, causing pop-ups and slowdown for the user’s computer. Below is C code used to create a fork bomb, pulled from the source code of the Fall 2019 group project. This code spawns 10,000 separate processes on the user’s computer, each printing the message “YOU_MADE_A_MISTAKE”. This is a controlled fork bomb, as typically a truly malicious bomb would run infinitely and spawn copies of itself such that the user cannot terminate it. Figure 1 • Memory leaks: C/C++ instructions that allocate chunks of memory from the computer’s RAM and never deallocates them, causing slowdown if allowed to continue. Below is C code that allocates approximately 15 kilobytes of memory from the user’s RAM. Most modern computer systems have at least 4 gigabytes of RAM, so this will likely go undetected if not carefully looked for. This code was used to cause a memory leak in the Fall 2019 project. Figure 2 • File spawning: Code that spawns files on the user’s computer that clog the hard drive. A partial snippet of code used in the Spring 2020 group project to spawn 50 files on the user’s system can be seen below. These files each contained hundreds of words of text from Lord of the Rings. Figure 3 Once the malicious portions of the program have been implemented, the final development step is to obfuscate the code. Code obfuscation is the purposeful muddling of a program’s source code to make it hard to read and understand for humans. The goal in obfuscating the group project executable in CSI 2334 is to motivate the use of different tools and analysis techniques from the students. If an un-obfuscated executable is handed to the students, they can easily disassemble and glean its functionality given a short period of time. Since the learning objectives for the project are to become more proficient at design and analysis in a team environment, obfuscation is necessary. By obfuscating the executable that is to be given to them, the process of dismantling and analyzing becomes more complex and requires more thought, collaboration, and careful documentation.