Code Hardening: Development of a Reverse Software En- Gineering Project
Total Page:16
File Type:pdf, Size:1020Kb
Paper ID #30573 CODE HARDENING: DEVELOPMENT OF A REVERSE SOFTWARE EN- GINEERING PROJECT Mr. Zachary Michael Steudel, Baylor University Zachary Steudel is a Junior Computer Science student at Baylor University working as a Teaching Assis- tant under Ms. Cynthia C. Fry. As part of the Teaching Assistant role, Zachary designed and created the group project for the Computer Systems course. Zachary Steudel worked as a Software Developer Intern at Amazon in the Summer of 2019 and plans to join Microsoft as a Software Engineering Intern in the Summer of 2020. Ms. Cynthia C. Fry, Baylor University CYNTHIA C. FRY is currently a Senior Lecturer of Computer Science at Baylor University. She worked at NASA’s Marshall Space Flight Center as a Senior Project Engineer, a Crew Training Manager, and the Science Operations Director for STS-46. She was an Engineering Duty Officer in the U.S. Navy (IRR), and worked with the Naval Maritime Intelligence Center as a Scientific/Technical Intelligence Analyst. She was the owner and chief systems engineer for Systems Engineering Services (SES), a computer systems design, development, and consultation firm. She joined the faculty of the School of Engineering and Computer Science at Baylor University in 1997, where she teaches a variety of engineering and computer science classes, she is the Faculty Advisor for the Women in Computer Science (WiCS), the Director of the Computer Science Fellows program, and is a KEEN Fellow. She has authored and co- authored over fifty peer-reviewed papers. c American Society for Engineering Education, 2020 Code Hardening: Development of a Reverse Software Engineering Project Abstract In CSI 2334, “Introduction to Computer Systems” (CompSys), at Baylor University, we introduce a group project to the students whose purpose is to simulate a team project on the job. Group projects are used very frequently to provide a similar learning environment which capitalizes on the benefits of peer-to-peer instruction, or cooperative learning. In this group project, students are presented with a challenge. A piece of executable code has been found on an older server, and the student teams must determine what the code is designed to do; and, in particular, whether the code is benign or malicious in nature. In order to simulate this scenario a software system is developed and then hardened. The system developed is generally a game that has some malicious content, which is then obfuscated before the executable is presented to the student teams. The objective for the student teams is to research methods by which the binary file can be “read” without executing it, and to modify the behavior of the executable file, depending on the purpose of the code. This paper will introduce methods of software hardening and obfuscation; and will discuss the design of the project, the implementation of the design, code obfuscation techniques used, and which obfuscation techniques were used to produce the mystery executable presented to the class as their class project. Introduction Group projects in engineering and computer science coursework are a critical part of the education process. Not only do they enforce the concepts being taught, they also provide an environment in which essential professional skills (aka, soft skills) can be understood, culminating in a synergistic learning experience. The value of such group learning has been well documented in both engineering and computer science courses [1]-[5], and is a cornerstone to the education process according to ABET EAC and CAC [6], [7], the National Academies of Science, Engineering, and Medicine [8], and industry partners [9]. Before introducing the methods and challenges related to the obfuscation a software system for a classroom project, it is important to understand the depth of knowledge available in the field and the maturity of the discipline at this point. Although one of the most important applications of binary-level reverse engineering is dealing with malware [10], it is certainly not the only application. It is often the case with older software systems that code is not adequately documented, and the original developers no longer available for consultation [11], in which case the binary executable would be reverse engineered for behavior and interface information. Code obfuscation or hardening may also be employed intentionally. It might be that the digital rights of pieces of information [12] (e.g., cryptographic keys and protocols) must be protected. A company or governmental entity may need to protect intellectual property; perhaps they have developed an algorithm that is more efficient than others/competitors, and they need to secure it. Code-level hardening changes can be defined as “changes in the source code in a way that prevents vulnerabilities without altering the design” [13]. Code hardening is a common practice in major software development firms, with many teams carving out sprints1 to harden new code by refactoring and removing unnecessary dependencies [14]. Code-level hardening is also employed to make it difficult for attackers from easily discovering vulnerabilities in the code, as well as to protect the program owner’s intellectual property [15]. Hardening can also refer to practices that make the original source code of an application more difficult to understand. This is generally defined as code obfuscation. The practice of code obfuscation in modern software engineering dates back to the 1980’s, with small competitions held to transform simple C code into confusing, abstract puzzles difficult for humans to parse and understand [16]. Since the 1980’s, literature on code-level obfuscation has been consistent but generally sparse. This field of research is relatively small, with no more than three or four papers published each year since the early 1990’s. With the growth of cloud-computing services that require paid subscriptions and API keys embedded in the code, the necessity of obfuscation techniques to halt reverse software engineering attempts has grown. Without sufficient code obfuscation, reverse engineers can easily disassemble or decompile code to gain access to API keys and compromise an application’s access to cloud databases or other cloud compute resources [17]. Introduction to the Challenge Reversing software group projects have been a part of the CompSys course since the Fall 2012 semester. As a result, nearly 600 students have participated in one of the various group projects. The projects are different every semester, responding to changes in the available reversing software. CompSys is a sophomore-level course in which the relationship between hardware and software is investigated at a low level2. As part of the course requirements, students in CompSys are paired up for the class group project a little more than halfway through the semester. Each pair of students is presented with a Windows executable file with no information other than that this file has unknown origins and unknown functionality. Given this, students are tasked with learning the effects and purpose of the executable and to quarantine any malicious elements while also further developing the functionality of the program. 1 A sprint is generally defined as a short 2-4 week iteration in a team’s software development process. Most agile development processes utilize sprints to develop software. 2 All materials used in this course in the Fall 2019 semester can be found here: https://classnotes.ecs.baylor.edu/wiki/CSI_2334_Fry_Fall_2019 (username: CSI2334F19, password: FrySteudelASEE2020). This includes the CSI 2334 Course syllabus, the CSI 2334 Course calendar, and the CSI 2334 Fall 2019 Project (among other items). The CompSys course is primarily focused around low-level design, architecture, and code. This project plays into the goals of the course by mimicking the research and design process of small agile teams in the software engineering industry while also developing the low-level computing skills of the students. Given only an executable file, students must take advantage of open source software to gain access to the binary code and modify it. At the end of the semester, the student groups present their findings to the professor and the teaching assistant in a formal mock final review setting. The groups are determined through a rigorous questionnaire administered through the Comprehensive Assessment of Team Member Effectiveness system (CATME.org)3. Time is spent in class discussing the importance of working together and learning to value each team member's contributions. The team members must assess each other's effectiveness with both a formative assessment conducted midway through the group project, as well as a summative assessment after the final projects are submitted. Students are given two assignments regarding team building and team dynamics before they begin the formative assessment. These assessments are graded by evaluating the consideration used by each student in their assessment of their team members. This grade becomes an individual component in the student's overall course grade, and the individual adjustment factor is used as a multiplier in each student's group project grade. The objectives of this group project are not only technically related, but also related to educational pedagogy: 1. To begin consideration of the project in a project-based manner, from the beginning of the term. 2. To develop and build a team, where what each team member brings to the project is known and valued. 3. To further develop team building skills, including communication between team members, team dynamics, conflict resolution, and presentation skills to various audiences. 4. To help students understand the importance of how code hardening is used to protect intellectual property. 5. To help students understand how the skills learned in the course can be directly applied in the detection of unknown code. Determination of the Challenge Before developing the course project for the students, it was first necessary to analyze the learning objectives of the project and potential outcomes.