Peer Instruction for Digital Forensics

William Johnson, Irfan Ahmed,∗ Vassil Roussev Cynthia B. Lee Department of Computer Science Computer Science Department University of New Orleans Stanford University [email protected], {irfan,vassil}@cs.uno.edu [email protected]

Abstract are effective for students who have a good understanding of individual forensic concepts relating to the lab task, Digital forensics can be a difficult discipline to teach ef- and are able to integrate them into a broader view of a fectively because of its interdisciplinary nature, closely forensic investigative scenario. integrating law and computer science. Prior research in Unfortunately, the traditional lecture-centric approach Physics and Computer Science has shown that the tradi- is often inadequate in providing the conceptual prepara- tional lecture approach is inadequate for the task of pro- tion for hands-on labs and exercises. Commonly, lectures voking students’ thought-processes and systematically fail to motivate students to study before class, and there- engaging them in problem-solving during class. Peer in- fore the in-class learning experience is less effective than struction is an established pedagogy for addressing some it could be. The traditional lecture also lacks a reliable of the challenges of traditional lectures. For this paper, in-class mechanism to help an instructor ensure that stu- we developed 108 peer instruction questions for a dig- dents understand the concepts being taught. Although ital forensics curriculum, and evaluated a selection of the instructor may ask ad-hoc questions to stimulate dis- the questions by holding a condensed computer forensics cussion, this usually engages a few students and does not workshop for university students. The evaluation results provide feedback about every student in class. show that peer instruction helps students understand the targeted digital forensics concepts, and that 91% of stu- Peer instruction has emerged as a promising solution dents would recommend that other instructors use peer to enrich the in-class experience such that it yields bet- instruction. ter outcomes. It was first introduced by Eric Mazur in physics classrooms at Harvard University [2]. With peer 1 Introduction instruction, a lecture consists of a series of multiple- choice questions (a.k.a. ConcepTests) that are designed Digital forensics is defined as the application of scien- to provoke deep conceptual thinking in students and en- tific tools and methods to identify, collect, and analyze gage them in a meaningful discussion. Students are re- digital artifacts in support of legal proceedings [11]. It quired to study some reading material before coming to is a challenging field to teach effectively because of its class, which helps them to find correct answers to the positioning at the intersection of law and computer sci- peer instruction questions. Peer instruction has shown ence. For instance, the data acquired from a suspect com- to be effective in physics, computer science, and biology puter (a.k.a digital evidence) without a search warrant classrooms [7]. In particular in computer science, it has may prevent the evidence from being admitted in court. halved failure rates in four courses [7] and increased stu- Students need to achieve a clear understanding of im- dent retention in the computer science major [10]. portant principles of performing a forensic investigation Previously, peer instruction has not been explored as within the constraints put forth by state and federal law. a means for delivering a digital forensics curriculum. Often, forensic instructors combine traditional lectures We have developed 108 peer instruction questions for an with hands-on lab exercises to improve the student learn- introductory course on computer forensics. This paper ing outcomes. Typical exercises include solving cyber- discusses some of these questions as examples and ana- crime cases such as theft of credit card data and lyzes classifies with respect to two basic criteria. First, attacks. In our experience, forensic hands-on exercises what are the elements used in the questions to trigger ∗Corresponding author conceptual thinking processes of students such as “com- This work was supported by the NSF grant # 1500101 pare and contrast,” “deliberate ambiguity” and “trolling for misconception”? We use Beatty et al.’s concept trig- • The instructor then poses the same question to the gers [1] for the analysis. Second, how are the questions students again; the students reply to the question. presented, such as in a scenario, example, or diagram. • Depending on the answers, the instructor would The paper also presents the results of a digital foren- have the option to discuss the concept in further de- sic workshop utilizing peer instruction methodology and tail with the students to ensure a better understand- questions. The evaluation results show that most of the ing; or, if the students clearly understand the con- students find the use of clickers beneficial and the group cept at this point, the instructor may choose to con- discussion for a peer question helps the students under- tinue to the next question. stand a target concept. 91% students would recommend other instructors use peer instruction. The results also show a clear evidence of improvement in student learn- 2.2 Related Work ing that is measured through quizzes taken before and Peer instruction has not been widely adopted in the cy- after a topic is covered and peer instruction questions bersecurity classroom. Recently, Johnson et al. [4] de- replied to by the students before and after peer discus- veloped peer instruction questions for two courses: in- sions. troduction to computer security, and network penetration The rest of the discussion is organized as follows. Sec- testing. They developed a set of 172 peer instruction tion 2 discusses the peer instruction methodology and re- questions; however, these were not evaluated in a class- lated work. Section 3 presents the elements of a peer room setting. instruction question (i.e. concept triggers and question More broadly, there have been a number of prior ef- presentation) along with the steps to create a peer ques- forts to use peer instruction in the computer science tion. Section 4 discusses four examples of peer instruc- classroom. Porter et al. [8] perform a multi-institutional tion questions and identifies their concept triggers and study of the usage of peer instruction in seven instruc- question presentation types, followed by section 5 that tors’ introductory programming courses. Considering presents an analysis of 108 peer instruction questions re- instructors’ prior experience (or lack thereof) utilizing cently developed by the authors. Sections 6, 7, and 8 peer instruction, they primarily focus on student percep- present the peer instruction implementation in a digital tion of peer instruction using measurements such as per- forensic workshop, a list of peer instruction questions ceived question difficulty, question time allowed, discus- used in the workshop, and evaluation results, followed sion time allowed, and content difficulty. They obtain by conclusion in section 9. participants’ responses using surveys and note that at least 71% of students would recommend other instruc- 2 Background/Related Work tors to use peer instruction. 2.1 Peer Instruction Methodology Similarly, Porter et al. conduct a measurement of peer instruction across multiple small liberal arts colleges to Peer instruction pedagogy could be categorized under the measure the effectiveness of peer instruction in smaller general umbrella of a“flipped classroom” approach, but classes, using data from five instructors at three institu- the term “peer instruction” describes a specific protocol, tions [9]. They notice normalized gains in the same range and does not mean generally any use of peer discussion that of larger universities with students generally approv- or clickers. The protocol has three important elements. ing of the method and their performances. First, it divides a class lecture into a series of multiple- Esper [3] utilizes peer instruction in a software engi- choice conceptual questions. The questions focus on the neering course that has 189 students. She makes a modi- primary concepts an instructor wishes to teach. Second, fication in the standard peer instruction process in which it divides the students in class into a number of small a clicker question is initially shown without answers and groups for discussions. Third, it requires students to read both the students and instructor propose potential answer some material before coming to class. The material is choices with discussions of those answers. It is worth related to the topic covered in the class and enables the noting that the instructor does not mention whether an students to find answers of the questions. answer suggestion is correct or incorrect. The evaluation For each question, the following steps are involved: results show that 72% of the students would recommend the course instructor, with 28% not recommending due • A peer instruction question is posed to students. to the reasons such as correct answers are not given and The students use clickers to reply. clicker questions are unclear. • The instructor is able to view the results. If the Liao et al. [6] create models of in-class clicker ques- answers are mostly incorrect, the instructor then tions to predict low-performing students early in the encourages the students to discuss their answer term. They use a linear regression model to predict fi- choices with their group members. nal exam scores with approximately 70% accuracy in a twelve-week introductory computer science course. A peer instruction question on this concept can be cre- Lee et al. [5] examine the effectiveness of peer instruc- ated as follows: In order to access a file from a FAT tion in two upper-level computer science courses: The- file system, what information is absolutely necessary? a) ory of Computation and Computer Architecture and find Name and ending address of file content; b) Name, file the average normalized learning gains of 39%. size, and ending address of file content; c) Name and starting address of file content, d) File size and starting 3 Developing Peer Instruction Questions address of file content, e) None of the above (answer: c). Concept trigger. We use “none of the above” to A peer instruction question has two important elements: present a possibility that the provided choices may be in- concept trigger(s) and question presentation. A concept- correct. We also use “trap unjustified assumptions” by trigger is a hidden element in multiple choices of a ques- introducing file-size in choices–FAT file system’s data tion that is deliberately introduced in the question. The structures allow file clusters to be accessed similarly to hidden element triggers the students thought process and a linked-list, whereas the last cluster is marked as EOF helps them with useful peer discussion allowing them to thereby does not require file size to access a file. understand the target concept and further eliminate either Question presentation. This question is presented as incorrect choices or identify a correct choice. We utilize a feature question as it draws upon and asks about core concept-triggers from Beatty et al. [1]. Table 1 presents features of the FAT file system and its data structures. a list of concept-triggers used in our questions such as Use “none of the above”, Omit necessary information, 4.2 File Carving and Trap unjustified assumptions. File carving is an important concept in computer foren- A question presentation refers to how a question is sics. It is a method used to recover files when they are written or articulated. We use five presentation types: inaccessible through file system, such as deleted files, or scenario, example, definitional, diagram, and feature. A a corrupted file system. In many cases, the file content on concept may be comprised of multiple components or disk remains intact and can be recovered by file carving. features. If a question or its multiple choices target these A basic form of file carving assumes that a file blocks features, we classify the question as feature. The ques- are physically located in a sequence on disk and requires tion asks students to identify required features of a con- only the location of the first and last block of a file to cept in multiple choices, or a feature is provided in a recover the file content. It searches for file headers and question which asks students to identify its correspond- footers using predefined signatures to identify the first ing concept in the choices. and last blocks of files. This technique however, suffers To develop a question, an instructor should first in the presence file fragmentation that can cause a file’s thoughtfully select a target concept. Peer instruction blocks to be store non-contiguously. questions do take a significant amount of class time, and In addition, while defragmenting a drive places allo- so questions should focus on the most essential elements cated file blocks in contiguous pieces, the defragmenta- of the course. Second, the instructor should select a ques- tion process may overwrite blocks that have been marked tion presentation that is suitable to articulate the concept. by the file system as deallocated. Finally, the instructor should choose at least one concept A peer instruction question on basic file carving ad- trigger. A question can have multiple concept triggers dressing the above-stated challenges can be created as (examples are given in the next section). follows: In which of the following situations is file carv- ing most effective? a) The targeted drive is highly frag- 4 Examples of Peer Instruction Questions mented, b) The targeted drive has been recently defrag- 4.1 FAT32 Internals mented, c) The system being used to examine the drive has low free space, d) The system being used to exam- In the FAT32 file system, there are two primary data ine the drive has high free space, e) More than one of the structures used to maintain knowledge of data: the di- above (answer: d). rectory entry and the file allocation table. A directory Concept trigger. Deconstructing the question, we entry is present for every file and folder and contains the note that two triggers are used. First, as we provide the file name, file size, the address of the starting cluster of potential for multiple question choices in option (d), we a file’s content. The file allocation table is used to iden- use “identify a set or subset.” tify the next cluster in a file and the allocation status of To further assess a student’s understanding of the clusters. The final cluster of a file has a marking EOF in downsides of both fragmentation and defragmentation its FAT entry that indicates end of file. Therefore, in or- processes, we use “trolling for misconceptions”, as op- der to access a specific file from a FAT file system, the tion (b). The defragmentation presents an ideal layout of filename and starting cluster are necessary. contiguous data (and thus is an attractive option), how- Concept-trigger Name Brief Description Compare and contrast Compare multiple situations; draw conclusions from comparison Interpret representations Provide a situation that asks students to make inferences based upon the presented features Identify a set or subset Ask them to identify a set or subset fulfilling some criterion Strategize only Provide a problem; ask students to identify the best means of reaching a solution Omit necessary information Provide less information than is essential for answer; see if students realize this Use “none of the above” An option to learn alternative understandings; use it occasionally as a correct answer Qualitative questions Questions are about the concepts and relationships rather than numbers or equations Analysis and reasoning questions Questions require significant decision-making, hence promote significant discussion Trap unjustified assumptions Answer choices are facilitated by potential unjustified assumptions made by the students Deliberate ambiguity Use deliberate ambiguity in questions to facilitate discussion Trolling for misconceptions Attempt to trap students with answers that require common misconceptions to choose

Table 1: List of concept-triggers from Beatty et al. [1]

ever, it also raises a possibility that important data has set or subset” as option d). While each option is possible been lost. and may assist in locating references to USB drives, the Question presentation. This question does not use of the SYSTEM hive is the most efficient. SYSTEM specifically present a scenario, but provides examples of hive contains system-wide activities. If no reference to potential carving situations on both a target drive and an the drive is found in the hive, there is no need to search investigator’s machine. This question thus uses the “ex- through installed software or any user-specific locations. ample” presentation type. Question presentation. This is a scenario question. It provides a particular situation that a forensic investigator 4.3 Registry Forensics may face, and it asks for the best means to proceed. Forensics of the Microsoft provides significant details of activities on a computer. The reg- 4.4 Forensic Artifacts istry is used by the and applications Redaction of information is done generally through ob- as a database for storing configuration information. It scuration (with pens, opaque tape, etc.) or excision consists of hives (backed by files) and can hold infor- of the document through cutting the intended area out mation such as list of applications to run on user login, of the document. Improper redaction, like many other user password hashes, the last time particular USB de- processes (such as file deletion), can create unintended vices were plugged into the system, and the list of in- forensic artifacts. stalled software. In a registry hive, keys are similar to A peer instruction question on this concept can be cre- directory paths, values are analogous to file names, and ated as follows: Which of the following are examples of data is analogous to file content. The SYSTEM hive has a redaction? a) A company employee selectively covers key called USBSTOR that maintains a set of USB storage sentences from a press release with a black marker; b) A devices plugged into a system, holding their serial num- copy editor at a nonprofit adds information to a press re- bers with the last time the devices were inserted into the lease draft with a red pen; c) An inside attacker at a law system. firm cuts documents to physically remove appearances of A peer instruction question on registry can be created his name; d) Both a) and b); e) Both a) and c) (answer: as follows: A USB drive with an unknown owner is found e). in a corporate setting. How might a forensic investiga- The question checks whether students can readily tor typically determine whether that particular drive was identify examples that illustrate the core features of plugged into any given Windows machine? a) Examine redaction, i.e., actions that obscure, or remove, informa- all ntuser.dat files to determine if a user plugged it into tion. the machine; b) Check the SYSTEM registry hive to see Concept trigger. As the question is somewhat sim- if it was plugged into that machine; c) Check the SOFT- ple and looking for provided examples of the essential WARE registry hive to see if it has been used by any par- features of redaction, it uses “analysis and reasoning” as ticular piece of software; d) More than one of the above; students scan the answers for information that suggests e) None of the above (Answer: b). obfuscation or removal. Concept trigger. The question utilizes “use none of Question presentation. The question requires stu- the above” to point the students toward the possibility dents to demonstrate the understanding of the features of of each answer being potentially invalid. However, with the Redaction concept; it is therefore a “feature” ques- multiple potentially valid options, it also uses “identify a tion.

References Q-2: How is the filename “conf.ini” stored in a FAT file system that utilizes short filenames? [1] BEATTY,I.D.,GERACE,W.J.,LEONARD,W.J.,AND 1) conf.ini DUFRESNE, R. J. Designing effective questions for classroom 2) CONF.INI response system teaching. American Journal of Physics 74,1 3) CONFINI (2006), 31–39. 4) CONF INI Answer: CONF INI [2] CROUCH,C.H.,AND MAZUR, E. Peer Instruction: Ten years of experience and results. American Journal of Physics 69 (Sept. Q-3: FAT32 maintains a backup BIOS Parameter Block. 2001), 970–977. 1) True 2) False [3] ESPER, S. A discussion on adopting peer instruction in a course Answer: True focused on risk management. J. Comput. Sci. Coll. 29, 4 (Apr. 2014), 175–182. Q-4: How does a FAT file system denote the end of a file? 1) The number of the final cluster equaling the stored number of [4] JOHNSON,W.E.,LUZADER,A.,AHMED,I.,ROUSSEV,V., clusters (-1 for zero indexing) III, G. G. R., AND LEE, C. B. Development of peer instruction 2) A FAT entry marked EOF 2016 USENIX Work- questions for cybersecurity education. In 3) The final FAT entry for the file is marked ”NULL” shop on Advances in Security Education (ASE 16) (Austin, TX, 4) Each file is allocated the same initial space, and the first 2016), USENIX Association. ”NULL” entry in the file’s allocation table is the first cluster fol- [5] LEE,C.B.,GARCIA,S.,AND PORTER, L. Can peer instruction lowing the end of file be effective in upper-division computer science courses? Trans. Answer: A FAT entry marked EOF Comput. Educ. 13, 3 (Aug. 2013), 12:1–12:22.

[6] LIAO,S.N.,ZINGARO,D.,LAURENZANO,M.A.,GRIS- File Carving Quiz. WOLD,W.G.,AND PORTER, L. Lightweight, early identifica- tion of at-risk cs1 students. In Proceedings of the 2016 ACM Con- Q-5: Traditional carving uses these to find potential files: ference on International Computing Education Research (New 1) Known headers York, NY, USA, 2016), ICER ’16, ACM, pp. 123–131. 2) Known filenames 3) Allocated clusters following blocks of unallocated clusters [7] PORTER,L.,BAILEY LEE,C.,AND SIMON, B. Halving fail 4) Recovered file system metadata rates using peer instruction: A study of four computer science Answer: Known headers courses. In Proceeding of the 44th ACM Technical Symposium on Computer Science Education (New York, NY, USA, 2013), Q-6: Which of the following is a known issue with carving? SIGCSE ’13, ACM, pp. 177–182. 1) Fragmentation 2) Milestones [8] PORTER,L.,BOUVIER,D.,CUTTS,Q.,GRISSOM,S.,LEE, 3) Unprintable bytes in headers C., MCCARTNEY,R.,ZINGARO,D.,AND SIMON, B. A multi- Answer: Fragmentation institutional study of peer instruction in introductory computing. In Proceedings of the 47th ACM Technical Symposium on Com- Q-7: File carving could efficiently utilize distributed systems. puting Science Education (New York, NY, USA, 2016), SIGCSE 1) True ’16, ACM, pp. 358–363. 2) False Answer: True [9] PORTER,L.,GARCIA,S.,GLICK,J.,MATUSIEWICZ,A.,AND TAYLOR, C. Peer instruction in computer science at small liberal Windows Registry Quiz. arts colleges. In Proceedings of the 18th ACM Conference on Innovation and Technology in Computer Science Education (New Q-8: What are the primary registry files known as? York, NY, USA, 2013), ITiCSE ’13, ACM, pp. 129–134. 1) Hives 2) Keys [10] PORTER,L.,AND SIMON, B. Retaining nearly one-third more 3) Values majors with a trio of instructional best practices in cs1. In Pro- 4) Root files ceeding of the 44th ACM Technical Symposium on Computer Answer: Hives Science Education (New York, NY, USA, 2013), SIGCSE ’13, ACM, pp. 165–170. Q-9: What is the timestamp given to any registry key? 1) LastWriteTime [11] ROUSSEV,V. Digital Forensic Science: Issues, Methods, and 2) LastReadTime Challenges. Synthesis Lectures on Information Security, Privacy, 3) CreatedTime & Trust. Morgan & Claypool Publishers, 2016. Answer: LastWriteTime Q-10: Which of the following stores data in the registry? Appendices 1) Key 2) Value A Workshop Quizzes 3) Data 4) Hive Answer: Data

File Systems Quiz. Q-11: Where can user password hashes be found? 1) SYSTEM Q-1: What are the two primary data structures of FAT systems? 2) SECURITY 1) Directory entry, 3) SOFTWARE 2) Cluster entry, File allocation table 4) SAM 3) File entry, Quick allocation table 5) DEFAULT Answer: Directory entry, File allocation table Answer: SAM B Quiz for Pre-class Reading Material

1. Which of these is not a Windows registry root key? 1) HKEY LOCAL MACHINE 2) HKEY USERS 3) HKEY CURRENT CONFIG 4) HKEY ROOT USER 5) HKEY CLASSES ROOT Answer: HKEY ROOT USER 2. Which of the following utilizes a bitmap to denote cluster allo- cation? 1) FAT16 2) NTFS 3) FAT32 Answer: NTFS 3. Which of these carvers was built as a direct improvement to Fore- most? 1) Photorec 2) Scalpel 3) Binwalk 4) FTK carver 5) Magic Rescue Answer: Scalpel 4. Which of these features of the registry is most known for using ROT13 for encoding data? 1) Autorun 2) MRU 3) UserAssist 4) USBSTOR Answer: UserAssist 5. Which web browser utilizes the TypedURLs registry key? 1) Mozilla Firefox 2) 3) Google Chrome 4) Opera Browser Answer: Internet Explorer

C Survey on Workshop Opinion

Table 4: Workshop-specific opinions

From the point of helping me learn, the content of clicker questions was Much too hard Too hard OK Too easy Much too easy 0% 0% 100% 0% 0% In general, the instructor gave us enough time to read and understand the questions before the first vote. No, far too little time No, too little time OK amount of time Yes, too much time Yes, far too much time 0% 0% 89% 11% 0% Which of the following best describes your discussion practices in this group? I always discuss with I always discuss with I sometimes discuss, I rarely discuss, I I rarely discuss, I’m the group around me, the group around me, it depends don’t think I get a lot too shy it helps me learn I don’t really learn, out of it but I stay awake 78% 0% 22% 0% 0% The amount of time generally allowed for peer discussion was Much too short Too short About right Too long Much too long 0% 11% 89% 0% 0% In general, the time allowed for class-wide discussion (after the group vote) was Much too short Too short About right Too long Much too long 0% 11% 89% 0% 0% In general, it was helpful for the instructor to begin class-wide discussion by having students give an explana- tion. N/A - The instructor rarely did this It’s not helpful to hear other students’ It was helpful to hear other students’ explanations explanations 11% 0% 89% The professor explained the value of using clickers in this class. Not at all Somewhat, but I was still un- Yes, they explained it well Yes, they explained it too clear why we were doing it much 0% 11% 67% 22%