
IEEE 6th International Conference on Biometrics, BTAS 2013. Behavioral Biometric Verification of Student Identity in Online Course Assessment and Authentication of Authors in Literary Works John V. Monaco, John C. Stewart, Sung-Hyuk Cha, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains, NY 10606 difficult to duplicate [4, 10]. The keystroke biometric is a Abstract behavioral biometric, and most of the systems developed Keystroke and stylometry behavioral biometrics were previously have been experimental in nature. Nevertheless, investigated with the objective of developing a robust system there has been a long history of commercially unsuccessful to authenticate students taking online examinations. This implementations aimed at continuous recognition of a typist. work responds to the 2008 U.S. Higher Education While most previous work dealt with short input (passwords Opportunity Act that requires institutions of higher learning or short name strings) [1, 7, 14, 15, 16], some used long free undertake greater access control efforts, by adopting (arbitrary) text input [2, 8, 11, 13, 19, 20]. Free-text input identification technologies as they become available, to as the user continues typing allows for continuous assure that students of record are those actually accessing authentication [5, 12, 13, 17] which can be important in the systems and taking the exams in online courses. online exam applications [6, 19]. Performance statistics on keystroke, stylometry, and Stylometry is the study of determining authorship from combined keystroke-stylometry systems were obtained on the authors’ linguistic styles. Traditionally, it has been data from 30 students taking examinations in a university used to attribute authorship to anonymous or disputed course. The performance of the keystroke system was literary documents. More recently, computer-based 99.96% and 100.00%, while that of the stylometry system communication and digital documents have been the focus was considerably weaker at 74% and 78%, on test input of of research, sometimes with the goal of identifying 500 and 1000 words, respectively. To further investigate perpetrators or other malicious behavior. Recent computer the stylometry system, a separate study on 30 book authors studies have used stylometry to determine authorship of achieved performance of 88.2% and 91.5% on samples of emails, tweets, and instant messaging, in an effort to 5000 and 10000 words, respectively, and the varied authenticate users of the more commonly used digital performance over the population of authors was analyzed. media. A few studies have applied stylometry to the detection of intentional obfuscation or deceptive writing 1. Introduction style, and others to the detection of the author’s demographics [3]. Appendix A summarizes the prior The main application of interest in this study is verifying authorship attribution stylometry studies and lists the the identity of students in online examination environments, associated references. an application that is becoming more important with the There are several reasons keystroke and stylometry student enrollment of online classes increasing, and biometric applications are appealing. First, they are not instructors and administrations becoming concerned about intrusive to computer users. Second, they are inexpensive evaluation security and academic integrity. The 2008 since the only hardware required is a computer with federal Higher Education Opportunity Act (HEOA) requires keyboard. Third, text continues to be entered for potential institutions of higher learning to make greater access control repeated checking after an initial authentication phase, and efforts for the purposes of assuring that students of record this continuing verification throughout a computer session is are those actually accessing the systems and taking online referred to as dynamic verification [11]. exams by adopting identification technologies as they A number of measurements or features are generally used become more ubiquitous [9]. To meet the needs of this act, to characterize an individual. For the keystroke biometric the keystroke biometric seems appropriate for the student these measurements are typically key press duration (dwell) authentication process. Stylometry appears to be a useful times, transition (latency) times, and the identity of the keys addition to the process because the correct student may be pressed. Stylometry typically uses statistical linguistic keying in the test answers while a coach provides the features at the word and syntax level. answers with the student merely typing the coach’s words The current work addresses some of the limitations of without bothering to convert the linguistic style into his prior work on free-text biometric systems [20]. The current own. system has several unique aspects. First, it can collect raw Keystroke biometric systems measure typing keystroke data over the Internet as well as from a key logger characteristics believed to be unique to an individual and IEEE 6th International Conference on Biometrics, BTAS 2013. on an individual machine. Second, it focuses on free-text The 239 employed features include means and standard input where sufficient keystroke data are available to permit deviations of the timings of key press durations and the use of powerful statistical feature measurements – and transitions, and percent use of certain keys, grouped as the number, variety, and strength of the measurements used follows [20]: in the system are much greater than those used by earlier 78 duration features (39 means and 39 standard deviations) of systems reported in the literature. Third, it focuses on individual letter and non-letter keys, and of groups of letter applications using arbitrary text input because copy texts are and non-letter keys (Figure 1) unacceptable for most applications of interest. And, fourth, 70 type-1 transition features (35 means and 35 standard because of the statistical nature of the features and the use of deviations) of the transitions between letters or groups of letters, between letters and non-letters or groups thereof, arbitrary text input, special statistical procedures are between non-letters and letters or groups thereof, and between incorporated into the system to handle the paucity of data non-letters and non-letters or groups thereof (Figure 2) from infrequently used keyboard keys. 70 type-2 transition features (35 means and 35 standard Using an open biometric system approach, an earlier deviations) identical to the type-1 transition features except student authentication study was conducted on data obtained for the method of measurement (Figure 2) from students taking actual tests in a university course [19]. 19 percentage features that measure the percentage of use of In contrast, this paper presents a closed biometric system the non-letter keys and mouse clicks approach to classification that significantly increases the 2 keystroke input rates: the unadjusted input rate (total time to performance reported in the earlier study. Also, to further enter the text / total number of keystrokes and mouse events) analyze the stylometry component of the system, a separate and the adjusted input rate (total time to enter the text minus study on 30 book authors was undertaken to evaluate the pauses greater than ½ second / total number of keystrokes and mouse events) stylometry performance on text lengths ranging from 250 to 10000 words. Additionally, because the mean population performance does not give the complete picture, the varied All Keys performance over the population of users was analyzed on the book-author study. All Non Letters Right Letters The paper organization is as follows: section 2 describes Letters the system procedures, section 3 the student online testing Space Left studies, section 4 the stylometry study on short novels, and Letters Other Freq Next Shift Vowels Least Cons Freq Cons Punctuation Numbers section 5 the conclusion and suggestions for future work. Freq Cons e a o i u 2. Keystroke and Stylometry Systems . , ‘ Other m w y b g Other The keystroke and the stylometry systems consist of a t n s r h data collector, a feature extractor, and a pattern classifier. l d c p f The frontends of both systems, up through the feature Figure 1. Hierarchy tree for the 39 duration categories (each oval). extractor, were used from earlier studies, the keystroke frontend from [20] and the stylometry frontend from [19], Any-key/Any-key and these frontend systems are described only briefly below. A third combined keystroke-stylometry system simply concatenates the feature vectors from the first two systems. Non-letter/ A generic classification system operates on feature-vector Letter/Letter Non-letter/ Non-letter Right/Right Letter/ Letter Cons/ input from the keystroke, stylometry, or the combined Non-letter Punct/ Cons Space Right/Left system. This classification system was improved Space/ th Shift/ Shift nd Letter/ Letter significantly over those in the earlier mentioned studies and st Vowel/ Left/Right Space Cons Vowel/ Space/ is one of the important contributions of this study. Vowel Letter/ Letter Left/Left Punct The input system captures the keystroke timings and full an Cons/ Vowel ea input text in an XML file. The feature extractor parses each in Double Letters file creating both keystroke and stylometry feature vectors er on en he ti for later processing. es at or re 2.1. Keystroke System Figure 2. Hierarchy tree for the 35 transition
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-