Online Deception Detection Using BDI Agents Richard Alan Merritts Nova Southeastern University, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
Nova Southeastern University NSUWorks CEC Theses and Dissertations College of Engineering and Computing 2013 Online Deception Detection Using BDI Agents Richard Alan Merritts Nova Southeastern University, [email protected] This document is a product of extensive research conducted at the Nova Southeastern University College of Engineering and Computing. For more information on research and degree programs at the NSU College of Engineering and Computing, please click here. Follow this and additional works at: https://nsuworks.nova.edu/gscis_etd Part of the Computer Sciences Commons Share Feedback About This Item NSUWorks Citation Richard Alan Merritts. 2013. Online Deception Detection Using BDI Agents. Doctoral dissertation. Nova Southeastern University. Retrieved from NSUWorks, Graduate School of Computer and Information Sciences. (244) https://nsuworks.nova.edu/gscis_etd/244. This Dissertation is brought to you by the College of Engineering and Computing at NSUWorks. It has been accepted for inclusion in CEC Theses and Dissertations by an authorized administrator of NSUWorks. For more information, please contact [email protected]. Online Deception Detection Using BDI Agents by Richard A. Merritts A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Information Systems Graduate School of Computer and Information Sciences Nova Southeastern University 2013 ii An Abstract of a Dissertation Submitted to Nova Southeastern University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Online Deception Detection Using BDI Agents by Richard A. Merritts September 2013 This research has two facets within separate research areas. The research area of Belief, Desire and Intention (BDI) agent capability development was extended. Deception detection research has been advanced with the development of automation using BDI agents. BDI agents performed tasks automatically and autonomously. This study used these characteristics to automate deception detection with limited intervention of human users. This was a useful research area resulting in a capability general enough to have practical application by private individuals, investigators, organizations and others. The need for this research is grounded in the fact that humans are not very effective at detecting deception whether in written or spoken form. This research extends the deception detection capability research in that typical deception detection tools are labor intensive and require extraction of the text in question following ingestion into a deception detection tool. A neural network capability module was incorporated to lend the resulting prototype Machine Learning attributes. The prototype developed as a result of this research was able to classify online data as either “deceptive” or “not deceptive” with 85% accuracy. The false discovery rate for “deceptive” online data entries was 20% while the false discovery rate for “not deceptive” was 10%. The system showed stability during test runs. No computer crashes or other anomalous system behavior were observed during the testing phase. The prototype successfully interacted with an online data communications server database and processed data using Neural Network input vector generation algorithms within seconds. iii Acknowledgements I would like to express my gratitude to Dr. James Cannady for providing excellent guidance as my Dissertation Chair. I would also like to extend my gratitude to Dr. Sumitra Mukherjee and Dr. George Thurmond, II for their support and encouragement as members of the Dissertation Committee. All worked as a team to influence my activities and guide the research process in the right direction from start to finish. I would also like to thank my family for their support and especially Karen. The dissertation process can be very stressful and strong family encouragement and enthusiasm to complete the program are key ingredients for success. Special thanks to Aynsley and Richard, Jr. for making me laugh. iv Table of Contents Abstract iii Acknowledgements iv List of Tables vii List of Figures viii Chapters 1. Introduction 6 Background 6 Problem Statement 7 Dissertation Goal 7 Relevance and Significance 8 Barriers and Issues 8 Limitations 9 Delimitations 10 Definitions of Terms 10 Summary 11 2. Review of the Literature 12 3. Methodology 43 Overview of the Research Methodology 43 Specifics of the Research 46 Format for Results 68 Resource Requirements 69 Summary 70 4. Results 71 Data Analysis 71 Findings 79 Summary of Results 81 v 5. Conclusions, Implications, Recommendations and Summary 84 Conclusions 84 Strengths 86 Weaknesses 87 Limitations 88 Implications 88 Recommendations 91 Summary 93 Appendices A. Weblog Data Entry Instructions 102 B. Performance Test Results 104 C. Neural Network File Division Procedures 106 D. Neural Network Validation Results 108 E. Neural Network Test Plan, Test Procedures and Test Results 110 F. ARFF Training File Contents 118 G. ARFF Testing File Contents 122 References 124 vi List of Tables Tables 1. ART Definitions 37 2. Classification Validation Results 75 3. Classification Test Results 76 4. Test Results “weblogentry” Database Table Backup 104 5. Test Results “vectors.csv” File Creation 104 6. Test Results backup “vectors.csv” File Creation 105 7. Test Results “vectors.csv” File Deletion 105 vii List of Figures Figures 1. Prototype Architecture and Working Scenario 54 2. Depiction of Prototype Development and Test Process 66 3. Formulae for Deception Detection Accuracy and False Discovery 69 4. WEKA GUI 77 5. “Not Deceptive” False Discovery Rate Formula 78 viii 6 Chapter 1 Introduction Background The Internet is often used by people involved in criminal, terror, fraud, harassment and other malicious conduct (Boongoen, & Shen, 2009). Deception in online data communications is commonly used by such persons as a precursor to the commission of these acts. The word deception can be described as information that falsely represents a fact and is intended to mislead the person to whom it is presented. Because online data has the persistence quality, deceptive information contained therein has the ability to deceive many persons over its lifetime. This quality remains until the deception is discovered and eliminated. If unchecked, online data communications deception persistence allows repeated deception over a broad scope of people and entities that encounter it. Verbal deception is only effective as long as the source of the untruth continues to tell the lies. If others continue the deception it becomes rumor and loses its effect. Deceptive data communications can influence individual’s intent on relying on Internet based information for decision making to make poor decisions based on faulty information or become victims of Internet crimes (Jensen, Burgoon, & Nunamaker, 2010). Research conducted in an effort to protect against deception in online data communications is gaining in importance as people and organizations fall victim to Internet based malicious activities more and more. The ability to detect deception rapidly may serve to slow or disallow the dissemination of deceptive information. Rapidly refers 7 to the amount of time that an individual needs to determine whether a deception has occurred using automated tools. The U.S. Government has a keen interest in identifying deception in online data communications as part of its Information Operations (IO) initiative. IO is defined by the U.S. Government below: Information operations (IO) are described as the integrated employment of electronic warfare (EW), computer network operations (CNO), psychological operations (PSYOP), military deception (MILDEC), and operations security (OPSEC), in concert with specified supporting and related capabilities, to influence, disrupt, corrupt, or usurp adversarial human and automated decision making while protecting our own (U.S. Department of Defense, Joint Chiefs of Staff, 2006, p. ix). Online data communications deception and its discovery falls within the CNO subcategory. Problem Statement Current developments in deception detection in data communications environments are ineffective due to their time intensive data extraction processes and inefficient approach to processing data which is not suitable for deception detection in real time data communications environments. Automated linguistic and other indicators of deception have not been developed and deployed in online communications environments (Zhou & Zhang, 2008). Dissertation Goal The research goal is to create capabilities which autonomously and automatically detect deception in online data communications systems. The capabilities developed will 8 detect online data communications deception to realize this goal. To measure accuracy, evidence of a deception will be uncovered in text entered by the user. Relevance and Significance Deception in online data communications will continue to increase if automated deception detection systems are not developed to combat it. All people and organizations are susceptible to online data communications deception resulting in criminal activity and more will be victimized as the Internet continues to grow. For example, nearly one billion pounds ($1.58 billion US) have been lost due to theft by criminals using online deception as a means to commit crimes in the United Kingdom (Boongoen & Shen, 2009). Exacerbating this problem is the fact that some online data communications are completely harmless white lies,