System Call Analysis and Visualization

Total Page:16

File Type:pdf, Size:1020Kb

System Call Analysis and Visualization SYSTEM CALL ANALYSIS AND VISUALIZATION A Project Presented to the faculty of the Department of Computer Science California State University, Sacramento Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in Computer Science by Aditya Singh FALL 2018 © 2018 Aditya Singh ALL RIGHTS RESERVED ii SYSTEM CALL ANALYSIS AND VISUALIZATION A Project by Aditya Singh Approved by: __________________________________, Committee Chair Dr. Xiaoyan Sun __________________________________, Second Reader Dr. Jun Dai ____________________________ Date iii Student: Aditya Singh I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the project. __________________________, Graduate Coordinator ___________________ Dr. Jinsong Ouyang Date Department of Computer Science iv Abstract of SYSTEM CALL ANALYSIS AND VISUALIZATION by Aditya Singh Nowadays it is very widespread to see attacks in the system. The attackers try automated tools and programs to attempt and gain access to the data of the users. However, for attackers, it is hard to boycott system calls. System calls are used by the user-level processes to request the different services from the kernel of the operating system. It is very difficult for the attacks to evade the system calls. The system calls are used to make every basic interaction between the operating system and program. The system performs allocating and deallocating memory, closing, reading, renaming and the opening of files, and starting and stopping a process. The size of the system log can be overwhelmingly huge, which makes it hard for the system admins to extract the useful information from it. In this project, we propose to analyze and visualize the system calls so that it can help the system administrators to extract information from the log easily and identify suspicious activities and behavior. The steps in the project include data collection/gathering, data v exploration, data cleaning, data transformation, data mining, and data visualization. This approach helps to extract important information from the system calls by using data mining and machine learning algorithms. The statistics obtained through system call analysis and visualization provide valuable information about the system activities and reveal important patterns. This information and patterns can help identify suspicious behavior which might be related to attacks. _______________________, Committee Chair Dr. Xiaoyan Sun ________________________ Date vi DEDICATION To My Parents vii ACKNOWLEDGEMENTS I want to thank Dr. Xiaoyan Sun, for providing me with an opportunity to work on this project and guiding me throughout this project. She offered me great insight and believed in my abilities. I thank her for continually providing feedback and pushing my limits to improve my mistakes. I would also like to thank Dr. Jun Dai for his readiness in evaluating this report and providing helpful feedback. I would also like to thank the Department of Computer Science at California State University, Sacramento. I am grateful to my parents, friends, and elders for supporting me throughout this journey to complete the Master’s degree program. Also, I would like to thank for the continuous support and feedback from Preetham Dhondaley and Bhuvan Bhatia. viii TABLE OF CONTENTS Page Acknowledgements .......................................................................................................... viii List of Figures .................................................................................................................... xi List of Acronyms .............................................................................................................. xii Chapter 1. INTRODUCTION .......................................................................................................1 1.1 Research Motivation ...........................................................................................1 1.2 Related Work ......................................................................................................2 1.3 Our approach ......................................................................................................4 2. BACKGROUND .........................................................................................................5 2.1 Used Technologies/Tools ...................................................................................5 2.2 Machine Learning concepts ................................................................................8 3. DESIGN .....................................................................................................................10 4. DATA COLLECTION ..............................................................................................12 5. DATA PREPROCESSING .......................................................................................14 6. DATA ANALYSIS AND VISUALIZATION ..........................................................18 6.1 Start time and End time Distribution ................................................................19 6.2 System call vs. Start time Distribution .............................................................21 ix 6.3 PCMD and Start time Distribution ...................................................................24 6.4 Analysis using Machine learning ......................................................................26 6.4.1 K-means Clustering .....................................................................................27 6.4.2 K-means clustering with PCMD and Start time ..........................................30 6.4.3 Clustering with all the essential attributes ...................................................31 7 CONCLUSION ..........................................................................................................34 8 FUTURE WORK .......................................................................................................35 Bibliography ......................................................................................................................36 x LIST OF FIGURES Figures Page 1. ARFF format ........................................................................................................... 7 2. K-means clustering ................................................................................................. 9 3. Project Design ....................................................................................................... 10 4. Data Collection ..................................................................................................... 13 5. Data preprocessing ................................................................................................ 14 6. Transformed data .................................................................................................. 15 7. Code Snippet to create a csv file ........................................................................... 16 8. Start Time distribution .......................................................................................... 19 9. End time distribution............................................................................................. 19 10. System call vs. start time distribution ................................................................... 21 11. System call Count ................................................................................................. 23 12. PCMD table .......................................................................................................... 25 13. Clustering report ................................................................................................... 28 14. Visualization of cluster ......................................................................................... 29 15. Clustering report with PCMD and Start time ....................................................... 30 16. Clustering visualization with PCMD and Start time ............................................. 31 17. Visualization of Start time and PPID .................................................................... 32 18. Result of Start time and PPID clustering .............................................................. 32 xi LIST OF ACRONYMS WEKA: Waikato Environment for Knowledge Analysis ARFF: Attribution Relation File Format SODG: System Object Dependency Graph PCMD: Parallel Command SSL: Secure Sockets Layer SSH: Secure Shell Daemon XSS: Cross-Site Scripting CSV: Comma Separated Values PPID: Parent Process Identity PID: Process Identity xii 1 1. INTRODUCTION In the current world, attackers have been using different tools and technologies to gain access to the systems of an enterprise network. Analyzing system data becomes one of the most commonly used techniques to detect intrusions. Since system calls neutrally capture all system activities, including benign ones and malicious ones, analyzing system calls is a very effective way to detect attacks. However, due to the overwhelming amount of system calls that can be generated by systems, extracting useful information from system call logs is very challenging. Therefore, system call analysis and visualization are very important for efficient and effective detection of attacks. 1.1 Research Motivation The system call is an interface between the kernel and user programs. The kernel provides services
Recommended publications
  • Glibc and System Calls Documentation Release 1.0
    Glibc and System Calls Documentation Release 1.0 Rishi Agrawal <[email protected]> Dec 28, 2017 Contents 1 Introduction 1 1.1 Acknowledgements...........................................1 2 Basics of a Linux System 3 2.1 Introduction...............................................3 2.2 Programs and Compilation........................................3 2.3 Libraries.................................................7 2.4 System Calls...............................................7 2.5 Kernel.................................................. 10 2.6 Conclusion................................................ 10 2.7 References................................................ 11 3 Working with glibc 13 3.1 Introduction............................................... 13 3.2 Why this chapter............................................. 13 3.3 What is glibc .............................................. 13 3.4 Download and extract glibc ...................................... 14 3.5 Walkthrough glibc ........................................... 14 3.6 Reading some functions of glibc ................................... 17 3.7 Compiling and installing glibc .................................... 18 3.8 Using new glibc ............................................ 21 3.9 Conclusion................................................ 23 4 System Calls On x86_64 from User Space 25 4.1 Setting Up Arguements......................................... 25 4.2 Calling the System Call......................................... 27 4.3 Retrieving the Return Value......................................
    [Show full text]
  • The Linux Kernel Module Programming Guide
    The Linux Kernel Module Programming Guide Peter Jay Salzman Michael Burian Ori Pomerantz Copyright © 2001 Peter Jay Salzman 2007−05−18 ver 2.6.4 The Linux Kernel Module Programming Guide is a free book; you may reproduce and/or modify it under the terms of the Open Software License, version 1.1. You can obtain a copy of this license at http://opensource.org/licenses/osl.php. This book is distributed in the hope it will be useful, but without any warranty, without even the implied warranty of merchantability or fitness for a particular purpose. The author encourages wide distribution of this book for personal or commercial use, provided the above copyright notice remains intact and the method adheres to the provisions of the Open Software License. In summary, you may copy and distribute this book free of charge or for a profit. No explicit permission is required from the author for reproduction of this book in any medium, physical or electronic. Derivative works and translations of this document must be placed under the Open Software License, and the original copyright notice must remain intact. If you have contributed new material to this book, you must make the material and source code available for your revisions. Please make revisions and updates available directly to the document maintainer, Peter Jay Salzman <[email protected]>. This will allow for the merging of updates and provide consistent revisions to the Linux community. If you publish or distribute this book commercially, donations, royalties, and/or printed copies are greatly appreciated by the author and the Linux Documentation Project (LDP).
    [Show full text]
  • Practice Test Version 1.8 LPI 117-101: Practice Exam QUESTION NO: 1 CORRECT TEXT
    LPI 117-101 117-101 LPI 101 General Linux, Part I Practice Test Version 1.8 LPI 117-101: Practice Exam QUESTION NO: 1 CORRECT TEXT You suspect that a new ethernet card might be conflicting with another device. Which file should you check within the /proc tree to learn which IRQs are being used by which kernel drives? Answer: interrupts QUESTION NO: 2 How many SCSI ids for peripherals can SCSI-1 support? A. 5 B. 6 C. 7 D. 8 Answer: C Explanation: SCSI-1 support total 7 peripherals. There are several different types of SCSI devices. The original SCSI specification is commonly referred to as SCSI-1. The newer specification, SCSI-2, offers increased speed and performance, as well as new commands. Fast SCSI increases throughput to more than 10MB per second. Fast-Wide SCSI provides a wider data path and throughput of up to 40MB per second and up to 15 devices. There there are Ultra-SCSI and Ultra-Wide-SCSI QUESTION NO: 3 You need to install a fax server. Which type of fax/modem should you install to insure Linux compatibility? Test-King.com A. External Serial Fax/modem B. External USB Fax/modem C. Internal ISA Fax/modem D. Internal PCI Fax/modem Answer: A QUESTION NO: 4 You are running Linux 2.0.36 and you need to add a USB mouse to your system. Which of the following statements is true? "Welcome to Certification's Main Event" - www.test-king.com 2 LPI 117-101: Practice Exam A. You need to rebuild the kernel.
    [Show full text]
  • Control of Protein Orientation on Gold Nanoparticles † ∥ † § † ∥ ⊥ Wayne Lin, Thomas Insley, Marcus D
    Article pubs.acs.org/JPCC Control of Protein Orientation on Gold Nanoparticles † ∥ † § † ∥ ⊥ Wayne Lin, Thomas Insley, Marcus D. Tuttle, Lingyang Zhu, Deborah A. Berthold, Petr Kral,́, † ‡ # † Chad M. Rienstra,*, , , and Catherine J. Murphy*, † ‡ Department of Chemistry and Department of Biochemistry, University of Illinois at Urbana−Champaign, 600 South Matthews Avenue, Urbana, Illinois 61801, United States § School of Chemical Sciences, University of Illinois at Urbana−Champaign, 505 South Matthews Avenue, Urbana, Illinois 61801, United States ∥ ⊥ Department of Chemistry and Department of Physics, University of Illinois at Chicago, 845 West Taylor Street, Chicago, Illinois 60607, United States # Center for Biophysics and Computational Biology, University of Illinois at Urbana−Champaign, 607 South Matthews Avenue, Urbana, Illinois 61801, United States *S Supporting Information ABSTRACT: Gold nanoparticles (Au NPs) have attracted much attention due to their potential applications in nanomedicine. While numerous studies have quantified biomolecular adsorption to Au NPs in terms of equilibrium binding constants, far less is known about biomolecular orientation on nanoparticle surfaces. In this study, the binding of the protein α-synuclein to citrate and (16- mercaptohexadecyl)trimethylammonium bromide (MTAB)-coated 12 nm Au NPs is examined by heteronuclear single quantum coherence NMR spectroscopy to provide site-specific measurements of protein−nanoparticle binding. Molecular dynamics simulations support the orientation assignments, which
    [Show full text]
  • File System, Files, and *Tab /Etc/Fstab
    File system, files, and *tab File system files directories volumes, file systems mounting points local versus networked file systems 1 /etc/fstab Specifies what is to be mounted where and how fs_spec: describes block special device for remote filesystem to be mounted fs_file: describes the mount point fs_vfstype: describes the type of file system fs_mntops: describes the mount options associated with the filesystem 2 /etc/fstab cont. fs_freq: used by the dump command fs_passno: used by fsck to determine the order in which checks are done at boot time. Root file systems should be specified as 1, others should be 2. Value 0 means that file system does not need to be checked 3 /etc/fstab 4 from blocks to mounting points metadata inodes directories superblocks 5 mounting file systems mounting e.g., mount -a unmounting manually or during shutdown umount 6 /etc/mtab see what is mounted 7 Network File System Access file system (FS) over a network looks like a local file system to user e.g. mount user FS rather than duplicating it (which would be a disaster) Developed by Sun Microsystems (mid 80s) history for NFS: NFS, NFSv2, NFSv3, NFSv4 RFC 3530 (from 2003) take a look to see what these RFCs are like!) 8 Network File System How does this actually work? server needs to export the system client needs to mount the system server: /etc/exports file client: /etc/fstab file 9 Network File System Security concerns UID GID What problems could arise? 10 Network File System example from our raid system (what is a RAID again?) Example of exports file from
    [Show full text]
  • System Calls
    System Calls What are they? ● Standard interface to allow the kernel to safely handle user requests – Read from hardware – Spawn a new process – Get current time – Create shared memory ● Message passing technique between – OS kernel (server) – User (client) Executing System Calls ● User program issues call ● Core kernel looks up call in syscall table ● Kernel module handles syscall action ● Module returns result of system call ● Core kernel forwards result to user Module is not Loaded... ● User program issues call ● Core kernel looks up call in syscall table ● Kernel module isn't loaded to handle action ● ... ● Where does call go? System Call Wrappers ● Wrapper calls system call if loaded – Otherwise returns an error ● Needs to be in a separate location so that the function can actually be called – Uses function pointer to point to kernel module implementation Adding System Calls ● You'll need to add and implement: – int start_elevator(void); – int issue_request(int, int, int); – int stop_elevator(void); ● As an example, let's add a call to printk an argument passed in: – int test_call(int); Adding System Calls ● Files to add (project files): – /usr/src/test_kernel/hello_world/test_call.c – /usr/src/test_kernel/hello_world/hello.c – /usr/src/test_kernel/hello_world/Makefile ● Files to modify (core kernel): – /usr/src/test_kernel/arch/x86/entry/syscalls/syscall_64.tbl – /usr/src/test_kernel/include/linux/syscalls.h – /usr/src/test_kernel/Makefile hello_world/test_call.c ● #include <linux/linkage.h> ● #include <linux/kernel.h> ● #include
    [Show full text]
  • Filesystem Hierarchy Standard
    Filesystem Hierarchy Standard LSB Workgroup, The Linux Foundation Filesystem Hierarchy Standard LSB Workgroup, The Linux Foundation Version 3.0 Publication date March 19, 2015 Copyright © 2015 The Linux Foundation Copyright © 1994-2004 Daniel Quinlan Copyright © 2001-2004 Paul 'Rusty' Russell Copyright © 2003-2004 Christopher Yeoh Abstract This standard consists of a set of requirements and guidelines for file and directory placement under UNIX-like operating systems. The guidelines are intended to support interoperability of applications, system administration tools, development tools, and scripts as well as greater uniformity of documentation for these systems. All trademarks and copyrights are owned by their owners, unless specifically noted otherwise. Use of a term in this document should not be regarded as affecting the validity of any trademark or service mark. Permission is granted to make and distribute verbatim copies of this standard provided the copyright and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this standard under the conditions for verbatim copying, provided also that the title page is labeled as modified including a reference to the original standard, provided that information on retrieving the original standard is included, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this standard into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the copyright holder. Dedication This release is dedicated to the memory of Christopher Yeoh, a long-time friend and colleague, and one of the original editors of the FHS.
    [Show full text]
  • Advanced Bash-Scripting Guide
    Advanced Bash−Scripting Guide An in−depth exploration of the art of shell scripting Mendel Cooper <[email protected]> 2.2 31 October 2003 Revision History Revision 0.1 14 June 2000 Revised by: mc Initial release. Revision 0.2 30 October 2000 Revised by: mc Bugs fixed, plus much additional material and more example scripts. Revision 0.3 12 February 2001 Revised by: mc Another major update. Revision 0.4 08 July 2001 Revised by: mc More bugfixes, much more material, more scripts − a complete revision and expansion of the book. Revision 0.5 03 September 2001 Revised by: mc Major update. Bugfixes, material added, chapters and sections reorganized. Revision 1.0 14 October 2001 Revised by: mc Bugfixes, reorganization, material added. Stable release. Revision 1.1 06 January 2002 Revised by: mc Bugfixes, material and scripts added. Revision 1.2 31 March 2002 Revised by: mc Bugfixes, material and scripts added. Revision 1.3 02 June 2002 Revised by: mc 'TANGERINE' release: A few bugfixes, much more material and scripts added. Revision 1.4 16 June 2002 Revised by: mc 'MANGO' release: Quite a number of typos fixed, more material and scripts added. Revision 1.5 13 July 2002 Revised by: mc 'PAPAYA' release: A few bugfixes, much more material and scripts added. Revision 1.6 29 September 2002 Revised by: mc 'POMEGRANATE' release: some bugfixes, more material, one more script added. Revision 1.7 05 January 2003 Revised by: mc 'COCONUT' release: a couple of bugfixes, more material, one more script. Revision 1.8 10 May 2003 Revised by: mc 'BREADFRUIT' release: a number of bugfixes, more scripts and material.
    [Show full text]
  • Oracle® Linux 7 Managing File Systems
    Oracle® Linux 7 Managing File Systems F32760-07 August 2021 Oracle Legal Notices Copyright © 2020, 2021, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract.
    [Show full text]
  • Demarinis Kent Williams-King Di Jin Rodrigo Fonseca Vasileios P
    sysfilter: Automated System Call Filtering for Commodity Software Nicholas DeMarinis Kent Williams-King Di Jin Rodrigo Fonseca Vasileios P. Kemerlis Department of Computer Science Brown University Abstract This constant stream of additional functionality integrated Modern OSes provide a rich set of services to applications, into modern applications, i.e., feature creep, not only has primarily accessible via the system call API, to support the dire effects in terms of security and protection [1, 71], but ever growing functionality of contemporary software. How- also necessitates a rich set of OS services: applications need ever, despite the fact that applications require access to part of to interact with the OS kernel—and, primarily, they do so the system call API (to function properly), OS kernels allow via the system call (syscall) API [52]—in order to perform full and unrestricted use of the entire system call set. This not useful tasks, such as acquiring or releasing memory, spawning only violates the principle of least privilege, but also enables and terminating additional processes and execution threads, attackers to utilize extra OS services, after seizing control communicating with other programs on the same or remote of vulnerable applications, or escalate privileges further via hosts, interacting with the filesystem, and performing I/O and exploiting vulnerabilities in less-stressed kernel interfaces. process introspection. To tackle this problem, we present sysfilter: a binary Indicatively, at the time of writing, the Linux
    [Show full text]
  • Name Synopsis Description Options
    UMOUNT(8) System Administration UMOUNT(8) NAME umount − unmount file systems SYNOPSIS umount −a [−dflnrv][−t fstype][−O option...] umount [−dflnrv]{directory|device}... umount −h|−V DESCRIPTION The umount command detaches the mentioned file system(s) from the file hierarchy. A file system is spec- ified by giving the directory where it has been mounted. Giving the special device on which the file system livesmay also work, but is obsolete, mainly because it will fail in case this device was mounted on more than one directory. Note that a file system cannot be unmounted when it is ’busy’ - for example, when there are open files on it, or when some process has its working directory there, or when a swap file on it is in use. The offending process could evenbe umount itself - it opens libc, and libc in its turn may open for example locale files. Alazy unmount avoids this problem, but it may introduce another issues. See −−lazy description below. OPTIONS −a, −−all All of the filesystems described in /proc/self/mountinfo (or in deprecated /etc/mtab) are un- mounted, except the proc, devfs, devpts, sysfs, rpc_pipefs and nfsd filesystems. This list of the filesystems may be replaced by −−types umount option. −A, −−all−targets Unmount all mountpoints in the current namespace for the specified filesystem. The filesystem can be specified by one of the mountpoints or the device name (or UUID, etc.). When this option is used together with −−recursive,then all nested mounts within the filesystem are recursively un- mounted. This option is only supported on systems where /etc/mtab is a symlink to /proc/mounts.
    [Show full text]
  • Managing Network File Systems in Oracle® Solaris 11.4
    Managing Network File Systems in ® Oracle Solaris 11.4 Part No: E61004 August 2021 Managing Network File Systems in Oracle Solaris 11.4 Part No: E61004 Copyright © 2002, 2021, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction, duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle data, is subject to the rights and limitations specified in the license contained in the applicable contract.
    [Show full text]