Trustworthy Whole-System Provenance for the Linux Kernel Adam Bates, Dave (Jing) Tian, Thomas Moyer Kevin R.B. Butler University of Florida MIT Lincoln Laboratory {adammbates,daveti,butler}@ufl.edu
[email protected] Abstract is presently of enormous interest in a variety of dis- In a provenance-aware system, mechanisms gather parate communities including scientific data processing, and report metadata that describes the history of each ob- databases, software development, and storage [43, 53]. ject being processed on the system, allowing users to un- Provenance has also been demonstrated to be of great derstand how data objects came to exist in their present value to security by identifying malicious activity in data state. However, while past work has demonstrated the centers [5, 27, 56, 65, 66], improving Mandatory Access usefulness of provenance, less attention has been given Control (MAC) labels [45, 46, 47], and assuring regula- to securing provenance-aware systems. Provenance it- tory compliance [3]. self is a ripe attack vector, and its authenticity and in- Unfortunately, most provenance collection mecha- tegrity must be guaranteed before it can be put to use. nisms in the literature exist as fully-trusted user space We present Linux Provenance Modules (LPM), applications [28, 27, 41, 56]. Even kernel-based prove- the first general framework for the development of nance mechanisms [43, 48] and sketches for trusted provenance-aware systems. We demonstrate that LPM provenance architectures [40, 42] fall short of providing creates a trusted provenance-aware execution environ- a provenance-aware system for malicious environments. ment, collecting complete whole-system provenance The problem of whether or not to trust provenance is fur- while imposing as little as 2.7% performance overhead ther exacerbated in distributed environments, or in lay- on normal system operation.