Relational Access to Unix Kernel Data Structures

Relational access to Unix kernel data structures Marios Fragkoulis Diomidis Spinellis Panos Louridas Angelos Bilas Athens University of Athens University of Athens University of University of Crete and Economics and Business Economics and Business Economics and Business FORTH-ICS [email protected] [email protected] [email protected] [email protected] Abstract We argue that a relational interface can offer complementary advantages; this is our motivation behind PiCO QL. Var- State of the art kernel diagnostic tools like DTrace and Sys- temtap provide a procedural interface for expressing analy- ious imperative programming languages offer relational in- sis tasks. We argue that a relational interface to kernel data terfaces; some have become very popular [21, 25]. Previous structures can offer complementary benefits for kernel diag- work proposes relational interfaces in place of imperative nostics. ones for performing validation tasks at the operating system This work contributes a method and an implementation level [14, 29]. Windows-based operating systems provide for mapping a kernel’s data structures to a relational inter- the Windows Management Instrumentation (WMI) infras- tructure [33] and the WMI Query Language (WQL), which face. The Pico COllections Query Library (PiCO QL) Linux kernel module uses a domain specific language to define adopts a relational syntax, to provide access to management a relational representation of accessible Linux kernel data data and operations. structures, a parser to analyze the definitions, and a com- Extending operating system kernels with high level tools [4, 7] is a growing trend. PiCO QL is such a tool used for per- piler to implement an SQL interface to the data structures. forming kernel diagnostic actions. Our work contributes: It then evaluates queries written in SQL against the kernel’s data structures. PiCO QL queries are interactive and type safe. • a method for creating an extensible relational model of Unlike SystemTap and DTrace, PiCO QL is less intrusive because it does not require kernel instrumentation; instead it Unix kernel data structures (Sections 2.1 and 2.2), hooks to existing kernel data structures through the mod- • a method to provide relational query evaluation on Linux ule’s source code. PiCO QL imposes no overhead when idle kernel data structures (Section 2.3), and and needs only access to the kernel data structures that con- • a diagnostic tool to extract relational views of the kernel’s tain relevant information for answering the input queries. state at runtime (Section 3). Each view is a custom image We demonstrate PiCO QL’s usefulness by presenting of the kernel’s state defined in the form of an SQL query. Linux kernel queries that provide meaningful custom views of system resources and pinpoint issues, such as security A Unix kernel running PiCO QL provides dynamic analy- vulnerabilities and performance problems. sis of accessible data structures with the following features. Categories and Subject Descriptors D2.5 [Testing and Debugging]: Diagnostics SQL SELECT queries: Queries conform to the SQL92 standard [8]. Keywords Unix, kernel, diagnostics, SQL Type safety: Queries perform checks to ensure data struc- 1. Introduction ture types are used in a safe manner. PiCO QL does not affect the execution of kernel code, because it does not Kernel diagnostic tools like DTrace [7] and SystemTap [27] require its modification. provide a procedural interface for diagnosing system issues. Low performance impact: PiCO QL’s presence in the Unix kernel does not affect system performance. In addition, Permission to make digital or hard copies of all or part of this work for personal or query execution consumes only the necessary kernel re- classroom use is granted without fee provided that copies are not made or distributed sources, that is, data structures whose relational represen- for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM tations appear in the query. Thus, PiCO QL minimizes its must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, impact on kernel operation. Section 4.2 presents a quan- to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. titative evaluation of this impact. EuroSys ’14, April 13–16, 2014, Amsterdam, The Netherlands. Copyright c 2014 ACM 978-1-4503-2704-6/14/04. $15.00. Relational views: To reuse queries efficiently and store http://dx.doi.org/10.1145/2592798.2592802 important queries, standard relational non-materialized views can be defined in the PiCO QL Domain Specific Our method defines three entities: data structures, data Language (DSL). structure associations, and virtual relational tables. Specif- ically, it provides a relational representation of data struc- Unix kernel data structures include C structs, linked lists, tures and data structure associations in the form of virtual arrays, and unions to name but a few [2, 6]. In the con- relational tables. Data structures can be unary instances or text of this paper, and unless otherwise noted, the term data containers grouping multiple instances. structure refers in general to Unix kernel data structures. Al- Our method is expressive; it can represent has-a associ- though PiCO QL’s implementation targets the Linux kernel, ations, many-to-many associations, and object-oriented fea- the relational interface is equally applicable to other versions tures like inheritance and polymorphism [10]. In this paper of Unix featuring loadable kernel modules and the /proc file we exemplify the representation of has-a associations, since system. this is the widespread type of associations in the kernel. We evaluate our tool by presenting SQL queries, which di- agnose interesting effects in two domains that concern a system’s operation, namely security and performance. We find that our tool can identify security vulnerabilities and perfor- 2.1.1 Has-a associations mance problems with less effort than other approaches. We Has-a associations are of two types: has-one and has-many. present several use cases to describe our tool’s contribution To represent data structures as first normal form relations, we to the kernel diagnostics tool stack. normalize has-a associations between data structures, that is we define a separate relational representation for them. For 2. A relational interface to the Unix kernel instance Figure 1(a) shows a simplified kernel data struc- data structures ture model for the Linux kernel’s files, processes, and vir- Our method for mapping the kernel’s data structures to a re- tual memory. Figure 1(b) shows the respective virtual ta- lational interface addresses two challenges: first, how to pro- ble schema. There, a process’s associated virtual memory structure has been represented in an associated table. The vide a relational representation for the kernel’s data struc- same applies for a process’s open files, a has-many associa- tures; second, how to evaluate SQL queries to these data structures. The key points of the design that address these tion. Notably, this normalization process is only required for has-many challenges include: associations. In the same schema, the associated files struct has been included within the process’s represen- • rules for providing a relational representation of the ker- tation. In addition, the structure fdtable has been included in nel’s data structures, its associated files struct and it is also part of the process rep- • a domain specific language (DSL) for specifying rela- resentation; each member of fdtable and files struct occupies tional representations and required information about a column in Process VT. By allowing to represent a has-one data structures, and association as a separate table or inside the containing instance’s table, the relational representation flexibly expands • the leveraging of SQLite’s virtual table hooks to evaluate or folds to meet representation objectives. SQL queries. Combining virtual tables in queries is achieved through 2.1 Relational representation of data structures join operations. In table Process VT foreign key column fs fd file id identifies the set of files that a process retains The problem of mapping a program’s in-memory data structures to a relational representation, and vice versa, has been open. A process’s open file information can be retrieved by studied thoroughly in the literature [28]. We address this specifying in a query a join to the file table (EFile VT). By problem from a different angle: we provide a relational in- specifying a join to the file table, an instantiation happens. The instantiation of the file table is process specific; it con- terface to data structures without storing them in a relational tains the open files of a specific process only. For another database management system; the issue at hand is not the transformation of data from procedural code data structures process another instantiation would be created. Thus, multi- EFile VT to relational structures, but the representation of the same ple potential instances of exist implicitly, as Fig- data in different models. In other words, we do not trans- ure1(b) shows. form the data model used for procedural programming; instead we want to provide a relational view on top of it. Issues that emerge in the Object-Relational mapping problem are 2.2 Domain Specific Language design not relevant to our work. Our method’s DSL includes struct view definitions that de- To provide a relational representation we must solve the scribe a virtual table’s columns, virtual table definitions that representation mismatch between relations and procedural link a struct view definition to a data structure type, lock programming data structures. Relations consist of a set of directive definitions to leverage existing locking facilities columns that host scalar values, while data structures form when querying data structures, and standard relational view graphs of arbitrary structure.

Relational Access to Unix Kernel Data Structures

Katalog Elektronskih Knjiga

Printed Program

A Dataset of Open-Source Safety-Critical Software⋆

Curriculum Vitae

Design Principles and Patterns for Computer Systems That Are

Digium Analog Gateway EULA

Redundancy and Access Permissions in Decentralized File Systems

Truenas X10 Was Released and Steve Wong Wrote About It

June 2006 (PDF)

Delft University of Technology an Empirical Evaluation of Feedback

Technologies for Main Memory Data Analysis

Form, Forces, and Lessons of Unix Architectural Evolution