XML Access Control Using Static Analysis
Total Page:16
File Type:pdf, Size:1020Kb
XML Access Control Using Static Analysis Makoto Murata Akihiko Tozawa Michiharu Kudo IBM Tokyo Research Lab/IUJ IBM Tokyo Research Lab IBM Tokyo Research Lab Research Institute 1623-14, Shimotsuruma, 1623-14, Shimotsuruma, 1623-14, Shimotsuruma, Yamato-shi, Yamato-shi, Yamato-shi, Kanagawa-ken 242-8502, Kanagawa-ken 242-8502, Kanagawa-ken 242-8502, Japan Japan Japan [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION Access control policies for XML typically use regular path XML [5] has become an active area in database research. expressions such as XPath for specifying the objects for ac- XPath [6] and XQuery [4] from the W3C have come to be cess control policies. However such access control policies widely recognized as query languages for XML, and their are burdens to the engines for XML query languages. To implementations are actively in progress. In this paper, relieve this burden, we introduce static analysis for XML we are concerned with fine-grained (element- and attribute- access control. Given an access control policy, query expres- level) access control for XML database systems rather than sion, and an optional schema, static analysis determines if document-level or collection-level access control. We be- this query expression is guaranteed not to access elements lieve that access control plays an important role in XML or attributes that are permitted by the schema but hidden database systems, as it does in relational database systems. by the access control policy. Static analysis can be per- Some early experiences [21, 10, 3] with access control for formed without evaluating any query expression against an XML documents have been reported already. actual database. Run-time checking is required only when Access control for XML documents should ideally provide static analysis is unable to determine whether to grant or expressiveness as well as efficiency. That is, (1) it should be deny access requests. A nice side-effect of static analysis easy to write fine-grained access control policies, and (2) it is query optimization: access-denied expressions in queries should be possible to efficiently determine whether an ac- can be evaluated to empty lists at compile time. We have cess to an element or an attribute is granted or denied by built a prototype of static analysis for XQuery, and shown such fine-grained access control policies. It is difficult to ful- the effectiveness and scalability through experiments. fill both of these requirements, since XML documents have richer structures than relational databases. In particular, Categories and Subject Descriptors access control policies, query expressions, and schemas for XML documents are required to handle an infinite number H.2.7 [Database Management]: Database Administra- of paths, since there is no upper bound on the height of tion—Security, integrity, and protection; D.4.6 [Operating XML document trees. Systems]: Security and Protection—Access controls Existing languages (e.g. [21, 10]) for XML access con- trol achieve expressiveness by using XPath [6] as a simple General Terms and powerful mechanism for handling an infinite number of paths. For example, to deny accesses to name elements that Algorithms,Performance,Experimentation,Security,Theory are immediately or non-immediately subordinate to article elements, it suffices to specify a simple XPath expression Keywords //article//name as part of an access control policy. XML, XQuery, XPath, schema, automaton, access control, However, XPath-based access control policies are addi- query optimization, static analysis tional burdens for XML query engines. Whenever an el- ement or attribute in an XML database is accessed at run time, a query engine is required to determine whether or not this access is granted by the access control policies. Since such accesses are frequently repeated during query evalua- tion, naive implementations for checking access control poli- cies can lead to unacceptable performance. In this paper, we introduce static analysis as a new ap- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are proach for XML access control. Static analysis examines ac- not made or distributed for profit or commercial advantage and that copies cesscontrolpoliciesandqueryexpressionsaswellasschemas, bear this notice and the full citation on the first page. To copy otherwise, to if present. Unlike the run-time checking described above, republish, to post on servers or to redistribute to lists, requires prior specific static analysis does not examine actual databases. Thus, permission and/or a fee. static analysis can be performed at compile time (when a CCS’03, October 27–31, 2003, Washington, DC, USA. query expression is created rather than each time it is eval- Copyright 2003 ACM 1-58113-738-9/03/0010 ...$5.00. 73 uated). Run-time checking is required only when static anal- searchers have used automata (string automata or tree au- ysis is unable to grant or deny access requests without ex- tomata) for handling queries, schemas, patterns, or integrity amining the actual databases. In addition, static analysis constraints. Furthermore, recent works apply boolean op- facilitates query optimization, since access-denied XPath ex- erations (typically the intersection operation) to such au- pressions in queries can be rewritten as empty lists at com- tomata. These works include type checking (e.g., [20, 12]), pile time. In Section 5, we will demonstrate effectiveness of query optimization using schemas (e.g., [14]), query opti- static analysis using examples. mization using views (e.g., [27, 11, 23, 25, 30]), consistency The key idea for our static analysis is to use automata for between integrity constraints and schemas (e.g., [13]). Our representing and comparing queries, access control policies, static analysis uses similar techniques. However, to the best and schemas. Our static analysis has two phases. In the of our knowledge, our static analysis is the first application first phase, we create automata from queries, access con- of automata for XML access control. trol policies, and (optionally) schemas: (1) automata cre- XPath containment [11, 23, 25, 30] is similar to our static ated from queries, called query automata,representpaths analysis, since we compare XPath expressions for queries to elements or attributes as accessed by these queries; (2) and those for access control policies. However, denial rules those created from access control policies, called access con- (shown in Section 3) in access control policies require that trol automata, represent paths to elements or attributes as our static analysis apply the negation operation to automata exposed by these access control policies; and (3) those cre- and use both over- and under-estimation of access control ated from schemas, called schema automata,representpaths automata. to elements or attributes as permitted by these schemas. In the second phase, we compare these automata while apply- 1.2 Outline ing the following rules: (1) accesses by queries are always- The rest of this paper is organized as follows. After re- granted if the intersection of query automata and schema viewing the fundamentals of XML, schemas, XPath, and automata is subsumed by the access control automata; (2) XQuery in Section 2, we introduce access control policies they are always-denied if the intersection of query automata, for XML documents in Section 3. We introduce static anal- schema automata, and access control automata is empty; ysis in Section 4 and further demonstrate the effectiveness and (3) they are statically indeterminate,otherwise. and scalability of static analysis in Section 5. We discuss fu- ture extensions for handling value-based access control and 1.1 Related Works the advanced features of XPath and XQuery, and conclude in Section 6. Fine-grained access control for XML documents has been studied by many researchers [2, 21, 3, 10, 15]. Their access control policies are similar to ours. They all provide run- 2. PRELIMINARIES time checking of access control policies, but do not consider In this section, we introduce the basics of XML, schema static analysis. Their algorithms for run-time checking as- languages, XPath, and XQuery. sume that XML documents are in the main memory and can be examined repeatedly. 2.1 XML Access control for an RDBMS is driven by views, which An XML document consists of elements, attributes, and hide some information (typically attributes in relations) in text nodes. These elements collectively form a tree. The the RDBMS. Queries written by users do not access actual content of each element is a sequence of elements or text databases, but rather access these views. View-driven ac- nodes. An element has a set of attributes, each of which has cess control is typically efficient, since view queries and user a name and a value. We hereafter use ΣE and ΣA as a set queries are optimized together and then executed. In other of tag names and that of attribute names, respectively. To words, access control is provided partly by optimization at distinguish between the symbols in these sets, we prepend compile-time and partly by checking at run-time. ’@’ to symbols in ΣA. Object-oriented database systems (OODBMS) provide richer An XML document representing a medical record is shown structures than RDBMSs or XML. In fact, an OODBMS in Figure 1. This XML document describes diagnosis and provides network structures and class hierarchies. Access chemotherapy information for a certain patient. Several control frameworks for the OODBMShaveappearedinthe comments are inserted in this document. For the rest of literature [1, 28]. Such frameworks typically rely on run- this paper, we use this document as a motivating example. time analysis and do not use static analysis. Our static analysis for XML access control is made possi- 2.2 Schema ble by the tree-structured nature of XML. First, the schemas A schema is a description of permissible XML documents.