On the Accuracy Evaluation of Access Control Policies in a Social Network

2020 International Conference on Computational Science and Computational Intelligence (CSCI) On the Accuracy Evaluation of Access Control Policies in a Social Network Jedidiah Yanez-Sierra Arturo Diaz-Perez Victor Sosa-Sosa CINVESTAV - Tamaulipas CINVESTAV - Guadalajara CINVESTAV - Tamaulipas Cd. Victoria, Mexico Guadalajara, Mexico Cd. Victoria, Mexico [email protected] [email protected] [email protected] Abstract—Access control policies are mandatory for organi- special role when potential disclosure restricts the availability zations whose operation involves sharing resources that must of the original data. Data collections have been used to produce be kept private. In this paper, we address the problem of synthetic versions of datasets when no real information is evaluating the accuracy of access control policies distributed in an interaction network modeled from a social network. Such available. network models granted access to documents owned by a large In 1993 Rubin proposed a multiple imputation framework set of users. Since denied accesses are not included in the input for synthetic data. Further contributions by Raghunathan et al. network, we discuss a method based on Network Science to [5], provided a detailed methodology for making inferences include complementary edges to have an approximate evaluation from synthetic data. The authors simulated multiple copies of of ACPs’ accuracy. The synthetic interactions allow the evaluation of ACPs by assessing the explicit and implicit intentions of the population and release a random sample from each of these the owners. We present an evaluation strategy to measure the synthetic populations. Each synthetic dataset depicts the target accuracy of the generated ACPs. The results will be of interest population based on the collected data. Similarly, Penny et al. to academics who want to synthesize information for similar [6] evaluate the use of hierarchical Bayes imputation models phenomena. for creating synthetic categorical data. Index Terms—Access Control Policies, Complex Networks, Synthesize Information Benedetto et al. [7], in partnership with the U.S. Census Bureau, report the creation of a partially synthetic Census Bureau data product called the SIPP Synthetic Beta (SSB). I. INTRODUCTION The SSB has been extensively tested, looking for analytic Two critical types of failures related to access control validity over the years as new versions have been released. decisions are accesses that should be denied but are not or Similarly, Snoke et al. [8] evaluate and recommend methods accesses that should be allowed but are not. Naturally, the to judge whether synthetic data have a distribution that is consequences of these failures have different impacts on the comparable to that of the original data. They also evaluate security and privacy of the information [1, 2]. the extension of existing global and specific measures of When users or organizations need to guarantee the security utility and perform comparisons for data generated by different and privacy of their information, it is essential to identify and methods of synthesis. reduce possible failures related to low restrictive or highly As can be observed, works in the literature focuses on over-adjusted access control policies (ACPs) [3, 4]. generating synthetic datasets that accurately model the original From a social network the set of interactions between a large data, but there is a gap when trying to complete a dataset set of users sharing information, complex network techniques modeling the opposite behavior to the real one. can be applied to model a network of interactions, where In this paper, we present a methodology based on Network vertices depict users (owners and consumers), and edges depict Science to comprehensively address the problem of evaluating the explicit permissions between owners and consumers to the accuracy of access control policies distributed in an interact with their content. In this scenario, the ACP assigned interaction network modeled from a social network, which to that interaction can: (a) preserve or (b) break the accesses. indicates explicit access to documents owned by a large set However, in an interaction-based graph, as well as in many of users. Since denied accesses are not included in the input other social network phenomena, only one type or part of the network, we discuss a method based on Network Science to information is modeled or available. In the above scenario, include complementary edges in order to have an approximate only explicit access permissions are reflected. Therefore, it evaluation of ACPs’ accuracy. Our method takes advantage is necessary to define a mechanism that allows enhancing of the underlying information discovered when modeling the the information, to evaluate whether the explicit permissions interactions between the documents and the users as a complex continue to be preserved and if the implicit permissions are network. kept. We have tested our approach with a real dataset from the There is a need for synthetic data generation methods to Instagram social network. Evaluating the accuracy of the ACPs perform proper inferences from them. Particularly, they play a associated with the users, the results showed that the proposed 978-1-7281-7624-6/20/$31.00 ©2020 IEEE 244 DOI 10.1109/CSCI51800.2020.00048 method allows us to adequately limit the number of synthetic Predicted/Classified Class edges to be added to the graph. Furthermore, it is shown that Negative Positive Negative True Negative (TN) False Positive (FP) Actual the synthetic edges adequately depict the opposite meaning to Positive False Negative (FN) True Positive (TP) the real edges since they had an impact of up to 4% when considering the synthetic edges. Fig. 1: Illustrative example of a binary confusion matrix. In Section II we provide the background of the proposed method. Section III describes the proposed methodology to Once defined the positive and negative possible outcomes, generate synthetic interactions. Section IV describes the pro- the accuracy can be defined as the ratio of the correctly posed evaluation methodology. Section V describes the ex- classified inputs to the whole dataset. Formally, the accuracy perimental evaluation and the achieved results. Finally, some is defined as follows. conclusions are discussed in Section VI. TP + TN Accuracy = (1) TP + TN + FP + FN II. BACKGROUND III. METHOD TO COMPLEMENT THE USER INTERACTIONS Complex networks show properties that only emerge when modeling real massive phenomena, and in some human-made Given the lack of real datasets having all the necessary systems such as the internet network. Complex networks have cases to evaluate social phenomena, here we describe the been studied due to the particular characteristics they present proposed methodology based on network science to synthesize in comparison to other types of networks. Characteristics as complementary interactions that model implicit denied access small-world effect, clustering, degree distribution, community in a network of interactions. structure, have been widely used to understand the modeled The proposed methodology consists of two steps, synthesis phenomenon [9]. of a large set of possible complementary interactions followed An interaction network is a graph G(VG,EG) that models by sampling a representative set of them. The set of ACPs a large set of interactions between documents and the users associated with the users must preserve the real accesses; who use them. The set of vertices depict users (owners and however, given the nature of the information, there is a consumers), and the edges depict the explicit permissions possibility that ACPs do not fulfill the original authorizations between owners and consumers to interact with their content. of the owners. The ACPs could: preserve or break the real The interaction network can be processed through complex interactions. network techniques to generate, based on the underlying Furthermore, in an interaction-based graph, only explicit information its partition from two axes, horizontal and vertical. access permissions are reflected. Therefore, it is necessary to Horizontal partitioning groups users into communities and define a mechanism that allows enhancing the information, not sub-communities, while a hierarchy of consumers and owners, only to evaluate whether the explicit permissions continue to depicted by k-shells, is created on the vertical axis. be preserved (TP, FN) but also if the implicit permissions (TN, Each vertex in the interaction graph has an associated ACP, FP) are kept. which defines the consumers who can access the content of As a result of evaluating the ACPs, there are only two an owner. By using the interaction graph, the accuracy of possible outputs, the policy allows or denies access. TP and the set of associated ACPs can be evaluated by analyzing FN are two cases implicit in the graph. To include the two whether a consumer fulfills the policy of a producer. However, remaining cases, TN and FP, it would be necessary to evaluate only explicit accesses are covered in the graph, limiting the the set ESG of all the possible combinations of edges not accuracy evaluation. explicitly present in the set EG, where ESG →{{v, w} ∈/ EG, | v, w ∈ VG}. A. Definition of the Accuracy Metric Since the interaction graph behaves like a complex network, In binary classification, data is divided into two different it is sparse. Only those complementary edges (also called classes, positives (P) and negatives

Load more