Compositional Evolution of Secure Services with Aspects

CESSA Compositional Evolution of Secure Services using Aspects

ANR project no. 09-SEGI-002-01

Demonstrator for ERP

Abstract. One of the CESSA Project’s main goals is to provide mechanisms that provide clear separation of security functionality from other concerns in Service Oriented Architectures using the Aspect-Oriented Programming paradigm. In this document, we report on the application of the methodologies and techniques de- veloped by CESSA to address diverse security problems in distributed enterprise service-oriented . Among other contributions, we automate the mitigation of security vulnerabilities in web services, the enforcement of privacy policies, and provide security for collaborations using the REST web service de facto standard.

Deliverable No. I3.3 Task No. 3 Type Deliverable Dissemination Public Status Final Version 1.0 Date 11 Jan. 2013 Authors Julien Massiera, Jean-Christophe Pazzaglia, Anderson Santana de Oliveira, Theodoor Scholte, Jakub Sendor, Gabriel Serme (SAP); Yves Roudier, Muhammad Sabir Idrees (Eurecom) Contents

1 Introduction 5

2 Adaptive Security 7 2.1 Context and Motivation ...... 7 2.2 Services ...... 7 2.3 Security ...... 8 2.4 Aspect-Oriented Programming ...... 9 2.5 Architecture proposal ...... 9 2.6 Application Example ...... 11 2.7 Related work ...... 14 2.8 Summary ...... 15

3 Aspects for the Correction of Security Vulnerabilities in Web Services and Applica- tions 16 3.1 Context and Motivation ...... 16 3.2 An agile approach ...... 17 3.3 Architecture ...... 18 3.4 Static analysis ...... 20 3.4.1 Static Analysis Process ...... 20 3.4.2 Multiple vulnerability analysis ...... 21 3.5 Assisted Remediation ...... 23 3.5.1 Methdoology ...... 24 3.5.2 Constraints from Aspect-Oriented Programming ...... 26 3.6 Related work ...... 27 3.7 Summary ...... 28

4 Automated Prevention of Input Validation Vulnerabilities in Web Applications 30 4.1 Introduction ...... 30 4.2 Preventing input validation vulnerabilities ...... 31 4.2.1 Output sanitization ...... 31 4.2.2 Input validation ...... 32 4.2.3 Discussion ...... 32 4.3 Output Sanitization and Input Validation ...... 33

2 4.4 IPAAS ...... 34 4.4.1 Parameter Extraction ...... 35 4.4.2 Parameter Analysis ...... 35 4.4.3 Runtime Enforcement ...... 36 4.4.4 Prototype Implementation ...... 37 4.4.5 Discussion ...... 38 4.5 Evaluation ...... 38 4.5.1 Vulnerabilities ...... 39 4.5.2 Automated Parameter Analysis ...... 39 4.5.3 Static Analyzer ...... 41 4.5.4 Impact ...... 43 4.6 Related Work ...... 44 4.6.1 Input validation ...... 44 4.6.2 Attack detection and prevention ...... 45 4.6.3 Vulnerability analysis ...... 46 4.7 Summary ...... 46

5 Enabling Message Security for RESTful Services 47 5.1 Context ...... 47 5.2 REST Security Protocol ...... 48 5.2.1 Message Security Model ...... 48 5.2.2 PKI-based message exchange ...... 48 5.2.3 The REST Security principle ...... 48 5.2.4 Message Signature ...... 50 5.2.5 Message Encryption ...... 51 5.2.6 Signature and Encryption ...... 53 5.2.7 Multiparts ...... 54 5.3 Comparison to WS-Security ...... 55 5.3.1 Environment & Methodology ...... 55 5.3.2 Size comparison ...... 56 5.3.3 Processing performance comparison ...... 59 5.4 Related Work ...... 60 5.5 Summary ...... 61

6 Automating Privacy Enforcement in Cloud Platforms 62 6.1 Context and Motivation ...... 62 6.2 Privacy-Aware Applications in the Cloud ...... 64 6.2.1 Use case ...... 64 6.2.2 Background: Privacy Policy Language ...... 65 6.3 Privacy Enhanced Application Programming ...... 67 6.3.1 Programming Model ...... 67 6.3.2 Implementation ...... 68 6.4 Related Works ...... 71

3 6.5 Summary ...... 72

7 Concluding Remarks 74 7.1 Acknowledgments ...... 74

Bibliography 75

4 Chapter 1

Introduction

One of the CESSA Project’s main goals is to provide mechanisms that provide clear separation of security functionality from other concerns in Service Oriented Architectures using the Aspect- Oriented Programming paradigm. In this document, we report on several application of the methodologies and techniques developed by CESSA to address diverse security problems around service oriented architectures. Although ERP (Enterprise Resource Planning), remains the foundation of SAP’s reputation and one of the company’s major product lines, we focused on developing proof of concepts over the software platform the company is providing today. The SAP Netweaver Cloud1 allows to build web applications and services in a dedicated development environment based on . The developed service oriented artifacts can be deployed over the SAP cloud infrastructure, what characterized the solution as a Platform as a Service. We have chosen to maintain the title of the deliverable as suggested in the description of work for ease of reference and for coherence with respect to the document. Therefore this deliverable brings several contributions around securing SOA’s with Aspect-Oriented techniques either vertically or horizontally, mainly reporting the work executed in the context of the Tasks 3.2 “Security-related aspects and aspect interfaces for use cases” and 3.3 “Design and development of a proof of concept implementation for enterprise information systems”. We summarize these contributions and outline the remainder of the current deliverable as follows :

• Chapter 2 proposes a framework for security mechanisms adaptation when services are involved by using Aspect-Oriented-Programming (AOP) concepts that can be applied to SCM applications. The novelty is the expressivity of security policy at a global level and the enforcement at a local level, through a specific and distributed aspect model that has a larger semantic to catch up events relevant for business usage and dedicated to security concerns. This work has been published at WSSCM2011 [99].

• Chapter 3 introduces an integrated Eclipse plug-in to assist developers in the detection and mitigation of security vulnerabilities using Aspect-Oriented Programming early in the development life-cycle. The work is a combination of static analysis and protection code

1http://scn.sap.com/community/developer-center/cloud-platform

5 generation during the development phase. We leverage the developer interaction with the integrated tool to obtain more knowledge about the system, and to report back a better overview of the different security aspects already applied. This work appeared at the SE- CURWARE 2012 [97], where it has received the best paper award2.

• Chapter 4 brings a novel technique for preventing the exploitation of XSS and SQL in- jection vulnerabilities based on automated data type detection of input parameters. IPAAS automatically and transparently augments otherwise insecure web application development environments with input validators that result in significant and tangible security improve- ments for real systems. Specifically, IPAAS automatically (i) extracts the parameters for a web application; (ii) learns types for each parameter by applying a combination of machine learning over training data and a simple static analysis of the application; and (iii) auto- matically applies robust validators for each parameter to the web application with respect to the inferred types. These validators, which can be seen as message interceptors, act around web services and applications, making input validation an aspectualized concern. This work has appeared in COMPSAC 2012 [94].

• Chapter 5 presents the REST security protocol to provide secure service communication to RESTful web services, as the mainstream service providers nowadays are shifting to REST-based services in the detriment of SOAP-based ones. REST proposes a lightweight approach to consume resources with no specific encapsulation, thus lacking of meta-data descriptions for security requirements. Currently, the security of RESTful services relies on ad-hoc security mechanisms (whose implementation is error-prone) or on the transport layer security (offering poor flexibility). The chapter also provides performance analysis when compared to equivalent WS-Security configuration. This work was published in the research track of ICWS 2012 [98].

• Chapter 6 experiments with vertical composition for the enforcement of privacy policies in SOA. The enforcement of privacy policies is facilitated in a Platform as a Service. Cloud service developers can use simple “aspect” annotations in the code to indicate where per- sonally identifiable information is being handled, prior to the application deployment in the cloud. The evaluation of user defined preferences is performed by trustful components generated by the platform, releasing developers from the creation of ad hoc mechanisms for privacy enforcement. This work was published in the proceedings of the Data Privacy Management Workshop [114].

• Chapter 7 concludes the deliverable.

2http://www.iaria.org/conferences2012/AwardsSECURWARE12.html

6 Chapter 2

Adaptive Security

2.1 Context and Motivation

An SCM application can be viewed as a long chain process along which goods have to pass through mandatory gates. It involves various devices, from embedded systems like sensors to large-scale servers in backend systems. Sensors usages are dedicated to data collection and signal triggering. They try to capture real-world status and measure it. Backend systems allow for data processing but need to adapt to all devices communicating with them, as each can have a different communication protocol and data format. The heterogeneity of platforms and software used in devices makes it difficult to manage simple security rules, especially across a supply chain. In order to deal with the multiple possi- bilities and not to interfere with the business part of software, one might want to describe security behavior for one system that adapts to security capabilities of systems communicating with it. To do so, we propose an architecture that allows correct modularization of security concerns to quickly intervene in applications and make them adapt to the conditions they can face up to. The application uses the SOA architectural style to provide a loosely-coupled platform where entities can integrate with each other. In the following sections, we start by explaining the dif- ferent concepts we are using in our proposed architecture. Namely, Web Services and SOA concepts, security properties we aim to express in an adaptive manner and also AOP (Aspect- oriented programming) paradigm. Then, we describe the proposed architecture and process to handle service adaptation with two examples highlighting difficulties to adapt security for sys- tems accordingly.

2.2 Services

Service Oriented Architectures (SOA) enable a world of loosely-coupled and interoperable soft- ware components towards reusability. Nowadays, the main entity used to represent a software service is a Web Service. Web-Services represent a paradigm defined by W3C as ”a software sys- tem designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact

7 with the Web service in a manner prescribed by its description using SOAP messages, conveyed using HTTP with an XML serialization in conjunction with other Web-related standards” [9]. Web Services can also be addressed through other transport mechanisms such as JMS or ESBs. The Web Service standards stack goes beyond the atomic service, and proposes different approaches depending on the level of abstraction. Service behavior can be defined when linking different services together, e.g., with BPEL4WS or BPMN 2.0 [38]. It allows definition of service composition to realize a so-called business process.

2.3 Security

As services are advancing fast and are being extensively deployed in applications spanning dif- ferent organizations, it becomes crucial to ensure security and trust for these applications to hold their promise. It was only recognized in recent years that services are themselves susceptible to various attacks at different levels of system conceptualization [45]. Since SOA has a flexible set of design principles used during the phases of system devel- opment, integration, and evolution, one obvious and common challenge is to secure SOA. This often involves invasive modifications, in particular to enable new security functionalities that re- quire modifications to applications. Furthermore, enforcing crosscutting security functionality in service-based systems is difficult to specify and implement because the security of services and their compositions is not modular. Modifications made to one part of an application may interact strongly with the security properties of other parts of the same application. Security properties generally pervade software systems, that is, security properties crosscut service-oriented archi- tectures. Enforcing service level security needs specialization based on the implementation. We propose in the following an aspect-based service model that proposes an original method to introduce several security properties. These security properties are specified in a security policy language that is then interpreted to generate crosscutting concerns. It includes Integrity which relates to communication, storage (resources), and execution (process - infrastructure) in- tegrity. The execution environment integrity is an important security objective together with data integrity measures. Confidentiality relates to message exchange between entities such as sensors and services, or need-to-know principles limitation applied to specific resources. Authentication and Authorization crosscut applications to decide at several points if a given subject is allowed to perform an action on a given resource. Whereas authorization decisions are mainly on server- side, authentication mechanism needs to adapt all peers to agree on the scheme, including sensors authentication. Applying non-repudiation requires the implementation of an asymmetric encryp- tion scheme in the execution environment supporting the computation. The aforementioned properties represent security goals we want to apply on applications by adapting them with our framework.

8 2.4 Aspect-Oriented Programming

The term Aspect-Oriented-Programming [48] (AOP) has been coined around 1995 by a group led by Gregor Kiczales, with the goal to bring proper separation of concerns for cross cutting functionalities. Roots for foundations can be traced back to adaptive programming, or compo- sition filters [62]. O. Selfridge introduced a notion that can be related to AOP as ”demons that record events as they occur, recognize patterns in those events, and can trigger subsequent events according to patterns that they care about” [96]. But the approach has then derived to become a discipline apart. The aspect concept is composed of several advice/pointcut couple. Pointcuts allow to define where (points in the source code of an application) or when (events during the execution of an application) aspects should apply modifications. Pointcuts are expressed in pointcut languages and often contain a large number of aspect-specific constructs that match specific structures of the language in which base applications are expressed, such a pattern language based on language syntax. Advices are used to define modifications an aspect may perform on the base application. Advices are often expressed in terms of some general-purpose language with a small number of aspect-specific extensions, such as the proceed construct that allows the execution of the behavior of the base application that triggered the aspect application in the first place. The main advantage using this technology is the ability to intervene in the execution without interfering with the base program code, thus facilitating maintainability.

2.5 Architecture proposal

In this work, we shape the solution we are currently implementing at the service layer. The root of the problem is to instrument several services at the same time, potentially not under the same execution environment to realize a specific security property. Illustration examples are available in next section. The architecture is presented in Figure 2.1. It contains two parts, involving de- sign and runtime part. The design part involves business stakeholders to define aforementioned security policies (also denoted rules), and security experts to provide concrete security mecha- nisms as pre-defined aspects. The runtime part of the architecture leverages the aspect model to modify the different execution environments and make them satisfy security policies specified by business and security stakeholders. The main piece is the runtime engine whose goal is to detect a certain state across platforms. The state is described by rules composed of predicates. Upon matching between rules and a state, platforms coordinate to realize a new behavior. Locally, systems implement mechanisms to realize a behavior specified in the knowledge base and make usage of context information available at execution. Runtime monitor agents and runtime engine work together to realize a distributed aspect model, introduced in [72]. The advantage of such an architecture is to intervene in one specific organization where we separate security concerns across different platforms. The security code is no longer tied to the business code. Rather, it is decoupled from business code intentionally and bound with the distributed pointcut language. The runtime engine uses rules to gather state of various services at the same time, and security aspects are used in advice to dispatch security

9 Figure 2.1: Proposed architecture behavior across several services. For example, in the integrity and origination scenario, the distributed aspect model tracks all places where messages come in the system to taint them. Then, the system tracks these taint messages to weave security aspects when behavior is needed. Security policies are specified by business users, security experts and architects then derived in rules to realize a specific security property. To provide an enriched semantic addressing all concerns of services and security at the same time, we are developing a policy which contains high level description of wanted behavior, through predicates. The predicates allow matching with a particular state. The language semantic relates to events and actions a platform can gener- ate, i.e., services, messages and resources. Services can be atomic or composite, e.g., orchestra- tion of services with BPEL4WS or BPMN2.0, or simply a service that consumes other services. Examples of predicates that can be used are ”receive” or ”reply” to match a service call. There are also predicates for messages and resources. Predicates have different level of abstraction relating to the service stack we discussed above. The policies also contain behavior that stakeholders want to introduce across systems. In our framework, we address only security concerns such as the ones described in previous sections: confidentiality, integrity, authorization, etc. In Listing 2.1, the policy describes integrity and non- repudiation presence in messages when they are issued by sensors. The message origination is verified when we are able to verify signature. The behavior is described in an abstract way to indicate which parties are concerned and what shall be executed. Platforms receive this behavior and are in charge of translating it according to mechanisms available for the given platform. The Listing 2.2 in next section is an exemple of java code to verify a given signature for a message.

10 The runtime monitor detects a certain application state through the aforementioned predicates. Upon matching, the wanted behavior is read from policies and spread to concerned systems to satisfy and realize security properties. 1 m e s s a g e in (issuer , msg): 2 issuer in (s:sensors)= > v e r i f y integrity (s, msg), verify origination ( s , msg ) 3 msg .taint (UNSAFE) #Default 4 msg :integrity , msg:non repudiation= > msg .taint (SAFE) 5 6 v e r i f y integrity (msg): 7 msg.contains(integrity), integrity .check(msg) = > msg :integrity 8 9 v e r i f y origination (msg, issuer): 10 msg . c o n t a i n s (sign), sign.issued(issuer) = > v e r i f y origination (msg, sign,issuer), msg:non repudiation

Listing 2.1: Policy snippet for integrity and non-repudiation check Our framework heavily relies on aspect oriented paradigm. The runtime monitor is able to detect system state using a distributed aspect pointcut language. On matching system state, advices are executed with the system’s context through context exposition mechanisms. For example, a service can expose information about inputs, outputs, service origination, as well as other security-related information. The proposed architecture allows definition of security policies for service systems at a global level that are then enforced at a local level in an semi-automatic mode. We propose to decouple definition of specific security properties from the base application, and let declaration through rules respecting application owners’ needs.

2.6 Application Example

We describe two scenarios illustrating when our framework can be applied in a SCM application. The long term scenario (cf Fig. 2.2) is a military container that is sent to supply a camp thanks to a boat transport. Shipment is not direct and the container has to pass through several interme- diaries, to refuel, change boat, etc.. Thus, it has been decided by the army to frequently track and check containers when they stop in harbors. The communication between containers and the army system is made through Web Services and use the Harbor system to certify the shipment advancement.

The first scenario leveraging our framework is on adaptation and tracking of sensitive data. It highlights integrity and non-repudiation scenario and how it impacts the existing architecture on the army systems. Over time, the army deployed containers with protection mechanisms to detect failure or intrusion composed of sensors and nodes. Different software solutions are shipped with containers with different security capabilities. Maintaining applications, both in nodes and back-end systems, is costly. It requires business owners to specify each possible use case, at a time t. Upon release of new version, they need to

11 Figure 2.2: Army shipment and control system extend existing specifications and activate their development team. The development team has to correctly implement the solution and to not break the previous solutions. The release cycle can be counted in weeks or even months and is error prone. A suitable solution is to have a framework that knows what to do given a certain situation. For example we want to allow communication between all nodes versions and the back end system while keeping track of sensitive nodes - those which do not implement security mechanisms. An example of policy we might define to detect various versions of protocols is described in Listing 2.1. The policy language used is not yet fully developed and is used as an illustration. 1 @Aspect 2 Class Verification { 3 / / . . . 4 @Covers(SPL.verify origination) //Provided by framework 5 boolean verifyOrigination (Byte[] msg, Byte[] msgdsig, Identity issuer) { 6 //get public key of issuer 7 X509EncodedKeySpec pubKeySpec = new X509EncodedKeySpec(Security. getPubKey (issuer)); 8 KeyFactory keyFactory = KeyFactory.getInstance(”DSA”, ”SUN”); 9 PublicKey pubKey = keyFactory.generatePublic(pubKeySpec); 10 11 //get message signature 12 Signature sig = Signature.getInstance(”SHA1withDSA”, ”SUN”); 13 sig.initVerify(pubKey); 14 s i g . u p d a t e (msg); 15 16 / / v e r i f y 17 boolean verifies = sig.verify(msgdsig); 18 r e t u r n v e r i f i e s ; 19 } 20 }

Listing 2.2: Java snippet for proof of origin as aspect

Figure 2.3 shows the sequence diagram of two containers notifying the back-end system.

12 Figure 2.3: Multi platform adaptation

Containers one (C1) and two (C2) both send the same type of information to the army system. But C2 uses a newer protocol which includes a proof of origin to avoid tamper risk on transmis- sion. The rectangles and their attached dotted-lines in the figure are points in architecture where our framework intervenes and injects mechanisms. With our framework, the back-end system intercepts data coming in the system and verifies it, thanks to a runtime monitor agent. It detects security protections from containers and provide to back-end services the data formatted accord- ingly. A taint mechanism marks data depending on its state and policy in place. Detection is made through platform implementation, such the one described in Listing 2.2. The listing respects pol- icy declaration, as shown in Line 4 which bind the code - hence the behavior with the policy. The method signature is extracted from the leaf policy veri f y origination(msg,sign,issuer). It then allow verification of signature. The message is tainted depending the method execution result. The piece of code is processed only upon correct matching and return information understand- able by the ”runtime engine”. In our example, the origination of data1 cannot be verified. As the policy in Listing 2.1 expresses, the data is marked as UNSAFE. When headquarters request this data, the army system knows the data is unsafe and can propose notification mechanisms to warn user about data uncertainty. The second use-case shows authorization and confidentiality check with our framework. The adaptation relies heavily on the context. We first explain the authorization part. The army sys- tem receives a document composed of parts with different authorization level : L1 and L2. We are in the context of a Mandatory Access Control – thus strong hierarchy and definition of who is allowed to perform which action on which resource.The resource is composed of two parts. One that contains logistics information, such as freight id, container weight or event history of harbour. The second part contains details about freight : composition of the fret, final destination and usage, etc. The second part contains strategical information, that only high-ranked militaries can consult. As shown in Figure 2.4, a lieutenant and the logistics officer try to access resources sent by the container. The former can access all parts while the latter can only access the L2-part. When the lieutenant accesses the sensitive part of the data, the runtime monitor detects usage of a sensitive data and adapt the platform to provide confidentiality between involved peers : encryp- tion for the lieutenant before data transmission and decryption mechanism after transmission.

Our framework intervenes from security rules – upon detection of data from a specific con- tainer, the runtime monitor triggers a code that marks different parts of the data. Then, it in-

13 Figure 2.4: Authorization and confidentiality mechanisms troduces a behavior when sensitive information is transmitted. The concrete implementation of mechanisms is made locally. For instance, when the lieutenant requests the sensitive part of the data, the encryption/decryption mechanisms are executed. Systems agreed upon a behavior then implementation and execution of security mechanisms is made locally.

2.7 Related work

We divide the related work in two separate categories addressing security-related solutions with aspects, or aspects for services. The former often imply modelling of security properties before- hand to latter enforce them correctly on the system. Translation mechanisms are often hand- written. The second category focuses on AOP and how to introduce its underlying concepts in services. To the best of our knowledge, no concrete work has been done to address security con- cerns that pervades both applications and services while proposing decoupling from the business code. In [5], Baligand uses AOP with Web Services to introduce non-functional requirements, fol- lowing a policy. The difference with our work is they do not cover simultaneous orchestration of different services to realize one capability. [35] has the same goal but proposes an XML-centric approach to specify pointcuts and advice whereas we rely on automatic matching from policy rules. In [26], Ganesan et al.. addresses an aspect model for composite services. They introduce a specification language to design non-functional requirements as distributed aspects, but they do not cover security per se. In [67], Mostefaoui´ et al. also address a framework to decouple security concerns with aspects on web services. They use frames concept to have a configuration including both composite and component level. In [79], the authors provide an architecture to have distributed aspects to modularize and adapt non-functional requirement but only for com- posite services. Also, their approach implies advice code to be already on target platform for execution. In [44], Jakob et al. use AOP to secure distributed systems. Their approach is rather to specify early security properties thanks to pointcut language tied to an architecture diagram. [75] exposes a concrete use case of applying authentication with AOP in a SOA-based sensor architecture. Whereas it provides concrete mechanisms as we aim to do, we go a step further by binding these mechanisms with policies. It makes policy analysis way more consistent over

14 time. In [68], Mourad et al. use an AOP-based language for security hardening. The language introduces concepts close to pointcuts. Therefore, the language does not cover services.

2.8 Summary

Addressing cross-cutting concerns that pervades services with strong focus on security lead us to a new architecture proposal. We have seen through our example that this architecture can be applied to an SCM use case. It gives tools and methods from early phase of application design to implementation and maintenance of sensors to gather accurate context information. From modelling information, ones decide what are specifications that have to be enforced during the execution of the application. In other words, the proposed architecture allows definition of secu- rity policies for service-based systems at a global level that are then enforced at a local level in an semi-automatic mode. We propose to decouple definition of specific security properties from the base application, and let declaration through rules respecting application owners’ needs. A prototype is under development to address the runtime part - modification of different execu- tion environment with Aspects to introduce security features. Currently, we limit complexity to one platform at a time. We want to investigate modifications across platforms in future work - platforms not located under a same administrative domain. It requires trust and mechanisms to ensure synchronisation, guaranties that security is effectively implemented to mention a few.

15 Chapter 3

Aspects for the Correction of Security Vulnerabilities in Web Services and Applications

3.1 Context and Motivation

After a decade of existence, Cross-site scripting, SQL Injection and other of types of security vulnerabilities associated to input validation can cause severe damage once exploited. To analyze this fact, Scholte et al.[91] conducted an empirical study that shows that the number of reported vulnerabilities is not decreasing. While computer security is primarily a matter of secure design and architecture, it is also known that even with best designed architectures, security bugs will still show up due to poor implementation. Thus, fixing security vulnerabilities before shipment can no more be considered optional. Most of the reported security vulnerabilities are leftovers forgotten by developers, thought to be some benign code. Such kind of mistakes can survive unaudited for years until they end up exploited by hackers. The software development lifecycle introduces several steps to audit and test the code pro- duced by developers in order to detect the security bugs, such as code review tools for early detection of security bugs to penetration testing. The tools are used to automate some tasks nor- mally handled manually or requiring complex processing and data manipulation. They are able to detect several of errors and software defects, but developers have to face heterogeneous tools, each one with a different process to make it run correctly, and they have to analyze the results of all the tools, merge them and fix the source code accordingly. For instance, code scanner tools are usually designed to be independent from the developers’ environment. Therefore, they gain in flexibility but loose comprehensiveness and the possibility to interact with people having the experience on application code. Thus, tools produce results that are not directly linked to appli- cation defects. It is the case for example for code scanner tools triggering several false positives, that are not actual vulnerabilities. The contributions of this work are twofold. First, we focus on static code analysis, an auto-

16 mated approach to perform code review integrated in developer’s environment. This technique analyzes the source code and/or binary code without executing it and identifies anti-patterns that leads to security bugs. We focus on security vulnerabilities caused by missing input validation, the process of validating all the inputs to an application before using it. Although our tool handles other kinds of vulnerabilities, here we discuss on three main vulnerabilities caused by missing input validation, or mis-validation of the input: Cross Site Scripting (also called XSS), Direc- tory Path Traversal and SQL Injection. Second, we provide an innovative assisted remediation process that employs Aspect-Oriented Programming for semi-automatic vulnerability correction. The combination of these mechanisms improve the quality of the software with respect to secu- rity requirements. The chapter is structured as follows : Section 3.2 presents the overall agile approach to con- duct code scanning and correct vulnerability during the development phase. Then, Section 3.3 presents the architecture we adopt to combine the static analysis with the code correction compo- nent. The Section 3.4 describes the static analysis process with its integration in the developers’ environment. Then, we explain techniques for assisted remediation along with pros and cons in Section 3.5. Finally, we discuss the advantages of our approach compared to related work in Section 4.6 to conclude in Section 6.5.

3.2 An agile approach

Agile approaches to software development require the code to be refactored, reviewed and tested at each iteration of the development lifecycle. While unit testing can be used to check func- tional requirements fulfillment during iterations, checking emerging properties of software such as security or safety is more difficult. We aim to provide each developer with a simple way to do daily security static analysis on his code. That would be properly achieved by providing a security code scanner integrated in the development environment, i.e. Eclipse in this case, and a decentralized architecture that allows the security experts to assist the developers in any of the findings. Typically that would include verifying false positives and correspondingly adjusting the code scanner test cases, or assisting in reviewing the solutions for the fixes. It brings several advantages over the approach in which the static analysis phase stays only at the end. The exper- tise of the context in which the code was developed lies in development groups. Therefore, the interaction between development team and security experts is faster with less efforts in finding and applying corrections on the security functionalities. The experts provide support on a case basis for a better tuning of false positive detection across teams and reducing final costs of main- tenance : solving security issues into the development phase can reduce the number of issues that the security experts should analyze at the end. Maintaining the separation of roles between the security experts performing the code scan- ning and the team members developing the application raises a critical complication, typically, from a time perspective, due to the human interaction between security experts and developers. If such an approach would have to scale to what most of the agile approaches describe, the amount of iteration between developers and experts would need to be reduced. That could be reduced by up-skilling the developers and reducing the interaction between them and the security experts

17 Figure 3.1: Vulnerability remediation process. The red corresponds to the static analysis com- ponent. The green one corresponds to the remediation component. The blue one corresponds to assisted processing for the analysis of the security scans of the project, which is simplified by the introduction of our tool. Our incentive is to harvest the advantages acquired by using our approach in an agile and decentralized static analysis process early in the software development lifecycle. It raises security awareness for the developers at the development time and reduces maintenance costs. A tool covering the previous needs should fulfill several requirements: • easy-to use for users non-experts in security • domain specific with integration into developers’ daily environment, to maximize adoption and avoid additional steps to run the tool • adjustable to maximize project knowledge and reduce false positives and negatives • reflexive to adjust accuracy of the scan over time, with collaborative feedbacks for example • supportive to assist developers in correcting and understand issues. • educative to help developers understanding errors, steps to correct existing error, and tech- niques to prevent future vulnerability We have developed an Eclipse Plugin made of components leveraging decentralized approach for static analysis. It gives direct access to detected flaws and global overview on system vulner- abilities. The developer analyzes its code and review vulnerabilities when necessary. The Figure 3.1 presents the interaction between the two phases : the static analysis phase allows to scan the code in order to identify and classify the different vulnerabilities found. It is described in details in Section 3.4. The measurement is performed directly by developers who decide to remediate by undertaken actions, with support from our second component. The full remediation process is given in Section 3.5 .

3.3 Architecture

Figure 4.3 represents the architecture of our prototype. First of all, we consider two main stake- holders involved in the configuration and usage of the prototype. Security experts and developers

18 regroup different profiles whose goal is to provide and configure the knowledge database in order to avoid false positives and negatives, and to provide better accuracy during the analysis phase. They have two main tasks. First, they update the knowledge base, adding to it classes or meth- ods that can be considered as trusted for one or more vulnerabilities. Second, the knowledge database receives feedback from analysis on possible trusted objects for one or more security vulnerabilities; they must analyze them more in detail and, if these objects are really trusted they tag them as trusted into the knowledge base.

Figure 3.2: Architecture The second role is the developer, interacting directly with the static analysis engine to verify vulnerabilities in application, code and libraries under its responsibility. The developer at this stage can be naive, therefore with no focus on complexity of security flow. The knowledge base is shared among developers. It contains all the security knowledge about trust: objects that do not introduce security issues into the code. Security experts and developers with understanding of security patterns maintain and keep under control the definitions used by all developers in an easy way using one admin web application or some web-services. In this way the code scanner testing rules are harmonized for the whole application or even on a project-basis. The knowledge base allows developers to run static analysis that is perfectly adapted to the context of their project. In industrial scale projects, daily scans are recommended. In order to facilitate this task, we provide a plugin for Eclipse that uses the Abstract Syntax Tree generated by JDT compiler to simplify the static analysis process. The plugin accesses the knowledge database via web-

19 services making it possible to each developer to run independently the code scanner. We detail its components in the next section.

3.4 Static analysis

Static analysis can report security bugs even when scanning small pieces of code. Another family of code scanners is based on dynamic analysis techniques that acquire information at runtime. Unlike static analysis, dynamic analysis requires a running executable code. Static analysis scans all the source code while dynamic analysis can verify certain use cases being executed. The ma- jor drawback of static analysis is that it can report both false positives and false negatives. The former detects a security vulnerability that is not truly a security vulnerability, while the latter means that it misses to report certain security vulnerabilities. Having false negatives is highly dangerous as it gives one sensation of protection while vulnerability is present and can be ex- ploited, whereas having false positives primarily slows down the static analysis process. Modern static analysis tools, similarly to compilers, build an Abstract Syntax Tree - a tree representation of the abstract syntactic structure of the code - from the source code and analyze it.

3.4.1 Static Analysis Process In a nutshell, our process allows developers to run a check on their code to uncover potential vulnerabilities by checking for inputs that have not been validated. It finds information flows connecting an entry point and exit point that does not use a trusted object for the considered vulnerabilities. The algorithm uses an abstract syntax tree of the software in conjunction with the knowledge base to identify the vulnerable points. The Figure 3.3 presents the different analysis steps performed from the moment developer presses the analysis button to the display of results. The static analysis works on Document Object Model generated by the Eclipse JDT compo- nent able of handling all constructs described in the Java Language Specification [28]. The static analysis process is described as follows:

• The engine contacts the knowledge database in order to retrieve the up-to-date and most accurate configuration from the shared platform. If the developer cannot retrieve the con- figuration, it can still work independently with the latest local configuration.

• The process identifies all entry points of interest in the accessible source code and libraries. The analysis is based on the previously mentioned AST. We are gathering the different variables and fields used as well as the different methods. We apply a first filter with pattern-matching on the potential entry points : a method call or a new object instantiation might be tagged as returning trusted inputs.

• For each entry point the control flow is followed to create the connections between meth- ods, variables and fields to discover all the exit points. For instance, the engine visits assignments, method invocations and construction of new objects with the variables and fields detected during the entry point gathering.

20 Figure 3.3: Static Analysis Activity Diagram

• Once the different exit points have been collected, we check an absence of validation in the flow for the different kinds of vulnerabilities. For instance, if the flow from an entry point to an exit point passes through a method or a class which is known to validate SQL input, the flow is tagged as trusted for this specific vulnerability. Of course, the tag runs from the moment where the method validates for the vulnerability to the moment of a novel composition with potential vulnerable code, or until an exit point.

3.4.2 Multiple vulnerability analysis In the previous section, we have presented the global analysis process. In this section, we discuss more in-depth the notion of trusted object for the different vulnerabilities we address. We present the integration with the Eclipse’s workbench and partial source source code being validated in Figure 3.4. The problem of identifying security vulnerabilities caused by errors in input valida- tion can be translated to finding an information flow connecting an entry point and an exit point

21 that does not use a trusted object for the considered vulnerabilities.

Figure 3.4: Code Analysis phase

We define an input as a data flow from any external class, method or parameter into the code being programmed. We also define as entry point any point into the source code where an untrusted input enters to the program being scanned. In an analogous way we define as output any data flow that goes from the code being programmed into external objects or method invocations. Our approach relies on our trusted object definition, that impacts the detection accuracy. A trusted object is a class or a method that can sanitizes all the information flow from an entry point to an exit point for one or more security vulnerabilities. We implemented the trust definitions into the centralized knowledge base presented in the previous section. The knowledge database represents the definitions using a trusting hierarchy that follows the package hierarchy. Security experts can tag classes, packages or methods as trusted for one or more security vulnerabilities, accordingly to their analysis, feedbacks from developers or static analysis results. Obviously defining as trusted an element into the hierarchy trust also all the elements below it (i.e. trusting a package trusts all the classes and methods into it and trusting a class trusts all the fields and methods in it). A trusted object can sanitize one or more security vulnerabilities (e.g. sanitization for SQL Injection Thus users can tag an object as trusted for specific vulnerabilities. This approach lets users and security experts to define strong trust policies. It is the major contribution to bring deep knowledge of security for the success of the process. Defining a trusted object is a strong assertion as it taints a given flow as valid and free for a

22 given vulnerability. The definition process to trust a class, a package or a method is rigorous. The object must not introduce a specific vulnerability into the code. This is the reason why developers report feedback and security experts take the decision. The experts can also analyze, manage and update the base, if the class, package or method is considered trusted. This phase allows system tuning that is related to a given organization and leads to fewer false positives while ensuring no false negatives.

Figure 3.5: Code Analysis result

The detected vulnerabilities (Figure 3.5) are mainly caused by lack of input validation, namely SQL Injection, Directory Path Traversal and Cross Site Scripting. The engine detects also a more general Malformed Input vulnerability that represent a any input that is not validated using a stan- dard implementation. The engine can be easily extended to support new kinds of vulnerabilities caused by missing input validation. For this simply adding the definition of the new vulnerabil- ity to the centralized knowledge base (and, if exist, adding trusted objects that sanitize it), and creating a new class extending an interface, that implements the checks to be done on the result of the static analysis to detect the vulnerability.

3.5 Assisted Remediation

Performing static analysis is yet integrated in quality processes in several companies. But the actual identification of vulnerabilities does not mean they are correctly mitigated. Given this problem, we can have several approaches : (i) refactoring the code, (ii) applying a proxy in inbound and outbound connections, and finally - the solution we adopted, (iii) to generate pro- tection code linked to the application being analysed. Software refactoring involves the developer into understanding the design of its application and the potential threats, to manually rewrite part of the code. The refactoring improves the design, performance and manageability of the code, but is difficult to address. It costs time

23 Vulnerability Origin Potential Remediation Cross-Site Script- Server does not validate in- Validate input and filter or encode prop- ing put coming from external erly the output depending on the usage source : the encoding differs from HTML con- tent to Javascript content for example SQL Injection Server does not validate in- Use a parametrized query or a safe API. put and use it directly in a Escape special characters. Validate the construct of a SQL Query input used in the construction of query Directory Path Application server is Enclose the application with strict poli- Traversal misconfigured, or the file- cies, that restrict access to the filesystem system policy contains by default. Filter and validate the input weaknesses prior to direct file access Other malformed Misvalidation Validate input, determine the origin and input possible manipulation from externals

Table 3.1: List of detected vulnerabilities with potential origin and potential remediation. and is error prone. Up to six distinct activities have been observed in [64] from identification to verification of refactoring. The impacted code is generally scattered over the application, and some part can be left unchecked easily. This can lead to an inconsistent state where the application does not reflect the intended goal. In terms of vulnerability remeditation, the software refactoring is one of the most powerful due to the flexibility in terms of code re-writing and architecture evolution. The proxy solution is equivalent to a gray-box approach, with no in-depth visibility of internal processes. It can be heavy to put in place, specially when the environment is under control of a different entity than the development team. For instance, on cloud platforms, one can deploy its application but has limited management on other capabilities, leading to the impossibility to apply filter on the application. The lack of flexibility and the absence of small adjustments makes it complicated to adopt at the development phase. In this work we provide protection inlined with the application. This solution has several advantages, but bring also new limitations due to the technology we use: Aspect-Oriented Pro- gramming paradigm (AOP) [48], which is a paradigm to ease programming concerns that cross- cut and pervade applications. In the next section, we describe our methodology and provide a comprehensive list of advantages and drawbacks.

3.5.1 Methdoology The approach comprises the automatic discovery of vulnerability and weaknesses in the code. In addition, we integrate a protection phase tied to the analysis process which guides developers through the correct and semi-automatic correction of vulnerabilities previously detected. It uses information from the static analysis engine to know what vulnerabilities have to be corrected. Then it requires inputs from the developer to extract knowledge about the context, like in Fig-

24 ure 3.6. These steps allow to gather places in the code where to inject security correction. The security correction uses AOP. The goal is to bring proper separation of concerns for cross cutting functionalities, such as security. Thus, code related to a concern is maintained separately from the base application. The main advantage using this technology is the ability to intervene in the control flow of a program without interfering with the base program code. The list of vulnerability we cover principally are in Table 3.1. The Table highlights the potential origin vulnerabilities and some of known remediation techniques. These vulnerabilities are known and subject to high attention. For instance, we can retrieve them in the OWASP Top Ten [74] for several years now, but also in the MITRE Top 25 Most Dangerous Software Errors [65]. Albeit several approaches exist to remediate the vulnerabilities, we are considering mainly escaping and validation to consistently remediate the problems with the aspect-oriented technique.

Figure 3.6: Gathering context for vulnerability protection

By adopting this approach, we reduce the time to correct vulnerabilities by applying semi- automatic and pre-defined mechanisms to mitigate them. We use the component to apply protec- tion code which is mostly tangled and scattered over an application. Correcting a security vulnerability is not trivial. Different refactoring are possible depending

25 on the issue. For instance, the guides for secure programming advise SQL prepared statement to prevent SQL Injection. But developers might be constrained by their frameworks to forge SQL queries themselves. Therefore, developers would try another approach such as input validation and escaping of special characters.

Figure 3.7: Example of correction snippet generated for a malformed input

We assist developers by proposing them the best automated solution possible. For the previ- ously mentioned correction, our integrated solution would propose to mitigate the vulnerability with an automatic detection of incoming, unsafe and unchecked variables. The developer does not need to be security expert to correct vulnerabilities as our approach provides interactive steps to generate AOP protection code, like in Figure 3.7. Although semi-automation simplifies the process to introduce protection code, the technique can introduce several side-effects if the de- velopers are not following closely what is generated. The plug-in gives an overview for the developer of all corrected vulnerabilities (Figure 3.8), allowing him to manage and re-arrange them in case of need. Currently, the prototype does not analyze interaction between the differ- ent protection code generated. By adopting this approach, we allow better understanding from an user point of view of the different vulnerabilities affecting the system, and we guide the de- veloper towards more compliance in its application. The protection code can be deployed by security expert teams and change without refactoring.

3.5.2 Constraints from Aspect-Oriented Programming The usage of AOP in the remedition of vulnerability bring us more flexibility. One can evolve the techniques used to protect the application, by switching the process to resolve a problem,

26 Figure 3.8: Correction applied making the security solution independent from the application. But this approach also bring us some limitations we discuss in this section. Firstly, the language is designed to modify the application control flow. One of the limitations we have is related to the deep modification we need to perform in order to replace partially a behavior. For example, suppose a SQL query written manually in the application we would like to validate. We are able to weave validation and escaping code, but we can hardly modify the application to construct a parameterized query. Secondly, the aspects cover the application in the whole. When more than one aspect is in- volved, the cross-cutting concerns can intersect. Therefore, we need to analyze aspect interaction and prevent an annihilation of the behavior we intended to address. Thirdly, the evolution of the program leads to a different repartition of vulnerabilities. The vulnerabilities are detected after the static analysis phase. We are not addressing yet this problem of evolution to maintain the relation between the aspects and the application. This differs from the fragile pointcut problem inherent of aspect using pointcut languages referring to the syntax of the base language: the evolution affects the application as a whole, by introducing new entry points and exit points that need to be considered, or introducing methods that validate a flow for a given vulnerability. The fourth constraint is that aspects have no specific certification. The actual protection li- brary is defined globally, but applied locally, with a late binding to the application. The protection code is the same everywhere, but we put strong trust in the protection library by assuming that aspects are behaving properly with the actual modification of the flow to mitigate the vulnerabil- ity. Finally, the fifth constraint is user acceptance. Since the developers rely on cross cutting solution, the code itself does not reflect the exact state of the application. The point where the aspect interferes with the base application are not presented in the code. We address this limita- tion with the strong interaction with the developer’s environment. The Eclipse plugin provides a mean to display remediation code in place at a given time.

3.6 Related work

The interest into static analysis field has lead to several approaches. They go from simple tech- niques like pattern matching and string analysis like in [107, 29, 109, 14] to more complex techniques like data flow analysis in [54, 59, 7]. Commercial tools, such as Fortify1 or CodePro-

1Fortify 360 - https://www.fortify.com/

27 filer2 propose better integration in developers’ environment but lack of decentralized approach and assistance in security management. Several tools are based on the Eclipse’s platform and detect vulnerabilities in web applications [60] , flaws [19], bugs [105], and propose testing and audit to verify respect of organizational guidelines [27]. Their main disadvantages are the lack of context support for correction and poor integration into the daily development lifecycle. In [34], the authors use AOP to protect against web vulnerabilities: XSS and SQL Injection. They use AspectJ - the mainstream AOP language, to intercept method calls in an application server then perform validation on parameters. Viega and al. [106] presents simple use case on the usage of AOP for software security. In [63], the authors introduce an aspect primitive for dataflow, allowing to detect vulnerabilities like XSS. Our approach reduces the overhead brought by the detection of vulnerability patterns at runtime and allows wider range of vulnerability detection. Also, the aforementioned approaches do not rely on external tools to gather security context, but rather a manual processing to understand the architecture and decide where to apply aspects. Our approach also bring more awareness to the developer as he obtain a visual indicator of what is applied at which place in its application. A combination of detection and protection is found in[18] with the proposal of an approach for detecting faults identified by pre-compiled patterns. Faults are corrected using a correction module. The difference with our approach lies in the detection of faults rather than security vulnerabilities. Also, the correction module fixes the faults statically and prevents further modi- fications of the introduced code. A recent work [113] uses static analysis to detect security points to deploy protection code with aspects, on distributed tuple space systems. These two approaches suffer from same limitation as the ones presented in the previous paragraph, which is a lack of visual support from the tool, and a loose of context.

3.7 Summary

We presented how to overcome several security vulnerabilities using a combination between a static analyzer that assists developers to report security vulnerabilities and a semi-automated correction of these findings with AOP. The usage of an integrated tool to provide support for security bugs detection and mitigation has several advantages. It benefits to several stakeholders at the same time. First, security teams are able to distribute the maintenance of the code to the people writing their code and let them mitigate security bugs whenever they are detected. They can interact closely to decide of the best solutions for a given situation, and apply security across development teams. Developers benefit from this approach, having an operational tool already configured for their development. They can focus on writing their functional code and, time to time, verify the accuracy of their implementation. Security concerns are often cross cutting the application, which tends to have security checks spread around application. Using one central tool to have an overview is more efficient and productive, and gives the possibility to track all applied protection code. The automation allows a broader and consistent application of security across applications. The usage of AOP eases the deployment and change of security protection

2CodeProfiler - http://www.codeprofilers.com/

28 code, in a single environment and during the development phase. The overall vision we would like to achieve in the future is the specification and maintenance of security concerns in one central place, and usage by developers of these concerns by defining some places in application where they should be active. We have designed this plug-in for an improved awareness of security concerns from a de- veloper point of view. The automated correction might not be the best choice everytime, and we encourage developers to look further in vulnerabilities’ descriptions. Also, we do not want developers believe our solution is bullet-proof. It leads to a false sensation of security which the opposite of our goal. Albeit we have listed several benefits for an integrated tool, we know that it suffers from limitations. For instance, when we are developing a tool such as an Eclipse plug-in, we are targeting a platform and a language, thus voluntarily restricting the scope of application. From the tool itself, we have designed a working prototype that we have validated on projects internally at SAP and compared to commercial . In several cases, the agile approach leads to a reduction of false positives and an absence of false negatives. Also, the approach of providing support for correcting the vulnerability is novel and we focus now in improving accuracy of the protection code.

29 Chapter 4

Automated Prevention of Input Validation Vulnerabilities in Web Applications

4.1 Introduction

Recent data from MITRE, OWASP and SANS institute shows that web vulnerabilities such as XSS and SQL injection have been dominating the vulnerability reporting charts for years now and they are still very common. In [92] and [93], Scholte et al. analyzed a large number of input validation vulnerability reports to understand why they are still very prevalent and how those vulnerabilities can be prevented. Application developers often fail to implement any counter- measures against those vulnerabilities and many of them can be prevented if web programming languages and frameworks would enforce the validation of user-supplied input using common data types. In this Chapter, we present IPAAS, a novel technique for preventing the exploitation of XSS and SQL injection vulnerabilities based on automated data type detection of input parameters. IPAAS automatically and transparently augments otherwise insecure web application develop- ment environments with input validators that result in significant and tangible security improve- ments for real systems. Specifically, IPAAS automatically (i) extracts the parameters for a web application; (ii) learns types for each parameter by applying a combination of machine learning over training data and a simple static analysis of the application; and (iii) automatically applies robust validators for each parameter to the web application with respect to the inferred types. IPAAS is transparent to the developer and helps therefore developers that are unaware of web application security issues to write more secure applications than they otherwise would do. Fur- thermore, our technique is not dependent on any specific programming language or framework. This allows IPAAS to improve the security of legacy applications and/or applications written in insecure languages. Unfortunately, due to the inherent drawbacks of input validation, IPAAS is not able to protect against all kind of XSS and SQL injection attacks. However, our experiments show that IPAAS is a simple and effective solution that greatly improves the security of web applications.

30

${msg.title}

${msg.body}

Figure 4.1: HTML fragment output sanitization example.

4.2 Preventing input validation vulnerabilities

Input validation and sanitization are related techniques for helping to ensure correct web ap- plication behavior. However, while these techniques are related, they are nevertheless distinct concepts. Sanitization — in particular, output sanitization — is widely acknowledged as the pre- ferred mechanism for preventing the exploitation of XSS and SQL injection vulnerabilities. In this section, we highlight the advantages of input validation, and thereby motivate our approach we present in following sections.

4.2.1 Output sanitization One particularly promising approach to preventing the exploitation of input validation vulnera- bilities is robust, automated sanitization of untrusted input. In this approach, sanitizers are auto- matically applied to data computed from untrusted data immediately prior to its use in document or query construction [83, 87, 110]. As an example of output sanitization, consider the web template fragment shown in Fig- ure 4.1. Here, untrusted input is interpolated as both child nodes of the h1 and p DOM elements, as well as in the style attribute of the h1 element. At a minimum, a robust output sanitizer should ensure that dangerous characters such as ‘<’ and ‘&’ should not appear un-escaped in the values to be interpolated, though more complex element white-listing policies could also be applied. Additionally, the output sanitizer should be context-aware; for instance, it should au- tomatically recognize that ‘"’ characters should also be encoded prior to interpolating untrusted data into an element attribute. The output sanitizer described here would be able to prevent attacks that might bypass input validation. For instance, an input verified to be valid might nev- ertheless be concatenated with dangerous characters during processing before being interpolated into a document. Output sanitization that is automated, context-aware, and robust with respect to real browsers and databases is an extremely attractive solution to preventing XSS and SQL injection attacks. This is because it provides a high degree of assurance that the protection system’s view of un- trusted data used to compute documents and queries is identical to the real system’s view. That is, if an output sanitizer decides that a value computed from untrusted data is safe, then it is almost certainly the case that that data is actually safe to render to the user or submit to the database. Unfortunately, output sanitization is not a panacea. In particular, in order to achieve correct- ness and complete coverage of all locations where untrusted data is used to build HTML docu-

31 ments and SQL queries, it is necessary to construct an abstract representation of these objects in order to track output contexts. This generally requires the direct specification of documents and queries in a domain-specific language [83, 87], or else the use of a language amenable to precise static analysis. While new web applications have the option of using a secure-by-construction development framework or templating language, legacy web applications do not have this luxury. Furthermore, many web developers continue to use insecure languages and frameworks for new applications.

4.2.2 Input validation In contrast to output sanitization, another approach to preventing XSS and SQL injection injec- tion vulnerabilities is the use of input validation. Input validation is fundamentally the process of ensuring that program input respects a spec- ification of legitimate values (e.g., a certain parameter should be an integer, or an email address, or a URL). Any program that accepts untrusted input should incorporate some form of input validation procedures, or input validators, to ensure that the values it computes are sensible. The validation should be performed prior to executing the main logic of the program, and can vary greatly in complexity. At one end of the spectrum, programs can apply what we term implicit validation due to, for instance, typecasting of inputs from strings to integers in a statically- typed language. On the other hand, programs can apply explicit validation procedures that check whether program input satisfies complex structural specifications, such as the Luhn check for credit card numbers. In the context of web applications, input validation should be applied to all untrusted input; this includes input vectors such as HTTP request query strings, POST bodies, database queries, XHR calls, and HTML5 postMessage invocations.

4.2.3 Discussion Input validation is more general than output sanitization in the sense that input validation is con- cerned with the broader goal of program correctness, while sanitization has the specific goal of removing dangerous constructs from values computed using untrusted data. Sanitation pro- cedures, or sanitizers, focus on enforcing a particular security policy, such as preventing the injection of malicious JavaScript code into an HTML document. While rigorous input valida- tion can provide a security benefit as a side-effect, sanitizers should provide strong assurance of protection against particular classes of attacks. Input validation in isolation, on the other hand, cannot guarantee that an input it considers safe will not be transformed during subsequent pro- cessing into a dangerous value prior to being output into a document or query. Hence, input validation provides less assurance that vulnerabilities will be prevented than output sanitization. We note, however, that despite these drawbacks, input validation has significant benefits as well. First, even though input validation is not necessarily focused on enforcing security constraints, rigorous application of robust input validators has been shown to be remarkably effective at preventing XSS and SQL injection attacks in real, vulnerable web applications. For instance, in the previous chapter, we have demonstrated that robust input validation would have

32 POST /payment/submit HTTP/1.1 Host: shop.example.com Cookie: SESSION=cbb8587c63971b8e [...]

cc=1234567812345678&month=8&year=2012& save=false&token=006bf047a6c97356

Figure 4.2: HTTP POST request containing several examples of untrusted program input. been able to prevent the majority of XSS and SQL injection attacks against a large corpus of known vulnerable web applications. Second, it is comparatively simple to achieve complete coverage of untrusted input to web applications as opposed to the case of output sanitization. Web application inputs can be enu- merated given a priori knowledge of the language and development framework, whereas context- aware output sanitization imposes strict language requirements that often conflict with developer preferences. Consequently, input validation can be applied even when insecure legacy languages and frameworks are used.

4.3 Output Sanitization and Input Validation

Input validation is fundamentally the process of ensuring that program input respects a specifi- cation of legitimate values. Any program that accepts untrusted input should incorporate some form of input validation procedures, or input validators, to ensure that the values it computes are sensible. The validation should be performed prior to executing the main logic of the program, and can vary greatly in complexity. At one end of the spectrum, programs can apply what we term implicit validation due to, for instance, typecasting of inputs from strings to integers in a statically-typed language. On the other hand, programs can apply explicit validation procedures that check whether program input satisfies complex structural specifications, such as the Luhn check for credit card numbers. As an example, consider the POST request shown in Figure 4.2. The request contains several parameters, including: cc, a credit card number; month, a numeric month; year, a numeric year; save, a flag indicating whether the payment information should be persisted for future use; token, a CSRF nonce; and SESSION, a session identifier. Each of these request parameters requires a different type of input validation. For example, the credit card number should contain certain characters and pass a Luhn check. The month should be an integer between 1 and 12. The year should be an integer value representing a year in the near future. Finally, the save parameter should contain a boolean value (e.g., “0”, “1”, “true”, “false”, or “yes”, “no”). Input validation is concerned with a broader goal of program correctness, while sanitization focuses on the specific goal of removing dangerous constructs from values computed using un- trusted data. Sanitation procedures, or sanitizers, focus on enforcing a particular security policy,

33 Figure 4.3: The IPAAS architecture. A proxy server intercepts HTTP messages generated during application testing. Input parameters are classified during an analysis phase according to one of a set of possible types. After sufficient data has been observed, IPAAS derives an input validation policy based on the types learned for each application input parameter. This policy is automatically enforced at runtime by rewriting the application.

such as preventing the injection of malicious JavaScript code into an HTML document. While rigorous input validation can provide a security benefit as a side-effect, sanitizers should provide a strong assurance of protection against particular classes of attacks. Sanitizers have traditionally been applied throughout the web application processing life cy- cle, but automated output sanitization has come to be recognized as the most attractive form of the technique. Sanitizing untrusted data immediately prior to its use is highly desirable be- cause it provides a high degree of assurance that the protection system’s view of untrusted data is identical to the real system’s view. Input validation in isolation, on the other hand, cannot guarantee that an input it considers safe will not be transformed during subsequent processing into a dangerous value.

4.4 IPAAS

In this section, we present IPAAS (Input PArameter Analysis System), an approach to securing web applications against XSS and SQL injection attacks using input validation. The key insight behind IPAAS is to automatically and transparently augment otherwise insecure web application development environments with input validators that result in significant and tangible security improvements for real systems. IPAAS can be decomposed into three phases: (i) parameter extraction, (ii) type learning, and

34 Type Validator boolean (0|1)|(true|false)|(yes|no) integer (+|-)?[0-9]+ float (+|-)?[0-9]+(\+.[0-9]+)? URL RFC 2396, RFC 2732 token static set of string literals word [0-9a-zA-Z@ -]+ words [0-9a-zA-Z@ - \r\n\t]+ free-text none

Table 4.1: IPAAS types and their validators.

(iii) runtime enforcement. An architectural overview of IPAAS is shown in Figure 4.3. In the remainder of this section, we describe each of these phases in detail.

4.4.1 Parameter Extraction The first phase is essentially a data collection step. Here, a proxy server intercepts HTTP mes- sages exchanged between a web client and the application during testing. For each request, all observed parameters are parsed into key-value pairs, associated with the requested resource, and stored in a database. Each response containing an HTML document is processed by an HTML parser that extracts links and forms that have targets associated with the application under test. For each link containing a query string, key-value pairs are extracted similarly to the case of requests. For each form, all input elements are extracted. In addition, those input elements that specify a set of possible values (e.g., select elements) are traversed to collect those values.

4.4.2 Parameter Analysis The goal of the second phase is to label each parameter extracted during the first phase with a data type based on the values observed for that parameter. The labeling process is performed by applying a set of validators to the test inputs.

4.4.2.1 Validators Validators are functions that check whether a value meets a particular set of constraints. In this phase, IPAAS applies a set of validators, each of which checks that an input belongs to one of a set of types. The set of types and regular expressions describing legitimate values are shown in Table 4.1. In addition to the types enumerated in Table 4.1, IPAAS recognizes lists of each of these types.

35 4.4.2.2 Analysis Engine IPAAS determines the type of a parameter in two sub-phases. In the first, types are learned based on values that have been recorded for each parameter. In the second, the learned types are augmented using values extracted from HTML documents. Learning In the first sub-phase, the analysis begins by retrieving all the resource paths that were visited during application testing. For each path, the algorithm retrieves the unique set of parameters and the complete set of values for each of those parameters observed during the ex- traction phase. Each parameter is assigned an integer score vector of length equal to the number of possible validators. The actual type learning phase begins by passing each value of a given parameter to every possible type validator. If a validator accepts a value, the corresponding entry in that parameter’s score vector is incremented by one. In the case that no validator accepts a value, then the analysis engine assigns the free-text type to the parameter and stops processing its values. After all values for a parameter have been processed, the score vector is used to select a type and, therefore, a validator. Specifically, the type with the highest score in the vector is selected. If there is a tie, then the most restrictive type is assigned; this corresponds to the ordering given in Table 4.1. The second sub-phase uses the information extracted from HTML documents. First, a check is performed to determine whether the parameter is associated with an HTML textarea ele- ment. If so, the parameter is immediately assigned the free-text type. Otherwise, the algo- rithm checks whether the parameter corresponds to an input element that is one of a checkbox, radiobutton, or select list. In this case, the observed set of possible values are assigned to the parameter. Moreover, if the associated element is a checkbox, a multi-valued select, or the name of the parameter ends with the string [], the parameter is flagged as a list. The analysis engine then derives input validation policies for each parameter. For each re- source, the path is linked to the physical location of the corresponding application source file. Then, the resource parameters are grouped by input type (e.g., query string, request body, cookie) and serialized as part of an input validation policy. Finally, the policy is written to disk. Static Analysis The learning sub-phases described above can be augmented by static anal- ysis. In particular, IPAAS can use a simple static analysis to find parameters and application resources that were missed during the learning phase due to insufficient training data. This anal- ysis is, of course, specific to a particular language and framework. We describe our prototype implementation of the static analysis component in Section 4.4.4.

4.4.3 Runtime Enforcement The result of the first two phases is a set of input validation policies for each input parameter to the web application under test. The third phase occurs during deployment. At runtime, IPAAS intercepts incoming requests and checks each request against the validation policy for that re- source’s parameters. If a parameter value contained in a request does not meet the constraints specified by the policy, then IPAAS drops the request. Otherwise, the application continues execution.

36 A request may contain parameters that were not observed during the previous phases, either in the learning sub-phases or static analysis. In this case, there are two possible options. First, the request can simply be dropped. This is a conservative approach that might, on the other hand, lead to program misbehavior. Alternatively, the request can be accepted and the new parameter marked as valid. This fact could be used in a subsequent learning phase to refresh the application’s input validation policies.

4.4.4 Prototype Implementation Parameter extraction We have implemented a prototype of the IPAAS approach for PHP. Pa- rameter extraction is performed by a custom OWASP WebScarab extension, and HTML parsing performed by jsoup. WebScarab is a client-side interceptor proxy, but this implementation choice is of course not a restriction of IPAAS. The extractor could have easily been implemented as a server-side component as well, for instance as an Apache filter. Type learning The parameter analyzer was developed as a collection of plugins for Eclipse and makes use of standard APIs exposed by the platform, including JFace and SWT. The Java DOM API was used to read and write the XML-based input validation policy files. Static analyzer We implemented a simple PHP static analyzer using the Eclipse PHP De- velopment Tools (PDT). The analyzer scans PHP source code to extract the set of possible input parameters. There are many ways in which a PHP script can access input parameters. In simple PHP applications, the value of an input parameter is retrieved by accessing one of the following global arrays: $ GET, $ POST, $ COOKIE, or $ REQUEST. However, in more complex appli- cations, these global arrays are wrapped by special library functions that are specific to each web application. In order to collect input parameters for PHP, our static analyzer performs pattern match- ing against source code and records the name of input parameters. The location of the name of an input parameter can be specified in a pattern. A pattern can be specified as a piece of PHP code and is attached to one or more input vectors (e.g., $ GET). For example, the pattern optional param(’$’, ’*’) specifies a pattern that we used to extract input parameters from the source code of the Moodle web application. The analyzer makes a best-effort attempt to find all occurrences of function invocations of optional param having two parameters. The value in the first argument is recorded, and the second argument is a “don’t care” that is ignored. The analyzer can capture the names of input parameters in a similar way when the input parameter is accessed via an array. To perform the pattern matching itself, the analyzer transforms the pattern and the PHP script to be analyzed into an abstract syntax tree (AST). Then, the static analyzer tries to match the pattern AST against the AST for the PHP script. For each match found in the source code, the analyzer then traverses the script’s control flow graph (CFG) to check whether the match is reachable from the entry point of the script. For example, when an optional param function invocation is observed, the analyzer checks whether a potential call chain exists from the invoca- tion site to the script entry point. CFG traversal is recursive, including inclusions of other PHP files using the require and include statements. Runtime enforcement The runtime component is implemented as a PHP wrapper that is

37 Application PHP Files Lines of Code Joomla 1.5 450 128930 Moodle 1.6.1 1352 365357 Mybb 1.0 152 42989 PunBB 1.2.11 70 17374 Wordpress 1.5 125 29957

Table 4.2: PHP applications used in our experiments. executed prior to invoking a PHP script using PHP’s autoprepend mechanism. The PHP XMLReader library is used to parse input validation policies. The validation script checks the contents of all possible input vectors using the validation routines corresponding to each param- eter’s learned type.

4.4.5 Discussion The IPAAS approach has the desirable property that, as opposed to automated output sanitiza- tion, it can be applied to virtually any language or development framework. IPAAS is can be deployed in an automated and transparent way such that the developer need not be aware that their application has been augmented with more rigorous input validation. While the potential for false positives does exist, our evaluation results in Section 4.5 suggest that this would not be a major problem in practice. However, our current implementation of IPAAS has a number of limitations. First, type learn- ing can fail in the presence of custom query string formats. In this case, the IPAAS parameter extractor might not be able to reliably parse parameter key-value pairs. Second, the prototype implementation of the static analyzer is fairly rudimentary. For in- stance, it cannot infer parameter names from variables or function invocations. Therefore, if an AST pattern is matched and the argument that is to be recorded is a non-terminal (e.g., a variable or function invocation), then the parameter name cannot be identified. In these cases, the loca- tion of the function invocation is stored along with a flag indicating that an input parameter was accessed in a dynamic way. This allows the developer the opportunity to identify the names of the input parameters manually after the analyzer has terminated, if desired.

4.5 Evaluation

To assess the effectiveness of our approach in preventing input validation vulnerabilities, we tested our IPAAS prototype on five real-world web applications shown in Table 4.2. Each ap- plication is written in PHP, and the versions we tested contain many known, previously-reported XSS and SQL injection vulnerabilities. To run our prototype, we created a development environment by importing each application as a project in Eclipse version 3.7 (Indigo) with PHP Development Tools (PDT) version 3.0

38 Parameter Type Joomla Moodle MyBB PunBB Wordpress Total xss sqli xss sqli xss sqli xss sqli xss sqli xss sqli word 2 4 5 10 11 14 16 2 5 0 39 (36%) 30 (25%) integer 1 7 0 28 6 23 6 3 4 2 17 (16%) 63 (53%) free-text 3 2 4 0 5 1 4 0 13 0 29 (27%) 3 (3%) boolean 1 0 0 1 1 4 5 0 0 0 7 (6%) 5 (4%) enumeration 1 2 0 0 3 8 1 2 0 1 5 (5%) 13 (11%) words 2 1 0 1 0 0 2 0 1 0 5 (5%) 2 (2%) URL 0 0 0 0 1 0 1 0 3 0 5 (5%) 0 (0%) list 0 0 0 1 1 2 1 1 0 0 2 (2%) 4 (5%) Total 10 16 9 41 28 52 36 8 26 3 109 120

Table 4.3: Manually identified data types of vulnerable parameters in five large web applications. installed.

4.5.1 Vulnerabilities Before starting our evaluation, we extracted the list of vulnerable parameters for reach applica- tion by analyzing the vulnerability reports stored in the Common Vulnerabilities and Exposures (CVE) database hosted by NIST [71]. For each extracted parameter, we manually verified the existence of the vulnerability in the corresponding application. In addition, we manually deter- mined the data type of the vulnerable parameter. Table 4.3 summarizes the results of the manual analysis, and shows, for each web application, the number of vulnerable parameters having a particular data type. The dataset resulting from this analysis contains 109 XSS and 120 SQL injection vulnerable parameters. According to Table 4.3, more than half of the SQL injections are associated with integer pa- rameters, while the majority of the XSS vulnerabilities are exploited through the use of parame- ters of type word. Interestingly, only a relatively small number of vulnerabilities are caused by free-text or similarly unconstrained parameters. This supports our hypothesis that IPAAS can be used in practice to automatically prevent the majority of input validation vulnerabilities.

4.5.2 Automated Parameter Analysis

39 Parameter Type Joomla Moodle MyBB PunBB Wordpress Total xss sqli xss sqli xss sqli xss sqli xss sqli xss sqli word 2 4 1 0 5 7 12 1 1 0 21 (19%) 12 (10%) integer 1 6 0 2 2 8 5 1 3 0 11 (10%) 17 (14%) free-text 3 2 1 0 4 0 2 0 10 0 20 (18%) 2 (2%) boolean 1 0 0 0 11 33 4 0 11 11 7 (6%) 4 (3%) enumeration 1 2 0 0 1 3 1 2 0 1 3 (3%) 8 (6%)

40 words 2 0 0 0 0 0 1 0 0 0 3 (3%) 0 (0%) URL 0 0 0 0 1 0 0 0 1 0 2 (2%) 0 (0%) list 0 0 0 0 0 1 0 0 0 0 0 (0%) 1 (1%) unknown 0 0 2 0 2 1 1 0 1 0 6 (6%) 1 (1%) Correctly Identified 10 14 4 2 15 20 26 4 16 1 71 (65%) 41 (34%) Wrongly Identified -- -- 1 3 -- 1 1 2 (1.8%) 4 (3.3%)

(*) number reported as superscript indicate the parameters identified with an incorrect type.

Table 4.4: Typing of vulnerable parameters in five large web applications before static analysis. In order to automatically label parameters with types, IPAAS requires a training set contain- ing examples of benign requests submitted to the web application. We collected this input data by manually exercising the web application and providing valid data for each parameter. The results of our automated analysis are summarized in Table 4.4. For each application, the table reports the number of vulnerable parameters having a particular type. The results show that less than half of the parameters could be identified automatically. For most, our system was able to assign the correct type. However, in a few cases, the parameter was part of a request or serialized in a response, but had no value assigned to it. Hence, the type could not be identified. These parameters are reported as having type unknown. Finally, IPAAS wrongly assigned the type boolean instead of integer to two XSS and four SQL injection vulnerable parameters. These misclassifications are caused by the overlap between boolean and integer validators. In fact, parameters having values of “0” and “1” can be considered of type boolean as well as integer (i.e., if only the values “0” and “1” are observed during training, the analysis engine gives priority to the type boolean). Collecting more data for each parameter by exercising the same functionality of a web application multiple times can result in different values for the same parameter. Hence, collecting more training data would increase the probability that our algorithm makes the correct classification.

4.5.3 Static Analyzer

41 Type Joomla Moodle MyBB PunBB Wordpress Total xss sqli xss sqli xss sqli xss sqli xss sqli xss sqli Detected by static analysis 3 9 6 40 28 46 24 8 23 1 94 (86%) 104 (87%) Missed during type analysis 0 2 2 37 10 18 10 4 6 0 28 (26%) 61(51%)

Table 4.5: Results of analyzing the code.

Type Joomla Moodle MyBB PunBB Wordpress Total 42 xss sqli xss sqli xss sqli xss sqli xss sqli xss sqli word 2 4 4 7 10 10 15 1 5 0 36 (33%) 22 (18%) integer 1 7 0 25 6 21 5 3 4 2 16 (15%) 58 (48%) free-text 3 2 3 0 4 1 2 0 10 0 22 (20%) 3 (3%) boolean 1 0 0 1 1 3 4 0 0 0 6 (6%) 4 (3%) enumeration 1 2 0 0 3 8 1 2 0 1 5 (5%) 13 (11%) words 2 1 0 1 0 0 2 0 1 0 5 (5%) 2 (2%) URL 0 0 0 0 1 0 0 0 2 0 3 (3%) 0 (0%) list 0 0 0 0 0 1 0 0 0 0 0 (0%) 1 (1%) unknown 0 0 0 0 1 0 1 0 0 0 2 (2%) 0 (0%) Total 10 16 7 34 26 44 30 6 22 3 95 (87%) 103 (86%)

Table 4.6: Typing of vulnerable parameters in five large web applications after static analysis. Application Vulnerabilities Prevented Vulnerabilities xss sql xss sqli Joomla 10 16 7 (70%) 14 (88%) Moodle 9 41 4 (44%) 34 (83%) MyBB 28 52 21 (75%) 43 (83%) PunBB 36 8 27 (75%) 6 (75%) Wordpress 26 3 12 (46%) 3 (100%) Total 109 120 71 (65%) 100 (83%)

Table 4.7: The number of prevented vulnerabilities in various large web applications.

To improve the detection ratio of the vulnerable parameters, we ran our static analyzer on the source code of each application. Table 4.5 shows the number of vulnerable parameters that were identified with the help of the static analyzer. The tool was able to find 86% of the XSS and 87% of the SQL injection affected parameters. By comparing these input parameters with the ones that were detected by the analysis engine, we see that 26% of the XSS and 51% of the SQL injection affected parameters were missed by the analysis engine, but were found by the static analyzer. Hence, the static analyzer component can help in achieving a larger coverage of the type analysis, and, thus, help prevent a larger number of vulnerabilities. Based on these results, we collected more input data by testing the functionality of each web application using the data from the static analyzer. Then, we ran IPAAS again to determine the data types of the newly discovered parameters, and we manually verified whether the types were correctly identified. The results are shown in Table 4.6. In this case, we obtained better coverage, with 87% of XSS and 86% of SQL injection affected parameters being properly identified. In addition, none of the parameters were misclassified. Although the static analyzer helps significantly in achieving a higher coverage, a few param- eters were still missed during analysis. This problem could be improved by employing a more precise static analysis. Also, we believe that unit testing might serve as an additional source of test input data to help improve IPAAS’ coverage.

4.5.4 Impact To assess the extent to which IPAAS is effective in preventing input validation vulnerabilities in practice, we manually tested whether it was still possible to exploit the aforementioned vulnera- bilities while IPAAS was enabled. During our tests, we explored different ways to perform the attacks, and to evade possible sanitization and validation routines as reported by XSS and SQL cheatsheets available on the Internet. Table 4.7 shows the number of XSS and SQL injection vulnerabilities that are prevented by IPAAS. We observe that most of the SQL injection vulnerabilities and a large fraction of

43 XSS vulnerabilities became impossible to exploit with the input validation policies that were automatically extracted in our last experiment in place. The results of this analysis are consistent with our observation that the majority of input validation vulnerabilities on the web can be prevented by labeling the parameter with a data type that properly constrains the range of legitimate values. If a parameter is assigned to an unknown or unrestricted type such as free-text, our system will still accept arbitrary input. In these cases, the vulnerability is not prevented by our system. The difference in the number of prevented XSS and SQL injection vulnerabilities is mainly due to the relatively large number of integer parameters that are vulnerable to SQL injection, while many XSS vulnerabilities are due to injections in free-text parameters. We believe that the large number of parameters vulnerable to SQL injection that correspond to the type integer is caused by the phenomenon that web applications frequently use integers to identify records.

4.6 Related Work

In this section, we place IPAAS in the context of related work on web application security.

4.6.1 Input validation Much work has been done that aims to mitigate the impact of malicious input data without chang- ing the application’s source code. Scott and Sharp [95] proposed an application-level firewall to prevent malicious input from reaching the web server. Their approach required a specification of constraints on different inputs, and compiled those constraints into a policy validation program. In contrast, our approach automatically learns the specification of constraints. Automating the task of generating test vectors for exercising input validation mechanisms is also a topic explored in the literature. Sania [52] is a system to be used in the development and debugging phases. It automatically generates SQL injection attacks based on the syntactic struc- ture of queries found in the source code and tests a web application using the generated attacks. Saxena et al. proposed Kudzu [89], which combines symbolic execution with constraint solving techniques to generate test cases with the goal of finding client-side code injection vulnerabilities in JavaScript code. Halfond et al. [32] use symbolic execution to infer web application interfaces to improve test coverage of web applications. Several works propose techniques based on sym- bolic execution and string constraint solving to automatically generate XSS and SQL injection attacks and input generation for systematic testing of applications implemented in C [12, 50, 49]. We consider these mechanisms to be complementary to our approach, in that they could be used to automatically generate malicious input for “free-text” fields, or to create legitimate input for other fields during the type learning phase.

44 4.6.2 Attack detection and prevention Different techniques have been proposed to detect the occurrence of XSS and SQL injection attacks in HTTP traffic [8, 53, 84, 85]. Intrusion detection systems such as Snort [85], are con- figured with a number of ‘signatures’ that support the detection of web-based attacks. These systems match patterns that are associated with known exploits against HTTP traffic obtained while monitoring web applications. Unfortunately, it is very difficult to keep the set of sig- natures up-to-date as new signatures must be developed when new attacks or modifications to previously known attacks are discovered. Anomaly-based intrusion detection systems [8, 53, 84] establish models describing the normal behavior of the monitored system and rely on these mod- els to identify anomalous activity that may be associated to intrusions. The main advantage of anomaly detection systems compared to signature-based intrusion detection is that they can iden- tify unknown attacks. While anomaly-based detection systems have the potential to protect web applications effectively against XSS and SQL injection attacks, they suffer from a large number of false positives. In contrast to anomaly-based detection systems, our approach employs static analysis to achieve a larger coverage of protected parameters to the web application. Preventative techniques for mitigating XSS and SQL injection vulnerabilities focus either on client-side mechanisms, or on server-side mechanisms. Client-side or browser-based mecha- nisms such as Noxes [51], Noncespaces [30], or DSI [70] make changes to the browser infras- tructure aiming to prevent the execution of injected scripts. Each of these approaches requires that end-users upgrade their browsers or install additional software; unfortunately, many users do not regularly upgrade their systems [108]. Many techniques focus on the prevention of injection attacks using runtime monitoring. For example, Wassermann and Su [100] propose a system that checks at runtime the syntactic struc- ture of a query for a tautology. AMNESIA [33] checks the syntactic structure of queries at run- time against a model that is obtained through static analysis. XSSDS [46] is a system that aims to detect XSS attacks by comparing HTTP requests and responses. While these systems focus on preventing injection attacks by checking the integrity of queries or documents, we focus on input validation. Recent work has focused on automatically discovering parameter injection [4] and parameter tampering vulnerabilities [80]. Among server-side approaches, leveraging language type systems has been proposed as an XSS and SQL defense mechanism by Robertson et al [83]. In this approach, XSS attacks are prevented by generating HTTP responses from statically-typed data structures that represent web documents. During document rendering, context-aware sanitization routines are automatically applied to untrusted values. The approach requires that the web application constructs HTML content using special algebraic data types. Recent work has also focused on the correct use of sanitization routines to prevent XSS attacks. Scriptgard [90] can automatically detect and repair mismatches between sanitization routines and context. In addition, it ensures the correct ordering of sanitization routines. Samuel et al. [87] propose a type-qualifier based mechanism that can be used with existing templating languages to achieve context-sensitive auto-sanitization. Both approaches only focus on prevent- ing XSS vulnerabilities. As we focus on automatically identifying parameter data types, our approach can help identify other vulnerabilities such as SQL injection or, in principle, HTTP

45 Parameter Pollution.

4.6.3 Vulnerability analysis Static analysis as a tool for finding security-critical bugs in software has also received a great deal of attention. WebSSARI [37] was one of the first efforts to apply classical information flow techniques to web application security vulnerabilities, where the goal of the analysis is to check whether a sanitization routine is applied before data reaches a sensitive sink. Several static anal- ysis approaches have been proposed for various languages [47, 61]. Unfortunately, due to the inherently dynamic nature of scripting languages, static analysis tools are often imprecise [111]. The IPAAS approach incorporates a static analysis component as well as a dynamic compo- nent to learn parameter types. While our prototype static analyzer is simple and imprecise, our evaluation results are nevertheless encouraging. Runtime approaches to automatically harden web applications have been proposed for PHP [78] and Java [31]. Although these approaches can work at a finer-grained level than static analysis tools, they incur runtime overhead. Both approaches aim to detect missing sanitization function- ality while our focus is on the validation of untrusted user input. The XSS cheatsheet [86] is a list of XSS vectors that can be used to bypass many sanitiza- tion routines. Balzarotti et al. [6] show that web applications do not always implement correct sanitization routines. The BEK project [36] proposes a system and languages for checking the correctness of sanitizers.

4.7 Summary

Web applications are popular targets on the Internet, and well-known vulnerabilities such as XSS and SQL injection are, unfortunately, still prevalent. Current mitigation techniques for XSS and SQL injection vulnerabilities mainly focus on some aspect of automated output sanitization. In many cases, these techniques come with a large runtime overhead, lack precision, or require invasive modifications to the client or server infrastructure. In this Chapter, we identified automated input validation as an effective alternative to out- put sanitization for preventing XSS and SQL injection vulnerabilities in legacy applications, or where developers choose to use insecure legacy languages and frameworks. We presented the IPAAS approach, which improves the secure development of web applications by transparently learning types for web application parameters during testing, and automatically applying robust validators for these parameters at runtime. The evaluation of our implementation for PHP demonstrates that IPAAS can automatically protect real-world applications against the majority of XSS and SQL injection vulnerabilities with a low false positive rate. As IPAAS ensures the complete and correct validation of all input to the web application, IPAAS can in principle prevent other classes of input validation vulnerabilities as well. These classes include Directory Traversal, HTTP Response Splitting and HTTP Parameter Pollution.

46 Chapter 5

Enabling Message Security for RESTful Services

5.1 Context

With the growing interest of cloud computing, systems are getting inter-connected faster, as ap- plications and cloud API’s make intensive usage of RESTful services to expose resources to consumers. There has been a shift from SOAP-based services to more lightweight communica- tion, based on REST which allowed a number of advancements in the way resources are used on the web. As REST web services are self-described, resources can be manipulated through a set of verbs already provided in the communication protocol, accelerating the adoption of the REST philosophy. On the other hand, REST suffers from the absence of meta-descriptions, specially concerning security requirements. Different solutions have been developed to provide a common way to address service de- scription and communication. For SOAP-based web services, the standard defines envelopes to transmit requests and responses. In contrast, the REST concepts coined by Roy Fielding in his Ph.D. dissertation [24] simplify access to web services by reusing existing and widespread standards instead of adding new layers to the communication stack. The reuse of HTTP protocol contributed to the large industry adoption of RESTful services, supported by the simple CRUD set of operations (Create, Read, Update, Delete). RESTful services suffer from the lack of a specific security model, unlike SOAP-based ser- vices which rely on the message security model defined in WS-Security [73] standard. Es- pecially, the security of existing RESTful API’s rely on transport layer security and on some home-made message protection mechanism. The former protects efficiently point-to-point com- munication channels, but becomes a burden for mobile systems, as the TLS channel need to be frequently reset. The latter can be error-prone, as security protocols are known to be tricky. In this chapter we provide a security protocol to make message security implementation as lightweight and efficient as possible, and yet to respect the REST principles. We show how message signature and encryption can address communication security for RESTful services at a fine-grained level. We present results of the benchmark we conducted on our implementation

47 and compare it to the equivalent realization using SOAP and WS-Security. The chapter is organized as follows: in Section 5.2, We present the REST security protocol and the threat model we aim to mitigate. In Section 5.3 we position our protocol with regards to WS-Security via a benchmark. Then we discuss related works in Section 5.4 and conclude in Section 6.5.

5.2 REST Security Protocol

In the following, we propose a protocol to secure communications for RESTful services. We provide Encryption, Signature and their combination. We do not aim to provide an equivalent of Secure Conversation for RESTful services, as it relates to some transport layer security for HTTP which is already addressed in protocols such as TLS.

5.2.1 Message Security Model We specify an abstract message security model based on confidentiality and digital signatures to protect RESTful messages. The associated threat model is exactly the same as the one described in Web-Service Security standard [73]: “The message could be modified or read by attacker or an antagonist could send messages to a service that, while well-formed, lack appropriate security claims to warrant processing”. For instance, a malicious attacker can intercept messages on any intermediary between peers. We want messages to carry tokens for non-repudiation (via digital signatures), to provide data confidentiality by encrypting its content, and to have replay attack protection.

5.2.2 PKI-based message exchange We assume that a PKI landscape is in place and that certificates have been exchanged between clients and servers prior to the communication. In this way we are able to transmit a certifi- cate identifiers within the messages instead of full certificates, what would bring unnecessary overhead. In order to distinguish a certificate on both client and server sides, we rely on a unique identifier, called Certificate ID, known to all entities. The Certificate ID is the aggregation of a serial number and an issuer name. The RFC 5280 [40] specifies that serial numbers “MUST be unique for each certificate issued by a given CA, i.e., the issuer name and serial number identify a unique certificate”. The issuer name in our case can be represented by the Distinguished Name of a X509 certificate.

5.2.3 The REST Security principle The principle of our protocol is to propose secure communication at the message level with the minimum overhead: we try to respect the philosophy of RESTful services and to reuse HTTP protocol to its full advantage. For example, we take into account the specificity of HTTP verbs

48 Header keys Value X-JAG-CertificateID Unique identifier for a certificate X-JAG-DigestAlg Algorithm used to obtain digest X-JAG-DigestValue Value of the digest(s) X-JAG-SigAlg Algorithm used to obtain the signature X-JAG-SigValue Value of the signature(s) X-JAG-EncAlg Algorithm used to encrypt headers and messages’ part X-JAG-EncKeyAlg Algorithm used to encrypt the symmetric key X-JAG-EncKeyValue Encrypted value of the symmetric key X-JAG-MultiParts Designation of headers and messages’ part Table 5.1: REST security protocol headers in the design of the protocol. The REST security protocol is closely related to the WS-Security standard: it proposes a fine-grained approach to provide authenticity, non repudiation, and con- fidentiality to messages. But the approach targets another type of service. We claim that our approach is complementary to provide consistent application of security policies, disregarding the type of service being addressed. When comparing both approaches, we can highlight the reduced development effort and also less computation at runtime. This is a consequence of the optimization in the message size while we have performed, yet respecting the compatibility with service’s definition and implementation. We propose a set of HTTP-headers for transmitting meta-data, unlike WS-Security which modifies messages to add its own container describing the security meta data. The headers are described in Table 5.1. They start with a prefix “X-JAG” to distinguish them from other applica- tion headers. The main difference with the WS-Security approach, is that we are agnostic about the information format. WS-* services use a strict approach to determine the transformations of XML-based messages to ensure the correct handling by interpreters at both sides. In our ap- proach, we consider the information as a set of multiparts, and protocol headers. It allows us to gain flexibility in terms of fine-grained signature and encryption of attached documents, and/or to restrict visibility of a number of headers.

In the following, we present the REST security protocol process. For illustration purposes, we present the interaction trace produced by the request of a RESTful service in the Listing 5.1. A client requests customer information to the service and expects a JSON-encoded result. One can notice the expected result can be in any format accepted by the server (e.g., XML, YAML, plain text, audio file, binary content, etc.). The response produced by the application server starts at Line 6 . 1 GET /customer/123 HTTP/1.1 2 Accept: application/json 3 Host: 127.0.0.1:8080 4 Connection: keep −a l i v e 5 6 HTTP/1.1 200 OK 7 Server: Apache −Coyote / 1 . 1 8 Content −Type: application/json

49 9 Content −Length: 77 10 11 {”Customer”: { ”firstname”:”Gabriel”,”id”:123,”lastname”:”Serme”,” title ”:” Mr”}}

Listing 5.1: RESTful request and response

5.2.4 Message Signature Providing digital signature along with requests gives confidence on the data being transmitted. A server might need information on the authenticity of a message to launch internal orders and to render the service correctly. A digital signature brings non-repudiation: a requester cannot deny the request. Also, the service cannot later repudiate the response if it includes signed token linked to the initial request. Additionally, digital signature protects from unintentional or malicious modifications during the transmission. Algorithm 1 presents the steps to attach signature information to the message after a “digest then encrypt” processing. It starts with a message m or part of it, with: the digest algorithm, the signature algorithm, the Certificate Id of the sender, and the private key of the sender. The algorithms can be decided by the sender itself, or imposed by the server policy. In our implemen- tation, we allow the client to decide about the algorithm to be used, but the server can deny access if its policy considers the protection to be insufficient. We have defined a “digest then encrypt” function over the message payload, security parameters, and header information. The algorithm vary slightly depending on the concrete signature algorithm. The values are then attached to the message along with algorithm information.

Algorithm 1 Signature of REST messages Require: m is a message, sig is a signature algorithm name, dig is a digest algorithm name, cid is a Certificate Id, pk is the sender private key, urlpath the requested path, hds are headers element to protect dv ← digest(m.payload, dig) url ← ‘’ if m is a request then url ← urlpath end if bytes ← concat(dv,url,sig,dig,cid,hds) digValue ← digest(bytes,dig) m.sigValue ← encrypt(digValue,sig, pk) m.{url,sig,dig,cid,hds} ← url,sig,dig,cid,hds

In Algorithm 2, we present the signature verification function. It starts from a message m, or part of it, and with the public key of the sender. The steps are the reverse of the previous “digest then encrypt” algorithm. We first calculate the digest value of a set of headers and the payload. Then, we retrieve the digest value calculated by the sender. The encrypted value is transmitted

50 along with the message, on a specific header. When we decrypt the value, we are then able to detect any corruption in the payload and headers but also to guarantee message safety and authenticity, as it has been digitally proved by the sender.

Algorithm 2 Verification of REST Signature Require: m is a message, Pk is the sender public key dv ← digest(m.payload,m.dig) bytes ← concat(dv,m.url,m.sig,m.dig,m.cid,m.hds) calculatedDigest ← digest(bytes,m.dig) retrievedDigest ← decrypt(m.sigValue,m.sig,Pk) if retrievedDigest ≡ calculatedDigest then return true end if return false

The Listing 5.2 presents a HTTP trace with concrete headers and payload value. The request starts at Line 1 and the response starts at Line 10. We can observe for example that message request is issued by a sender identified as the 4102th certificate issued by the CESSA Authority. This sender protects the request of the customer 123. The response is given by another peer, identified as the 4th certificate issued by the CESSA Authority, on Line 12. The request and response are here signed, which allows the party consuming the message to verify the identity of the producer and the validity of the security token, to detect if the message has been tam- pered with. A replay attack can be avoided by binding the messages to elements with unique characteristics: MAC, timestamp , session related nonce, etc..

5.2.5 Message Encryption Message encryption provides confidentiality to sensitive assets so that no eavesdropping and data modification happen during messages transmission. In requests, several assets are transmitted, such as payload, session headers in cookies, etc. In our approach, we focus on payload and header protection mainly. We envisage extensions to address parameter encryption in GET re- quests in future versions of the protocol. The encryption has the property to modify the payload and headers, unlike signature which needs read-only access to the message. The encryption mechanism is also process-intensive. The Algorithm 3 processes the payload of a message, or part of it for encryption. The PKI environment gives us mechanisms to share information between actors: the public and private keys. However, asymmetric algorithms are too heavy in order to perform an encryption on large amounts of data. Instead, we generate a symmetric key for encryption. This key is small enough to be encrypted with an asymmetric algorithm and sent with the message. Thus, the message contains an encrypted symmetric key for the receiver, the encrypted payload, and several headers expressing the algorithm used for encryption. The Algorithm 4 presents the reverse operation with respect to the above algorithm, to be executed on the receiver side. The procedure is performed on an encrypted message m or part

51 1 GET /sign/customer/123 HTTP/1.1 2 Accept: application/json 3X −JAG−CertificateID : CN=CA CESSA , <... >O=SAP Labs France , C=FR;4102 4X −JAG−DigestAlg: w3.org/2000/09/xmldsig#sha1 5X −JAG−DigestValue : 2 jmj7l5rSw0yVb / vlWAYkK /YBwk= 6X −JAG−SigAlg: w3.org/2000/09/xmldsig#rsa −sha1 7X −JAG−SigValue : CwgrRTaC0oGBMpLPF6m<... >+gjtCMnuC+2svEdI5zJvITbM= 8 Host: 127.0.0.1:8080 9 10 HTTP/1.1 200 OK 11 Server: Apache −Coyote / 1 . 1 12X −JAG−CertificateID : CN=CA CESSA , <... >O=SAP Labs France , C=FR;4 13X −JAG−DigestAlg: w3.org/2000/09/xmldsig#sha1 14X −JAG−DigestValue : RUAYhPTuXqwChvIGrclAyRtA22Y= 15X −JAG−SigAlg: w3.org/2000/09/xmldsig#rsa −sha1 16X −JAG−SigValue : pmpc347XG/8a9QIFWYaHHsbt79hCwF<... >G/ buHnjsHQvZhaggilRuM= 17 Content −Type: application/json 18 Content −Length: 77 19 20 {”Customer”: { ”firstname”:”Gabriel”,”id”:123,”lastname”:”Serme”,” title ”:” Mr”}}

Listing 5.2: Signed request and response

Algorithm 3 Encryption of a REST message Require: m is a message, Pk is the receiver public key, enc is a symmetric algorithm name, aenc is an asymmetric algorithm name, hds are headers element to protect skey ← generateSymmetricKey(enc) m.payload ← encrypt(m.payload,skey) for all name,value ← hds do hds[name] ← encrypt(value,skey) end for m.keyValue ← encrypt(skey,aenc,Pk) m.enc,aenc,hds ← enc,aenc,hds

52 1 GET /encrypt/customer/123 HTTP/1.1 2 Accept: application/json 3X −JAG−CertificateID : CN=CA CESSA , <... >O=SAP Labs France , C=FR;4102 4 Host: 127.0.0.1:8080 5 6 HTTP/1.1 200 OK 7 Server: Apache −Coyote / 1 . 1 8X −JAG−CertificateID : CN=CA CESSA , <... >O=SAP Labs France , C=FR;4 9X −JAG−EncKeyValue: RHvEjpmkt2QF3ZPCtqFbflDzA48 <... > / UYNCYPbB265W2ZjYhL5VQSyv1Xs3Skm0= 10X −JAG−EncAlg: w3.org/2001/04/xmlenc#aes128 −cbc 11X −JAG−EncKeyAlg: w3.org/2000/09/xmldsig#rsa −sha1 12 Content −Type: application/json 13 Content −Length: 101 14 15 eIdV39 /XV/IHgPNWB2Hpo2jWglsI9p <... > k5c4+vVs9d53o6OEoh7M0bybmtGwdZE=

Listing 5.3: Encrypted payload during a request of it. The message usually contains meta-information about encrypted parts and algorithms used for key encryption and data encryption. Otherwise, these information should result of a previous agreement between the sender and the receiver. To decrypt the data, the receiver retrieves the symmetric key and uses it to replace the headers and the payload.

Algorithm 4 Decryption of a REST message Require: m is a message, pk is the receiver private key skey ← decrypt(m.keyValue,m.aenc, pk) for all name,value ← m.hds do m.hds[name] ← decrypt(value,m.enc,skey) end for m.payload ← decrypt(m.payload,m.enc,skey)

The Listing 5.3 presents a HTTP trace where the request does not contain custom informa- tion apart from the Certificate Id. The service has been configured to send back all messages encrypted. The service then processes and encrypts the message content for the requester. In the Listing, the payload is protected and no eavesdropping can be performed during the trans- mission. The protection mechanisms described in the previous section for replay attacks are also apply here.

5.2.6 Signature and Encryption Signature combined with encryption is an important feature. Signature alone brings non-repudiation to the system, but an attacker can still read the content of messages and remain unnoticed. Pro- viding encryption-only brings data confidentiality, but do not prevent against data tampering: any intruder can replace the payload and security tokens with its own, as there is no binding with the proof of identity. For this purpose, the combination of encryption and signature at the message

53 1 PUT /sign/customer/111/file HTTP/1.1 2 Content −Type: multipart/form −data; boundary=”uuid:7d156074 −35”; start=”< r o o t >”; 3X −JAG−CertificateID : CN=CA CESSA , <... >O=SAP Labs France , C=FR;4102 4X −JAG−DigestAlg: w3.org/2000/09/xmldsig#sha1 5X −JAG−DigestValue : 0;8X3Ci4M+bhWKMg+f83CXoXXjjns= 6X −JAG−SigAlg: w3.org/2000/09/xmldsig#rsa −sha1 7X −JAG−SigValue : 0;lcj7v4UAMxFOkhBoX+8 <... >NKo393OQ= 8X −JAG−M u l t i p a r t s: 0; < r o o t > 9 Host: 127.0.0.1:8080 10 T r a n s f e r −Encoding: chunked 11 12 −−uuid:7d156074 −35 13 Content −Type: application/octet −s t r e a m 14 Content −T r a n s f e r −Encoding: binary 15 Content −ID: 16 Content −Disposition: attachment;filename=data.dat 17 <..binary content.. > 18 19 <.. HTTP Response.. >

Listing 5.4: Multipart signature example level provides confidence that data is kept confidential from intruders, and that no modification have been made to it. The signature testifies authenticity of the encrypted content, and only the receiver can retrieve the original data. In the current version of our work, we do not address ordering between the two mechanisms, therefore it is not yet possible to encrypt a signature.

5.2.7 Multiparts We consider the case where one request or response message contains several parts. It is the case for example when forms are submitted with several fields containing user data, or when several files are attached along the same request. In such case, we might have general-purpose infor- mation and sensitive-information. To encrypt sensitive information, we need a mechanism that specifies the format of the different parts. We have several choices: we can apply the security requirements on the entire request/response of the RESTful service, or just on some parts/ele- ments. HTTP makes usage of the Multipurpose Internet Mail Extensions (MIME) standard1 to separate the content in several parts. We can take advantage of this usage to distinguish parts of the data along requests. Therefore, if a request contains multiple parts, we can choose to sign and encrypt some of them without affecting the others. The approach differs from what is implemented in WS-Security standards and S/MIME stan- dard. In our approach, we are independent from the actual content-type, and proposes to gather in one place all security meta-data. WS-* standards deal with XML-based content, so they pro- pose a fine-grained approach at the XML-data level. Our approach is more general, and provides resource-grained encryption and signature. The Listing 5.4 highlights this principle. It represents

1http://www.ietf.org/rfc/rfc2045

54 the signature for the first multipart element identified by . In a multipart environment, the meta-information vary depending on the part subject to encryption or signature. The header X- JAG-Multiparts contains a set of multipart elements and some headers referenced by identifiers. These identifiers are used to reference digest and signature values in the other security headers.

5.3 Comparison to WS-Security

The REST security protocol is close to the WS-Security standard. WS-Security [73] describes enhancements to SOAP messaging to provide protection through message integrity, confiden- tiality, and single message authentication. More precisely, it is an open format for signing and encrypting message parts leveraging XML Digital Signature and XML Encryption protocols, for supplying credentials in the form of security tokens, and for securely passing those tokens in a message. As explained in previous sections, the REST security protocol has been designed to be an equivalent alternative to WS-Security for RESTful services, with some differences in the way messages are secured.

5.3.1 Environment & Methodology In order to position the protocol performance with respect to the state of the art, we have run several performance tests to compare WS-* and RESTful based services. In order to have a clear methodology and to reproduce performance tests, the evaluation has been made on the same environment to eliminate network side-effects. We limited resource starvation on the server to obtain accurate data. The Table 5.2 lists server characteristics. In order to compare the different services, we evaluate them in a single framework proposing coverage of both JAX-RS and JAX- WS specifications. The CXF service framework2 allows us to compare the complexity of the two kinds of web services under the same conditions.

We have defined and implemented three scenarios, corresponding to real-use cases. In this way, we simulate several scenarios in order to evaluate and compare performance, message size, etc. The three scenarios correspond to:

5.3.1.0.1 Simple Get In the following, we identify this scenario with the acronym Get. The scenario retrieves information without further processing. It is materialized by the invocation of a method in WS-* to retrieve customer information, from customer identifier. In RESTful services, the client requests a customer through a GET action, and the service renders the customer in the requested format.

5.3.1.0.2 Modify Post In the following, we identify this scenario with the acronym Post. In this scenario, the data is transmitted in the request phase, and the response phase is just an in- dicator of the success or failure. Some additional processing is made on background to modify

2http://cxf.apache.org/index.html

55 Processor Intel Core i7-2600 @ 3.40GHz Installed 16 GB RAM Hard Drive Seagate ST3500413AS Bar- racuda 7200 500 GB Application Tomcat 7.0.21 Server Server JVM -Xmx 8000m Memory WS frame- CXF 2.4.2 work Server certifi- RSA 1024 cate Client’s RSA 4096 certificates Table 5.2: Benchmark environment objects on the server. The modification of a remote resources is materialized by a method invo- cation with WS-* services, whereas it is a POST request in REST.

5.3.1.0.3 Large payload In the following, we identify this scenario with the acronym Large or Big. It corresponds to the transmission of large amount of data between client and server. The size of messages brings out the real impact of the protocol. Each operation gives rise to accurate observation of the cost in terms of size and performance. It is materialized by a method invocation for a customer document in the input for WS-* services, and by a PUT request in the RESTful version. The reply contains indication of success or failure.

The different scenarios provide heterogeneous tests to verify several properties of the REST security protocol, in different conditions. They cover the most problematic situation one can face in a real production environment. They are a good basis for protocol comparison. For each of the scenarios, we have configured and run several tests with different security capabilities: signa- ture, encryption, signature & encryption and no-security acting as the baseline. The experiments were performed couple of times to ensure consistent and valid results for comparison. The REST security implementation uses the same cryptographic algorithms as in the WS-Security configu- ration. For instance, both SOAP and REST services are set to use the “Basic128Rsa15” security algorithms suite: it determines the algorithms for digest, symmetric encryption, asymmetric en- cryption, as well as key derivation algorithms and key-wrap algorithms.

5.3.2 Size comparison The Table 5.3 indicates the measurement in size to compare REST and WS-* services in the different scenarios. It lists the incoming and outgoing message sizes with distinction between headers and payload size. The results correspond to the different scenarios, with an equivalence

56 between the Get and Post scenarios in terms of total size. The Large scenario sends a resource of around 3311kB. In the Get scenario, a client sends a request to the server in order to retrieve a customer object. In SOAP messages, the request is embedded in a SOAP envelope. The envelope grows with the type of security used. For each type, the SOAP headers comprise secure data to indicate the type of algorithm, the encrypted or signed parts, and sometimes full certificates. In REST messages, the request is directly represented by the HTTP verb used to query the server. Therefore, no additional payload is necessary than the actual data plus some meta-data headers.

Figure 5.1: Overhead of SOAP messages compared to REST. For each scenario and security, the REST size represents the base 100

The Figure 5.1 highlights the global overhead using SOAP with any security mechanisms for the different scenarios. The REST size represents 100 for each scenario and security. We compare then the message overhead of different security mechanisms with its REST equivalent. For example, a SOAP-signed message size with the Get scenario represents around 460 when its counterpart in REST is 100. In the figure, we distinguish a second dimension: the origin of the overhead - from incoming message or outgoing message. The message increase for the previous scenario is half due to the incoming message, and second half by the outgoing message. In all tests, the usage of SOAP services instead of REST services is less efficient in terms of message size. The minimal overhead impact in all scenarios is 33%, which is the case where message payload is really large. We can explain it by the minimal impact of SOAP overhead compared to the actual data to transmit. This number is the result of our measurements, where the size of mes- sages (including incoming and outgoing payload and headers) is larger when WS-* services are used compared to REST services, with all security mechanisms. The experimental cases where REST security protocol is the most efficient compared to WS-Security is on encryption of small set of data. The Get and Post scenarios present high SOAP overhead when data to transmit is small. For such cases, SOAP adds to much meta-data compared to the actual information, which multiply up to eight times the message size for a request and response in our measurements.

57 Message size in Bytes & Average processing time in ms SOAP REST Enc Sign Sign&Enc Plain Enc Sign Sign&Enc Plain Payload In 4991 5407 6849 765 0 0 0 0 Headers In 228 228 228 221 330 1200 1200 192

Get Payload Out 3685 3190 5149 827 236 161 236 161 Headers Out 2 2 2 2 1001 531 1359 38

58 Processing Time 3.964 3.445 6.244 0.187 1.060 2.332 3.156 0.145 Payload In 5052 5458 6912 820 236 167 236 163 Headers In 228 228 228 221 662 1216 1532 208

Post Payload Out 3581 3107 5040 746 0 0 0 0 Headers Out 2 2 2 2 193 551 551 58 Processing Time 4.343 3.821 6.574 0.218 1.610 2.352 3.766 0.128 Payload In (in kB) 5891 kB 4420 kB 5893 kB 4415 kB 4415 kB 3311 kB 4415 kB 3311 kB Headers In 228 228 228 221 815 1369 2052 361

Large Payload Out 3565 3089 5028 734 0 0 0 0 Headers Out 2 2 2 2 193 551 551 58 Processing Time 164.412 85.063 211.377 35.510 137.950 69.468 146.043 19.776 Table 5.3: Comparison of payload and header size, and average processing time for SOAP and REST messages 5.3.3 Processing performance comparison In this paragraph, we present the processing performance comparison. The server has a cer- tificate with RSA 1024 bits key, and the different clients have RSA 4096 bits. The difference of key size for the clients and the server impacts the time of processing depending on actions performed by the different actors. This behavior is directly linked to the performance of asym- metric algorithm that differs from encryption and decryption [17]. For instance, the encryption algorithm is straightforward has it uses a small value for the exponentiation (typically 0x10001). The decryption algorithm requires more computation as the exponent is of the size of the private key (1024 or 4096 bits in our benchmarks). Thus, the server can decrypt faster than clients at the cost of less security. The calculated factor shows server decryption is around 20 times faster than client decryption. In our benchmarks, it impacts the performance comparison between the different scenarios we have defined. For instance, the server processes messages from the Get scenario with one encryption (fast operation) when messages from the Post scenario needs to be decrypted (slow operation) which lowers the processing time and throughput.

Figure 5.2: Average processing time comparison for the different scenarios

The Table 5.3 lists the average processing time calculated under the same conditions. Each scenario has been launched for 60 seconds, with a single client emitting requests. The client sends messages sequentially not to overload the server and to extract the optimal processing time. The Figure 5.2 depicts the differences between the different scenarios. The difference between REST and SOAP average processing time differs depending on the algorithm scheme and scenario used. In the Get and Post scenarios, REST is twice more efficient than SOAP when cryptography is used. It can be explained by the ratio of data related to XML format and SOAP meta-information that impact size of messages. For thin SOAP messages, the ratio doubles the size compared to REST messages. The time spent to process message is directly impacted by this size. For large messages, the encryption scheme is shown to be slower than signature.

59 We can notice differences in term of performance with regards to encryption and signature, depending on the size of data to be processed. Although SOAP encryption is always more costly than SOAP signature, REST shows better performances with encryption when amount of data remains low like in the Get and Post scenarios. If the data size growths, signature is faster than encryption.

5.4 Related Work

In this section, we present some security models adopted by existing web services to expose their REST API’s. Then, we provide alternative approaches to address REST security and per- formance issues. The security model adopted by Amazon S3 [2] supports authentication and custom data en- cryption over HTTP requests. The requests are issued with a token to prevent unauthorized users from accessing, modifying or deleting the data. The token conveys a signature value calculated per request which transmits a proof of identity, ensuring the authenticity of the request, similar to our protocol. The data encryption can be performed by the client itself, or by the server prior storage. The communication is supposed secured through SSL endpoints. Our approach brings more flexibility as actors decide of resources and headers to protect and transform. The server benefits of the PKI environment to render services to its clients without the need to generate and maintain a set of secret keys. The clients can also enable the REST security protocol with different service providers by simply uploading their public key. The other models adopt a slightly different approach, making intensive usage of the OAuth 2.0 protocol. Yahoo [112] uses OAuth Authorization protocol (OAuth Core 1.0 [41]) which is a simple, secure protocol to publish and share protected data when several actors require access to the resource. Yahoo demands the usage of an API Key to sign requests and provide end-user authentication. Twitter [104] leverages the transport layer security by exposing REST APIs over SSL. Facebook [23] requires the OAuth 2.0 protocol [42] for authentication and authorization. They distribute SSL Certificates to consumers so that they can create signed requests and force users to use HTTPS. The Dropbox model [22] allows third-party applications to use their ser- vices on behalf of users. Their model forces the requests through SSL and requires additional authenticity checks on messages. Like the previous approaches, they are combining transport layer security and application security. In our approach, we simplify the access of resources by unifying security at the message level. For instance, performing a request to retrieve a file with Dropbox transmits content metadata in an header. This content can be visible when the packet reaches the endpoint of a SSL tunnel, whereas our approach protects the header until its consumption. The idea of having RESTful security as an equivalent of WS-Security has been expressed in a blog entry [57], using a similar approach but with no implementation and concrete specifica- tion. An approach to sign and encrypt multiparts have been drafted in [25]. They do not refer to REST services, but rather propose a model integrated to the multipart separation content to describe meta-information. Our approach benefits from multipart to split the payload in several resources, but we prefer centralizing security meta-data in headers to avoid service disruption,

60 and to incorporate other field protection: headers, parameters, etc.. Our lightweight approach modifies content only when necessary. Pautosso et al. [76] describe the differences between REST services and “big” services with a number of architectural decisions about which type of service is more appropriate. We have used this work to compare security of both approaches and to provide an extension to REST services for more security. The work in [81] addresses attacks targeting SOAP-based services. Although attacks are based on the XML message format, we advocate that the approach presented can be easily introduced in our implementation using particular header fields to inform about the document structure. Optimizing service consumption in terms of performance has been addressed for a long time. The problem is rather to balance usability and composability while allowing cross-cutting con- cerns such as security to protect the messages with a variable level of granularity. We can mention work on Fast Web Services [88] which defines binary-based messages to lower bandwidth and memory consumption. The price is the loss of self-description so that intermediaries cannot pro- cess the messages. In [101], Suzumura et al. propose a different approach, which is based on SOAP messages. They boost performance by considering partial regions of messages that differ from previously processed ones. Albeit the approach gives interesting results, they can not help with encrypted SOAP messages in the current state of the protocol.

5.5 Summary

In this chapter, we have presented a novel approach to provide security for RESTful services equivalent to WS-Security. Our solution respects the REST philosophy by minimizing the pro- cessing overhead to service consumers, without interfering in the service composition already in place. We are able to keep messages confidential and to sign them with a fine granularity. The custom and ad-hoc processing on a per-message basis is a valid alternative to the existing approaches, which consider mainly transport layer security for securing all REST services. The advantage of our approach is to hide the complexity for the consumers, with no pollution on request parameters, while still carrying security tokens processable and verifiable by recipients. We also conducted a performance evaluation considering several use-cases to analyze the impact of message protection to the performance of the web services. The analysis comprises heterogeneous scenarios to compare different security mechanisms among them, but also the behavior of the application server when dealing with RESTful services versus SOAP-based web services. The results show that RESTful services are processed more efficiently from any point of view, which is inherent to the service’s purpose. RESTful services are oriented to handle resources, when SOAP-based services forge requests for operation invocation. The protocol is self-descriptive, so all information about the message verifications and transformations are specified to let the recipient informed about the message state. As future works, we will extend the protocol to handle other security constraints. For in- stance, carrying encrypted basic authentication tokens, signed P3P claims, or even convey au- thorization token decisions. We would also like to investigate an automated way to configure services to enable security transformations when necessary, i.e. when the resource is sensitive.

61 Chapter 6

Automating Privacy Enforcement in Cloud Platforms

6.1 Context and Motivation

In order to speed up the deployment of business applications, and to reduce overall IT capital expenditure, many cloud providers nowadays offer the Platform as a Service (PaaS) solutions as an alternative to leverage the advantages of cloud computing. We can mention for instance SAP NetWeaver Cloud, Google App Engine, or VMware Cloud Foundry, to cite a few. PaaS brings an additional level of abstraction to the cloud landscape, by emulating a virtual platform on top of the infrastructure, generally featuring a form of mediation to the underlying services akin to middleware in traditional communication stacks. As the consequence of that shift we observe that more and more personally identifiable in- formation (PII) is being collected and stored in cloud-based systems. This is becoming an ex- tremely sensitive issue for citizens, governments, and companies, both using and offering cloud platforms. The existing regulations, which already established several data protection principles, are being extended to assign new responsibilities to cloud providers with respect to private data handling. The provision of privacy preserving services and tools will be one of the arguments favoring the choice of one PaaS provider over the other when a company is hesitating where to deploy new cloud application. The proposed reform of the European data protection regulation points out that privacy-aware applications must protect personal data by design and by default: “Article 22 takes account of the debate on a ’principle of accountability’ and describes in detail the obli- gation of responsibility of the controller to comply with this Regulation and to demonstrate this compliance, including by way of adoption of internal policies and mechanisms for ensuring such compliance. Article 23 sets out the obligations of the controller arising from the principles of data protection by design and by default. Article 24 on joint controllers clarifies the responsibilities of joint controllers as regards their internal relationship and towards the data subject1.”

1http://ec.europa.eu/justice/data-protection/document/review2012/com_2012_ 11_en.pdf

62 The correct enforcement of privacy and data usage control policies has been recently subject of several incidents reported about faulty data handling, perhaps on purpose, see for instance the cases of Facebook2. Therefore, addressing compliance requirements at the application level is a competitive ad- vantage for cloud platform providers. In the specific cases where the cloud platform provider is also considered a joint controller, a privacy-aware architecture will address the accountability requirement for the PaaS provider with regards to the next generation of regulations. Such archi- tecture can enable compliance also for the Software as a Service delivery model, if we assume the software was built over a privacy-aware platform. On the other hand, this could be hardly achieved in the context of Infrastructures as a Service, since there would be no interoperability layer on which the privacy controls can rely on. In order to achieve this, the PaaS must implement some prominent, possibly standardized, pri- vacy policy framework (such as EPAL[3], P3P[16]), where privacy preferences can be declared in a machine-readable form, and later enforced automatically. In such a setting, the privacy en- forcement controls could be easily incorporated into new deployment landscape accelerating the development process of compliant applications. Furthermore the cloud platform can offer the guaranties ensuring the correct implementation of the enforcement components. This could be offered either via a certification mechanism or an audit of an existing cloud landscape that would be executed by the governing entities. In this chapter we present work towards the implementation of privacy-aware services in a PaaS. We aim to empower the cloud platform with capabilities to automatically enforce the privacy policy that is result of the end-user consent over the application provider privacy policy. End-user policies and service provider terms of use are stated in a state of the art privacy and usage control language [10]. In order to leverage the provided implementation of privacy-aware services, cloud application developers need to introduce simple annotations to the code, prior to its deployment in the cloud. These indicate where PII is being handled, automating privacy enforcement and enabling compliance by design and by default. The idea is outlined in Figure 6.1, and consists of design-time steps (declaring policies, annotation of the code and deployment in the cloud); and run-time steps (including policy matching, privacy control and obligation execution). The enforcement mechanisms are provided by the platform with the help of a new approach for aspect-oriented programming where aspects can be manipulated at the process and at the platform levels. That approach gives possibility to maintain a more flexible configuration of the enforcement mechanisms. The mechanisms interpret end-user preferences regarding handling of the PII, presented in form of opt-in or opt-out choices among available privacy policies of a cloud application, and later perform the required actions (filtering, blocking, deletion, etc). We experimented on a Java-based Platform as a Service, SAP NetWeaver Cloud, to demonstrate how privacy preferences can be handled automatically thanks to the use of simple Java annota- tion library provided in our prototype. The platform provider can in this way achieve built-in compliance with the personal data protection regulations in a transparent way, as we describe in the next sections. 2http://mashable.com/2011/10/21/facebook-deleted-data-fine/

63 Figure 6.1: Privacy aware PaaS components

The remainder of the chapter is organized as follows: in Section 6.2 we present our use case and we give a brief overview of the privacy policy language we adopt in this work, in Section 6.3 we introduce the technical architecture allowing to enforce privacy on multiple PaaS layers, Section 6.4 brings a discussion on related works and Section 6.5 presents future perspectives and concludes this work.

6.2 Privacy-Aware Applications in the Cloud

In this section we present our use case involving multiple stakeholders accessing users’ PII in the cloud, as well as some background on privacy policy language that we used.

6.2.1 Use case In our use case we consider a loyalty program offered by a supermarket chain, together with a mobile shopping application that communicates with back-end application deployed on the PaaS cloud offering. The supermarket’s goal is to collect information about its consumers shopping practices thus creating consumer profile that could be used to provide him more precise offers and bargains. Supermarket’s business partners may also want to access this information in order to propose personalized offers to the shopping application users themselves. The back-end application for the supermarket loyalty program communicates with the cloud persistency service with the help of Java Persistence API (JPA). The supermarket employees can access detailed results of database queries regarding the consumers’ sales history and also create personalized offers, via a web-based portal.

64 Moreover, the cloud application exposes web services through which third parties interact with the back-end system to consume collected data: both for its own business analysis, but also to contact directly the consumers for marketing purposes. In the interface for the consumers it is possible to indicate preferences with respect to the category of products (health care, food, drinks, etc) that one wants to share his shopping habits about. The consumer can also indicate whether he permits the supermarket to share personally identifiable information with its business partners, among other usages. The application then updates the privacy policies accordingly.

6.2.2 Background: Privacy Policy Language The end users of the cloud application are asked to provide various kinds of personal information, starting from basic contact information (addresses, phone, email) to more complex data such as shopping history or lifestyle preferences. Service providers describe how users’ data are handled using a privacy policy, which is explicitly presented to users during the data collection phase. In this work we adopt the PrimeLife3 Policy Language (PPL) [10], which extends XACML with privacy-related constraints for access and data usage. A PPL policy is then used by an application to record its privacy policy. It states how the collected data will be used, by whom, and how it could be shared. On the other hand, the end-user also selects among the possible choices as to the conditions of the data usages, that are derived from privacy policies specific to the application. This user opt-in/opt-out choice is managed by the application and as such is not part of the generic enforcement mechanism developed by us. Before disclosing personal information, the user can match his preferences against the privacy policy of the service provider with the help of a policy matching engine. The result of the matching process is an agreed policy, which is then translated into the set of simple rules that are stored together with users’ data inside the cloud platform’s database servers. In summary a PPL policy defines the following structures [10]:

• Access Control Elements: inherited from the XACML attribute-based access control mech- anism to describe shared resource (in our case PII) in general, as well as entities that can obtain access to the data (subjects).

• Data Handling Preferences: expressing the purpose of data usage (for instance marketing, research, payment, delivery, etc.) but also downstream usage (understood here as sharing data with third parties, e.g. advertising companies), supporting a multi-level nested policy describing the data handling conditions that are applicable for any third party retrieving the data from a given service.

• Obligations: specify the actions that should be carried out with respect to the collected data, e.g. notification to the user whenever his data is shared with a third party, or deletion of the credit card number after the payment transaction is finished, etc. In that case obligations in PPL differ from the standard XACML obligations which are only referring to the actual

3www.primelife.eu

65 Figure 6.2: Excerpt of a PPL policy rule

Figure 6.3: Excerpt of a PPL policy condition

moment of data access. PPL obligations can be executed at any moment throughout whole lifetime of collected data.

An excerpt of a policy is shown in Figure 6.2. It shows part of a policy rule, stating the con- sent to use the data collected for three distinct purposes (described using P3P purpose ontology), but forbids downstream usage. Consumer opt-in/opt-out choice is linked with PPL policy rule via XACML conditions that we adopted for this purpose. We have reused EnvironmentAttributeDesignator elements syntax to refer to the actual recorded consumer choice in the application data model, as shown in Figure 6.3. The location is provided as the AttributeId value and can be read as TABLE NAME:COLUMN NAME of the database table where this choice is stored (CONSUMER CONSENT) as well as a foreign key to the product category table (CATEGORY ID) that is used to join products table. This informa- tion is used when enforcement mechanism is put in place to take consumer consent into account whenever information about consumer’s shopping history (about certain product categories) is requested. This policy definition of how user consent is linked to the rest of the application data model is left in charge to the application developer as he is the one possessing full knowledge of the application domain.

66 Figure 6.4: JPA entity class annotation indicating persistency of private information

6.3 Privacy Enhanced Application Programming

We have designed a framework able to modify, at deployment time, the architectural elements (such as databases, web service frameworks, identity management, access control, etc) enriching it with the further components in order to enforce user privacy preferences. In this landscape new applications deployed on the modified platform can benefit from privacy-aware data handling.

6.3.1 Programming Model The privacy-aware components are integrated seamlessly with cloud application at the deploy- ment time, so that the enforcement of privacy constraints is done afterwards automatically. They mediate access to the data sources, enforcing privacy constraints. In this case we are taking full benefit of the uniform database access in the PaaS landscape that is exposed via standard Java database interfaces such as JDBC (Java Database Connectivity) or JPA. Usually the application code handling privacy related data is scattered and tangled over the application, being difficult to handle and to maintain if any changes in the privacy policy are introduced. As we observed in the existing applications the operations performed on the pri- vate user data to ensure that privacy policies are enforced are typically cross-cutting concerns in aspect-oriented programming paradigm. Inspired by this, we designed a simplified pro- cess for the application developer to rapidly achieve data protection compliance. It consists of adding meta-information to the application code via Java annotation mechanism in the JPA entity classes. Also we provide a second type of annotations for the methods that make use of a private data to indicate the purpose of the data usage. The modifications to the code are non-intrusive, in the sense that the application functionality will be exactly the same as before, except for the data set it will receive from database, that will be obtained by adhering to the privacy policy. The changes are transparent from an application point-of-view as the new platform components propose the same set of APIs as in traditional platforms (in our case this API is JPA). This approach adds value with respect to legacy applications while allowing privacy man- agement when needed. Another advantage is that the cloud service provider can easily move to another cloud platform without being locked into the certain vendor, apart from the fact that the guarantees given by the platform about private data handling could not be the same. The platform we used to develop our prototype offers the enterprise level technologies avail- able for Java in terms of web services and data persistency (JEE, JPA). In most of the examples we present along the chapter we assume that the application developer will likely use a frame- work such as the JPA to abstract the database access layer.

67 Figure 6.5: Annotating private data usage class with PII meta-information

In our approach developers are required to add annotations to certain constructs, such as @PII annotation in the JPA entity class (Figure 6.4). This annotation indicates that the class comprises one or more fields having private data (that usually are represented in database as columns) or that all fields are to be considered as PII (thus whole database table row needs to be either filtered or kept during the privacy enforcement, as JPA entity is by default mapped to a database row). In the business code that is handling the private data we propose to use two other annotations to indicate class and method that processes PII sets. An example of annotated code is shown in Figure 6.5. In this figure the method annotation holds the information that the shopping history list items will be processed for marketing purpose. In summary our library provides three different annotations:

@PII: It is a flag to indicate personally identifiable information inside a JPA entity class defini- tion. Such information is usually stored in a database as a table or a column. In Figure 6.4 this annotation involves the scope of the class declaration, see lines 2 and 3.

@PiiAccessClass: This annotation should be put in the class to indicate where it contains access methods to personal data (see line 5 in Figure 6.5). We assume that PII access method performs queries to the database that are requesting private user data.

@Info: This annotation is applied to PII access method, to describe the purpose or set of pur- poses of the query performed in that method (see lines 9 and 10 in Figure 6.5).

We expect the application developers to use this annotations to mark each usage of personal data as well as to indicate correct purposes. Ultimately they seek compliance to regulations, therefore we trust them to correctly indicate via the annotations the intended usage of the data. One can envisage that automated code scanners and manual reviews can take place during an audit procedure in order to check whether the annotations are rightfully used.

6.3.2 Implementation In this section we detail the components of our prototype architecture. Technically our code com- ponents are packaged as several OSGi (Open Services Gateway initiative framework4) bundles.

4http://www.osgi.org

68 Figure 6.6: Enforcement components

A bundle is a component which interacts with the applications running in the cloud platform. Some of them are to be directly deployed inside the PaaS cloud landscape and managed by the cloud provider while the other are part of the library to be used by the cloud application devel- opers. Cloud providers can easily install or uninstall bundles using the OSGi framework without causing side effects to applications themselves (e.g. no application restart is required if some of the bundles are stopped). In the context of our scenario, we have three main bundles managed by the cloud provider, illustrated in Figure 6.6: JDBC Wrapper, Annotation Detector and SQL Filter, which are described below in more details.

6.3.2.1 JDBC Wrapper The wrapper intercepts all queries issued by the cloud application directly or by the third parties that wants to consume data about the shopping history of the fidelity program participants. This component is provided by the platform as a replacement to the usual JDBC driver in order to enforce consumers’ privacy preferences. It implements the usual interfaces for the JDBC driver and overrides specific methods im- portant to the Java Persistence API, necessary to track the itinerary of SQL queries. As a matter

69 of fact, it is wrapping all JDBC methods that are querying the database, intercepting SQL state- ments and enriching them with proper conditions that adhere to privacy policy (e.g. by stating in the WHERE clause conditions that refer to the consumer consent table). In order to identify the purpose of each query, its recipient and the tables referred, we retrieve the call stack within the current thread thanks to the annotations described in the previous section. We look for the PII access class, then we look for the method that sent the request to get the further parameters that help properly enforce privacy control.

6.3.2.2 Annotation Detector This component scans for entities with customized annotations at deployment time and stores this information in a secure environment. Information about entities considered as PII is used to determine which database call needs to be modified in order to preserve consumer privacy preferences. The annotation detector scans the application bytecode in order to gather information con- cerning the operation that the application intends to perform on the data. It is important to recall that the annotations are not a “programmatic” approach to indicate purpose, as they are inde- pendent from the code, which can evolve on its own. The assumption is that developers want to reach compliance, thus the purpose is correctly indicated, in contrast to [11], where it is assumed that end-users themselves indicate the purposes of the queries they perform. The cloud platform provider can instrument the annotation detector with a configuration file where the required an- notations are declared. The detector can recognize custom annotations and stores information about related entity class in the runtime for future use.

6.3.2.3 SQL Filter This component allows us to rewrite general queries by replacing affected tables with SQL condi- tions, implementing an adapted version of the algorithm for disclosure control described in [58], also similar to the approaches described in [1], [66], and [82]. The query transformation process considers the pre-processed decisions generated by the policy engine concerning each combination of the triple purpose, recipient and PII table. If privacy policies apply, related decisions are stored in a protected table. These imply in the access control of some of the data associated to specific fields the fact that consent has to be enforced. The transformation of the SQL query happens at runtime by substituting the values of some fields with default values. Data owner privacy preferences are enforced with additional join conditions in the query, relating data subject consent, product category and filtering rules. The output is a transformed SQL query that takes into account all stated privacy constraints and is still compatible with the originally issued SQL query (it means that the result set contains exactly the same type of data, e.g. number of columns and their types). From a business perspective, at least in this use case, it was always possible to visualize relevant data, e.g. sales information, etc, without disclosing personal data when user didn’t give his consent. The process is illustrated in Figure 6.7.

70 Figure 6.7: SQL transformation example

The negotiated privacy policies are stored under the form of constraints together with the data in the database component provided by the cloud infrastructure. Whenever a query is launched by the application, we use the information collected by the annotation detector in order to modify queries on the fly, thus using the constraints to filter out the data that is not allowed to appear in the query results. This approach is interesting because the behavior of the application itself is not modified. The impact on the performance of the system is minor, as the policy enforcement is actually pushed into a database query and also the complexity of this query transformation algorithm is low, as shown in previous works [58]. The work in [1] brings some performance evaluation for the same kind of transformations. We advocate that the agility to achieve compliance is more important than these performance questions when dealing with private data in cloud computing.

6.4 Related Works

There are many similarities between our approach and the work described in [66]. It proposes a holistic approach for systematic privacy enforcement for enterprises. First, we also build on the state of the art access control languages for privacy, but here with an up-to-date approach, adapted for the cloud. Second, we leverage on the latest frameworks for web application and service development to provide automated privacy enforcement relying on their underlying iden- tity management solutions. We also have similarities on the way privacy is enforced, controlling access at the database level, which is also done in [1]. Although the query transformation algorithm is not the main focus of our work, the previous art on the topic [15, 82, 11] present advanced approaches for privacy preserving database query

71 over which we can build the next versions of our algorithm. Here we implemented an efficient approach for practical access control purposes, but we envisage to enrich the approach with anonymization in the future. On the other hand, we work in the context of the cloud, where a provider hosts applica- tions developed by other parties, which can in their turn, communicate with services hosted in other domains. This imposes constraints outside of the control of a single service provider. We went further in the automation, by providing a reliable framework to the application developer to transfer the complexity of dealing with privacy preferences to the platform provider. Our an- notation mechanism provides ease of adoption without creating dependencies with respect to the deployment platform, more precisely, no lock in is introduced by our solution. The work in [56] presents an approach based on privacy proxies to handle privacy relevant interactions between data subjects and data collectors. Proxies are implemented as SOAP based services, centralizing all PII. The solution is interesting, but it is not clear how to adapt the proxy to specific data models corresponding to particular applications in a straightforward way. Our work is aligned with the principles defended in [77], in particular we facilitate many of the tasks the service designers must take into consideration when creating new cloud-based applications. In [69], a user-centric approach is taken to manage private data in the cloud. Control is split between client and server, which requires cooperation by the server, otherwise obfuscated data must be used by default. This is a different point of view from our work, where we embed the complexity of the privacy enforcement in the platform itself. Automated security policy management for cloud platforms is discussed in [55]. Using a model driven approach, cloud applications would subscribe to a policy configuration service able to enforce policies at run-time, enabling compliance. The approach is sound but lacks of fine-grained management for privacy policies, as it is not clear how to deal with personal data collection and usage control. In [43], cryptographic co-processors are employed to ensure confidentiality of private data protection. The solution is able to enforce rather low level policies - using cryptography as a essential mechanism, without explicit support to design new privacy compliant applications. Several works exist on privacy protection in Web 2.0 and peer-to-peer environments, such as in [103], where access control is adopted in social networks. Some of these controls can be reused in the context of cloud applications, but our approach differentiates from this line of work in the sense we empower cloud consumers with easily usable controls from the cloud platform. In [13], authors are using aspect-oriented programming as well to enforce privacy mecha- nisms when performing access control in applications. They adopt a similar approach but limiting privacy to a per-application basis. In our approach, we cover multiple applications by addressing platforms directly.

6.5 Summary

We presented an approach to personal data protection by design and by default in Platforms as a Service. We augment cloud applications with meta-data annotations and private data-handling policies which are transparently enforced by the platform.

72 The cloud consumer applications indicate how and where personally identifiable information is being handled. We adapt the platform components with privacy enforcement mechanisms able to correctly handle the data consumption, in accordance with an agreed privacy policy between the data subject and the cloud consumer. The advantages of our approach can be summarized as follows: the implementation details of the privacy controls are hidden to the cloud application developer; compatibility with legacy applications, since the annotations do not interfere with the existing code; cloud applications can gracefully move to other platform providers that implement privacy-aware platforms in different ways. The next steps in this development will include the orchestration of other components such as event monitors, service buses, trusted platform modules, etc, in order to enforce providing a real-time information to users about the operations performed on their personal data. We plan to generalize our approach to enforce other kinds of policies, such as service level agreements, separation of duty, etc. An important improvement of this work is the integration of advanced k-anonymization [102] process at the database access level. Such solution would be more adapted to business applica- tions than access control, since the end-users could obtain more meaningful information, without fully disclosing their identities.

73 Chapter 7

Concluding Remarks

In this deliverable we presented the demonstrator for ERP built in Task 3 “Application to en- terprise service-oriented architectures”. We addressed several needs identified in the deliver- able [39] constructing solutions built upon the technical results from Task 2 “Synthesis and certification of secure service-oriented architectures”. We have also provided a proof of con- cept implementation of the security policy language from Task 2 [20], detailed in the extended journal paper [21]. The various initiatives reported here are an evidence of the success of the execution of the task, consisting of several proof of concept implementations created and exhib- ited in relevant scientific and industrial dissemination events. Selected results from this task are under examination internally at SAP, with promising exploitation plans.

7.1 Acknowledgments

We are extremely grateful for the contribution of Marco Guarnieri (Universita` degli studi di Bergamo), Peng Yu (Universite´ de Technologie de Compiegne)` and Yann Lehmann (Ecole´ Po- litechnique Fed´ erale´ de Lausanne) who were interns at SAP Labs France.

74 Bibliography

[1] Rakesh Agrawal, J. Kiernan, Ramakrishnan Srikant, and Y. Xu. Implementing p3p using database technology. In Data Engineering, 2003. Proceedings. 19th International Con- ference on, pages 595 – 606, march 2003.

[2] Amazon. Amazon Simple Storage Service REST Security Model. http://docs. amazonwebservices.com/AmazonS3/latest/dev/RESTAPI.html, 2006.

[3] P. Ashley, S. Hada, G. Karjoth, C. Powers, and M. Schunter. Enterprise privacy authoriza- tion language (epal). Research report, 3485, 2003.

[4] Marco Balduzzi, Carmen Torrano Gimenez, Davide Balzarotti, and Engin Kirda. Auto- mated discovery of parameter pollution vulnerabilities in web applications. In NDSS’11, 8th Annual Network and Distributed System Security Symposium, 6-9 February 2011, San Diego, California, USA, 02 2011.

[5] Fabien Baligand and Valerie´ Monfort. A concrete solution for web services adaptability using policies and aspects. In Proceedings of the 2nd international conference on Service oriented computing, ICSOC ’04, pages 134–142, New York, NY, USA, 2004. ACM.

[6] Davide Balzarotti, Marco Cova, Vika Felmetsger, Nenad Jovanovic, Engin Kirda, Christo- pher Krugel,¨ and Giovanni Vigna. Saner: composing static and dynamic analysis to vali- date sanitization in web applications. In Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, USA, May 2008.

[7] Davide Balzarotti, Marco Cova, Viktoria Felmetsger, Nenad Jovanovic, Engin Kirda, Christopher Kruegel, and Giovanni Vigna. Saner: Composing static and dynamic analysis to validate sanitization in web applications. In IEEE Symposium on Security and Privacy, pages 387–401. IEEE Computer Society, 2008.

[8] Damiano Bolzoni and Sandro Etalle. Boosting web intrusion detection systems by infer- ring positive signatures. In Proceedings of the OTM 2008 Confederated International Con- ferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Mean- ingful Internet Systems, OTM ’08, pages 938–955, Berlin, Heidelberg, 2008. Springer- Verlag.

75 [9] David Booth, Hugo Haas, Francis McCabe, Eric Newcomer, Mike Champion, Christopher Ferris, and David Orchard. Web services architecture. http://www.w3.org/TR/ws-arch/, 99(7):1–100, January 2004.

[10] Laurent Bussard, Gregory Neven, and Franz-Stefan Preiss. Matching privacy policies and preferences: Access control, obligatons, authorisations, and downstream usage. In Jan Camenisch, Simone Fischer-Hubner,¨ and Kai Rannenberg, editors, Privacy and Identity Management for Life, pages 117–134. Springer Berlin Heidelberg, 2011.

[11] Ji-Won Byun, Elisa Bertino, and Ninghui Li. Purpose based access control of complex data for privacy protection. In Proceedings of the tenth ACM symposium on Access control models and technologies, SACMAT ’05, pages 102–110, New York, NY, USA, 2005. ACM.

[12] Cristian Cadar, Daniel Dunbar, and Dawson Engler. Klee: unassisted and automatic gen- eration of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX conference on Operating systems design and implementation, OSDI’08, pages 209–224, Berkeley, CA, USA, 2008. USENIX Association.

[13] Kung Chen and Da-Wei Wang. An aspect-oriented approach to privacy-aware access con- trol. In Machine Learning and Cybernetics, 2007 International Conference on, volume 5, pages 3016 –3021, aug. 2007.

[14] Aske Simon Christensen, Anders Moller, and Michael I. Schwartzbach. Precise analysis of string expressions. In Proc. 10th International Static Analysis Symposium, SAS’03, pages 1–18. Springer-Verlag, 2003.

[15] Sara Cohen, Werner Nutt, and Alexander Serebrenik. Rewriting aggregate queries using views. In Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS ’99, pages 155–166, New York, NY, USA, 1999. ACM.

[16] L.F. Cranor. P3p: making privacy policies more useful. Security Privacy, IEEE, 1(6):50 – 55, nov.-dec. 2003.

[17] Wei Dai. Crypto++ 5.6.0 benchmarks. http://www.cryptopp.com/ benchmarks.html, 2009.

[18] Prattana Deeprasertkul, Pattarasinee Bhattarakosol, and Fergus O’Brien. Automatic de- tection and correction of programming faults for software applications. Journal of Systems and Software, 78(2):101–110, 2005.

[19] Josh Dehlinger, Qian Feng, and Lan Hu. Ssvchecker: unifying static security vulnerability detection tools in an eclipse plug-in. In Proc. OOPSLA Workshop on eclipse technology eXchange, Eclipse’06, pages 30–34. ACM, 2006.

76 [20] Matteo Dell’Amico, Gabriel Serme, Muhammad Sabir Idrees, Anderson Santana de Oliveira, and Yves Roudier. Hipolds: A security policy language for distributed sys- tems. In Ioannis G. Askoxylakis, Henrich Christopher Pohls,¨ and Joachim Posegga, edi- tors, WISTP, volume 7322 of Lecture Notes in Computer Science, pages 97–112. Springer, 2012. [21] Matteo Dell’Amico, Gabriel Serme, Muhammad Sabir Idrees, Anderson Santana de Oliveira, and Yves Roudier. Hipolds: A hierarchical security policy language for dis- tributed systems. Information Security Technical Report, 2013. Accepted for publication. [22] Dropbox. REST API. https://www.dropbox.com/developers/ reference/api, 2012. [23] Facebook. Facebook Authentication. http://developers.facebook.com/ docs/authentication/, 2012. [24] Roy Thomas Fielding. Architectural styles and the design of network-based software architectures. PhD thesis, University of California, Irvine, 2000. [25] J. Galvin, S. Murphy, S. Crocker, and N. Freed. Security multiparts for mime: Multi- part/signed and multipart/encrypted. Technical report, IETF, Network Working Group, October 1995. http://tools.ietf.org/html/rfc1847. [26] Kumaravel Ganesan, Swarup Kumar Mohalik, and Cyril Raj. A distributed aspect model for composite service. In International Workshop on Service-Oriented Engineering and Optimization, 2008. [27] Google. Codepro analytix. http://code.google.com/javadevtools/ codepro/. [28] J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification. 2005. [29] Carl Gould, Zhendong Su, and Premkumar T. Devanbu. Jdbc checker: A static analysis tool for sql/jdbc applications. In ICSE, pages 697–698. IEEE Computer Society, 2004. [30] Matthew Van Gundy and Hao Chen. Noncespaces: Using randomization to enforce infor- mation flow tracking and thwart cross-site scripting attacks. In Proceedings of the Network and Distributed System Security Symposium, NDSS 2009, San Diego, California, USA, 8th February - 11th February 2009, 2009. [31] Vivek Haldar, Deepak Chandra, and Michael Franz. Dynamic taint propagation for java. In Proceedings of the 21st Annual Computer Security Applications Conference, pages 303–311, Washington, DC, USA, 2005. IEEE Computer Society. [32] W. Halfond, S. Anand, and A. Orso. Precise Interface Identification to Improve Testing and Analysis of Web Applications. In Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2009), Chicago, Illinois, USA, July 2009.

77 [33] William G.J. Halfond and Alessandro Orso. AMNESIA: Analysis and Monitoring for NEutralizing SQL-Injection Attacks. In Proceedings of the IEEE and ACM International Conference on Automated Software Engineering (ASE 2005), Long Beach, CA, USA, Nov 2005.

[34] Gabriel Hermosillo, Roberto Gomez, Lionel Seinturier, and Laurence Duchien. Aprosec: an aspect for programming secure web applications. In ARES, pages 1026–1033. IEEE Computer Society, 2007.

[35] M. Mehdi Ben Hmida, Ricardo Ferraz Tomaz, and Valerie Monfort. Applying aop con- cepts to increase web services flexibility. In Proceedings of the International Conference on Next Generation Web Services Practices, NWESP ’05, pages 169–, Washington, DC, USA, 2005. IEEE Computer Society.

[36] Pieter Hooimeijer, Benjamin Livshits, David Molnar, Prateek Saxena, and Margus Veanes. Fast and precise sanitizer analysis with bek. In Proceedings of the 20th USENIX confer- ence on Security, SEC’11, pages 1–1, Berkeley, CA, USA, 2011. USENIX Association.

[37] Yao-Wen Huang, Fang Yu, Christian Hang, Chung-Hung Tsai, Der-Tsai Lee, and Sy- Yen Kuo. Securing web application code by static analysis and runtime protection. In Proceedings of the 13th international conference on World Wide Web, WWW ’04, pages 40–52, New York, NY, USA, 2004. ACM.

[38] Richard Hull and Jianwen Su. Tools for design of composite web services. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, SIGMOD ’04, pages 958–961, New York, NY, USA, 2004. ACM.

[39] Muhammad Sabir Idrees, Gabriel Serme, Yves Roudier, et al. State of the art and re- quirement analysis of security functionalities for soas. Deliverable D2.1, The CESSA project, July 2010. http://cessa.gforge.inria.fr/lib/exe/fetch. ?media=publications:d2-1.pdf.

[40] IETF. Internet x.509 public key infrastructure certificate and certificate revocation list (crl) profile. http://tools.ietf.org/html/rfc5280#section-4.1.2.2, 2008.

[41] IETF. The OAuth 1.0 Protocol. http://tools.ietf.org/html/rfc5849, 2010.

[42] IETF. The OAuth 2.0 Authorization Protocol. http://tools.ietf.org/html/ draft-ietf-oauth-v2-23, 2012.

[43] Wassim Itani, Ayman I. Kayssi, and Ali Chehab. Privacy as a service: Privacy-aware data storage and processing in cloud computing architectures. In DASC, pages 711–716. IEEE, 2009.

78 [44] Henner Jakob, Nicolas Loriant, and Charles Consel. An aspect-oriented approach to secur- ing distributed systems. In Proceedings of the 2009 international conference on Pervasive services, ICPS ’09, pages 21–30, New York, NY, USA, 2009. ACM.

[45] Meiko Jensen, Nils Gruschka, and Ralph Herkenhoner.¨ A survey of attacks on web ser- vices. Informatik - Forschung Und Entwicklung, 24:185–197, 2009.

[46] Martin Johns, Bjorn¨ Engelmann, and Joachim Posegga. Xssds: Server-side detection of cross-site scripting attacks. In Proceedings of the 2008 Annual Computer Security Ap- plications Conference, ACSAC ’08, pages 335–344, Washington, DC, USA, 2008. IEEE Computer Society.

[47] Nenad Jovanovic, Christopher Kruegel, and Engin Kirda. Pixy: A Static Analysis Tool for Detecting Web Application Vulnerabilities (Short Paper). In Proceedings of the 2006 IEEE Symposium on Security and Privacy, pages 258–263, Oakland, CA, USA, 2006. IEEE Computer Society.

[48] Gregor Kiczales, John Lamping, and al. Aspect-oriented programming. In Mehmet Ak- sit and Satoshi Matsuoka, editors, ECOOP, volume 1241 of Lecture Notes in Computer Science, pages 220–242. Springer Berlin / Heidelberg, 1997.

[49] Adam Kiezun,˙ Vijay Ganesh, Philip J. Guo, Pieter Hooimeijer, and Michael D. Ernst. HAMPI: A solver for string constraints. In ISSTA 2009, Proceedings of the 2009 Interna- tional Symposium on Software Testing and Analysis, pages 105–116, Chicago, IL, USA, July 21–23, 2009.

[50] Adam Kiezun,˙ Philip J. Guo, Karthick Jayaraman, and Michael D. Ernst. Automatic cre- ation of SQL injection and cross-site scripting attacks. In ICSE’09, Proceedings of the 31st International Conference on Software Engineering, Vancouver, BC, Canada, May 20–22, 2009.

[51] Engin Kirda, Christopher Kruegel, Giovanni Vigna, and Nenad Jovanovic. Noxes: a client-side solution for mitigating cross-site scripting attacks. In Proceedings of the 2006 ACM Symposium on Applied Computing, pages 330–337, Dijon, FR, 2006. ACM.

[52] Yuji Kosuga, Kenji Kono, Miyuki Hanaoka, Miho Hishiyama, and Yu Takahama. Sania: Syntactic and semantic analysis for automated testing against sql injection. In ACSAC, pages 107–117. IEEE Computer Society, 2007.

[53] C. Kruegel and G. Vigna. Anomaly Detection of Web-based Attacks. In Proceedings of the 10th ACM Conference on Computer and Communication Security (CCS ’03), pages 251–261, Washington, DC, October 2003. ACM Press.

[54] Monica S. Lam, John Whaley, V. Benjamin Livshits, and al. Context-sensitive program analysis as database queries. In Symposium on Principles of database systems, PODS’05, pages 1–12. ACM, 2005.

79 [55] Ulrich Lang. Openpmf scaas: Authorization as a service for cloud & soa applications. In CloudCom, pages 634–643. IEEE, 2010.

[56] Marc Langheinrich. A privacy awareness system for ubiquitous computing environments. In Gaetano Borriello and Lars Holmquist, editors, UbiComp 2002: Ubiquitous Comput- ing, volume 2498 of Lecture Notes in Computer Science, pages 315–320. Springer Berlin / Heidelberg, 2002.

[57] Francois Lascelles. RESTful Web services and signa- tures. http://flascelles.wordpress.com/2010/10/02/ restful-web-services-and-signatures/, October 2010.

[58] Kristen LeFevre, Rakesh Agrawal, Vuk Ercegovac, Raghu Ramakrishnan, Yirong Xu, and David J. DeWitt. Limiting disclosure in hippocratic databases. In Mario A. Nascimento, M. Tamer Ozsu,¨ Donald Kossmann, Renee´ J. Miller, Jose´ A. Blakeley, and K. Bernhard Schiefer, editors, VLDB, pages 108–119. Morgan Kaufmann, 2004.

[59] Yin Liu and Ana Milanova. Static information flow analysis with handling of implicit flows. Software Maintenance and Reengineering (CSMR), 2010.

[60] V. Benjamin Livshits and Monica S. Lam. Finding security errors in Java programs with static analysis. In Proceedings of the 14th Usenix Security Symposium, pages 271–286, August 2005.

[61] V. Benjamin Livshits and Monica S. Lam. Finding Security Errors in Java Programs with Static Analysis. In Proceedings of the 14th USENIX Security Symposium, pages 271–286, Aug 2005.

[62] Cristina Videira Lopes. AOP: A historical perspective (What’s in a name?). In Robert E. Filman, Tzilla Elrad, Siobhan´ Clarke, and Mehmet Aks¸it, editors, Aspect-Oriented Soft- ware Development, pages 97–122. Addison-Wesley, Boston, 2005.

[63] Hidehiko Masuhara and Kazunori Kawauchi. Dataflow pointcut in aspect-oriented pro- gramming. In Atsushi Ohori, editor, APLAS, volume 2895 of Lecture Notes in Computer Science, pages 105–121. Springer, 2003.

[64] T. Mens and T. Tourwe. A survey of software refactoring. Software Engineering, IEEE Transactions on, 30(2):126 – 139, feb 2004.

[65] MITRE. CWE/SANS Top 25 Most Dangerous Software Errors. http://cwe.mitre. org/top25.

[66] Marco Casassa Mont and Robert Thyne. A systemic approach to automate privacy pol- icy enforcement in enterprises. In George Danezis and Philippe Golle, editors, Privacy Enhancing Technologies, volume 4258 of Lecture Notes in Computer Science, pages 118– 134. Springer, 2006.

80 [67] G. Kouadri Mostefaoui,´ Z. Maamar, N. C. Narendra, and S. Sattanathan. Decoupling se- curity concerns in web services using aspects. Information Technology: New Generations, Third International Conference on, 0:20–27, 2006. [68] Azzam Mourad, Marc-Andre´ Laverdiere,` and Mourad Debbabi. A high-level aspect- oriented based language for software security hardening. In Javier Hernando, Eduardo Fernandez-Medina,´ and Manu Malek, editors, SECRYPT, pages 363–370. INSTICC Press, 2007. [69] Miranda Mowbray and Siani Pearson. A client-based privacy manager for cloud comput- ing. In Jan Bosch and Siobhan´ Clarke, editors, COMSWARE, page 5. ACM, 2009. [70] Yacin Nadji, Prateek Saxena, and Dawn Song. Document structure integrity: A robust basis for cross-site scripting defense. In Proceedings of the Network and Distributed System Security Symposium, NDSS 2009, San Diego, California, USA, 8th February - 11th February 2009, 2009. [71] National Institute of Standards and Technology. National Vulnerability Database Version 2.2. http://nvd.nist.gov/, 2010. [72] Luis Daniel Benavides Navarro, Mario Sudholt,¨ Wim Vanderperren, Bruno De Fraine, and Davy Suvee.´ Explicitly distributed aop using awed. In Proceedings of the 5th international conference on Aspect-oriented software development, AOSD ’06, pages 51–62, New York, NY, USA, 2006. ACM. [73] OASIS. Web Services Security : SOAP Message Security 1.1. http://www. oasis-open.org/committees/wss, February 2006. [74] OWASP. OWASP Top Ten Project. http://www.owasp.org/index.php/ OWASP_Top_Ten_Project, 2010. [75] S. V. Patel and Kamlendu Pandey. Soa using aop for sensor web architecture. In Pro- ceedings of the 2009 International Conference on Computer Engineering and Technology - Volume 02, pages 503–507, Washington, DC, USA, 2009. IEEE Computer Society. [76] Cesare Pautasso, Olaf Zimmermann, and Frank Leymann. Restful web services vs. “big” web services: making the right architectural decision. In WWW, pages 805–814. ACM, 2008. [77] Siani Pearson and Andrew Charlesworth. Accountability as a way forward for privacy protection in the cloud. In Martin Gilje Jaatun, Gansen Zhao, and Chunming Rong, ed- itors, CloudCom, volume 5931 of Lecture Notes in Computer Science, pages 131–144. Springer, 2009. [78] Tadeusz Pietraszek and Chris Vanden Berghe. Defending Against Injection Attacks Through Context-Sensitive String Evaluation. In Proceedings of the International Sympo- sium on Recent Advances in Intrusion Detection, pages 124–145, 2005.

81 [79] K. Ponnalagu, N.C. Narendra, J. Krishnamurthy, and R. Ramkumar. Aspect-oriented ap- proach for non-functional adaptation of composite web services. In Services, 2007 IEEE Congress on, pages 284 –291, july 2007.

[80] Prithvi Bisht, Timothy Hinrichs, Nazari Skrupsky, Radoslaw Bobrowicz, and V.N. Venkatakrishnan. NoTamper: Automatic Blackbox Detection of Parameter Tampering Opportunities in Web Applications. In CCS’10: Proceedings of the 17th ACM conference on Computer and communications security, Chicago, Illinois, USA, 2010.

[81] Mohammad Ashiqur Rahaman and Andreas Schaad. Soap-based secure conversation and collaboration. In ICWS, pages 471–480. IEEE Computer Society, 2007.

[82] Shariq Rizvi, Alberto Mendelzon, S. Sudarshan, and Prasan Roy. Extending query rewrit- ing techniques for fine-grained access control. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, SIGMOD ’04, pages 551–562, New York, NY, USA, 2004. ACM.

[83] W. Robertson and G. Vigna. Static enforcement of web application integrity through strong typing. In Proceedings of the 18th USENIX Security Symposium, pages 283–298. USENIX Association, 2009.

[84] W. Robertson, G. Vigna, C. Kruegel, and R. Kemmerer. Using Generalization and Charac- terization Techniques in the Anomaly-based Detection of Web Attacks. In Proceeding of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, Febru- ary 2006.

[85] Martin Roesch. Snort - lightweight intrusion detection for networks. In Proceedings of the 13th USENIX conference on System administration, LISA ’99, pages 229–238, Berkeley, CA, USA, 1999. USENIX Association.

[86] RSnake. Xss (cross site scripting) cheat sheet esp: for filter evasion. http://ha.ckers.org/xss.html, 2009.

[87] Mike Samuel, Prateek Saxena, and Dawn Song. Context-sensitive auto-sanitization in web templating languages using type qualifiers. In Proceedings of the 18th ACM conference on Computer and communications security, CCS ’11, pages 587–600, New York, NY, USA, 2011. ACM.

[88] P. Sandoz, S. Pericas-Geertsen, K. Kawaguchi, M. Hadley, and E. Pelegri-Llopart. Fast web services. Sun Developer Network, 2003.

[89] Prateek Saxena, Devdatta Akhawe, Steve Hanna, Feng Mao, Stephen McCamant, and Dawn Song. A symbolic execution framework for javascript. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, SP ’10, pages 513–528, Washington, DC, USA, 2010. IEEE Computer Society.

82 [90] Prateek Saxena, David Molnar, and Benjamin Livshits. Scriptgard: Automatic context- sensitive sanitization for large-scale legacy web applications. In Proceedings of the Con- ference on Computer and Communications Security, October 2011.

[91] Theodoor Scholte, Davide Balzarotti, and Engin Kirda. Quo vadis? a study of the evolu- tion of input validation vulnerabilities in web applications. In Proceedings of Financial Cryptography and Data Security 2011, Lecture Notes in Computer Science, February 2011.

[92] Theodoor Scholte, Davide Balzarotti, and Engin Kirda. Quo Vadis? A Study of the Evolution of Input Validation Vulnerabilities in Web Applications. In Proceedings of the International Conference on Financial Cryptography and Data Security, Bay Gardens Beach Resort, Saint Lucia, February 2011.

[93] Theodoor Scholte, Davide Balzarotti, William Robertson, and Engin Kirda. An Empir- ical Analysis of Input Validation Mechanisms in Web Applications and Languages. In Proceedings of the 27th ACM Symposium On Applied Computing (SAC 2012), Riva del Garda, Italy., March 2012.

[94] Theodoor Scholte, William Robertson, Davide Balzarotti, and Engin Kirda. Preventing input validation vulnerabilities in web applications through automated type analysis. In COMPSAC. IEEE Computer Society, 2012.

[95] David Scott and Richard Sharp. Abstracting application-level web security. In Proceed- ings of the 11th international conference on World Wide Web, WWW ’02, pages 396–407, New York, NY, USA, 2002. ACM.

[96] O. G. Selfridge. Pandemonium: a paradigm for learning. In Mechanisation of Thought Processes. In Proceedings of a Symposium Held at the National Physical Laboratory, pages 513–526, London, 1958. HMSO.

[97] Gabriel Serme, Anderson Santana de Oliveira, Marco Guarnieri, and Paul El-Khoury. To- wards assisted remediation of security vulnerabilities. In Proceedings of the Sixth Inter- national Conference on Emerging Security Information, Systems and Technologies: SE- CURWARE 2012, 2012.

[98] Gabriel Serme, Anderson Santana de Oliveira, Julien Massiera, and Yves Roudier. En- abling message security for restful services. In 19th International Conference on Web Services, ICWS’12. IEEE, 2012.

[99] Gabriel Serme and Muhammad Sabir Idrees. Adaptive security on service-based scm control system. In XPS, editor, The First International Workshop on Sensor Networks for Supply Chain Management (WSNSCM), 08 2011.

[100] Zhendong Su and Gary Wassermann. The essence of command injection attacks in web applications. In Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on

83 Principles of programming languages, POPL ’06, pages 372–382, New York, NY, USA, 2006. ACM.

[101] Toyotaro Suzumura, Toshiro Takase, and Michiaki Tatsubori. Optimizing web services performance by differential deserialization. In ICWS, pages 185–192. IEEE Computer Society, 2005.

[102] L. Sweeney. k-anonymity: A model for protecting privacy. International Journal on Uncertainty Fuzziness and Knowledgebased Systems, 10(5):557–570, 2002.

[103] Amin Tootoonchian, Stefan Saroiu, Yashar Ganjali, and Alec Wolman. Lockr: better privacy for social networks. In Jorg¨ Liebeherr, Giorgio Ventre, Ernst W. Biersack, and S. Keshav, editors, CoNEXT, pages 169–180. ACM, 2009.

[104] Twitter. Security Best Practices. https://dev.twitter.com/docs/ security-best-practices, 2011.

[105] University of Maryland. Findbugs. http://findbugs.sourceforge.net.

[106] John Viega, J. T. Bloch, and Pravir Ch. Applying aspect-oriented programming to security. Cutter IT Journal, 14:31–39, 2001.

[107] John Viega, J. T. Bloch, Y. Kohno, and Gary McGraw. Its4: A static vulnerability scanner for c and c++ code. In ACSAC, pages 257–. IEEE Computer Society, 2000.

[108] W3Counter. Web browser market share trends. http://www.w3counter.com/trends, 2011.

[109] Gary Wassermann and Zhendong Su. An analysis framework for security in web appli- cations. In Proc. FSE Workshop on Specification and Verification of Component-Based Systems, SAVCBS’04, pages 70–78, 2004.

[110] Joel Weinberger, Prateek Saxena, Devdatta Akhawe, Matthew Finifter, Richard Shin, and Dawn Song. An Empirical Analysis of XSS Sanitization in Web Application Frameworks. Technical report, UC Berkeley, 2011.

[111] Yichen Xie and Alex Aiken. Static detection of security vulnerabilities in scripting languages. In Proceedings of the 15th USENIX Security Symposium, Vancouver, B.C., Canada, 2006. USENIX Association.

[112] Yahoo. OAuth Authorization Model. http://developer.yahoo.com/oauth/.

[113] Fan Yang, Tomoyuki Aotani, Hidehiko Masuhara, Flemming Nielson, and Hanne Riis Nielson. Combining static analysis and runtime checking in security aspects for dis- tributed tuple spaces. In Wolfgang De Meuter and Gruia-Catalin Roman, editors, CO- ORDINATION, volume 6721 of Lecture Notes in Computer Science, pages 202–218. Springer, 2011.

84 [114] Peng Yu, Jakub Sendor, Gabriel Serme, and Anderson Santana de Oliveira. Automating privacy enforcement in cloud platforms. In Javier Herranz Roberto Di Pietro, editor, 7th International Workshop on Data Privacy Management. Springer, 2012.

85