Journal of Information Assurance and Security. ISSN 1554-1010 Volume 11 (2016) pp. 273-282 c MIR Labs, www.mirlabs.net/jias/index.html

Cloud Computing Security Modeling and Analysis based on a Self-Cleansing Intrusion Tolerance Technique

Iman EL MIR1, Dong Seong Kim2 and Abdelkrim HAQIQ3

1Computer, Networks, Mobility and Modeling laboratory FST, Hassan 1st University, Settat, Morocco [email protected]

2Department of Computer Science and Software Engineering University of Canterbury, New Zealand [email protected]

3Computer, Networks, Mobility and Modeling laboratory FST, Hassan 1st University, Settat, Morocco e-NGN Research Group, Africa and Middle East [email protected]

Abstract: Nobody can deny that Cloud computing, a rapidly to perform their applications and services. Users are able developing information technology, has attired the concern of to provision cloud computing resources without requiring the whole world. However, this emergent technology is Internet- human interaction, mostly done though a web-based self- based computing, allows on demand network access to a shared service portal. However, Cloud computing provides to enter- resources, software and information where it becomes an im- prises and,to its legitimate users on demand network access portant security issue. Consequently, we must have a clear un- to applications and shared pool of computing resources with- derstanding of potential security benefits and risks associated out installation, it delivers three fundamental service models with cloud computing. Since security is increasingly the princi- [1], such as Software as a Service(SaaS), Platform as a Ser- pal concern in the conception and implementation of software vice(PaaS) and, Infrastructure as a Service(IaaS). systems, it is very important that the security mechanisms are designed so as to protect the computer systems against cyber • IaaS : the cloud providers deliver computation process- attacks. An Intrusion Tolerance Systems play a crucial role in ing, storage, networks and computing resources. Hence, maintaining the service continuity and enhancing the security the consumer can deploy and run an arbitrary applica- compared with the traditional security. In this paper, we pro- tion, which is able to support operating systems and ap- pose to combine a preventive maintenance with existing intru- plications. For instance, VMware and HP are IaaS ven- sion tolerance system to improve the system security. We use a dors. Semi-Markov Process to model the system behavior. We quan- • PaaS : is a cloud computing model that delivers applica- titatively analyze the system security using the measures such tions over the Internet on-demand. It used to reduce to as system availability, Mean Time To Security Failure and cost. customer organizations the costs which can be invested The numerical analysis is presented to show the feasibility of the on infrastructure when they are developing software ap- proposed approach. An Intrusion-Tolerant System (ITS) aims plications. , Google AppEngine are to maintain a useful level of operational capability throughout some examples. ongoing cyber-attacks. The applications that are part of an ITS, especially those that provide critical services for the sys- • SaaS : is a software distribution model that providers to tems mission therefore, must survive the failures and unwanted users the possibility to access into applications through changes in the system caused by malicious acts of intruders. a single interface. So, they should just require Internet Keywords: Intrusion Tolerance, Preventive Maintenance, System connection and web browser. As an example of SaaS Availability, Cloud Computing vendors, we can list Amazon, Yahoo and Google. Cloud computing attracts different users owing to its high re- I. Introduction sources elasticity and scalability which offers important sav- ings in terms of investment and manpower. The massive us- Cloud computing is an emerging technology that allows the age of cloud resources in different domain such as data stor- central remote servers to be connected to Internet in order age, , cooling has many data security and

MIR Labs, USA Cloud Computing Security Modeling and Analysis based on a Self-Cleansing Intrusion Tolerance Technique 274 protection challenges. It involves potential cyber-security are described and a detailed comparison has been made in risks. For this reason, the security of network systems has re- section IV. The proposed stochastic model is presented in ceived considerable attention. The traditional security mech- section V. Section VI presents numerical analysis and dis- anisms such as firewalls and Intrusion Detection Systems cusses the limitations of the proposed approach as well as (IDS) monitor the events occurring in a computer network suggesting further research. Finally, section VII is devoted and analyze them to protect the network from malicious in- to the conclusion. cidents. IDS can detect the intrusions spreading in the net- work. But these technologies are still ineffective against un- II. Related Work known and undetected attacks and cannot guarantee that a system is absolutely intruded. To improve such limitations, Most current information systems that provide useful ser- intrusion tolerance techniques have been proposed. Intrusion vices to their legitimate users are connected to the Internet, tolerance was introduced by Fraga et al. [2]. Its main objec- and it is not obvious to successfully protect such systems tive [3] is not about how to defend or detect the intrusion, but against all threats. In this context, various researches have how to tolerate the intrusion. An intrusion tolerance system been performed on intrusion tolerance and, multiple intru- has to provide services to legitimate users of a network even sion tolerant architectures have been proposed in order to if there are attacks in the network. Bloom et al. [4] suggested guarantee high quality of services and to enhance the secu- two classes of intrusion tolerance and mitigation techniques rity. Some well-known intrusion tolerance systems are Scal- namely fault tolerance and quality of service. Fault tolerance able Intrusion-Tolerant Architecture (SITAR), Malicious and can be applied at three levels: hardware level, software level, Accidental Fault Tolerance for Internet Applications (MAF- or system level. knowing that Fault tolerance is a property TIA) [9], Self-Cleansing Intrusion Tolerance (SCIT). that provides the survivability to a system to continue op- SITAR is a framework of an intrusion tolerant architec- erating properly in the presence of malicious attacks in the ture [10], was developed as a part of the DARPA funded system and Quality of service is an important issue for intru- program called OASIS (Organically Assumed and Surviv- sion tolerance system. In order to devise an effective intru- able Information Systems). It maintains services in order to sion tolerance system, one has to first think about to provide protect COTS (Commercial Off The Shelf) servers from ex- services without degradation of quality of traffic. Hence, the ternal attacks by detecting intrusions and reconfiguring of the tolerance mechanism should be able to provide a good qual- compromised servers. MAFTIA [11] was developed in the ity of services without being interested by the attacks if they OASIS program as European project. Its architecture con- are happening in a system or not. tains several layers based on conceptual models, mechanisms A virtualization-based Intrusion Tolerant System (ITS) can and protocols for achieving tolerance. provide uninterrupted services without being seriously af- Madan et al. [12] modeled the SITAR as Semi-Markov Pro- fected by various system deficiencies and different attack in- cess (SMP) model to evaluate different security attributes of tensity [5]. The Self-Cleansing Intrusion Tolerance (SCIT) the SITAR system such as availability, integrity, and confi- uses a typical virtualization-based intrusion tolerance sys- dentiality. They suggested that the attacker behavior can be tem through its proactive recovery approach. SCIT maintains described by identifying several general probability distribu- service availability through periodic recovery and allows re- tion functions. They used a steady-state analysis to obtain moving and minimizing the effects of malicious attacks. The dependability measures such as availability and, a transient effectiveness of SCIT in terms the security and performance analysis with absorbing states to calculate security measures has been verified in the domains such as Web service sys- such as MTTSF. tems [6], DNS systems [7], and firewall systems [8]. Lim et al. [13] presented a new virtualization-based server Previous intrusion tolerance techniques have focused on cluster system using three schemes namely, simplified rota- proactive recovery approaches. In this paper, we propose tion process to minimize system overhead, exposure time ad- to use a preventive maintenance technique in addition to the justment to prevent the degradation of system performance, proactive recovery approach. For instance, if Denial of Ser- and spare server insertion to deal with heavy incoming pack- vice vulnerabilities are identified by an attacker, the attacker ets. These proposed schemes are evaluated for ubiquitous sends a large amount of data in order to violate the availabil- computing systems in order to enhance the security and to ity of the target the network. If a preventive maintenance is provide a better quality of service. Hence to prevent the Dis- performed before that the attacker detects and exploits the tributed Denial of Service (DDoS) attacks. In work [14], they vulnerabilities then we can reduce both cost and downtime described the system components for a Cloud-based SCIT because when a server comes offline, it can lead to expensive (C-SCIT) and they analyzed the different challenges and is- repair costs. sues of their approach compared with traditional implemen- In this paper, we propose to adopt preventive maintenance tation. Hence they validate the feasibility of controlling and for the existing SCIT system. A semi Markov model is con- adopting the design to satisfy different levels of intrusion tol- structed based on the system behavior of the SCIT and nu- erance by implementing the C-SCIT procedures using com- merical analysis via the model is performed in terms of secu- mercially available Cloud services APIs. Tanha et al. [15] rity attributes such as system availability and MTTSF (Mean proposed an ITS architecture for securing smart grid control Time To Security Failure). centers. This architecture is composed on several modules The rest of the paper is organized as follows: Section II sum- in order to improve the resilience to intrusions and achieve a marizes the related work. Previous architectures of security high availability of smart grid control centers. The obtained are discussed in section III. Intrusion Tolerant Architectures results showed improved availability in the event of Dos at- tacks, evaluated analytically and compared with the existing 275 MIR et al.

ITS architectures. In [16], a quantitative analysis of Self- rather than only logging and alerting the administrator. To Cleansing Intrusion Tolerance (SCIT) is presented. It has achieve this purpose, IPS(Intrusion Preventive System) was showed how to quantitatively tune the tolerance level based introduced. on its exposure window in order to improve the intrusion tol- Intrusion Preventive System : Concerning the IPSs through erance of a SCIT-based system. its inline deployment, they provide various advanced services Alex et al. [17] modeled the survivability of intrusion toler- in terms of blocking of the attacks, stopping them, logging ant database systems against attacks as a Semi-Markov Pro- the traffic and to update the block list by adding the source cess. The system is evaluated according two quantitative cri- IP address detected [22]. Intrusion Detection Systems only teria integrity and availability. They validated the proposed detect an intrusion, log the attack and send an alert to the semi-Markov models through empirical experiments to con- administrator. IDS systems do not slow networks down like clude that the semi-Markov model describes the system be- IPS as they are not inline. haviors with high exactness. Therefore, they studied the im- IDS and IPS systems analyze the traffic if there are intrusions pacts of intrinsic system deficiencies and attack behaviors or malicious intentions which can do damage to data. How- on the survivability using quantitative measures. Quyen et ever there are some solutions combining both capabilities of al. [18] proposed an approach relied on recovery-based intru- IDS and IPS. For instance (Figure 1), We start with an IDS sion tolerance architecture (SCIT) combined with constructs to monitor how the system behaves without trigger any ac- of service-oriented programming. They quantitatively ana- tion of blockage, after IPS can be used and the system be lyzed this approach based on Semi-Markov Process to im- deployed inline to enhance its security. IDS/IPS systems are prove the resilience of service by compensating the attack composed of sensors, analysers and GUI’s to perform auto- surface by smaller exposure window; minimizing the expo- matically the process of intrusion detection and to manage sure window of Enabler Services in a service orchestration responsive actions. and reducing the effective attack surface of a service by hav- ing multiple diverse configurations of that service. Nguyen  et al. [19] presented briefly the SCIT approach from the per- spectives of effectiveness, tunable parameters, performance Action/Report impact, and integration to application systems. They proved Decision criteria Alert filter mathematically how to tune the SCIT system based on its  Alerts exposure window for evaluating the intrusion tolerance of Normal and malicious Detection model Detection Algorithm SCIT. In our first work [20], we have presented an analytic activities are occured model for an Intrusion Tolerant Cloud Data Center. In par- Activity Data System activities are ticular, we have proposed to adopt the SCIT approach for a Data pre-processor single VM and modeled the lifecycle of the VM. The model analyzed was implemented in SHARPE and the numerical results on Incoming traffic / logs system availability metrics were analyzed. The acquired re- sults demonstrated that decreasing the exposure window will Figure. 1: IDS/IPS concepts improve the intrusion tolerance of a SCIT-based cloud data center. In [21], we have quantitatively analyze the system se- Intrusion Tolerance System : The intrusion can be occured curity using the measures such as system availability, Mean in computing system not only from the external attackers but Time To Security Failure and cost. In particular, we have pro- also from the legitimate users as internal intruders. The in- posed to adopt the SCIT approach for a single VM and mod- trusions can be defined as a malicious operational fault re- eled the lifecycle of the VM. The acquired results demon- sulting from a successful attack on a vulnerability. It can strated the feasibility of proposed approach. attack many targets in terms of confidentiality, integrity of the information and the service availability [23]. The in- III. Previous architectures of security truders are defined externals in the sense that they are not registered as users of computing system. their worries are Firewalls : By screening the network’s traffic, a firewall con- how to detect and by-pass the mechanisms of authentication trols the access to network and monitors the ports are open to and authorization. Or internal intruders such as legitimate Internet [22]. It can accept or refuse the incoming data pack- users of computing system because they are already regis- ets. But,the firewalls are still ineffective against detecting the tered are looking for violating of the system integrity for in- network traffic if it is legit and normal or it is an attack. For stance they can modify or destory the sensitive information this reason, the IDS and IPS were proposed. without any authorization or through malicious actions, they Intrusion Detection System : If firewalls are security can produce a denial of service which leads to disfunctioning guards, intrusion detection systems are security cameras. So, of system [24]. However, an intrusion tolerant system can al- The IDSs are designed for detecting, blocking and reporting leviate with malicious security failures which can be propa- unauthorized activity in computer networks [22]. However, gated in computing systems due to an intrusion and/or attack. the IDSs allow to monitor the traffic and based on signa- The intrusion tolerance mechanism maintains uninterrupted ture detection they can compare the system information to services to legitimate users in a timely manner, even in the attacks registered in the IDS database. IDSs are used to iden- presence of attacks. For this purpose, the intrusion tolerance tify the intrusion threats, attacks in a computer networks and concept is becoming more and more required. generate an alarms to the administrator. It is obvious that The following Table 1 summarizes the basic differences be- companies are looking to take actions and block all attacks tween these three generations of architectures of security. Cloud Computing Security Modeling and Analysis based on a Self-Cleansing Intrusion Tolerance Technique 276

Client Table 1: IDS/IPS vs Intrusion Tolerance , IDS, IPS Intrusion tolerance 1 7 Risk management. Reactive. Proactive. Request Response Attack models. Exposure time. 6 A priori information Software Lenght of longest Proxy required. vulnerabilities. transaction Protection Prevent all approach. intrusions. Limit losses. 2 High. Manage Ballot Monitor System reaction rules. Administrator Manage false Less. No false workload. alarms. alarms generated. Acceptance 5 Design metric. Unsepecified. Exposure time. Monitor Packet/Data stream monitoring. Required. Not required. 3 4 Higher traffic Computation volume requires. More computations. volume unchanged. COTS Must be applied Server Applying patches. immediately. Can be planned.

Figure. 2: The Architecture of the SITAR system IV. Intrusion Tolerant Architectures Applications The complexity and the sophistication of information sys- Application Support : tems is increasingly challenging. As a result more vulnera- Authorization Service bilities are appeared and various successful attacks have been Transation Service increased. Intrusion tolerance should be part of overall in- Communications Support : depth security. Redundancy, diversity, and reconfiguration Byzantine Agreement MAFTIA MAFTIA Group Communications of services and servers are the basic mechanisms which have Middleware Threshold Grytography been employed in ITS architectures. The ITS are classified on several categories. In this paper, we are interested on three Multipoint Network principales: detection triggered, algorithm driven, recovery Runtime Environment : based [25]. OS JVM Detection Triggered : These architectures build multiple TTCB levels of defense to increase system survivability. Most of Hardware: them rely on intrusion detection that triggers recovery mech- Untrusted Data Channel, anisms. As an example, SITAR (Scalable Intrusion Tolerant Trusted Control Channel Architecture) was developed in MCNC Inc. and Duke Uni- versity [26].The SITAR system is based on the principles of Figure. 3: The Architecture of the MAFTIA system spatial redundancy, diversity and automatic reconfiguration for achieving the intrusion tolerance goals. SITAR which tures, recovery-based architectures assume that as soon as a is detection triggered, it provides services to secure COTS system goes online, its compromised. Periodic restoration to servers against known and unknown external attacks. Its ar- a known good state is necessary. For instance SCIT [25] em- chitecture is based on five middleware components for pur- ploys a group of servers with identical services, which may pose to protect COTS servers and to increase system surviv- have some diversity. In this group of servers, round-robin ability. Basically, SITAR consists on three layers of defense cleansing restores the system to its pristine image. depicted in Figure 2 : acceptance testing and outgoing re- we have summarize that: sponses, majority Balloting, validation of incoming requests. Algorithm Driven : These systems employ algorithms such • Audit control in SITAR. as the voting algorithm, threshold cryptography, and frag- • One-way signal from controller to the servers in SCIT. mentation redundancy scattering (FRS) to harden their re- silience. MAFTIA [27] has explored two architectural ap- • Distributed trust throughout the system in MAFTIA. proaches, it developed a variety of intrusion tolerant capa- bilities like secure group communication, transactional sup- A. SCIT architecture port, Distributed authorization service and, intrusion detec- In this section, for more details, we will re-describe the ar- tion system. It brought together, for the first time, researchers chitecture of SCIT, especially its concept, components and, from security and dependability to tackle this subject. To how it works. So,the SCIT architecture has many benefits; clarify the relationships between the different fields, MAF- it reduces the losses by controlling the time that a server is TIA created a new conceptual model. It designed, imple- exposed to the Internet and provides fewer opportunities for mented and, demonstrated the first coherent system archi- the intruder to do damage. It is based on two principle com- tecture for intrusion tolerance. Consequently, MAFTIA de- ponents as shown in Figure 4[16]: picted in Figure 3 has focused on How to achieve intrusion tolerance applying error compensation mechanisms. • The Central controller is the core component of SCIT Recovery Based : Unlike the two previous types of architec- that manages and controls all the SCIT nodes. 277 MIR et al.

1QR1:7 the preventive maintenance is performed before the intru- ``C1JV VH%`V 1J@ sion time. Consequently the system transits from V to PM. QRV   :` 1$J:C When the actions of preventive maintenance are terminated, ``C1JV the system returns to normal state G. In the intruded state, QRV  JVR1:7 VH%`V 1J@ JC1JV two transitions can occur. If the intrusion is masked by the V_%V C1VJ 8  Q] 1$J:C QRV  V]QJV proactive recovery, the system moves to state PR. When this 8 restoration is completely finished without any service degra- 8 VJ `:CQJ `QCCV` dation, transition from PR to G is made. If the intrusion is 1QR1:7 ``C1JV VH%`V 1J@ carried at a Virtual Machine (VM) which is in active state QRV J CV:J1J$ and does not respond to controller signals or the controller 1$J:C itself fails due to hardware fault, a transition to state F is made. Hence this leads to have a graceful degradation and a Figure. 4: The Architecture of the SCIT system stopped operating of service. But through manual restoration and corrective measures, the system goes back to good state G. • A group of servers providing the identical functions and services but they are diversified in term of operating sys- tem platform in order to reduce likelihood of intrusions `V0VJ 10V and to make malicious exploitations more difficult. For :J VJ:JHV7 example, we can define a group of web servers, one run- ning on Windows, another on , and a third one on Mac OS. QQR7 %CJV`:GCV7 J `%1QJ7 :1C%`V7 All nodes are managed by the Central Controller; each node is continuously routed through the following lifecycle states as: `Q:H 10V VHQ0V`77 • Active: node is online and accepts/processes any in- coming requests. Figure. 5: Description of the SCIT behavior • Grace Period: node processes any existing requests, but does not accept any new requests.

• Cleansing: node is offline and undergoes the cleansing to get to a known good state. B. Model Formulation • Live Spare: node has been restored and is ready to come online. It is not sure that all the distributions for transitions between states exhibit exponential. For instance, the attacker takes a The SCIT architecture depicted in Figure 4 is composed on lot of time to prepare well in order to exploit system vulner- Node1 which is in Active state, Node2 in Live Spare state, abilities and to launch an attack on purpose. Therefore it is and Noden in Cleansing state. The central controller coordi- clear that the transitions from state G to state I may be not nates the communication between all nodes. Therefore, the exponentially distributed and then the stochastic process is link from the controller to the online node should be unidirec- not modeled as a continuous time Markov chain. For this rea- tional, and that to the offline nodes should be bidirectional. son, Semi Markov process (SMP), an extension of continu- A one-way communication is established between the cen- ous Markov chains are much more general and better adapted tral controller and online nodes to isolate the system from to applications than the Markov ones because sojourn times any external harmful influence and cyber attacks. in any state can be arbitrarily distributed. To study an SMP we need to determine the embedded discrete time Markov V. Proposed Stochastic model chain which is based on two sets of parameters [12], [28]:

A. Model Description 1. The mean sojourn time hi in state i ∈ Xs , and In this section, we will introduce the stochastic behavior of SCIT. Then, we formulate the SCIT system with a mathe- matical model by means of the supplementary variable tech- 2. The transition probabilities pij between different states niques and probability analysis. As a result, we present the i ∈ Xs and i, j ∈ Xs state model of the system. Figure 5 shows the configuration of the SCIT behavior. Let G be the good state in which the Figure 6 shows the Discrete Time Semi Markov Model. It system is functioning normally and can protect itself from in- has a discrete state space Xs = {G, V, I, F, P M, P R} for trusions. However, if an attacker finds the vulnerabilities in which hi indicates the mean sojourn time in state i ∈ Xs the system then the state transits from G state to V state. Fur- and pij represents the transition probabilities between state i ther, if the vulnerabilities are exploited, the system is under and j (i, j ∈ Xs). Numerical results and their analysis are attack, i.e. the state moves to intruded state I. Otherwise, presented in Section VI. Cloud Computing Security Modeling and Analysis based on a Self-Cleansing Intrusion Tolerance Technique 278

A = 1 − πF (6)    hG + hv + pPM hPM + pI hI + pRhPR  A =   (7) hG + hv + pPM hPM + pI hI + pRhPR + pF hF

      2) MTTSF MTTF (Mean Time To Failure) is a reliability measure used to quantify the reliability of a system. MTTF determines the  mean time required for the system can reach one of the desig- nated failure states, provided that given that the system starts  in a good state. In reliability analysis, the failed states are defined as absorbing states. Similar to MTTF, the MTTSF is defined as the measure for quantifying the security of an ITS. Based on the state transition diagram depicted in Figure 6, Figure. 6: The model using semi Markov process the set of absorbing states Xa = {F }. This state indicates the security failure state. The rest of the states are called VI. Numerical Evaluation transient states are denoted by Xt = {G, V, P M, I, P R}. The resulting transition probability matrix P has the general A. Security Attributes form:

1) System Availability  QC  P = (8) The steady-state availability A is defined as the probability 0 I that the system is in one of normal functioning states. So to Where Sub matrices Q and C determine the transition prob- determine the availability we should determine the unavail- abilities between transient states and from transient states to able states (i.e., state F). Hence, we formulate the Availabil- absorbing states, respectively. Matrices Q and C are given ity as follows: A = 1 − πF where πi,i ∈ {F } denotes by: the steady-state probability of being in state for the SMP, is vihi computed as, πi = P for which hi indicates the vj hj GVPMIPR i,j∈Xs   mean sojourn time in state vi and denotes the embedded Dis- G 0 1 0 0 0 crete Time Markov Chain (DTMC) steady-state probability V  0 0 PM PI 0    in state i, we obtain vi through the following equations: Q = PM  1 0 0 0 0  (9)   X I  0 0 0 0 PR  ve = v.P,e vi = 1 (1) PR 1 0 0 0 0 i∈Xs F Where P is the probability transition matrice of the corre- G  0  sponding DTMC for our proposed model. V  0    C = PM  0  (10) GVPMIPRF     IP F  G 0 1 0 0 0 0 PR 0 V  0 0 PM PI 0 0    PM  1 0 0 0 0 0  By the following formula, we compute MTTSF as: P =   (2) I  0 0 0 0 PR PF    X PR  1 0 0 0 0 0  MTTSF = Vihi (11) F 1 0 0 0 0 0 i∈Xt Finally, the availability of the proposed model is derived as : Where Vi denotes the average number of times of which the transient state i is visited before the DTMC reaches one of vG + vV + vI + vPM + vPR + vF = 1 (3) the absorbing states and hi is the mean sojourn time in state i. The Vis are computed by solving the system of equations:  vG = vR + vPM + vF X  Vi = qi + Vjqi,j (12)  vV = vG  i,j∈Xt  vPM = pPM vV (4) Knowing that q represents the probability to start in state i. vI = pI vI i  In our case, we assume that DTMC starts in state G. Let  vPR = pRvI  q˜= q  = 1 0 0 0 0 is the transition probability  vF = pF vI i from transient state i to transient state j. 1 Solving (12), we found that: VG = , VV = VG = pF pI hF pI (1−pR) πF = (5) 1 1−PI 1   ,VPM = , VI = , and VPR = hG + hv + pPM hPM + pI hI + pRhPR + pF hF pI (1−pR) pI (1−pR) 1−pR 279 MIR et al.

pR . Finally, using the equation (11) we compute the B. Numerical Results and Analysis pI (1−pR) MTTSF as: We compute the system availability and MTTSF with respect to four decision parameters including probability of proactive h + h + (1 − p )h + h p + p h recovery (pR), mean sojourn time in preventive maintenance MTTSF = G V I PM I I R PR (13) pI (1 − pR) state (hPM ), the mean time to resist becoming vulnerable to intrusions (hG), and probability of intrusion (pI ), respec- tively. 3) Cost formulation and analysis Figure 7 shows that the system availability decreases as the Recovery is an important approach in building ITSs which probability of intrusion increases, this due to that the SCIT’s removes and minimizes the effects of malicious attacks. incapability to detect the attacks occurred in the system. But, An ITS, is equipped with mechanisms, allows it to continue by using the preventive maintenance on the proactive recov- to operate in the presence of malicious intrusions, but the ery, we can alleviate the impact of attacks on system and to cost incurred by such mechanisms (i.e. redundancy and re- deal with unknown attacks. Indeed the objective of preven- covery) is still very high. We assume that the system can be tive maintenance is to intervene before a failure occurs. down in three cases. First, when the system is in Failure state. Secondly if the system is under preventive maintenance and 1 thirdly, when the system is in proactive recovery state. We formulate the total cost per unit time as follows: 0.9

πF 0.8 C = (CF ∗ πF ) + (cF ∗ ) + (CR ∗ πPR) t2 πPR πPM 0.7 +(cR ∗ ) + (CPM ∗ πPM ) + (cPM ∗ ) t1 t3 Availability (14) 0.6

h =1/2 PM 0.5 Where the different costs used are assigned as follows: h =1 PM h =3/2 PM • CF cost per unit time of downtime due to Failure state 0.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Probability of Intrusion (P ) I • cF cost for each repair performed such as manual restoration which require a human intervention and cor- Figure. 7: Probability of Intrusion (pI ) vs. Availability rective measures to return the system to the good state Figure 8 shows that the impact of varying the values of the • C cost per unit time of downtime due to proactive re- R time that the system spends in the good state on the system covery state availability. We consider three values of Probability of recov- ery such as 0.2, 0.5 and 0.7 to evaluate the effect of varying • c R cost for each recovery performed (i.e. we apply the the values of the time that the system is in the good state on periodic cleansing in order to remove, minimize the ef- the availability. We observe that we can reach a high level of fects of malicious attacks and then the system goes back the system availability through maximizing the probability to the good state). of recovery. This reflects the efficiency of the proactive re- covery of the SCIT to tolerate and to eliminate the unknown • CPM cost per unit time of downtime due to preventive attacks against the system. maintenance. Figure 9 shows the sensitivity of MTTSF with respect to probability of intrusion by varying the value of mean sojourn • cPM cost for each maintenance performed. With this time in preventive maintenance. We observe that the MTTSF maintenance, the vulnerabilities are corrected before be- develops reciprocally with the probability of intrusion. (i.e. ing detected by the attackers and the cost is reduced in as pI increases, MTTSF decreases) because a higher pI will both the short and long term. cause more intrusions in the system. But by adopting the preventive maintenance, we can improve the MTTSF when • The πF /t2 represents the average number of manual facing intrusions. This demonstrates that the system should restorations per unit time where t2 is the time to manual trigger preventive maintenance frequently in order to correct restoration. the vulnerabilities incurred in the system before being de- tected by the intruders in purpose to maximize the MTTSF. • The πR/t1 is the average number of periodic recoveries Figure 10 shows that MTTSF linearly increases as the mean per unit time where t1 is the system recovery time. time to resist becoming vulnerable to intrusion increases. We observe that the MTTSF increases when we increase both the • The πPM /t3 determines the average number of main- (hG) and the probability of recovery. The security and ro- tenances per unit time where t3 is the time to mainte- bustness of the SCIT are expected to have a high MTTSF nance. when facing intrusions. The proactive recovery in SCIT Cloud Computing Security Modeling and Analysis based on a Self-Cleansing Intrusion Tolerance Technique 280

Figure 11 shows that we can evaluate the cost according to 1 the time to do maintenance. The result shows that as the time to do maintenance increases the cost of downtime decreases. 0.9 It means that frequent maintenance can reduce the cost due to downtime. The costs due to the preventive maintenance and 0.8 recovery are less expensive than the cost due to system fail- ure. When the system is in failure the cost for repair becomes 0.7 more important. For this reason, we should apply a preven- Availability tive maintenance and the proactive recovery of the SCIT in 0.6 order to reduce the likelihood of system failure. We need to

p =0.7 R do more detailed analysis in the near future. 0.5 p =0.2 R p =0.5 R 0.4 0.46 0 1 2 3 4 5 6 7 8 9 10 t =2 hours Mean time to resist becoming vulnerable to intrusions (h ) 1 G t =4 hours 0.44 1 t =6 hours 1 Figure. 8: Mean Sojourn time in state G (hG) vs. Availabil- 0.42 ity 0.4 Cost 0.38 40

35 0.36

30 0.34

25 0.32 1 2 3 4 5 6 7 8 9 10 Time to maintenance t (hours) 20 3 MTTSF

15 Figure. 11: Time to do Maintenance (t ) vs. Cost h =1/2 3 PM 10 h =1 PM h =3/2 5 PM C. Limitation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Probability of Intrusion (P ) In this section, we discuss the limitation of this work and I possible extensions as future work. We focused on a sin- gle node based intrusion tolerance and we will construct a Figure. 9: Probability of Intrusion (p ) vs. MTTSF I stochastic model incorporating multiple nodes. We use only the semi Markov model with estimated parameters values, and we will validate this model by setting up a small testbed. seems to have more effects on the MTTSF when increas- We will find an optimal solution for the preventive mainte- ing the hG. The acquired results show the improvement in nance and reactive actions. MTTSF performance, this due to periodic cleansing which The maintenance policy is an important factor in determining lead to remove and minimize the effects of malicious attacks. the overall system availability. We will optimize the cost of the preventive maintenance in order to ensure availability and continuity of services even in the presence of the intruder. 90 Our main aim is to find a maintenance strategy that leads to 80 the best possible performance, but also to the lowest possible

70 costs.

60 VII. Conclusion 50

MTTSF 40 This paper has presented the previous architectures of se- curity and discussed its advantages and inconvenients. We 30 are interested on Intrusion Tolerance System as an emergent 20 p =0.7 R technology of security. Therefore, three architectures of in- p =0.2 10 R trusion tolerant are described and compared. We focused on p =0.5 R the SCIT as an ITS architecture. In this paper has presented 0 0 1 2 3 4 5 6 7 8 9 10 a semi Markov process based model to evaluate the security Mean time to resist becoming vulnerable to intrusions(h ) G of an ITS. Our aim is to use a preventive maintenance on the top of the existing intrusion tolerance mechanism. We Figure. 10: Mean Sojourn time in state G (hG) vs. MTTSF have shown numerical results via the semi Markov process. 281 MIR et al.

We have used the security attributes such as the availabil- IEEE Security and Privacy (SP), vol. 4, no. 4, pp. 54- ity, MTTSF and expected cost. We have demonstrated that 62, July-Aug 2006. the proposed approach could be an effective solution for im- provement in system availability and reduction in downtime [12] B. B. Madan, K. G. Popstojanova, K. Vaidyanathan, cost. and K. S. Trivedi. A Method for Modeling and Quanti- fying the Security Attributes of Intrusion Tolerant Sys- Performance Evaluation Acknowledgments tems, , vol. 56, no. 1-4, pp. 167-186, Mar. 2004. This research was sponsored by NSF grant #1528099, and [13] J. Lim, S. Song, S. Lee, S. Doo, and H. Yoon. The de- also supported by the NATO Science for Peace & Security sign of a new virtualization-based server cluster sys- Multi-Year Project (MD.SFPP 984425). tem targeting for ubiquitous it systems, in Ubiquitous Computing Application and Wireless Sensor. Springer References Netherlands, vol. 331, pp.361-375, March 2015.

[1] P. Mell and T. Grance. The nist definition of cloud com- [14] Q. L. Nguyen, and A. Sood,. Designing SCIT ar- puting, 2011. chitecture pattern in a Cloud-based environment. In IEEE/IFIP 41st International Conference on Depend- [2] J. Fraga and D. Powell. A Fault and Intrusion-tolerant able Systems and Networks Workshops (DSN-W 2011), File System. In 3rd International Conference on Com- pp. 123-128, June 2011. puter Security (ICCS), pp. 203-218, 1985. [15] M. Tanha, and F. Hashim. Towards a secure and avail- [3] P. Pal, F. Webber, R. E. Schantz, and J. P. Loyall. Intru- able smart grid using intrusion tolerance, in Inter- sion tolerant systems. In Proceedings of the IEEE In- net and Distributed Computing Systems, Lecture Notes formation Survivability Workshop (ISW 2000), 2000. in Computer Science, Springer Berlin Heidelberg, vol. [4] B. H. Bloom. Space/time trade-offs in hash coding with 7646, pp. 188-201, November 2012. allowable errors, Communications of the ACM, vol. 13, [16] Q. L. Nguyen, and A. Sood. Building a Resilient no.7 pp. 422-426, July 1970. Service-Oriented Architecture Environment, Journal of [5] Q. Nguyen and A. Sood. Comparative Analysis of Defense Software Engineering, pp. 27-31, CrossTalk- Intrusion-Tolerant System Architectures. In IEEE Se- September/ October 2013. curity and Privacy (SP 2011), vol. 9, no. 4, pp. 24-31, [17] H. W. Alex, Y. Su, and L. Peng. A Semi-Markov July-Aug 2011. Survivability Evaluation Model for Intrusion Toler- [6] S. Ayda, V. Nicomette, and Y. Deswarte. The de- ant Database Systems. In International Conference sign of a generic intrusion-tolerant architecture for web on Availability, Reliability and Security (IEEEARES servers. In IEEE Transactions on Dependable and Se- 2010), pp. 104-111, Feb. 2010. cure Computing (TDSC 2009), vol. 6, no. 1, pp. 45-58, Jan.-March 2009. [18] Q. L. Nguyen and A. Sood. Improving Resilience of SOA Services along Space-Time Dimensions. In [7] Y. Huang, D. Arsenault, and A. Sood. Incorruptible IEEE/IFIP 42nd International Conference on Depend- self-cleansing intrusion tolerance and its application to able Systems and Networks Workshops (DSN-W), pp. DNS security, Journal of Networks, vol. 1, no. 5, pp. 1-6, June 2012. 21-30, Oct. 2006. [19] Q. L. Nguyen and A. Sood. Quantitative Approach [8] Y. Huang and A. Sood. Self-cleansing systems for to Tuning of a Time-Based Intrusion-Tolerant Sys- intrusion containment. In workshop on self-healing, tem Architecture. In 3rd Workshop Recent Advances adaptive, and self-managed systems (SHAMAN), June on Intrusion-Tolerant Systems (WRATIS), pp. 132-139, 2002. 2009.

[9] S. Heo, P. Kim, Y. Shin, D. K. Jungmin Lim, Y. Kim, [20] I. El Mir, D. S. Kim, and A. Haqiq. Security Model- O. Kwon, and H. Yoon. A Survey on Intrusion-Tolerant ing and Analysis of an Intrusion Tolerant Cloud Data System, Journal of Computing Science and Engineer- Center. In Third World Conference on Complex Sys- ing, vol. 7, no. 4, pp. 242-250, 2013. tems(WCCS), November 23-25 2015.

[10] F. Wang, F. Gong, C. Sargor, K. Goseva-Popstojanova, [21] I. El Mir, D. S. Kim, and A. Haqiq. Security Modeling K. Trivedi, and F. Jou. Sitar: a scalable intrusion toler- and Analysis of a Self-Cleansing Intrusion Tolerance ant architecture for distributed services. In Proceedings Technique. In 11th International Conference on Infor- of the 2001 IEEE Workshop on Information Assurance mation Assurance and Security (IAS), pp. 110-116, De- and Security (WIAS), pp. 38-45, June 2001. cember 14-16 2015.

[11] P. E. Verissimo, N. F. Neves, C. Cachin, J. Poritz, D. [22] M. A. Ibanez, A comparison of firewalls and intrusion Powell, Y. Deswart, R. Stroud, and I. Welch. Intrusion- detection systems. Sys Admin, vol. 6, no. 12, pp. 37-41, tolerant middleware: the road to automatic security. In 1997. Cloud Computing Security Modeling and Analysis based on a Self-Cleansing Intrusion Tolerance Technique 282

[23] D. Denning. An Intrusion-Detection Model, in IEEE of Cloud computing, and Reliability and Resilience mod- Transactions on Software Engineering, vol. SE-13, no. eling and analysis of Smart Grid. More information is at 2, pp. 222-232, Feb. 1987. http://cosc.canterbury.ac.nz/dongseong.kim.

[24] Y. Huang and A. Ghosh. Introducing Diversity and Un- Abdelkrim HAQIQ has a High Study Degree (DES) and a certainty to Create Moving Attack Surfaces for Web PhD (Doctorat d’Etat), both in the field of modeling and per- Services, in Moving Target Defense, ser. Advances formance evaluation of computer communication networks, in Information Security, S. Jajodia, A. K. Ghosh, V. from the University of Mohamed V, Agdal, Faculty of Sci- Swarup, C. Wang, and X. S. Wang, Eds. Springer New ences, Rabat, Morocco. Since September 1995 he has been York, vol. 54, pp. 131-151, 2011. working as a Professor at the department of Mathematics and Computer at the Faculty of Sciences and Techniques, Settat, [25] Q. L. Nguyen and A. Sood. A comparison of intrusion Morocco. He is the Director of Computer, Networks, Mo- IEEE Security & Pri- tolerant system architectures. In bility and Modeling laboratory and the responsible for engi- vacy , vol. 9, no. 4, pp. 24-31, July-Aug. 2011. neering education in Computer Engineering at the same Fac- [26] F. Wang, F. Gong, C. Sargor, K. Goseva-Popstojanova, ulty. He is also the General Secretary of the electronic Next K. Trivedi, and F. Jou. Sitar: A scalable intrusiontoler- Generation Networks (e-NGN) Research Group, Moroccan ant architecture for distributed services. In IEEE Work- section. Dr. Abdelkrim HAQIQ is actually Co-Director shop on Information Assurance and Security, pp. 38- of a NATO multi-year project and Co-Director of a Moroc- 45, 56 June 2001. can Tunisian research project. Dr. Abdelkrim HAQIQ’s interests lie in the areas of modeling and performance evalu- [27] D. Powell, R. Stroud et al. Conceptual model and ar- ation of communication networks, cloud computing and se- chitecture of maftia. Technical Report Series University curity. He is the author and co-author of more than 100 of Newcastle Upon Tyne Computing Science, February papers (international journals and conferences/workshops). 2003. He was a General Co-Chair of the 11th International Confer- ence on Information Assurance and Security: IAS2015, the [28] C. Griffin, B. Madan, and K. Trivedi. State space ap- 15th International Conference on Intelligent Systems Design proach to security quantification. In 29th Annual Inter- and Applications: ISDA2015 and the 5th World Congress on national Computer Software and Applications Confer- Information and Communication Technologies: WICT2015, ence (COMPSAC 2005), vol. 2, pp. 83-88, July 2005. held conjointly in Marrakesh, December 14-16, 2015. He was a publication co-chair of the fifth international confer- ence on Next Generation Networks and Services, held in Author Biographies Casablanca, May, 28 - 30, 2014. He was also an Interna- tional Steering Committee Chair and TPC Chair of the inter- Iman EL MIR has received Engineering degree in Software national conference on Engineering Education and Research Engineering in 2013 from Hassan 1st University of Settat, 2013, iCEER2013, held in Marrakesh, July, 1st 5th, 2013, Morocco, and she is currently working toward the PhD de- and a TPC co-chair of the fourth international conference gree from Computer, Networks, Mobility and Modeling lab- on Next Generation Networks and Services, held in Portu- oratory, Mathematic and Computing Department at Hassan gal, December, 2-4, 2012. Dr. Abdelkrim HAQIQ was the 1st University, Morocco. His research interests are Security Chair of the second international conference on Next Gen- Modelling and Analysis of computer and networks, Cloud eration Networks and Services, held in Marrakech, July, 8- systems, Cloud Data Center security and security in Soft- 10, 2010. He is also a TPC member and a reviewer for many ware Defined Networking. Since 2014, she is a member of international conferences. He was also a Guest Editor of a the Nato Project SPS-984425 entitled Cyber Security Analy- special issue on Next Generation Networks and Services of sis and Assurance using Cloud-Based Security Measurement the International Journal of Mobile Computing and Multi- system. media Communications (IJMCMC), July-September 2012, Vol. 4, No. 3, and a special issue of the Journal of Mobile Dong Seong Kim is a Senior Lecturer (softly equivalent Multimedia (JMM), Vol. 9, No.3 &4, 2014. to an associate professor in the US, but permanent posi- tion) in Cyber Security in the Department of Computer Sci- ence and Software Engineering at the University of Canter- bury, Christchurch, New Zealand. He received Ph.D. de- gree in Computer Engineering from Korea Aerospace Uni- versity, South Korea in February 2008. He was a visiting scholar at the University of Maryland, College Park, Mary- land, U.S.A. during the year of 2007. From June 2008 to July 2011, he was a postdoc at Duke University, Durham, NC, USA. His research interests are in security and dependabil- ity for systems and networks; in particular, Intrusion Detec- tion using Data Mining Techniques, Security and Survivabil- ity for Wireless Ad Hoc and Sensor Networks and Internet of Things, Availability and Security modeling and analysis