International Journals of Advanced Research in Computer Science and Software Engineering Research Article August ISSN: 2277-128X (Volume-7, Issue-8) a 2017

Data Integrity Techniques in Cloud Computing: An Analysis Neha Thakur Aman Kumar Sharma Research Scholar, Himachal Pradesh University, Professor, Himachal Pradesh University, Shimla, Himachal Pradesh, India Shimla, Himachal Pradesh, India DOI: 10.23956/ijarcsse/V7I8/0141

Abstract: Cloud computing has been envisioned as the definite and concerning solution to the rising storage costs of IT Enterprises. There are many cloud computing initiatives from IT giants such as Google, Amazon, Microsoft, IBM. Integrity monitoring is essential in cloud storage for the same reasons that integrity is critical for any data centre. Data integrity is defined as the accuracy and consistency of stored data, in absence of any alteration to the data between two updates of a file or record. In order to ensure the integrity and availability of data in Cloud and enforce the quality of cloud storage service, efficient methods that enable on-demand data correctness verification on behalf of cloud users have to be designed. To overcome data integrity problem, many techniques are proposed under different systems and security models. This paper will focus on some of the integrity proving techniques in detail along with their advantages and disadvantages.

Keywords: PDP, MAC, CSP, POR

I. INTRODUCTION Cloud Computing has been envisioned as the definite and concerning solution to the rising storage costs of IT Enterprises. Cloud computing has been acknowledged as one of the prevailing models for providing IT capacities. Clouds have emerged as a computing infrastructure that enables rapid delivery of computing resources as a utility in a dynamically scalable virtualized manner. There are many cloud computing initiatives from IT giants such as Google, Amazon, Microsoft, IBM [1]. Data outsourcing [2] to cloud storage servers is raising trend among many firms and users owing to its economic advantages. This essentially means that the owner (client) of the data moves its data to a third party cloud storage server which is supposed to - presumably for a fee - faithfully store the data with it and provide it back to the owner whenever required. As data generation is far outpacing it proves costly for small firms to frequently update their hardware whenever additional data is created. Also maintaining the storages can be a difficult task. Storage outsourcing of data to cloud storage helps such firms by reducing the costs of storage, maintenance and personnel. It can also assure a reliable storage of important data by keeping multiple copies of the data thereby reducing the chance of losing data by hardware failures. Storing of user data in the cloud despite its advantages has many interesting security concerns which need to be extensively investigated for making it a reliable solution to the problem of avoiding local storage of data. Many problems like data authentication and integrity.

II. DATA INTEGRITY In terms of a data integrity [3] refers to the process of ensuring that a database remains an accurate reflection of the universe of discourse it is modeling or representing. In other words there is a close correspondence between the facts stored in the database and the real world it models. Integrity, in terms of , is the guarantee that data can only be accessed or modified by those authorized to do so, in simple words it is the process of verifying data. Data Integrity is important among the other cloud challenges. As data integrity gives the guarantee that data is of high quality, correct, unmodified. After storing data to the cloud, user depends on the cloud to provide more reliable services to them and hopes that their data and applications are in secured manner. But that hope may fail sometimes the user‟s data may be altered or deleted. At times, the cloud service providers may be dishonest and they may discard the data which has not been accessed or rarely accessed to save the storage space or keep fewer replicas than promised [4]. Moreover, the cloud service providers may choose to hide and claim that the data are still correctly stored in the Cloud. As a result, data owners need to be convinced that their data are correctly stored in the Cloud. So, one of the biggest concerns with cloud data storage is that of data integrity verification at untrusted servers. In order to solve the problem of data integrity checking, many researchers have proposed different systems and security models.

2.1 Data Integrity Issues A. Data Loss or Manipulation Users have a huge number of user files. Therefore, cloud providers provide storage as service (SaaS). Those files can be accessed every day or sometimes rarely. Therefore, there is a strong need to keep them correct. This need is caused by the nature of cloud computing since the data is outsourced to a remote cloud, which is unsecured and unreliable. Since the cloud is untrustworthy, the data might be lost or modified by unauthorized users. In many cases, data could be altered intentionally or accidentally. Also, there are many administrative errors that could cause losing data such as getting or restoring incorrect backups. The attacker could utilize the users‟ outsourced data since they have lost the control over it.

© www.ijarcsse.com, All Rights Reserved Page | 121 Thakur et al., International Journal of Advanced Research in Computer Science and Software Engineering7(8) ISSN(E): 2277-128X, ISSN(P): 2277-6451, DOI: 10.23956/ijarcsse/V7I8/0141, pp. 121-125 B. Untrusted Remote Server Performing Computation Cloud computing is not just about storage. Also, there are some intensive computations that need cloud processing power in order to perform their tasks. Therefore, users outsource their computations. Since the cloud provider is not in the security boundary and is not transparent to the owner, no one will prove whether the computation integrity is intact or not. Sometimes, the cloud provider behaves in such a way that no one will discover a deviation of computation from normal execution. Because the resources have a value to the cloud provider, the cloud provider could not execute the task in a proper manner. Even if the cloud provider is considered more secure, there are many issues such as those coming from the cloud provider‟s underlying systems, vulnerable code or misconfiguration.

III. DATA INTEGRITY AUTHENTICATION TECHNIQUES AND THEIR CHALLENGES In Cloud computing the issue of data integrity is still carried out by many researchers. There is lot of research still going on in this field to provide secure and efficient data integrity in cloud computing. Researchers have given many solutions to focus on resolving the issues of data integrity. This paper provides survey on the different techniques of data integrity. The basic schemes for data integrity in cloud are existing Provable Data Possession (PDP) and Proof of Retrievability (PoR). The following section describes the privacy techniques for data integrity [5].

3.1 Provable Data Possession (PDP) Provable Data Possession (PDP) is a technique for assuring data integrity over remote servers. In PDP a client that has stored data at an unfaithful server can verify that the server possesses the original data without retrieving it. Ateniese is the first to consider public audit ability in their defined “provable data possession” model for ensuring possession of files on untrusted storages. [6] The working principle of PDP is as shown in Fig.1. It works in two stages i.e. pre-process and store stage and verifies file possession by server stage. Pre-process and store:  The client generates pair of matching keys public & secret key by using probabilistic key generation algorithm.  Public key along with the file will be sent to the server for storage by client.

Verify file possession by server stage:  The client challenges the server for a proof of possession for a subset of the blocks in the file.  The client checks the response from the server.

Fig1: Protocol for provable data possession [8]

3.2 Basic PDP Scheme Based on MAC In paper [9] author proposed Message Authentication Code [MAC] based PDP to ensure data integrity of file F stored on cloud storage in very simple way .The data owner computes a MAC of the whole file with a set of secret keys and stores them locally before outsourcing it to CSP. It Keeps only the computed MAC on his local storage, sends the file to the Cloud Service Provider [CSP]. Whenever a verifier needs to check the Data integrity of file F, He/she sends a request to retrieve the file from CSP, reveals a secret key to the cloud server and asks to recomputed the MAC of the whole file, and compares the re-computed MAC with the previously stored value.

3.3 Scalable PDP Author in [10] proposed Scalable PDP which is an improved version of the original PDP. The main difference is Scalable PDP uses the symmetric encryption whereas original PDP uses public key to reduce computation overhead. Scalable PDP can have dynamic operation on remote data. Scalable PDP has all the challenges and answers are pre-computed and limited number of updates. Scalable PDP does not require bulk encryption. It relies on the symmetric-key which is more efficient than public-Key encryption. So it does not offer public verifiability. © www.ijarcsse.com, All Rights Reserved Page | 122 Thakur et al., International Journal of Advanced Research in Computer Science and Software Engineering7(8) ISSN(E): 2277-128X, ISSN(P): 2277-6451, DOI: 10.23956/ijarcsse/V7I8/0141, pp. 121-125 3.4 Dynamic PDP Author in [11] proposed Dynamic PDP which is a collection of seven polynomial-time algorithms (KeyGen DPDP, Prepare Update DPDP, Perform Update DPDP, Verify Update DPDP, GenChallenge DPDP, Prove DPDP, Verify DPDP). It supports full dynamic operations like insert, update, modify, delete etc. Here in this technique uses rank-based authenticated directories and along with a skip list for inserting and deleting functions. It has DPDP some computational complexity, it is still efficient. For example, for verifying the proof for 500MB file, DPDP only produces 208KB proof data and 15ms computational overhead. This technique offers fully dynamic operation like modification, deletion, insertion etc. as it supports fully dynamic operation there is relatively higher computational, communication, and storage overhead. All the challenges and answers are dynamically generated.

3.5 Proof of Retrievability (PoR): Proof of Retrievability (POR) is a cryptographic method for remotely verifying the integrity of files stored in the cloud, without keeping a copy of the user‟s original files in local storage. In a scheme, user backups his data file together with some authentication data to a potentially dishonest cloud storage server. User can check the data for its integrity stored with CSP using the authentication key, without retrieving back the data file from cloud [12]. A POR works on two phases first is setup phase and another is sequence of verification phases. Setup Phase: In the setup phase, user preprocesses his data file using his private key to generate some authentication code. Then he sends the data file together with authentication code to the cloud storage server, and removes them from his local disk. Consequently, in the end of setup phase user has his private key in her local disk, and CSP has both the data file and the corresponding authentication code. Sequence of Verification Phases: In each sequence of verification phase, user generates a random challenge query and CSP is supposed to produce a short response or proof upon the received challenge query, based on user's data file and the corresponding authentication information. In the end of a verification phase, user will verify CSP‟s response using his private key and decide to accept or reject this response coming from CSP.

3.6 PoR Based on Keyed Hash Function hk(F) A keyed hash function is very simple and easily implementable .It provides the strong proof of integrity. In this method the user, pre-computes the cryptographic hash of F using hk(F) before outsourcing the data file F in the cloud storage, and stores secret key K along with computed hash. The user releases the secret key K to the CSP to check the integrity of the file F and asks it to compute and return the value of hk(F). If the user want to check the integrity of the file F for multiple times he has store multiple hash values for different keys [13].

3.7 Proof of Retrievability for Large Files Authors of the paper [14] in “Proof of Retrievability” technique for large files use „sentinels‟. In this method, only a single key can be used irrespective of the size of the file or the number of files the user needs to access only a small portion of the file F. This small portion of the file F is in fact independent of the length of F. In this method special sentinels blocks, which are hidden among other blocks in the data file F are randomly embeds among the data blocks. To check the integrity of the data file F, the user challenges the cloud service provider [CSP] during the verification phase by specifying the positions of a collection of sentinels and asks the CSP to return the associated sentinel values. If the CSP has modified or deleted some portion of F, then it is possible that the position of sentinels also changed. Therefore it is unlikely to respond correctly to the CSP. The encryption is performed on whole modified file to in distinguish the sentinels from the data blocks, and stored in the CSP.

3.8 HAIL Authors in [15] proposed HAIL, high-availability and integrity layer [HAIL] for cloud storage, in which HAIL allows the user to store their data on multiple servers so there is a redundancy of the data. Simple principal of this method is to ensure data integrity of file via data redundancy. HAIL uses Message Authentication Codes (MACs), the pseudorandom function, and universal hash function to ensure integrity process. The proof is generated is by this method is independent of size of data and it is compact in size.

3.9 PoR Based on Selecting Random Bits in Data Blocks In [16] author proposed a technique which involves the encryption of the few bits of data per data block instead of encrypting the whole file F thus reducing the computational burden on the clients. It is stands on the fact that high probability of security can be achieved by encrypting fewer bits instead of encrypting the whole data. The client storage computational overhead is also minimized as it does not store any data with it and it reduces bandwidth requirements. Hence this scheme suits well for thin client. In these techniques user needs to store only a single cryptographic key and two random sequence functions. The user does not store any data in its local machine. The user before storing the file at the CSP pre-processes the file and appends some Meta data to the file and stores at the CSP. At the time of verification the verifier uses this Meta data to verify the integrity of the data.

© www.ijarcsse.com, All Rights Reserved Page | 123 Thakur et al., International Journal of Advanced Research in Computer Science and Software Engineering7(8) ISSN(E): 2277-128X, ISSN(P): 2277-6451, DOI: 10.23956/ijarcsse/V7I8/0141, pp. 121-125 IV. COMPARATIVE STUDY This Comparative study provides a brief summary of all the techniques that have been discussed earlier:

Table1: Comparison of data integrity techniques Data Integrity Methods used Advantages Limitations Techniques for data integrity Provable Data Key Generation i. This technique gives a i. Lack of error correcting codes to Possession Algorithm strong proof of data address concerns of corruption. integrity. ii. Lack of privacy preservation. ii. Protection against small iii. No dynamic support. corruptions. iv. Unbound no. of queries iii. Allows public verifiability. PDP Scheme Message i. Simple & Secure i. Limited number of verifications based on MAC Authentication Technique. with limited number of secret keys. Code ii. Gives strong proof Integrity ii. The data owner has to retrieve the of Data. entire file of F from the server in order to compute new MACs, Which is not possible for large file. iii. Public auditability is not supported as the private keys are required for verification. Scalable PDP Cryptographic i. It provides secure PDP by i. Limited number of updates and Hash function & encryption. challenges. symmetric key ii. Supports dynamic ii. Does not perform block insertions encryption operations on outsourced anywhere only append-type data blocks insertions are possible. iii. It is light weight PDP iii. Problematic for large files as each scheme as it supports update requires re-creating all the homographic hash function. remaining challenges Dynamic PDP Rank-based i. Offers fully dynamic i. Client needs to perform extra authenticated skip operation. computation. list. ii. Efficient integrity ii. Not suitable for thin client. verification is made by iii. DPDP does not include provisions querying and updating for robustness. DPDP scenario. Proof of Encryption i. Reduces the computational i. It only works with static data sets. Retrievability and storage overhead of the ii. It supports only a limited number of client as well as CSP. queries as a challenge since it deals ii. It also minimizes the size of with a finite number of check the proof of data integrity as blocks. reduces the network. iii. A POR does not provide in Bandwidth. prevention to the file stored on CSP. POR based on Key Hash i. Simple and easily i. More number of keys for each keyed hash Function implementable. check. function hk ii. Requires high cost for computation. iii. Puts the computational burden on client as well as server. POR for large Sentinel-based i. Ensures both possession and i. Newly inserted sentinels and error files scheme retrievability of files on correcting codes put computational CSP. overhead. ii. Increases input/output and transmission cost across the network. iii. Works only with static data. High MAC, i. Allow user to store data on i. This technique is only applicable Availability Pseudorandom multiple cloud. for static data. Integrity Layer function, Hash ii. Not suitable for thin client (HAIL) Function POR Based on Generation of i. This technique is suitable i. This technique is only applicable for Selecting Meta Data for thin client. static data. Random Bits in ii. Put minimum storage ii. No Data Prevention mechanism is Data Blocks overhead on client and CSP. implemented in this technique.

© www.ijarcsse.com, All Rights Reserved Page | 124 Thakur et al., International Journal of Advanced Research in Computer Science and Software Engineering7(8) ISSN(E): 2277-128X, ISSN(P): 2277-6451, DOI: 10.23956/ijarcsse/V7I8/0141, pp. 121-125 V. CONCLUSION In the world of cloud computing the data integrity is most challenging and burning security issue. Considering the importance of data integrity, in this paper different existing data integrity techniques and their merits and demerits are studied. The analytical study briefly compares all this techniques. From this paper it is conclude that there is need to design efficient, dynamic secure data integrity technique which is still wide area of research.

REFERENCE [1] Wei-Tek Tsai, Xin Sun and Janaka Balasooriya “Service-Oriented Cloud Computing Architecture”, 7th International Conference on Information Technology: New Generations (ITNG) 2010. [2] Sravan Kumar R and Saxena, “Data Integrity Proofs in Cloud Storage”, Third International conference on Communication Systems and Networks (COMSNETS), PP1-4, IEEE-2011. [3] G Ateniese, R D Pietro, L V Mancini, and G Tsudik, “Scalable and Efficient Provable Data Possession”, 4th International Conference on Security and Privacy in Communication Networks, SecureComm - Istanbul, Turkey, 2008. [4] Balachandra Reddy Kandukuri, Ramakrishna Paturi V and Dr. Atanu Rakshit, "Cloud Security Issues”, Proceedings IEEE International Conference on Services Computing, September 2009. [5] Mahesh S giri, Bhupesh Gaur and Deepak Tomar, “A Survey on Data Integrity Techniques in Cloud Computing”, International Journal of Computer Applications, Volume 122, No.2, July 2015. [6] G Ateniese, R Burns, R Curtmola, J Herring, L Kissner, Z. Peterson, and D. Song, “Provable Data Possession at Untrusted Stores”, Proceedings of 14th ACM Conference on Computer and Communication Security, 2007. [7] S Ramgovind, M M Eloff, and E Smith, “The management of security in cloud computing”, for South Africa (ISSA), IEEE, PP1–7,2010. [8] C Erway, C Papamanthou, and R Tamassia, “Dynamic provable data possession” Proceedings of the 16th ACM conference on Computer and communications security, Berkeley, CA, USA, PP1–6, 2007. [9] R Sravan kumar and Saxena,”Data integrity proofs in cloud storage”, Third International Conference on Communication Systems and Networks (COMSNETS), Bangalore, India 2011. [10] Chandran S and Angepat M, “Cloud Computing: Analyzing the risks involved in cloud computing environments,” in Proceedings of Natural Sciences and Engineering, Sweden, 2010. [11] Kevin D Bowers, Ari Juels, Alina Oprea, “Proofs of Retrievability: Theory and Implementation”, CCSW‟09, in Proceedings of Journal of Systems and Software, May, 2012. [12] R Curtmola, O Khan, R Burns, and G Ateniese, “MR- PDP: Multiple-Replica Provable Data Possession,” in Proceedings of 28th IEEE ICDCS, 2008. [13] K D Bowers, A Juels, and A Oprea, “HAIL: A high availability and integrity layer for cloud storage”, in Proceedings of 16th ACM conference on Computer and communications security, 2009. [14] E Aguiar, Y Zhang and M Blanton, “An overview of issues and recent developments in cloud computing and storage security in high performance cloud auditing and applications”, Springer, PP3–33, 2014. [15] I Gul and M Islam, “Cloud computing security auditing”, International Conference on Next generation Information Technology (ICNIT), Gyeongju, Korea, PP143–148, 2011. [16] M A Shah, M Baker, J C Mogul, and R Swaminathan, “Auditing to keep online storage services honest”, Proceedings of the 11th USENIX workshop on Hot topics in operating systems, 2007.

© www.ijarcsse.com, All Rights Reserved Page | 125