International Journal of Scientific Research and Review ISSN NO: 2279-543X

DEVISE AND ACCOMPLISHMENT OF HADOOP DISTRIBUTED BY USING ELLIPTIC CURVE CRYPTOGRAPHY FOR AMPLIFY THE REFUGE: A STUDY

Regonda.Nagaraju1 1Department of Computer Science and Engineering, 1Sreyas Institute of Engineering and Technology, Hyderabad,Telangana,India.

ABSTRACT:-

In recent years, a number of platforms for building Big applications, both open-source and proprietary, have been proposed. One of the most popular platforms is Apache Hadoop, an open- source software framework for processing used by leading companies like Yahoo and . Historically, earlier versions of Hadoop did not prioritize security, so Hadoop has continued to make security modifications. In particular, the Hadoop Distributed File System (HDFS) upon which Hadoop modules are built didn’t provide robust security for user authentication. This paper proposes a token-based authentication scheme that protects sensitive data stored in HDFS against replay and impersonation attacks. The proposed scheme allows HDFS clients to be authenticated by the data node via the block access token. Unlike most HDFS authentication protocols adopting public key exchange approaches, the proposed scheme uses the hash chain of keys. The proposed scheme has the performance (communication power, computing power and area efficiency) as good as that of existing HDFS systems.

KEYWORDS: -HDFS(Hadoop Distributed File System), BIG DATA,HASH CHAIN.

1.INTRODUCTION Data Nodes, usually one per node in the cluster, which manage storage attached to Hadoop Distributed File System (HDFS) has the nodes that they run on. HDFS exposes a master/slave architecture. An HDFS cluster file system namespace and allows user data consists of a single name node, a master to be stored in files. server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of 1

Volume 8, Issue 4, 2019 Page No: 901 International Journal of Scientific Research and Review ISSN NO: 2279-543X

Internally, a file is split into one or more system is designed in such a way that user blocks and these blocks are stored in a set of data never flows through the Name Node Data Nodes. The Name Node executes file 1.1 EXISTING SYSTEM system namespace operations like opening, closing, and renaming files and directories. Data is currently maximum crucial It also determines the mapping of blocks to belongings of any organization in every field Data Nodes. with which many operations may be completed. The continuous growth within The Data Nodes are responsible for serving the importance and volume of records has read and write requests from the file created a brand new hassle: it cannot be system’s clients. The Data Nodes also dealt with by means of conventional perform block creation, deletion, and evaluation strategies. This hassle was, upon instruction from the Name therefore, solved through the advent of a Node. new paradigm: Big Data. Big Data The Name Node and Data Node are pieces originated new troubles associated with of software designed to run on commodity records safety and privativeness. This survey machines. These machines typically run a paper consists of a few security issues in 1 GNU/ (OS). HDFS addition to privativeness issues in Big is built using the Java language; any statistics. Big Data is often heterogeneous, machine that supports Java can run the noisy, dynamic & untrustworthy. Name Node or the Data Node software. A Nevertheless, even noisy Big Data could be typical deployment has a dedicated machine extra treasured than tiny samples because that runs only the Name Node software. standard statistics obtained from common Each of the other machines in the cluster styles and correlation evaluation normally runs one instance of the Data Node software. overpower person fluctuations and regularly disclose greater dependable hidden styles The existence of a single Name Node in a and know-how. Further, interconnected Big cluster greatly simplifies the architecture of Data bureaucracy massive heterogeneous the system. The Name Node is the arbitrator facts networks, with which information and repository for all HDFS metadata. The redundancy can be explored to atone for

2

Volume 8, Issue 4, 2019 Page No: 902 International Journal of Scientific Research and Review ISSN NO: 2279-543X

missing facts, to crosscheck conflicting authenticated by the data node via the block cases, to validate straightforward access token. Unlike most HDFS relationships, to disclose inherent clusters, authentication protocols adopting public key and to discover hidden relationships and exchange approaches, the proposed scheme models. Big data analysis is the technique of uses the hash chain of keys. The proposed applying advanced analytics and scheme has the performance visualization strategies to large information (communication power, computing power sets to discover hidden styles and unknown and area efficiency) as good as that of correlations for powerful decision making. existing HDFS systems. The evaluation of Big Data entails more A token-based authentication scheme is than one wonderful levels which consist of proposed that protects sensitive data stored records acquisition and recording, facts in HDFS against replay and impersonation extraction and cleaning, information attacks. The clients of HDFC could be integration, aggregation and representation, authenticated using the block access token question processing, statistical modeling and by data node. analysis and Interpretation. Each of those stages introduces challenges. Unlike most HDFS authentication protocols adopting public key exchange approaches, 2.PROPOSED SYSTEM the proposed scheme uses the hash chain of

Earlier versions of Hadoop did not prioritize keys. The proposed scheme has the security, so Hadoop has continued to make performance (communication power, security modifications. In particular, the computing power and area efficiency) as 2 Hadoop Distributed File System (HDFS) good as that of existing HDFS systems. upon which Hadoop modules are built didn’t Apache Hadoop provides strong provide robust security for user authentication for HDFS data. authentication. This paper proposes a token- based authentication scheme that protects sensitive data stored in HDFS against replay and impersonation attacks. The proposed scheme allows HDFS clients to be

3

Volume 8, Issue 4, 2019 Page No: 903 International Journal of Scientific Research and Review ISSN NO: 2279-543X

Elliptic curves are applicable for encryption, digital signatures, pseudo- random generators and other tasks.

They are also used in several integer factorization algorithms that have applications in cryptography.

2.1 HASH CHAIN

A hash chain is the successive application of a cryptographic hash function to a piece of data. A hash chain is a method to produce many one-time keys from a single key or password.

A server which needs to Fig:i- HDFS Architecture provide authentication may store a hash All HDFS accesses must be authenticated: chain rather than a plain text password and

1. Access from users logged in on cluster prevent theft of the password in transmission gateways or theft from the server.

2. Access from any other service or daemon A server begins by storing h1000(password) (e.g. Catalog server) which is provided by the user. When the 3. Access from Map Reduce tasks user wishes to authenticate, he supplies 3 In this approach ECC is an approach h999(password) to the server. The server to public-key cryptography based on computes h (h999 (password)) = h1000 the algebraic Structure of elliptic (password) and verifies this matches the curves over finite fields. hash chain it has stored. It then stores h999

ECC requires smaller keys compared to non- (password) for the next time the user wishes 7 ECC cryptography4 to provide equivalent to authenticate . security.

4

Volume 8, Issue 4, 2019 Page No: 904 International Journal of Scientific Research and Review ISSN NO: 2279-543X

An eavesdropper seeing h999 (password) demanding conditions to the studies communicated to the server will be unable community. Security and privacy6 are the to re-transmit the same hash chain to the crucial problems with facts. However, there server for authentication since the server exists incongruity some of the Big facts now expects h998 (password). Due to safety and privacy and the massive use of the one-way property of cryptographically huge information. This paper offers insights secure hash functions, it is infeasible for the on evaluating of huge facts, related worrying eavesdropper to reverse the hash function conditions, privatives and safety troubles and obtain an earlier piece of the hash chain. and the differentiation among privatives& protection5 requirements in huge records. In this example, the user could authenticate Also, we cantered on numerous privacy 1000 times before the hash chain is fashions which may be stretched to massive exhausted. Each time the hash value is statistics area, reading the blessings and different, and thus cannot be duplicated by downsides of Data anonymity privacy an attacker. models. Big statistics describes huge information 3.REFERENCES sets which have greater various and complex

shapes like weblogs, social media, e-mail, [1] D. Kornack and P. Rakic, “Cell sensors, and snapshots. This plenty less set Proliferation without Neurogenesis in Adult up information and forte trends from Primate Neocortex,” Science, vol. 294, pp. traditional databases normally associated 2127-2130,Dec-2001. with greater headaches in storing, reading [2] L. Xu, . Jiang, J. Wang, J. Yuan, and and applying similar strategies or extracting Y. Ren, ‘‘Information safety in large consequences. Big records analytics is the information: Privacy and information method of examining incredible portions of mining,’’ in IEEE Access, vol. 2, pp. 1149– complex information on the manner to find 1176, Oct. 2014. out unseen patterns or spotting furtive [3] H. Hu, Y. Wen, T.-S. Chua, and X. Li, correlations. Since conventional databases ‘‘toward scalable structures for huge systems cannot be used to the method the information analytics: A generation massive information, it poses several

5

Volume 8, Issue 4, 2019 Page No: 905 International Journal of Scientific Research and Review ISSN NO: 2279-543X

educational,’’ IEEE Access, vol. 2, pp. 652– [6] N. Cao, C. Wang, M. Li, K. Ren, and W. 687, Jul. 2014. Lou, ‘‘Privacy-keeping multi- key-word [4] Dr. Regonda. Nagaraju, Reddygari ranked search over encrypted cloud Aishwarya Reddy, S. Yamini, P. Mythry, records,’’ IEEE Trans. Parallel Distrib. ‘‘Hybrid Encryption with Verifiable Syst., vol. 25, no. 1, pp. 222–233, Jan. 2014. Delegation using ,’’ International Journal of Research in [7] O. M. Soundararajan, Y. Jenifer, S. Electronics and Computer Engineering, vol. Dhivya, and T. K. P. Rajagopal, ‘‘Data 7,issue 1, no. 2, pp. 2448–2450, January- safety and privacy in cloud the usage of RC6 March 2019. and SHA algorithms,’’ Netw. Commun. [5] C. Hongbing, . Chunming, H. Kai, W. Eng., vol. 6, no. Five, pp. 202–205, Jun. Weihong, and L. Yanyan, ‘‘Secure huge information garage and sharing scheme for 2014. cloud tenants,’’ China Commun., vol. 12, no. 6, pp. 106–115, Jun. 2015.

6

Volume 8, Issue 4, 2019 Page No: 906