An Example of Nfs, Ceph, Hadoop

International Journal of Advances in Electronics and Computer Science, ISSN: 2393-2835 Volume-4, Issue-2, Feb.-2017 http://iraj.in A STUDY ON DISTRIBUTED FILE SYSTEMS: AN EXAMPLE OF NFS, CEPH, HADOOP 1MAHMUT UNVER, 2ATILLA ERGUZEN 1,2Computer Engineering, Kırıkkale University, Turkey E-mail: [email protected], [email protected] Abstract- Distributed File Systems (DFS), are systems used in both local network and wide area networks by using discs, storage areas and sources together. Nowadays, DFS makes possible operating big data, large-scale computings and transactions. The classifications are performed based on their working logics and architectures. These classifications were performed based on fault tolerance, replications, naming, synchronization and purpose of design. In this study, firstly the examinations on general design of DFSs were performed. Later, the results were discussed based on the advantages and disadvantages of Ceph and Hadoop and Network File Systems(NFS), commonly used in these days. Keywords- Distributed file system, Network file system (NFS), Hadoop, Ceph, fault tolerance, synchronization, replication, naming, operating system. I. INTRODUCTION which can support up to 5000 clients[5].Network File System (NFS) uses RPC Remote Procedure Call Computer systems had large evolutions until now. (RPC) communication model. RPC creates The first one is development of strong intermediate layer between server and client. The microprocessors on 1980s from 8 bit to 64 bit client performs operations without knowing the processing.The strengths of these computers were as server's file systems.This method allows clients and mainframe computers and command processing costs servers with different file systems to run smoothly were low at the same time. The second evolution is [6]. The purpose of Google File System (GFS) is to commonly using local networks with high speed and work with big data.This is achieved by using a lot of large scale nodes, This helped transferring 1 gigabit low cost equipment.Another DFS that has a very data in a second. At the end of these developments, different structure is XFS.It keeps very large files distributed systems using multiple computers with stable.Also, XFS does not have a generic server. The high speed networks appeared rather than a strong entire file system is distributed over the clients.In computer having one processor [1]. Ceph DFS, it decomposes the metadata holding the The first DFSs were developed on 1970s. These were data and data information.It replicates and increases storage system connected with FTP-like structure and the system's fault tolerance. they were not commonly used due to their limited storage spaces. L. Svoboda reported the first study on In this study, DFSs were compared using specific DFSs [2]and Svoboda developed various DFS in this classifications.Introduction of this work gives general year such as LOCUS, ACORN, SWALLOW, and information about DFS. In the second part, general XDFS. The studies continued on DFSs until now. architectural structures of DFS are mentioned. The Today’s DFSs are generally designed analogously to basic concepts are explained in this section. In the classical time sharing systems. These generally take third chapter, the classification criteria to be base the UNIX file systems. The purpose of this compared are determined and explained.In the fourth system is combination of different computer files and chapter, currently active DFSs are described storage sytems [3]. according to the criteria specified in the third chapter. DFSs process differently generated data on numeric In the last part, results and comparisons were data platforms. It also performs this safely, efficiently performed. and rapidly. The need for rapid growth of data and rapid access to them has caused the growth of data II. GENERAL STRUCTURE OFDISTRIBUTED storage resources.The big increase on data created a FILE SYSTEMS new concept, BigData. At the same time, distributed file systems are used to process big data and to The overall design goal of DFSs is to use less local perform operations quickly. Distributed file systems hardware resources by sharing hardware have emerged and are now being used effectively by resources.Besides the hardware advantages, it also cloud systems. A DFS file is stored on one or more has advantages in managing the files.This is also computers, each of which is a server, and computers, important in general design.For example, attention called clients, access those files as if they were a has been paid to the level of transparency of the DSF single file [4]. in order to overcome access problems caused by the DFSs were designed for different goals.For example, network [7]. While DFS is designed, they are the purpose of Andrew File System (AFS) is DS designed to provide file services to file system A Study on Distributed File Systems: An Example of NFS, CEPH, Hadoop 36 International Journal of Advances in Electronics and Computer Science, ISSN: 2393-2835 Volume-4, Issue-2, Feb.-2017 http://iraj.in clients.In this structure, clients use the interfaces to processed. An example of this architecture is the create files, delete files, read files, write files, perform Google File System (GFS). directory operations.The operating system used to perform these operations may be a distributed operating system or an intermediate layer between the operating system and the distributed file system[8]. Fig.3. Clustered-based architecture. The most important difference between DFS servers with a symmetric architecture is whether they create a file system on a distributed storage layer, or that all Fig.1. The Remote access model. files are stored in the nodes that are created.Thisarchitecture consists of three separate The architecture of DFS is generally based on 3 layers. The first layer is basic decentralizedlookup structures. These; facilities.The middle layer is a fully distributed block- -Server-Client based structures oriented storage layer. Top layer is a layer -Cluster based structures implementing afile system[1]. -Symmetricstructures III. CLASSIFICATION CRITERIA The Server-Client based architecture has been used extensively in DFS architecture. There are two DFSs have several classifications that affect server models in this architecture. qualities. The most important of these classifications are as follows: A.Fault Tolerance: When any part of the distributed site becomes corrupted, it is tolerated without being felt in the client [1]. B. Transparency: The distributed system looks like a single server by the client. It is the most important criterion affecting system design. C. Replication:More than one copy of the files used in the system is created and stored in the distributed system. Reliability is improved on this. If a copy is not accessible, the system continues to work using the Fig.2. Upload/download access model. other copy. D. Synchronization: There are copies of the file on The first is the remote access model. In this model, different servers. The change of client in one copy is the client provides an interface with various file also made in the other copies. operations.File operations are performed through this E. Naming: Names are all sources in the distributed interface. The server has to respond to this request. system. These are computers, services, users and The second model is the upload / download remote objects. Distributed system is to make a model.Unlike the client / server model, this model consistent naming of objects. If it does not provide, it downloads the file that the client will process, and will not access the objects. accesses the file locally.Server / Client model is used in NFS DFS. Nowadays, NFS is becoming the most IV. DISTRIBUTED FILE SYSTEMS used DFS [1]. 1.1. Network File System (NFS) Clustered based architecture also does not have a NFS was started to be developed in 1984. The project single server. There are multiple servers in the was developed by Sun Microsystems.It is the most system.One of the servers is the master server. The used and implemented DFS on UNIX systems. It uses master server keeps the metadata of the data. Other Remote Procedure Call (RPC) model for servers are chunk servers.With more than one server, communication [9]. Chuk can handle multiple clients at the same time. With this architecture, very large data can be A Study on Distributed File Systems: An Example of NFS, CEPH, Hadoop 37 International Journal of Advances in Electronics and Computer Science, ISSN: 2393-2835 Volume-4, Issue-2, Feb.-2017 http://iraj.in at the same time. Those are Object-based storing, block-based storing, file system. The most important features of Ceph are reliability and scalability. Metadata is the data holding the information of the data. In general distributed systems, the metadata that holds the data and data information is located on separate servers. Data cannot be accessed when metadata is not available. Ceph does not need a metadata server.Instead of a metadata server, it uses an algorithm that determines the location of the print job. This algorithm is called CRUSH. Clients use this algorithm to determine and Fig.4. NFS architecture. read the position of the dataset. With this algorithm, there is no problem of not reaching the metadata. The latest version is NFS version 4. The basic design structure is the distributed execution of the classic In Ceph DFS, more than one copy of the data is kept Unix file system.Virtual file system is used. The as distributed on the serve. It performs replication virtual file system works like an intermediate layer. with this way. This allows clients to easily work with different file systems. The operating system is an interfaced call According to the workload measurements, Ceph has placed between calls and file system calls. More than very good Input / Output performance. Ithas one command can be sent from an RPC in the last scalability metadata management that allows up to version.

Load more