Improving Performance of the Distributed File System Using Speculative Read Algorithm and Support-Based Replacement Technique
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Advanced Research in Engineering and Technology (IJARET) Volume 11, Issue 9, September 2020, pp. 602-611, Article ID: IJARET_11_09_061 Available online at http://iaeme.com/Home/issue/IJARET?Volume=11&Issue=9 ISSN Print: 0976-6480 and ISSN Online: 0976-6499 DOI: 10.34218/IJARET.11.9.2020.061 © IAEME Publication Scopus Indexed IMPROVING PERFORMANCE OF THE DISTRIBUTED FILE SYSTEM USING SPECULATIVE READ ALGORITHM AND SUPPORT-BASED REPLACEMENT TECHNIQUE Rathnamma Gopisetty Research Scholar at Jawaharlal Nehru Technological University, Anantapur, India. Thirumalaisamy Ragunathan Department of Computer Science and Engineering, SRM University, Andhra Pradesh, India. C Shoba Bindu Department of Computer Science and Engineering, Jawaharlal Nehru Technological University, Anantapur, India. ABSTRACT Cloud computing systems use distributed file systems (DFSs) to store and process large data generated in the organizations. The users of the web-based information systems very frequently perform read operations and infrequently carry out write operations on the data stored in the DFS. Various caching and prefetching techniques are proposed in the literature to improve performance of the read operations carried out on the DFS. In this paper, we have proposed a novel speculative read algorithm for improving performance of the read operations carried out on the DFS by considering the presence of client-side and global caches in the DFS environment. The proposed algorithm performs speculative executions by reading the data from the local client-side cache, local collaborative cache, from global cache and global collaborative cache. One of the speculative executions will be selected based on the time stamp verification process. The proposed algorithm reduces the write/update overhead and hence performs better than that of the non-speculative algorithm proposed for the similar DFS environment. Key words: Distributed file system, Client side caching, Prefetching, Hierarchical Caching, Replacement Algorithms, Speculative Execution http://iaeme.com/Home/journal/IJARET 602 [email protected] Improving Performance of the Distributed File System Using Speculative Read Algorithm and Support-Based Replacement Technique Cite this Article: Rathnamma Gopisetty, Thirumalaisamy Ragunathan and C Shoba Bindu, Improving Performance of the Distributed File System Using Speculative Read Algorithm and Support-Based Replacement Technique, International Journal of Advanced Research in Engineering and Technology, 11(9), 2020, pp. 602-611. http://iaeme.com/Home/issue/IJARET?Volume=11&Issue=9 1. INTRODUCTION In the emerging big data scenario, web-based information systems (WISs) are being deployed in cloud computing systems to serve millions of online users efficiently. Cloud computing systems provide scalable storage and computing power so that deploying WIS will be beneficial to the owners of the organizations and also the users of the WIS. Cloud computing system uses distributed file system (DFS) at the back end to store the large data in a distributed manner so that storage will be scalable. The distributed file system (DFS) environment consists of data nodes (DNs) and name nodes (NNs). The DNs are used to store the data permanently in the hard disks and NNs are used to store metadata. Note that, the read or update requests submitted by the users are carried out in the DNs. The WIS receives very frequently the read requests and less frequently the write/update requests. Hence, improving performance of the read operations carried out on the DFS has become an important issue in the current context. 2. RELATED WORK In this section, we discuss regarding the speculative techniques which have been proposed in the literature for improving the performance of distributed file systems. Modern processors use Speculative execution technique to alleviate pipeline stall problem by executing the next likely instructions in advance [1]–[5]. A large body of research has been dedicated to improve the performance of storage systems by reducing the disk Input/Output operations. A speculation-based method discussed in [6], starts the speculative execution using the local cache contents first. The timestamps obtained from the server disk and local caches are compared to commit the execution. An aggressive hint-driven prefetching system [7] is proposed to speculatively pre-execute the application’s code in order to discover and issue hints for its future read accesses, but the applications need to be manually modified to issue hints. Big Data systems such as Google MapReduce, Apache Hadoop, Apache Spark rely on speculative execution [8, 9] to mask slow tasks in turn to shorten job execution times. 3. PROPOSED SPECULATIVE READ ALGORITHM Speculative execution is an important issue in computer architecture, which enables parallel execution of serial programs. Executing instructions ahead of their typical schedule is called as speculative execution. This section describes how the proposed speculative read algorithm works. Speculative execution is a method that permits serial tasks to be executed in parallel. To implement speculative execution, the system anticipates the result of a particular operation and continues with its execution based on that predicted value. When the operation execution completes, the actual result obtained is compared with the predicted result. The system commits the speculative state, if the prediction was correct. Otherwise, the system rollback the system state to a consistent state prior point in time. Speculative execution has been used to improve performance in many systems including distributed file systems [6]. http://iaeme.com/Home/journal/IJARET 603 [email protected] Rathnamma Gopisetty, Thirumalaisamy Ragunathan and C Shoba Bindu The hierarchical global collaborative caching (HCGC) algorithm proposed in [12], the read operation invoked, searches for the requested file block to complete, in a sequential fashion starting from lower level cache to higher level cache of the multi level caches available in the system. It would be beneficial if we execute the read operation concurrently as the hierarchical caches present in the system are pre-filled with the popular file blocks and the possibility to find these file blocks in any one (or more ) of the caches of the system. The motivation behind speculative read algorithm is to reduce the read latency. The proposed speculative read algorithm allows the application to concurrently execute the read operation on the contents of local cache (LC), the collaborative local cache (LCC), the global cache (GC), the collaborative global cache (GCC) and disk to reduce the read operation latency. The proposed speculative read algorithm along with the support based prefetching [11], hierarchical collaborative global caching and the support based replacement algorithms [12] works as follows. Whenever a client application program running in a DN requests for a file block Fb, then the local cache manager simultaneously verifies with the local cache (LC), the collaborative local cache (LCC), the global cache (GC) and the collaborative global cache (GCC) for the file block, by communicating with the appropriate cache managers and by creating speculative executions (SPs). Speculative threads are created on behalf of each speculative execution and named as SP1, SP2, SP3, SP4, SP5, SP6, and SP7. Speculative thread SP1 represents the speculative execution initiated at local cache (LCDN). SP2 is a speculative thread, which represents the speculative execution initiated at one of the local cache on the same rack (local collaborative cache-LCC). The speculative execution started at global cache (GC) present on same rack is carried out by the speculative thread SP3. SP4 is the speculative thread created to represent the speculative execution started at global collaborative cache (global cache present on any rack-GCC). Speculative execution started at a nearest data node (DN) is represented by the speculative thread SP5. Speculative thread SP1 starts its execution if the requested file block Fb is available in the local cache (LCDN) otherwise it will get terminated. SP2 execution will be started if the requested Fb is available in any one of the local cache present on the same rack i.e the collaborative local cache (LCC). SP3 execution will be started if the requested Fb is available in the global cache (GC) connected to the same rack. SP4 execution will be started if the requested Fb is available in any one of the global cache present on any rack i.e. the global collaborative cache (GCC). Simultaneously, the DFS client program running in that DN starts the speculative execution by invoking SP5 thread to communicate with the NN to read the metadata (addresses of the data nodes where the file block is available and timestamp of the file block). Next, the time stamp value returned by the NN is checked up with the time stamp values of the copies in the local cache, collaborative local cache, global cache, and collaborative global cache. If the time stamp value of any one of the cached copies is matching then the speculative execution initiated in that cache will be allowed to continue and the speculative executions initiated in the remaining caches will be terminated. If time stamp value is not matching with that of all the cached copies then the speculative execution initiated at the disk is committed. Algorithm 1 describes the speculative read algorithm. http://iaeme.com/Home/journal/IJARET 604 [email protected] Improving