The Research on Cache Management of Cloud Storage

International Journal of Chaotic Computing (IJCC), Volume 1, Issue 2, December 2012 The Research on Cache Management of Cloud Storage Si Chengxiang National Computer Network Emergency Response Technical Team, Coordination Center of China (CNCERT/CC) Abstract performance, cache data usually is updated to ended storage asynchronously or periodically. Due to With rapid development of data intensive asynchronous update between cache and ended application technology, such as scientific computing, storage, if an error about hardware or software data analytical and data mining etc, cloud storage causes system crash, cache data will be lost. It may with high availability and scalability can satisfy the result in data inaccessible and interruption of application demand for the availability, scalability application service; therefore, the system availability and performance of shared storage. Cache located brings down. (2) Effect on system scalability. In the on data path of shared storage has an important environment of cloud storage, in order to improve influence on the performance of system. In cloud handing ability, it usually concurrently accesses the storage environment based on the opening storage ended storage through several IO paths. As a result, architecture, due to the inadaptability of the cache there are several copies in different paths; however, management, cache located on access paths provides every copy is managed independently, which will performance acceleration, but limits availability and result in cache inconsistency for multiply paths. One scalability of cloud storage system. This paper modified copy is known by local cache, but it is studies previous cache management technology of invisible to other copies in other paths, which results cluster or distributed system, and puts emphasis on in cache inconsistency. availability and consistency, which bring down From the above, with the development of cloud availability and scalability of cloud storage system. computing, it needs high performance, availability Moreover, this paper also analyzes the adaptability and scalability of storage service. In cloud storage of previous cache management technology for cloud environment based on the opening storage storage environment. Finally, we point out the future architecture, due to the inadaptability of the cache work which we will concern. management, cache located on access paths provides performance acceleration, but limits availability and 1. Introduction scalability of cloud storage system. This paper studies previous cache management technology of With rapid growth of the information and data cluster or distributed system, and puts emphasis on scale, computing ability of IT systems are facing availability and consistency, which bring down challenges. The emergence of cloud computing may availability and scalability of cloud storage system. solve the problem. Cloud Computing is a technology Moreover, this paper also analyzes the adaptability that uses the internet and central remote servers to of previous cache management technology for cloud maintain data and applications. It provides user storage environment. hardware, platform, software and storage services. As is known, data analysis includes data computing 2. Previous researches on cache and data storage, so storage system is very important. management of cluster With the explosive growth of data scale, cloud computing system provides high demand for In this section, we will study previous cache capacity, performance, reliability and scalability of management technology of cluster or distributed storage system. Cloud storage system based on the system, and pay attention to availability and opening storage architecture is widely noted. It consistency. In terms of different usage modes, we usually employs general hardware and open source will divide different cache management into the software to construct storage system. following types to illustrate. Cache is an effective method to improve IO performance of storage system. It employs data 2.1 cache management of cluster file system access locality to improve performance and brings down access latency of applications. In the According to the cooperative relation between environment of cloud storage, cache located on cache nodes, we divide into two types—cooperative access paths not only improves storage performance, cache and non-cooperative cache. but makes an important effect on availability and scalability of cloud storage system. To be specific, (1) effect on system availability. In order to improve Copyright © 2012, Infonomics Society 14 International Journal of Chaotic Computing (IJCC), Volume 1, Issue 2, December 2012 2.1.1 Cooperative cache SAN storage network. In order to improve system performance, clients of Storage Tank employs client zFS [1] [2] is developed by the IBM Haifa lab. memory as cache. To be noticed, meta-data and data The entire file system consists of clients, file were separated. The separated way enables tiered managers, cooperative cache managers, lease storage according to importance of data. The cache managers, transition managers and oriented-object consistency was guaranteed by the lease and lock storage devices. It employed all memory of nodes mechanisms. If one client needs to modify one local as cache of the entire file system to improve block copy, it must acquire the according lease from performance. As illustrated in figure 1, when client the meta-data cluster. In the valid time of a lease, it A accessed the data on the server, there is no the can modify the block copy. Once a client acquires corresponding cache copy, and then the request will the lock of a block object, it’s no need to be forwarded to the cooperative cache. The communicate with the meta-data server. When a cooperative cache found out the cache copy in the lease is invalid, the client submitted the modified client B through some retrieval operations, and then data to the corresponding meta-data cluster. The the request will be forwarded to client B. Client B meta-data server is responsible for copy updates will send the cache copy to client A; therefore, client between clients. The lease mechanism effectively A needn’t to acquire data from storage device. The reduced cache consistency overhead. cooperative cache in zFS can only buffer read data, and cache consistency is guaranteed by the lease and distributed transaction mechanisms. Figure 1. Cooperative cache of zFS Figure 2. The architecture of storage tank Reference [3] indicated that with the development NFS [5] is widely used in the industry, and has of networks, employing cache of other clients is been widely transplanted into all operating systems. feasible for distributed file system. In the reference, The property information of client files or directories remote client memory is regarded as one level of the is kept in client cache for a certain time. To be noted, storage architecture, and client memory, remote the client never keeps file data in its cache. NFS client memory, server memory and server storage adopts the timeout mechanism to distinguish whether construct the entire storage architecture. The special client cache is effective. To be specific, while data manager is responsible for the global cache staying in cache exceeds a certain time, NFS will management and consistency. For the cooperative refresh the cache data at the following access. The cache, due to unreliability of write operations, it cache reliability of NFS is weak, and the timeout adopted synchronous-update strategy and only kept mechanism also doesn’t guarantee cache consistency read data in cache. In the aspect of cache for multiple clients fundamentally. consistency, it employed the read-write token Reference [6] adopted NVRAM to improve write mechanism, and the central manager is responsible performance and reliability of the distributed file for token allocation and reclamation. When a cache system. It added NVRAM into client and server sides block is modified, it adopted the “write-invalidate” respectively as temporary cache for write data. Due strategy to notify the according clients to invalidate to the limit of cache consistency, this way is only the cache block. appropriate for the distributed file system whose write data buffered in clients, such as Sprite and 2.1.2 Non-cooperative cache AFS. NVRAM can guarantee performance and reliability of cache effectively, but it needs special Storage Tank [4] is a SAN file system designed hardware which results in weak generality. by IBM. It supports multiple operating systems and Moreover, NVRAM is very expensive and low cost its scalability is very high. As illustrated in figure 2, performance for a long time. In the aspect of cache it consists of meta-data clusters, SAN storage consistency, it followed cache consistency protocols network, management server, and client clusters. A of its host file system. For example, Sprite adopted client accesses data of storage devices through the the “concurrent write-sharing” protocol. When Copyright © 2012, Infonomics Society 15 International Journal of Chaotic Computing (IJCC), Volume 1, Issue 2, December 2012 multiple clients open a file concurrently, and one of applications. For hot data replicated among several them modifies the file, the server will notify every nodes, when one node failed, other members will client to invalidate its cache. take over. As a result, upper applications will access Clients of coda file system [7][8] adopt local the original data, which can

Load more