Analysis and Concept of the New Profile Cluster for The
Total Page:16
File Type:pdf, Size:1020Kb
MASARYK UNIVERSITY F}w¡¢£¤¥¦§¨ ACULTY OF I !"#$%&'()+,-./012345<yA|NFORMATICS Analysis and Concept of the New Profile Cluster for the UCN Domain BACHELOR’S THESIS Martin Janek Brno, Spring 2009 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Advisor: Mgr. Pavel Tucekˇ iii Acknowledgement I would like to express my deepest gratitude to my advisor Mgr. Pavel Tucekˇ and consultant Mgr. Ing. Luka´sˇ Rychnovsky´ for their time, guidance and constructive advice. v Abstract Masaryk University relies on Microsoft Windows network to enable its users to access their files from any workstation connected to UCN (University Computer Network) domain. The solution currently in use is soon to be replaced with new hardware. The aim of this thesis is to analyse current clustering options available in Windows Server 2008 and suggest the best solution for this purpose. vii Keywords failover cluster, high availability, redundancy, Windows Server 2008, storage per- formance, windows domain profile, UCN ix Contents 1 Introduction .................................... 7 2 High Availability at the Hardware Level ................... 9 2.1 Hardware Redundancy ........................... 9 2.2 Dynamic Hardware Partitioning ..................... 10 2.3 RAID ..................................... 11 2.3.1 RAID 0 – Striping . 12 2.3.2 RAID 1 – Mirroring . 13 2.3.3 RAID 3 – Bit-Interleaved Parity . 13 2.3.4 RAID 4 – Block-Interleaved Parity . 13 2.3.5 RAID 5 – Distributed Block-Interleaved Parity . 14 2.3.6 RAID 6 – Distributed Block-Interleaved Dual Parity . 15 2.3.7 Nested RAID Levels . 15 2.4 Storage Area Network ........................... 16 2.4.1 Comparison of SAN and NAS . 16 2.4.2 Fibre Channel . 17 2.4.3 SAN and High Availability . 19 3 Testing Storage Array Performance ...................... 21 3.1 Testing Methodology ............................ 21 3.2 Run One Configuration and Results ................... 21 3.3 Run Two Configuration and Results ................... 22 4 Windows Server 2008 Clustering Options . 25 4.1 Network Load Balancing .......................... 25 4.2 Failover Clustering ............................. 26 4.2.1 How Failover Clustering Works . 27 4.2.2 Quorum Models . 28 4.2.3 Multi-site Clustering . 31 4.2.4 Hyper-V and Failover Clustering . 32 4.2.5 Cluster Shared Volumes . 32 5 Clustering Solution for Storing UCN Domain Profiles . 35 5.1 Current Solution ............................... 35 5.2 New Solution ................................ 35 5.3 Failure Scenarios .............................. 38 6 Conclusion ..................................... 41 Bibliography . 41 1 List of Tables 3.1 Run One Results – Operations per Second 22 3.2 Run One Results – Transfer Rate [MBps] 22 3.3 Run One Results – Average Response Time [ms] 22 3.4 Run One Results – Maximum Response Time [ms] 23 3.5 Size of UCN Profile Files 23 3.6 Run Two Results – Operations per Second 23 3.7 Run Two Results – Transfer Rate [MBps] 24 3.8 Run Two Results – Average Response Time [ms] 24 3.9 Run Two Results – Maximum Response Time [ms] 24 3 List of Figures 2.1 Dynamic Hardware Partitioning – a single physical server divided into three hardware partitions. Some components are not in use and are dedicated for use as spares in case of other components’ failure. 10 2.2 RAID 0 – Striping 12 2.3 RAID 1 – Mirroring 13 2.4 RAID 3 – Bit-Interleaved Parity 14 2.5 RAID 4 – Block-Interleaved Parity 14 2.6 RAID 5 – Distributed Block-Interleaved Parity 15 2.7 RAID 6 – Distributed Block-Interleaved Dual Parity 15 2.8 Storage Area Network with redundant switches and independent fabrics 17 2.9 Fibre Channel Arbitrated Loop 18 2.10 Fibre Channel Switched Fabrics 18 4.1 Disk Only Quorum Model; one node and disk communicate – the cluster runs 28 4.2 Disk Only Quorum Model; nodes can communicate but the disk is unavailable – the cluster is offline 29 4.3 Node Majority Quorum Model; majority of nodes can communicate – the cluster is online 29 4.4 Node Majority Quorum Model; quorum cannot be achieved – the cluster is offline 29 4.5 Node and Disk Majority Quorum Model; majority of devices can communicate – the cluster is online 30 4.6 Node and Disk Majority Quorum Model; majority of devices can communicate – the cluster is online 30 4.7 Node and Disk Majority Quorum Model; some devices can communicate but majority is not achieved – the cluster stops 30 5.1 Current Solution Schematics Diagram 36 5.2 New Solution Schematics Diagram; notice that each node of cluster one uses a single dual-port FC adapter instead of two independent adapters. 37 5.3 Storage Configuration: each array (16 disks) is divided into two virtual drives (VD). Each virtual disk contains two LUNs – one for the witness disk (W), another for data file systems (1–8). 39 5 Chapter 1 Introduction A vast number of businesses today rely on electronic data exchange. Many mission- critical applications and services of such companies reside on servers. A failure to keep the servers in continuous operation might result in core business services becoming unavailable and the business losing money and reputation. And with competition being only a few clicks away in the global market, the need of high availability has become greater than ever before. However, ensuring continuous operation of services is problematic. Servers fail despite being made of high-quality components [1]. Therefore it is essential to implement countermeasures should a failure occur. Redundancy is a way of increasing hardware reliability but it is also necessary to ensure that services op- eration will not be interrupted even in case of natural disasters such as fires or earthquakes. In order to meet these requirements and achieve high availability, businesses implement clustering. Failover clustering and network load balancing are two key concepts when continuous service operation is the goal. Clustered solutions can effectively deal with temporary hardware or software malfunction. Despite the fact that clustering helps to substantially improve services availability and eliminate single points of failure, it is not meant to replace dedicated fault- tolerant solutions [1]. Instead, clustering is a cost effective solution, which can take advantage of commodity components and leverage existing investments. Masaryk University relies on Microsoft Windows network for storing UCN (University Computer Network) domain user profiles [2]. Students and employ- ees accessing Windows based workstations use UCN domain login and can access their files from any workstation connected to UCN domain. The underlying infras- tructure is soon to be replaced with new hardware in order to increase reliability and storage space. This Bachelor’s thesis aims to achieve two objectives with re- spect to this fact. Firstly, we analyse, test and compare clustering options using Microsoft Windows Server 2008 operating system. The analysis of each method in- cludes hardware and software requirements, reliability of hardware and software used, equipment cost and ease of administration and installation. Secondly, we test performance of a new storage array in several RAID configurations. Based on the results obtained from the analysis and testing a suggestion on the most appropriate clustering solution is made. The characteristics of the disk array that we look at are read/write transfer rate, 7 1. INTRODUCTION input/output operations per second (ops) and latency. We use Iometer software [3] to measure these characteristics. We experiment with I/O request lengths accord- ing to statistics on size of the files that are commonly used in UCN domain profiles. We assume that our file server stores large amount of small-sized files (< 10 KB), so I/O ops and latency are crucial. We test two RAID configurations, namely RAID 10 and RAID 6, which are convenient because they both provide fault-tolerance and do not have a single bottleneck such as a dedicated parity disk [4]. Assuming that the majority of I/O operations are read operations, the otherwise decreased write performance of RAID 6 should not be a problem. This thesis is divided into four chapters. First chapter explains the principles of hardware mechanisms that are used in high availability solutions. Namely re- dundant hardware components, RAID levels and their suitability for our purpose (file server). In addition, it describes Storage Area Network (SAN) options that can be used with clustering [5]. Second chapter presents RAID testing methodology and results. Third chapter deals with high availability ensured by the operating system. It compares clustering options in Windows Server 2008 and presents the outcome of testing these options. This chapter also provides in-depth explanation of failover clustering. Fourth chapter builds on the previous three chapters and its aim is to suggest an optimal clustering solution for storing UCN domain pro- files. This includes choosing appropriate hardware, suggesting the most efficient way of storing data on the disk array, proposing a clustering model and justifying its suitability for our purpose. Fourth chapter also contains a diagram of network connections (Ethernet and SAN) necessary for high availability of our solution. 8 Chapter 2 High Availability at the Hardware Level 2.1 Hardware Redundancy Servers are made of high quality components that are often the reason of their higher price. Even though these components provide prolonged lifetime it is im- possible to ensure 100 percent reliability. That is why many platforms today offer redundant components, which further increase reliability. Redundant hardware can detect a failing component and assign its function to another component. Re- dundant components mostly include power supplies, cooling fans, network inter- face cards (NICs), network switches, redundant storage, CPUs and memory (that is usually ECC enabled to further increase error protection) [6]. Some high-end sys- tems provide true fault-tolerance by duplicating all their components including motherboard components [1].