Pacemaker – the Open Source, High Availability Cluster

Pacemaker The Open Source, High Availability Cluster Research Institute for Software technology OpenSource Technical Team | Kim, donghyun Saturday, July 23, 2016 한국 리눅스 사용자 그룹 [email protected] Korea Linux User Group # Whoami Systems and Infrastructure Geek Enterprise Linux Infrastructure Engineer (Red Hat) Work - Technology Research - Technical Support : Troubleshooting, Debugging, Performace Tuning….. - Consulting : Linux (Red Hat, SUSE, OEL), Virtualization, High-Availability...... Hobby - Trevelling - Drawing (cartoon) I love linux ♥ - Blog : http://rhlinux.tistory.com/ - Café : http://cafe.naver.com/iamstrong - SNS : https://www.facebook.com/groups/korelnxuser 제3회 난공불락 오픈소스 인프라세미나 1 In this Session Pacemaker’s Story - The Open Source, High Availability Cluster Overview of HA architectural components Use case examples Future endeavors 제3회 난공불락 오픈소스 인프라세미나 2 Pacemaker - The Open Source, High Availability Cluster 제3회 난공불락 오픈소스 인프라세미나 3 HA for OpenSource Technology 제3회 난공불락 오픈소스 인프라세미나 4 “Mission Critical Linux” 제3회 난공불락 오픈소스 인프라세미나 5 High-Availability Clustering in the Open Source Ecosystem https://alteeve.ca/w/High-Availability_Clustering_in_the_Open_Source_Ecosystem 2014s ~ 2010s ~ . Pacemaker 1.1.10, released with RHEL6.5 . 2010s, Pacemaker version 1.1 . Red Hat 에서는 기존 cman과rgmanager 방식을 RHEL6 - CIB (Cluster Information Base, XML Configure) - Red Hat’s “cman” Support 라이프사이클이 종료되는 시점(2020년)까지 지원예정 - SLES11 SP1 (OpenAIS to Corosync) . Global Vendors 간 기술 협약을 통해 적용범위 확대 - Hawk, a web-based . 오늘날, Clusterlabs는 Heartbeat Project 에서생성된 . 2010, Pacemaker added support for cman Component들과 다른 솔루션형태로 빠르게 통합 및 변화 . Heartbeat project reached version 3 2006s 2008s . 2005, RedHat’s cman+rgmanager . 2007, Pacemaker (Heartbeat v 2.1.3) (RHCS, Cluster Services version 2) - Heartbeat package called "Pacemaker“ . 2007s, two projects remained entirely separate until 2007 . 2008s, SUSE 및 레드햇 개발자 모두 몇 가지 코드를 재사용 when, out of the Linux HA project, Pacemaker was born as a 논의에 대해 비공식 회의 cluster resource manager that could take membership from - SUSE's CRM/pacemaker and communicate via Red Hat's OpenAIS or SUSE's Heartbeat. - Red Hat's OpenAIS . 1998–2007s, Heartbeat Old Linux-HA cluster manager . 2008, Pacemaker version 0.6.0 was release - Alan Robertson - support for OpenAIS . 2009s, “Corosync” 새로운 Project 발표 2004s 2002s 1998s . 2002, REDHAT "Red Hat Cluster Manager" Version 1 - RHEL2.1 . 1990s, 오픈 소스 고가용성 플랫폼을 만들 수있는 두 개의 완전히 독립적 인 . 2003, SUSE's Lars Marowsky-Brée conceived of a new project 시도는 1990년대 후반에 시작 called the "crm" - SUSE's "Linux HA" project . 2003, Red Hat purchased Sistina Software ‘GFS’ - Red Hat's “Cluster Services" . 2004, Cluster Summit에 SUSE와 Red Hat developers 함께 참석 . 1998s, "Heartbeat“ 불리우는 새로운 프로토콜 'Linux-HA'프로젝트, 이후 heartbeat v1.0 발표 . 2004, SUSE, in partnership with Oracle, released OCFS2 . 2000s, “Mission Critical Linux” . 2005, "Heartbeat version 2“released . 2000s, ‘Sistina Software’ 회사 창립 “Global File System” 제3회 난공불락 오픈소스 인프라세미나 6 OpenSource Project Progress Hwak (GUI) pcs_gui luci Pacemaker-mgmt booth pcs crmsh pacemaker pacemaker rgmanager resource -agents fence- agents cluster-glue cman Heartbeat corosync Linux-HA / ClusterLabs SLES HA RHEL HA Add-on Community Novell Red Hat Developer Developer Developer 제3회 난공불락 오픈소스 인프라세미나 7 Architectural Software Components Corosync: - Messaging(framework) and membership service. Pacemaker: - Cluster resource manager Resource Agents (RAs): - 사용가능한 서비스를 구성/관리 및 모니터링 Fencing Devices: - Pacemaker에서 fencing 을 STONITH라 부름 User Interface: - crmsh (Cluster Resource Manager Shell) CLI tools and Hawk web UI (SLES) - pcs (Pacemaker Configuration System) CLI tools and pcs_gui (RHEL) 제3회 난공불락 오픈소스 인프라세미나 8 More …. LVS : (=Keepalive) - Kernel space, Layer 4, ip + port. HAproxy : - user space, Layer 7, HTTP based. Shared filesystem : - OCFS2 / GFS2 Block device replication : - DRBD, cLVM mirroring, Cluster md raid1 제3회 난공불락 오픈소스 인프라세미나 9 Pacemaker : the resources manager Pacemaker (Python-based Unified, scriptable, cluster shell) - 리눅스플랫폼을 위한 고가용성과 로드밸런싱 스택 제공 - Resource Agents(RAs)를 통한 Application 간 상호작용을 통한 설정이 가능 클러스터 리소스의 정책을 사용자가 직접 결정 - Resource Agents 설정을 만들고 지우고 변경하는 것에 대한 자유로움 - 여러 산업 (공공, 증권/금융, 통신 등)환경의 어플리케이션에서 요구하는 HA조건들을 대체로 만족 - 리소스형태 fence agents 설정관리 용이 Monitor and Control Resource : - SystemD / LSB / OCF Services - Cloned Services : N+1, N+M, N nodes - Multi-state (Master/Slave, Primary/Secondary) STONITH (Shoot The Other Node In The Head) : - Fencing with Power Management 제3회 난공불락 오픈소스 인프라세미나 10 Pacemaker - Architecture Component Resource Agents - Agent Scripts Resource Agents - Open Cluster Framework LRMd PEngine Pacemaker Stonithd CRMd CIB - Resource Management Cluster Abstraction Layer Corosync - Membership Corosync - Messaging - Quorum 제3회 난공불락 오픈소스 인프라세미나 11 Pacemaker - High level architecture Resource Agents RAs Services Resources Layer (Apache, PostgreSQL 등) Local Cluster Policy Resource Information Engine CIB (복제) Manager Base LRM XML XML CRM Cluster Resource Manager Resource Allocation Layer Corosync Corosync Messaging / Infrastructure Layer Cluster Cluster Node #1 Node #2 제3회 난공불락 오픈소스 인프라세미나 12 Quick Overview of Components - CRMd CRMd (Cluster Resource Management daemon) - main controlling process 역할 담당 RA RA RA Resource Layer - 모든 리소스 작업을 라우팅해주는 데몬 LRM PE - Resource Allocation Layer내에서 동작되는 모든 동작 처리 CIB STONITH CRM (XML) - Maintains the Cluster Information Base (CIB) Resource Allocation Layer - CRMd에 의해 관리된 리소스는 필요에 따라 클라이언트 시스템에 Corosync 전달, 쿼리되거나 이동, 인스턴스화, 변경 Messaging/Infrastructure Layer 제3회 난공불락 오픈소스 인프라세미나 13 Quick Overview of Components - CIB CIB (Cluster Information Base) - 설정 정보 관리 데몬. XML파일로 설정 (In-memory data) RA RA RA Resource Layer - DC(Designated Co-ordinator)에 의해 제공되는 각 노드별 LRM 설정내용 및 상태 정보를 동기화 PE - CIB 은 cibadmin 명령어를 사용하여 변경할수 있고, crm shell CIB STONITH CRM 또는 pcs utility 사용 (XML) Resource Allocation Layer Corosync Messaging/Infrastructure Layer 제3회 난공불락 오픈소스 인프라세미나 14 Quick Overview of Components - PEngine PEngine (PE or Policy Engine) - PE프로세스는 각 노드에서 실행되지만, DC[1]에서만 활성화 RA RA RA Resource Layer - 여러 서비스환경에 따라 Clone 및 domain 등 사용자요구에 따라 LRM 정책 부여 PE - 다른 클러스터 노드로 리소스 전환시 의존성 확인 CIB STONITH CRM (XML) Resource Allocation Layer Corosync Messaging/Infrastructure Layer [1] DC = Designated Controller (master node) 제3회 난공불락 오픈소스 인프라세미나 15 Quick Overview of Components - LRMd LRMd (Local Resource Management Daemon) - CRMd와 각 리소스 사이에 인터페이스 역할을 수행하며, RA RA RA CRMd의 명령을 agent에 전달 Resource Layer LRM - CRM을 대신하여 자기 자신의 RAs(Resource Agents) 호출 PE - CRM수행되어 보고된 결과에 따라 start / stop / monitor를 동작 CIB STONITH CRM (XML) Resource Allocation Layer Corosync Messaging/Infrastructure Layer 제3회 난공불락 오픈소스 인프라세미나 16 Quick Overview of Components - Resource Agents (1/2) RAs (Resource Agents) 는 클러스터리소스를 위해 정의된 규격화된 인터페이스 - local resource의 start / stops / monitors 스크립트 제공 RA RA RA - RAs(Resource Agents)는 LRM에 의해 호출 Resource Layer Pacemaker support three types of RA’s : LRM PE - LSB : Linux Standard Base “init scripts” • /etc/init.d/resource CIB STONITH CRM - OCF : Open Cluster Framework (LSB Resource agents 확장자) (XML) Resource types : Standard:provider:name Resource Allocation Layer • /usr/lib/ocf/resource.d/heartbeat • /usr/lib/ocf/resource.d/pacemaker Corosync - Stonith Resource Agents Messaging/Infrastructure Layer Resource = Service http://linux-ha.org/wiki/OCF_Resource_Agent clone = multiple instances of a resource http://linux-ha.org/wiki/LSB_Resource_Agents https://github.com/ClusterLabs/resource-agents ms = master-slave instances of a resource 수백만명의 많은 Contributer 들이 여러 Application환경에 적용될수 있도록 github 통해 배포 제3회 난공불락 오픈소스 인프라세미나 17 Quick Overview of Components - Resource Agents (2/2) 기본적으로 Resource Agents 제공되어지는 기능: - start / stop / monitor - Validate-all : resource 설정 확인 - Meta-data : Resource Agents 대한 정보를 스스로 회신 (GUI or Other tools) OCF Resource Agents를 통해 제공되는 추가 기능 - promote : master/ Primary - demote : slave/secondary - Notify : 이벤트 발생된 리소스를 사전에 클러스터 에이전트를 통해 통보하여 알림 - Reload : 리소스 설정정보를 갱신 - Migrate_from/migrate_to : 리소스 live migration 수행 Resource scores 값의 의미 - 대부분 리소스는 Score가 정의되어 있지만 종종 지정되어 있지 않은 경우가 있음 - 어느 Cluster nodes에서 리소스가 사용되고 결정되어지는 경우 필요 - Higfest Score INF (1.000.000), lowest score –INF(-1.000.000) - 해당 값의 Positive의미는 “Can run”, Negative한 의미로는 “Can not run” +/-INF 변경값으로 “can 또는 must” 가능 제3회 난공불락 오픈소스 인프라세미나 18 Quick Overview of Components - STONITHD (1/2) STONITHD “Shoot The Other Node In The Head Daemon” - fence node에서 사용되는 서비스 데몬 RA RA RA Resource Layer - Pacemaker의 fencing agents LRM PE - 일반적인 클러스터 리소스로써 모니터링 CIB STONITH CRM (XML) - STONITH-NG는 STONITH 데몬의 다음 세대로 모니터링, 알림 및 기타 기능 제공 Resource Allocation Layer Corosync Messaging/Infrastructure Layer 제3회 난공불락 오픈소스 인프라세미나 19 Quick Overview of Components - STONITHD (2/2) Application-level fencing 설정 가능 - Pacemaker 에서 직접 fencing 조정 - fenced (X) stonithd (O) 실무에서 가장 많이 사용되는 fence devices : - APC PDU (Networked Power Switch) - HP iLO, Dell DRAC, IBM IMM, IPMI Appliance 등 - KVM, Xen, VMware (Software library) - 소프트웨어 기반의 SBD (SUSE진영 가장 많이 사용) Data integrity (데이터 무결성)을 위해 반드시 필요 - 클러스터내 다른 노드로 리소스를 전환하기 위한 가장 최상의 방법 - “Enterprise”을 지향하는 Linux HA Cluster에서는 선택이 아닌 필수 제3회 난공불락 오픈소스 인프라세미나 20 What is fencing? ‘Planned or Unplanned’ 시스템 다운타임으로 부터 데이타보호하고 예방하기 위한 장치 (I/O Fencing) Kernel panic System freeze Live hang / recovery 제3회 난공불락 오픈소스 인프라세미나 21 Quick Overview of Components - Corosync 일반적인 클러스터, 클라우드컴퓨팅 그리고 고가용성 환경에서 사용되는 오픈소스 그룹 메시징시스템. open source group messaging system typically used in clusters, cloud computing, and other high availability environments. RA RA RA Pacemaker 작동에 필요한 기본 클러스터 인프라 Resource Layer LRM Communication Layer : messaging and membership PE - Totem single-ring ordering and membership protocol - 기본적인 제약 조건 : 브로드캐스트를 통한 멀티캐스트 통신 CIB STONITH CRM 방식을 선호 (XML) - UDP/IP and InfiniBand 기반의 networks 통신 Resource Allocation Layer - UDPU (RHEL경우 6.2+ 이상부터 지원) Corosync - Corosync (OpenAIS) cman (Only RHEL6) Messaging/Infrastructure Layer 클러스터 파일시스템 지원 (GFS2, OCFS2, cLVM2 등) 제3회 난공불락 오픈소스 인프라세미나 22 Corosync Cluster Engine Architecture Handle Database Manager : - maps in O1 order a unique 64-bit handle identifier to a memory address.

Pacemaker – the Open Source, High Availability Cluster

FORM 10−K RED HAT INC − RHT Filed: April 30, 2007 (Period: February 28, 2007)

Design and Implementation of the Spad Filesystem

Red Hat, Inc. Securities Litigation 04-CV-473-Consolidated Amended

Mellanox Switch Management System (MLNX OS) Software: End-User Agreement

PC Magazine® Linux® Solutions

View Annual Report

The GFS2 Filesystem

Postgresql License

Red Hat Inc. Annual Report 2004 Amended

Doctoral Thesis

Paulo Orlando Reis Afonso Lopes a Shared-Disk

Vysoke´Ucˇenítechnicke´V Brneˇ