High Availability with Oracle Linux

High Availability with Oracle Linux Lucian Preoteasa Sales Consultant Oracle Linux and Oracle VM June, 2015 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 3 Agenda 1. OCFS2 2. Clusterware 3. Ksplice 4 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Market Drivers OCFS2 5 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Essential Concepts What is a Clustered File System? • a file system which is shared by being simultaneously mounted on multiple servers • Oracle supports a shared disk clustered file system architecture providing a basis for load balancing and failover solutions (e.g. OCFS2, ACFS) Copyright © 2014, Oracle and/or its affiliates. All rights reserved. OCFS2 Scalable Cluster File System at No Additional Cost • Shared-disk cluster file system for Linux • POSIX+ conformance • Native journaling file system ORIGINALLY DEVELOPED BY ORACLE – 2003: Developed as successor to OCFS – January 2006: Integration into mainline Linux (2.6.16) • Architecture- and endian-neutral – Parallel mounts on x86, x86-64, IA64 or PPC possible – Big- and little-endian, 32-bit and 64-bit Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Oracle Cluster File System Architecture 8 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Oracle Cluster File System Features and Benefits Overview • Advanced Security (POSIX ACLs and SELinux) and Quotas • REFLINK Snapshots with Copy-On-Write • In-built Clusterstack with a Distributed Lock Manager • File Size Scalability up to 16 TB • Cluster Scalability up to 32 Nodes • Used by Oracle VM Tier 1 Virtualization solution, database clusters (Oracle RAC), middleware clusters (Oracle E-Business Suite), appliances (SAP's Business Intelligence Accelerator), and many other Oracle products Copyright © 2014, Oracle and/or its affiliates. All rights reserved. OCFS2 Architecture OCFS2 Heartbeat and Split Brain Scenario Avoidance • With the local heartbeat the heartbeat threads write/read from the heartbeat file per mount basis • With the global heartbeat the heartbeat threads write/read from the regions that was initialized together with the O2CB cluster stack e.g.: Cluster Node 1 Node 2 Node 3 in Optimal Status O2CB O2CB O2CB I see 1,2,3 I see 1,2,3 I see 1,2,3 Shared Heartbeat Files or Storage Regions Copyright © 2014, Oracle and/or its affiliates. All rights reserved. OCFS2 Cluster Stack Storage Heartbeat • Use heartbeat to check a node’s health – In a separate service - o2cb Storage heartbeat (o2hb-diskid process) – Only way to check liveness of a node in cluster – Use global heartbeat region in SPFS of a clustered server pool • Each node has specific area (by index of node) in global heartbeat region – Every two seconds • Each node tries to update timestamp in assigned area • Read global heartbeat region to know liveness of other nodes – Fail to update timestamp within defined interval will cause heartbeat failure Copyright © 2014, Oracle and/or its affiliates. All rights reserved. OCFS2 Cluster Stack Network Heartbeat Driven by o2net process – To check connectivity to other nodes in cluster – How to: • Establish connection to other nodes in cluster – When a Node is started – When a Node finds a new alive node through storage heartbeat • Keep-alive message is sent periodically after connection is established • Fail to connect some alive nodes within defined interval – Means network heartbeat failure – Will cause split-brain problem • Resolved by quorum mechanism in OCFS2 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. OCFS2 - Cluster Configuration /etc/ocfs2/cluster.conf – Cluster Layout $ cat /etc/ocfs2/cluster.conf node: cluster: cluster = b383b1a1e6fc003f name = b383b1a1e6fc003f name = ovs2 heartbeat_mode = global number = 1 node_count = 2 ip_address = 10.146.147.2 ip_port = 7777 node: cluster = b383b1a1e6fc003f heartbeat: name = ovs1 cluster = b383b1a1e6fc003f number = 0 region = ip_address = 10.146.147.1 0004FB000005000054AE95D21C6E22FB0 ip_port = 7777 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. OCFS2 - Cluster Configuration /etc/sysconfig/o2cb – Cluster Timeout Values $ cat /etc/sysconfig/o2cb Storage heart-beat O2CB_ENABLED=true configuration O2CB_STACK=o2cb • o2hb-diskid daemon O2CB_BOOTCLUSTER=b383b1a1e6fc003f Network heart-beat O2CB_HEARTBEAT_THRESHOLD=61 configuration O2CB_IDLE_TIMEOUT_MS=60000 • o2net daemon O2CB_KEEPALIVE_DELAY_MS=2000 O2CB_RECONNECT_DELAY_MS=2000 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Market Drivers Clusterware 15 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Oracle Clusterware A History of Providing High Availability Solutions • Oracle Clusterware was introduced as part of Oracle Database 10g, as the foundation for the Oracle Real Application Cluster (RAC) solution. • Beginning with 10g Release 2, Oracle enhanced the features of Oracle Clusterware to provide High Availability services for any workload. – This includes Oracle products and 3rd party products! • The performance and reliability features used in these solutions can be leveraged for all high availability workload needs. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. High Availability General Concepts • The clustering of systems is a standard business practice • Enterprise solutions have existed in market for over 15 years • Basic concepts are all similar – Grouping multiple systems together to appear as a single system – Provide redundancy to prevent a single point of failure – Isolate problem nodes to prevent data corruption | workload failure Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Oracle Clusterware Public Interconnect The Basics • Hardware Components Application | Web Services – Nodes – Shared Storage (NAS or SAN) – Private Interconnect Local Local Local Local Local Local Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 – Public Interconnect (LAN) Cluster Heartbeat – ACFS, OCFS2, NFS Private Interconnect Voting OCR • Software Components Disk – Voting Disk – Oracle Cluster Registry Shared Storage Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Managing Resources Public Interconnect Heartbeats, Fencing, Failover, Dependencies Application | Web Services • Oracle Clusterware uses a cluster Fenced Node heartbeat to monitor the status of each node • To prevent split-brain conditions, nodes Local Local Local Local Local Local are fenced if unresponsive Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Cluster Heartbeat • Failover actions are defined by the Private Interconnect Voting OCR application action profile Disk • Resource failover can be dependent on other cluster resources Shared Storage Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Managing Non-Oracle Applications with Oracle Clusterware Simple Integration with Oracle Clusterware Application Framework Standalone and bundled agents are available, beginning with 11.2.03 (see http://oracle.com/goto/clusterware) Generic Agent for easy application plug-in available beginning with 11.2.04 Script Agent to build your own agent available as of 11.2 Oracle Certified partners are providing application agents Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Market Drivers Ksplice 21 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Zero-downtime Kernel Updates Oracle Ksplice - providing zero-downtime kernel updates since 2007 • Oracle Ksplice capabilities are extensive – Supports multiple operating system releases and kernel versions – Capable of patching a variety of kernel issues – Easy to apply and rollback updates – Simple, flexible tools and options for installing updates – Proven track record in providing stable updates for production systems Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Avoid Expensive Disruptions A Reboot Impacts More Than Servers • Time is money • Each reboot of a production system impacts systems connected to or relying on the system: – Middleware – Database – Storage – Applications • Not to mention the impact it has on groups throughout an organization Copyright © 2014, Oracle and/or its affiliates. All rights reserved. How it works Ksplice Technology • Ksplice technology transforms Oracle Linux updates into zero downtime updates • Linux servers within the customer environment connect to a Unbreakable Linux Network to download and apply updates while the system is running • Customers can track the status of their servers via an intuitive web interface and can integrate zero downtime updates into existing management tools via an API Client Kernel update Ksplice technology Zero downtime kernel update Customer systems Copyright © 2014, Oracle and/or its affiliates. All rights reserved. Thee Ways to Consume Ksplice 1x Online, 2x Offline • Individual Servers can register with Oracle’s Ksplice server directly – Each system must be connected to Internet – Each system will check for new updates

Load more