Oracle Database Service with Data Guard? Robert Bialek Senior Principal Consultant

@RobertPBialek doag2017 Who Am I

Senior Principal Consultant and Trainer at Trivadis GmbH in Munich. – Master of Science in Engineering. – At Trivadis since 2004. – Trivadis Partner since 2012. Focus: – Data and Service High Availability, Disaster Recovery. – Architecture Design, Optimization, Automation. – New Technologies (Trivadis Technology Center). – Open Source. – Technical Project Leadership. – Trainer: O-GRINF, O-RAC, O-DG. 2 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Our company.

Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields:

O P E R A T I O N

Trivadis Services takes over the interacting operation of your IT systems.

3 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? With over 600 specialists and IT experts in your region.

COPENHAGEN

14 Trivadis branches and more than 600 employees. HAMBURG 200 Service Level Agreements. Over 4,000 training participants. Research and development budget: DÜSSELDORF CHF 5.0 million.

FRANKFURT Financially self-supporting and sustainably profitable. STUTTGART Experience from more than 1,900 FREIBURG VIENNA MUNICH projects per year at over 800 BRUGG customers. BASEL ZURICH BERN GENEVA LAUSANNE

4 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Technology on its own won't help you. You need to know how to use it properly.

5 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Database Service High Availability – Goal

Increase database service uptime, by: – eliminating any single point of failure to avoid unplanned outages. – minimizing the effect of an unplanned outage on the end user (automatic failover). – reducing downtimes during planned outages.

Consider the whole SW/HW stack. Find the best cost/risk ratio. Effort Downtime Costs Database Application Complexity Storage Clients Server(s) Server(s) Best cost/risk ratio

Availability

6 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Database Service High Availability – Options HA Cluster – primarily used option for service high availability:

– Real Application Clusters. SPOF – RAC One Node. – Cold Failover Cluster.

HA Data – used mostly for data, rather than service high availability: – Data Guard (Fast-Start Failover/Global Data Services). – GoldenGate (Global Data Services).

– Other replication technologies. HA

7 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Agenda

1. Introduction 2. Configuration 3. Special Cases 4. Conclusions

8 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Introduction

9 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Database Service HA with Data Guard? – Introduction

Data Guard Yes, it can also be used for service high availability: – Planned downtimes – manual switchover.

– Unplanned downtimes – fast-start failover or manual failover. Primary Standby FSFO Configuration Why might we consider Data Guard for service high availability: – Less complex than a cluster installation. – Infrastructure requirements not that high (even local storage is sufficient). – Not subject to additional license fees (EE license assumed). – Additionally, many other advantages: data high availability, snapshot standby, potentially rolling upgrade capability, ...

But, with some restrictions we need to consider...

10 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Database Service HA with Data Guard? – Big Picture

Backup Observers (optional, 12.2) Database Clients Master Observer Required

Ping Ping Ping

Primary RW Service

Transparency required (failover/ switchover) ...

Primary Target Candidate Target Failover Standbys Failover Standby (optional, 12.2)

11 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Database Service HA with Data Guard? – Monitoring

Observer Failover condition detected Reconnect interval expired Logoff

Timeout ObserverReconnect DGMGRL – Threads property set and reached Connect Failover re-tries W000 B001 P001 S001

SLEEP SLEEP ~ 3sec. Connect ~ 3sec.

Enter PING State

PING PING PRIMARY TARGET STANDBY

12 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Configuration

13 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Data Guard Protection Modes with FSFO – Prerequisites

FSFO: Guaranted zero data FSFO: Data loss possible. FSFO: Guaranted zero data loss. loss.

MaxAvailability (10.2+) MaxPerformance (11.1+) MaxProtection (12.2) ▪ LogXptMode=SYNC or ▪ LogXptMode=ASYNC ▪ LogXptMode=SYNC FASTSYNC (12.1+) ▪ FastStartFailoverTarget(*) ▪ FastStartFailoverTarget(*) ▪ FastStartFailoverTarget(*) ▪ FastStartFailoverLagLimit ▪ Flashback Database ▪ Flashback Database ▪ Flashback Database ▪ Recommended: at least 2 STDBY DBs (protection Mostly used mode downgrade!) protection mode

All Protection Modes DGMGRL> EDIT CONFIGURATION SET PROPERTY FastStartFailoverThreshold = ; DGMGRL> ENABLE FAST_START FAILOVER; Value in seconds

14 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Observer Fast-Start Failover – Observer (1)

Ping Monitoring component, initiates a failover procedure. In 12.2, up to 3 observers (in background) can be started: W000 B001 P001 S001 – One master and up to two backup (standby) observers. PRIMARY TARGET DGMGRL> START OBSERVER OBS1.TRIVADIS.COM IN BACKGROUND Failover Standby FILE IS '$ADMIN_SID/fsfo_$ORACLE_SID.dat' LOGFILE IS '$ADMIN_SID/fsfo_$ORACLE_SID.log' CONNECT IDENTIFIER IS .TRIVADIS.COM; Oracle wallet required In older releases: – Only one running observer (HA needs to be adressed). nohup dgmgrl -logfile $ADMIN_SID/fsfo_$ORACLE_SID.log <

15 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Observer Fast-Start Failover – Observer (2)

Ping Fast start-failover is initiated by the master observer to the target standby database, if one of the following W000 B001 conditions is detected: P001 S001

– observer and the target standby database cannot reach PRIMARY TARGET the primary database (default: ObserverOverride=‘FALSE‘). Failover Standby – user-configurable condition is met. – DBMS_DG.INITIATE_FS_FAILOVER function has been executed.

Additonally, other pre-condidtions enforced by a protection mode need to be fulfilled: – MaxProtection/MaxAvailability: target failover standby is in SYNC. – MaxPerformance: FastStartFailoverLagLimit not reached for the target failover standby.

16 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Data Guard: Role-Based Services

For a Data Guard system, we need a role-based service, Database Clients that is running only if database has a specific role: – Read-write service on a primary database. – Optionally, a service on standby databases for reporting. ? – Optionally, a service on snapshot standby databases.

To accomplish this task: Service Service – Use Oracle Grid Infrastructure role-based services. R/W R/O [SNAP] – Create your own AFTER STARTUP ON DATABASE trigger.

17 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Data Guard: Example Role-Based Services

Example role-based services with Grid Infrastructure.

srvctl add service -db DB_SITE1 –service SRV_RW.trivadis.com -role PRIMARY srvctl add service -db DB_SITE1 –service SRV_RO.trivadis.com -role PHYSICAL_STANDBY srvctl add service -db DB_SITE1 -service SRV_SP.trivadis.com -role SNAPSHOT_STANDBY

Services are started only if database and service role match.

SvcAgent::start 680 query_db_role SvcAgent::start 710 not starting service srv_rw Role mismatch - Service role:PRIMARY, current DB role:PHYSICAL_STANDBY

Depending on used client HA features (TAF, FAN/FCF, AC) additional service properties need to be specified.

18 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Client-Side Configuration – Main Problems To Address

CASE 1 CASE 2

New network session (connect) Already established network session

Database Clients Database Clients

2 Connect attempts 4 Client failover 1 Connected 5 Client failover 3 Re-connect attempts 3 Wait for Problem Problem connect timeout 4 Wait for Problem re-connect timeout

Problem

1 IP not reachable (server/network/… issue) 2 IP not reachable (server/network/… issue)

19 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? New Oracle Net Session – Connect Timeout (1)

1 sqlnet.ora parameters (OCI, ODP.net) – Applies to each IP that a host name resolves to!

– All Oracle client versions supported. LSNR LSNR TCP.CONNECT_TIMEOUT=3 #default 60 sec. 2 Oracle Net SQLNET.OUTBOUND_CONNECT_TIMEOUT=5 #no default Three-way handshake 3 For clients >=11.2:

OLTP.trivadis.com = (DESCRIPTION = (FAILOVER=ON) (LOAD_BALANCE=OFF) Introduced in 12.1.0.2 (CONNECT_TIMEOUT=5)(RETRY_COUNT=3)(RETRY_DELAY=1)(TRANSPORT_CONNECT_TIMEOUT=3) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP )(HOST = italy )(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP )(HOST = sweden )(PORT = 1521))) (CONNECT_DATA = (SERVICE_NAME = OLTP_RW.trivadis.com)))

20 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? New Oracle Net Session – Connect Timeout (2)

JDBC Thin driver – TRANSPORT_CONNECT_TIMEOUT is available beginning with 12.2 version – To use RETRY_COUNT with 12.1.0.2, patch is required (BUG 19154304) pds.setURL("jdbc:oracle:thin:@(DESCRIPTION =(FAILOVER=ON)(LOAD_BALANCE=OFF)" + "(CONNECT_TIMEOUT=3)(RETRY_COUNT=10)(RETRY_DELAY=1)" + "(ADDRESS_LIST = " + "(ADDRESS = (PROTOCOL = TCP )(HOST = blue.trivadis.com )(PORT = 1521)) " + "(ADDRESS = (PROTOCOL = TCP )(HOST = brown.trivadis.com )(PORT = 1521))) " + "(CONNECT_DATA = (SERVICE_NAME = sales_rw.trivadis.com)))");

JDBC Thin clients can alternatively use the following driver property (ms) – Overrides CONNECT_TIMEOUT from address description parameters

Properties prop = new Properties(); prop.put(oracle.net.ns.SQLnetDef.TCP_CONNTIMEOUT_STR, ""+3000); ods.setConnectionProperties(prop);

21 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Established Oracle Net Session – Re-Connect Timeout 2

Break established network connection 1 without waiting for long TCP timeouts (>15 min.) P1 LSNR LSNR – In most cases no VIPs in use! Oracle Net 3 4 Timeout Client failover Using the following parameters is not a good idea:

SQLNET.RECV_TIMEOUT=30 #no default value, OCI driver SQLNET.SEND_TIMEOUT=30 #no default value, OCI driver

prop.put ("oracle.jdbc.ReadTimeout", "5000"); //5000ms, JDBC Thin driver

Better solution: – If possible use: Fast Application Notification/Fast Connection Failover. – Tuning OS kernel parameter tcp_retries2 might be also an alternative.

22 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Client HA Features – Overview

Transparent Application Failover: – Can be used with Data Guard the same way (advantages/disadvantages) as with a cluster. Fast Application Notification/Fast Connection Failover: – Oracle Grid Infrastructure is required to register with ONS. – Comparing to RAC “only” rapid notification about up/down events, no workload balancing. Application Continuity can be used with Data Guard the same way as with a cluster: – But requires RAC or RAC One Node or ADG (GG) option.

More about this topic: – DOAG 2016 presentation: „Oracle Client Failover - Under the Hood”

23 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Special Cases

24 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Special Cases with FSFO: Candidate Targets Standby

Starting with 12.2, many candidate fast-start failover target databases can be specified, but switchover or FSFO works only to the current target standby database.

DGMGRL> EDIT DATABASE db_site1 SET PROPERTY FastStartFailoverTarget = 'db_site2,db_site3'; Current target depends on many conditions Threshold: 60 seconds FSFO Target: db_site2 Candidate Targets: db_site2,db_site3 Observers: (*) obs1.trivadis.com obs2.trivadis.com Switchover db_site1 db_site2 db_site3

DGMGRL> SWITCHOVER TO db_site3; Error: ORA-16655: specified standby database not the current fast-start failover target standby.

25 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Special Cases: Switchover – DelayMins >0

DelayMins>0 Version 11.2.0.X: recovery re-started with NODELAY option. Version 12.1.0.X: recovery waits until DelayMins reached! – OPEN_MODE • Primary – CLOSED BY SWITCHOVER PRIMARY TARGET Failover Standby • Standby – MOUNTED – Application RW service outage within DelayMins time-frame! Application connect attemps fail with ORA-16456: switchover to standby in progress or completed Version 12.2.0.1: Switchover is not possible.

Error: ORA-16672: switchover not permitted to standby database with non-zero DelayMinsFailed.

26 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Special Cases with FSFO: Master Observer Failure Master Observer Backup Observers After about 31 sec. the master observer is changed, i.e.: available backup observer is promoted to the master role. – Note: to perform a master change, the primary database needs to be available! PRIMARY Data Guard Broker initiated a master observer switch since the current master observer cannot reach the primary database Logged on the For maintenance, a master change can be performed manually. primary database

DGMGRL> SET MASTEROBSERVER TO obs2.trivadis.com; Sent the proposed master observer to the data guard broker configuration. Please run SHOW OBSERVER to see if master observer switch actually happens.

27 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Special Cases with FSFO: Failover Target Failure Observer

After about 10 sec. the target failover standby is changed, i.e.: a candidate target failover standby observer is promoted to the current target role.

Permission granted to the primary database for target switch. The primary database returned to SYNC/NOT LAGGING state with the standby database db_site3. db_site1 db_site2 db_site3

Note: to perform the target failover standby change, the primary database and the master observer need to be available! – If the master observer fails at the same time:

LGWR: FSFO SetState("UNSYNC", 0x2) operation requires an ack Primary database will shutdown within 30 seconds if permission is not granted from Observer or FSFO target standby to proceed

28 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Special Cases with FSFO: Master Observer/Primary

Master Observer Backup Observers If the primary and the master observer fail: – No failover is initiated to a candidate standby. – From a backup observer log file:

Ready to failover check on standby returned RFS_NON_MSTOB. Command READY_TO_FSFO to thread S024 returned status=0 Fast-Start Failover is not possible because this observer is not the master. db_site1 db_site2 db_site3

If the master observer is started at a later time, it waits until FastStartFailoverThreshold timeout is reached again and fails over to the current target standby.

29 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Special Cases with FSFO: Private Redo Network

Master Observer If the public network on the primary server fails: – Broker configuration property: ObserverOverride=FALSE. – No failover (HB over private network still works!). Public Network Fast-Start Failover is not possible because primary last contacted the standby within FastStartFailoverThreshold seconds HB

db_site1 db_site2 db_site3 In this network configuration consider using: Private Network

DGMGRL> EDIT CONFIGURATION SET PROPERTY ObserverOverride='TRUE';

30 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Master Observer, Primary/Standby DB: Location? (1)

For DR HA service protection, do not place the primary and the master observer in the same data center!

Data Center 1 Data Center 2

Master Observer Backup Observer

No automatic failover!

Primary Target Standby

No RW application service available! 31 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Master Observer, Primary/Standby DB: Location? (2)

For DR HA service protection, do not place the primary and the master observer in the same data center!

Data Center 1 Data Center 2

Backup Observer Master observer placement correction monitoring: Master Observer DGMGRL> SET MASTEROBSERVER TO …

To relocate: Disable & Enable Automatic FSFO failover!

Candidate Primary Target Standby Target Standby Potential placement problem! RW application service available! 32 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Conclusions

33 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Conclusions (1)

Can Data Guard be a good solution for database service high availability? – Yes, with a fast-start failover configuration. – Anyway, it is not a replacement for a cluster but rather an alternative. – Careful business requirements analysis is necessary. Advantages: – It offers a good service high availability, in addition to excellent data high availability and some other features. – Fairly simple solution (setup and operation). – Not subject to additional license fees (EE license assumed). – Infrastructure requirements not that high as for a cluster. – Most client HA features can be used the same way as with a cluster.

34 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Conclusions (2)

Disadvantages: – Component placement is critical and requires customized monitoring scripts. – Some technical restrictions like network latencies (SYNC), flashback database or force logging might limit Data Guard in this area. – Re-connect timeouts without FAN/FCF (no VIPs).

35 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard? Trivadis @ DOAG 2017 #opencompany

Booth: 3rd Floor – next to the escalator

We share our Know how! Just come across, Live-Presentations and documents archive T-Shirts, Contest and much more We look forward to your visit

36 23.11.2017 Trivadis DOAG17: Oracle Database Service High Availability with Data Guard?