Oracle Platinum Services – Fault Monitoring

What to Expect

Contents Document Objective ...... 4 Overview ...... 4 Remote Fault Monitoring ...... 4 Fault Monitoring Framework ...... 5 Key Components of the Gateway ...... 5 Managing the OASG ...... 6 Oracle Advanced Support Portal...... 6 Fault Monitoring Details ...... 7 Customer Requirement and Obligations ...... 7 Fault Monitoring Roles and Responsibilities ...... 8 Activities ...... 9 Oracle Platinum Services Fault Monitoring Implementation Prerequisites ...... 9 Oracle Platinum Services Fault Monitoring Implementation ...... 11 Oracle Platinum Services Fault Monitoring Event for Oracle Exadata and Oracle Zero Data Loss Recovery Appliance ...... 12 Activity ...... 12 Who ...... 12 When ...... 12 Oracle Platinum Services Fault Monitoring Events for Oracle SuperCluster, Oracle Exalogic, and Oracle Exadata ...... 13 Appendix I – Oracle Platinum Services Fault Monitoring Events ...... 14 ASR Fault Events ...... 14 OEM Fault Events...... 14 *Related to component only ...... 18 Appendix II – Description of Common For-Fee Monitoring Items ...... 18 Appendix III – Access Requirements ...... 19 Oracle Access to Data ...... 21 Appendix IV Process Flow Diagrams ...... 23 High-level Process Flow for Oracle SuperCluster, Oracle Exalogic, and Oracle Exadata ...... 23

Updated: January 14, 2019 Page 2 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

High-level Process Flow for Oracle Exadata and Oracle Recovery Appliance (OEM Auto Generated SRs) ...... 24 Appendix V Sample Service Request Details ...... 25 SR Example I – OEM Detected Fault ...... 25 SR Example II – ASR Detected Fault ...... 26 Appendix VI Sample Fault Notification Email ...... 27 Appendix VII Sample Notification of EM generated SR ...... 28 Appendix VIII Fault Event Telemetry and Configuration Data ...... 29 Sample Fault Event Telemetry Data via OASP ...... 29 Sample Configuration Item via OASP ...... 30 Sample Configuration Item Drilldown ...... 31 Oracle Collection Manager Collections ...... 32 Sample Configuration Collection ...... 32

Updated: January 14, 2019 Page 3 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Document Objective The objective of this document is to provide an overview of the Oracle Platinum Services Remote Fault Monitoring framework and detail a sample list of activities that may be performed to monitor a Certified Platinum Configuration. The information included in this document is for informational purposes only and is subject to change. This document is not binding on either party, will not be deemed an agreement between the parties and does not amend and/ or modify the terms of any order or agreement.

Overview

Remote Fault Monitoring Remote Fault Monitoring, referred to as “Fault Monitoring” in this document is a deliverable of Oracle Platinum Services. Oracle Platinum Services remotely monitors for faults in the hardware, database, operating system and networking components of Certified Platinum Configurations twenty-four (24) hours per day, seven (7) days per week and provides a mechanism to trigger the creation of a Service Request (SR) on behalf of the customer. Fault Monitoring is subject to the Oracle Platinum Services Technical Support Policy.

 Please review the Oracle Platinum Services Technical Support Policy at http://www.oracle.com/us/support/library/platinum-services-policies-1652886.pdf.  A list of Certified Platinum Configurations is available at http://www.oracle.com/us/support/library/certified-platinum-configs-1652888.pdf.

Fault Monitoring focuses on helping you maintain system and component functionality. Oracle determines whether an event constitutes a fault. For a list of Oracle Platinum Services fault monitoring events, please see Appendix I.

You may purchase additional monitoring services for a fee. Examples of for-fee monitoring include but are not limited to performance, availability and capacity monitoring. For a description of each, please see Appendix II.

For assistance with for-fee monitoring, please contact [email protected].

Updated: January 14, 2019 Page 4 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Fault Monitoring Framework At the heart of the Fault Monitoring Framework is the Oracle Advanced Support Gateway (OASG). The OASG is a multi-purpose platform designed to facilitate and enable a number of Oracle connected services including Oracle Platinum Services.

One gateway can monitor multiple Engineered Systems (for example, up to eight (8) Full Rack SuperCluster machines) as long as they are network accessible and the network connection between the OASG and the Engineered System is reliable with low latency. In conjunction with the Oracle Continuous Connection Network (OCCN) transport layer, the OASG establishes secure connectivity to Oracle via SSL. Learn more about gateway security by reviewing the Gateway Security Guide.

Key Components of the Gateway The gateway has several key components that facilitate Fault Monitoring. These include:

 Oracle Enterprise Manager – Oracle Enterprise Manager (OEM) is the standard tool for monitoring and managing Oracle products. With Oracle Platinum Services Fault Monitoring, OEM is the primary tool for detecting software faults. OEM software included with the OASG also includes rule-based fault detection functionality that automatically creates a Service Request (SR) and uploads related diagnostics, when available, upon detection of critical OEM issues with Exadata and Recovery Appliance. A client side OEM agent is installed on the Certified Platinum Configuration as a communication mechanism with OEM.  Oracle Auto Service Request – Oracle Auto Service Request (ASR) is used to detect hardware faults and automatically create the associated SR. ASR detects faults in compute nodes, storage cells, and their Oracle Integrated Lights Out Managers (ILOM). For more information on ASR, see Auto Service Request (ASR) documentation.  Oracle Configuration Manager – Oracle Configuration Manager (OCM) captures Engineered System configuration information and uploads the data to My Oracle Support. The configuration data is extracted and uploaded every twenty-four (24) hours and is analyzed by Oracle Support

Updated: January 14, 2019 Page 5 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Engineers when working to resolve SRs. For more information on data collected, see OCM documentation and Appendix VIII for a sample configuration collection.  Oracle Advanced Customer Support Services (ACS) Monitoring Framework (MFW) – The ACS MFW is a centralized framework for receiving, filtering, categorizing and enriching events from multiple monitoring sources. It qualifies events as faults and forwards consolidated information to Oracle Support for SR creation. See Appendix VIII for details on fault event telemetry data collected from the Certified Platinum Configuration.

Managing the OASG

The OASG will be monitored, managed, and maintained by Oracle remotely via the OCCN connection. Oracle monitors the entire event flow starting from OEM agent installed on the Certified Platinum Configuration to the OCM Collections housed at Oracle. This ensures Oracle is alerted to any breakdown in communication between components or software failure including detection of issues with OEM. Oracle Platinum Services leverages OEM to monitor OASG system resources such as disk, memory, CPU, etc. If the OASG is running on Oracle owned hardware, Oracle Platinum Services will leverage ASR to monitor the key components of the hardware and engage Oracle support accordingly.

Oracle Advanced Support Portal The Oracle Advanced Support Portal (OASP) is a fully integrated, ITIL-based operations management framework, including tools, processes and technology, which is hosted by Oracle and delivered via a Web interface. It enables users to monitor and manage their infrastructure elements.

The OASP provides a view of your configuration items, incident management, change management, user account management, and reporting.

See the OASP Quick Reference Guide or the Oracle Advanced Support Portal Demo for more information. Sample fault event telemetry and sample configuration item details visible by the customer can be found in Appendix VIII.

Updated: January 14, 2019 Page 6 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Fault Monitoring Details

Customer Requirement and Obligations The table below identifies customer requirements and obligations for successful Fault Monitoring with Oracle Platinum Services.

Item Requirement / Obligation Network Connectivity  Provide continuous inbound VPN connection via OCCN.  Open ports between OASG and Engineered System for agent communication and diagnostics. Deployment  Provide root or “sudo1” access for agent deployment and monitoring configuration.  Provide a dedicated user for deploying the OEM agent.  Provide monitoring account credentials. Service Delivery  Provide root or “sudo” access for management of agents and SR troubleshooting.  Provide notification of changes to Engineered System and associated targets, such as new databases to be monitored; databases that are removed; IP address changes, and password changes.  Work with Oracle Support Services (OSS) to resolve any agent issues that cannot be corrected remotely.

For additional details on required firewall ports, please see the Oracle Advanced Support Gateway Security Guide. For additional details on access requirements, please see Appendix III.

Note: Without continuous inbound connection, Oracle will not be able to validate faults, which negates the 15-minute resolution / 30-minute joint debug Oracle Platinum Service target response times.

1 sudo allows a user to execute a command or process with the privileges of another user – typically superuser or root – without having to grant full access to those privileged accounts. Updated: January 14, 2019 Page 7 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Fault Monitoring Roles and Responsibilities Role Responsibility Oracle Platinum Driver Oracle assigns each customer a Platinum Driver to provide key information during the customer’s consideration of Platinum Services. The goal is to verify that the customer is fully qualified, fully understands the requirements and responsibilities to the service, is committed to Platinum Services, and completes prerequisites before implementation begins. Once implementation of Platinum Services is underway, the Platinum Driver may engage the customer to help resolve delays and to see that customer expectations are being well managed and executed. Oracle Implementation The Implementation Engineer is the primary point of contact and technical Engineer (IE) manager for customers during the Oracle Platinum Services implementation. From the point of receiving ownership of the Platinum Implementation SR (PISR) to the point of hand over to the delivery organization, the IE acts as the technical project lead during the implementation and remotely installs all technical aspects of the fault monitoring, Oracle Automatic Service Request (ASR), and Oracle Configuration Manager (OCM) solution. The IE is also responsible for coordinating the resources and activities to deliver and install the Engineered System. Oracle Platinum The Oracle Platinum Control Center is responsible for fault event Control Center management after a fault is detected including managing faults in OASP, fault notification and SR creation.

Customer Contact Customer contact(s) are notified of verified fault events received by Oracle Platinum Services. Notification is made by email only and can be to individuals or an alias.

Oracle Field Engineer The Oracle Field Engineer (FE) is responsible for the Oracle Platinum hardware gateway installation (on Oracle hardware), OASG installation and Platinum connectivity to Oracle.

Customer Platinum The customer assigns an employee or contractor to fill the Customer Platinum Manager Manager role. The Customer Platinum Manager is the point of contact (POC) for Oracle and is responsible for the coordination of customer resources, installation- related activities (for example, opening firewall ports), and decisions needed for a smooth implementation. This POC is also responsible for the integration with customer processes and meeting the planned Go Live schedule. Additional responsibilities include managing customer stakeholder decisions and, when necessary, consulting within the company to acquire expertise for service integration—network expert(s), security expert(s), and the target system owner(s).

Updated: January 14, 2019 Page 8 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Activities

Oracle Platinum Services Fault Monitoring Implementation Prerequisites The table below identifies a sample of the prerequisite activities associated with implementing Fault Monitoring. Any activity in this list not completed may result in implementation delays.

Activity Responsibility When 1. Service Implementation Worksheet (SIW) (for new gateway Customer only) Oracle Your Oracle account team will guide you through the process of Both completing the Service Implementation Worksheet online. This worksheet is used to collect details for Oracle to initiate the Oracle Platinum implementation process. Some of the key details collected are: - Customer contact information for fault notification, change management, SRs and remote patch deployment. - Configuration information for the OASG. - External and internal firewall requirements. - Access requirements.

2. Open Firewall Ports Customer Open necessary firewall ports. See Oracle Advanced Support Oracle Gateway Security Guide Network Protocol and Port Matrix for Both details. 3. Install and Configure the Certified Platinum Configuration Customer The Oracle FE and Oracle Oracle IE will install and configure the Engineered System. Both 4. Provide suitable hardware or virtual environment for the Customer OASG software Oracle You must provide a suitable environment for the OASG software. Both This can be an x86 machine that meets the specifications outlined in the Gateway Host System Requirements ,for an Oracle Virtual Machine running on an Oracle Virtual Server i.e. using the Oracle VM Server software. Note: Oracle Database Appliance and Oracle Private Cloud Appliance with connected services implemented, are not recommended 5. Complete Network Connectivity Form (IPSEC VPN only) Customer Oracle Global IT will assist the customer in completing the Oracle OCCN Network Connectivity Form in case IPSEC VPN is Both required. 6. Deploy OCCN Customer  For SSL VPN – Oracle will deploy OCCN after the customer Oracle has enabled the outbound connection from the gateway. Both  For IPSEC VPN, Oracle Global IT will assist the customer in deploying OCCN. Updated: January 14, 2019 Page 9 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Updated: January 14, 2019 Page 10 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Oracle Platinum Services Fault Monitoring Implementation The table below identifies a sample of the activities associated with implementing Fault Monitoring. Any activity in this list not completed may result in implementation delays.

Activity Who When 1. Deploy the OASG You Two options: Oracle a. The customer may download, deploy, and test the ISO image Both on the VM, and complete the registration process using the code from the SIW. This will configure the system to the point where the IE can remotely access the system from Oracle using the VPN, and proceed with the installation and configuration of OEM and components of the Oracle Platinum Services. b. The IE will deploy the OASG software and configure components of the Oracle Platinum Services on customer- provided hardware that meets the requirements outlined in the Gateway Host System Requirements. Note: If the customer opts for non-Oracle hardware to deploy the OASG, the customer will install and configure the OASG image software. 2. Install and deploy OEM agents to Certified Platinum You Configuration Oracle The IE will install and deploy monitoring OEM agents to the Both target Engineered System and will, if required, upgrade the OEM agents with the latest patches. 3. Discover Monitoring Targets You The IE will discover monitoring component targets and deploy Oracle monitoring templates. Both 4. Activate ASR You The IE will enable ASR on the target Engineered System and the Oracle OASG. Both Note: For existing ASR installations, reconfiguration is required for Oracle Platinum Services. 5. Install and Configure OCM You The IE will install and configure OCM to capture configuration Oracle information on the target Engineered System. Both 6. Configure OASP You The IE will configure the OASP for use. Oracle Both 7. Validation You The IE will validate the OASG and monitoring setup. Oracle Both

Updated: January 14, 2019 Page 11 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Oracle Platinum Services Fault Monitoring Event for Oracle Exadata and Oracle Zero Data Loss Recovery Appliance The table below identifies the activities and owners associated with a Fault Monitoring event for Oracle Exadata and Oracle Zero Data Loss Recovery Appliance (Oracle Recovery Appliance) where the OEM software automatically opens SRs when it detects Oracle Platinum faults. For a list of Oracle Platinum Services fault monitoring events, please see Appendix I.

Activity Who When 1. Receive Oracle Platinum Services Fault Event You A fault event is detected via the OASG. Oracle Both 2. Validate Oracle Platinum Services Fault You Oracle determines whether a fault is a valid Oracle Platinum Services Oracle fault (see Appendix I for a list of Oracle Platinum Services fault Both monitoring events).

 If the fault is a valid Oracle Platinum Services fault, an SR will be automatically opened (see Step 3 below).  If the fault is not a valid Oracle Platinum Services fault, there is no further action required of Oracle. 3. Open Oracle Platinum Services SR & Notify Customer of Oracle You Within 5 Platinum Services Fault Event Oracle minutes of Fault notification will be sent to the distribution email list and the Both receiving customer contact defined for the Certified Platinum Configuration via fault event email after a fault is validated and SR opened. See sample fault notification in Appendix VI, and sample SR in Appendix V, SR Example I. 4. Diagnostic Upload You  Diagnostics collected automatically if fault is an Automatic Oracle Diagnostic Repository (ADR) covered fault (see Exadata ORA Both Events (Oracle Platinum Services Only) for a list of covered faults, including ADR).  Otherwise, diagnostics will be collected by Oracle or with the assistance of the customer. 5. Resolve Oracle Platinum Services SR You Oracle Support and customer contact will work together to adjust Oracle severity levels to appropriate severity levels, as needed. (See Technical Both Support Policy Severity Definitions) and resolve the SR.

Updated: January 14, 2019 Page 12 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Oracle Platinum Services Fault Monitoring Events for Oracle SuperCluster, Oracle Exalogic, Oracle Private Cloud Appliance, Oracle Zero Data Loss Recovery Appliance and Oracle Exadata The table below identifies the activities and owners associated with a Fault Monitoring event for Oracle SuperCluster, Oracle Exalogic, and Oracle Exadata where the Oracle Platinum Control Center opens SRs when OEM detects Platinum faults. For a list of Oracle Platinum Services fault monitoring events, please see Appendix I.

Activity Who When 1. Receive Oracle Platinum Services Fault Event You A fault event is received in the Oracle Platinum Control Oracle Center. Both 2. Create Incident ticket in OASP You The Oracle Platinum Control Center creates an incident Oracle ticket in OASP. Both 3. Notify Customer of Fault You Within 5 Based upon customer preference, the Oracle Platinum Oracle minutes of Control Center will provide fault notification to Both receiving customer’s identified customer contact by email. See, fault event sample fault notification in Appendix VI. 4. Validate Fault You Oracle determines whether a fault is a valid Oracle Oracle Platinum Services fault (see Appendix I for a list of Oracle Both Platinum Services fault monitoring events)  If the fault is a valid Oracle Platinum Services fault, an SR will be opened (see step 5 below).  If the fault is not a valid Oracle Platinum Services fault, the incident ticket opened in step 2 will be closed and there is no further action required of Oracle. 5. Open SR You Within 15  For a hardware fault event, the Oracle Platinum Oracle minutes of Control Center will validate an ASR has been created. Both notification See sample SR in Appendix V, SR Example III.  For a software fault event, the Oracle Platinum Control Center will create a SR. See sample SR in Appendix V, SR Example II. 6. Resolve Oracle Platinum Services SR You Oracle Support and the customer contact will work Oracle together to adjust severity levels to appropriate severity Both levels, as needed. (See Technical Support Policy Severity Definitions) and resolve SR.

Updated: January 14, 2019 Page 13 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Appendix I – Oracle Platinum Services Fault Monitoring Events Fault Monitoring covers the events documented below and is a combination of all ASR fault events and an Oracle determined set of events generated by OEM. This list of fault events is subject to change and some gateways may not yet have the latest software update.

ASR Fault Events ASR fault events are documented publicly. Please consult the following for details on ASR fault events:

 ASR Fault Event Coverage for Oracle Exadata: o Exadata Database Machine o Exadata Database Servers o Exadata Storage Servers o Exadata Servers  ASR Fault Event Coverage for Oracle Exalogic o Exalogic Server o Exalogic Storage Appliance  ASR Fault Event Coverage for Oracle SuperCluster o SuperCluster Products o See Exadata Storage Servers o See SuperCluster Storage Appliance  ASR Fault Event Coverage for Oracle Zero Data Loss Recovery Appliance o See Exadata Database Machine X5-2 o See Exadata Storage Servers  ASR Fault Event Coverage for Oracle ZFS Storage Appliance Racked System o ZFS Products  ASR Fault Event Coverage for Oracle Private Cloud Appliance o See PCA X5-2 Monitored Components o Note: PCA component fault coverage is defined via the component's SNMP fault coverage. For example: for PCA X7-2 compute nodes, See Oracle® Server X7-2 Service Manual: Monitoring Components and Identifying SNMP Messages

OEM Fault Events OEM software included with the OASG includes rule-based fault detection functionality that automatically creates a SR and uploads related diagnostics when available upon detection of critical OEM faults for Oracle Exadata and Oracle Recovery Appliance. Oracle SuperCluster and Oracle Exalogic SRs are manually created by the Oracle Platinum Control Center.

The below list of Oracle Platinum Services monitored faults, determined by Oracle, are standard and not subject to customization.

Exadata Exalogic SuperCluster Zero Data ZS PCA Loss Racked Item Name Description Recovery Systems Appliance

x x x x x A fan in the Infiniband switch Fan Failure on has failed or 1 an Infiniband dropped below a Switch safe operating level

x x x x x The Kernel has encountered an OS Kernel error condition 2 Panic and which may have Errors led to a restart of the system

x x x x x SCSI Errors have SCSI and PCI 3 been detected by Errors the OS

x x x x x Memory Errors have been 4 Memory Errors detected by the OS

x x x x x Disk Errors have 5 Disk Errors been detected by the OS

x x x x x I/O Errors have 6 I/O Errors been detected by the OS

x x Detection of ZFS Cluster 7 standby node

failure

x x Major and Critical alerts ZFS Critical and 8 reported in Major Alerts problem, alert and fault logs

x x Detection of ZFS # spare disks 9 spare disks available availability

Updated: January 14, 2019 Page 15 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Exadata Exalogic SuperCluster Zero Data ZS PCA Loss Racked Item Name Description Recovery Systems Appliance

Voting Disk x x x CRS-160(4|5|6) 8 Alert*

x ORA errors (see Control VM 9 list of ORA errors Database* below)

Node x x x CRS-180(2|3|4|5), 10 Configuration CRS-1607 Alert*

x x x CRS- 11 OCR Alert* (1006|1008|1010| 1011|1009)

Oracle High x x x CRS- 12 Availability (1202|1402|1602| Service Alert* 1603)

CRS Resource x x x CRS-120(3|5|6) 13 Alert*

x x x x ORA- (227|239|240|255| 445|494|3137|403 Generic 14 6|24982|25319|29 Incident* 770|29771|32701| 32703|32704|567 29)

15 Cluster Error* x x x ORA-29740

Data Block x x x x ORA-1578 16 Corruption*

Generic x x x x ORA-600 Internal Error 17 (Exadata Storage Cell and DB)*

x x x x ORA-7445 (Exadata Storage Access Cell and DB), 18 Violation* ORA-3113, RS- 7445 (Exadata Storage Cell)

Redo Log x x x x ORA- 19 Corruption* (353|355|356)

Out of x x x x ORA-403(0|1) 20 Memory*

Updated: January 14, 2019 Page 16 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Exadata Exalogic SuperCluster Zero Data ZS PCA Loss Racked Item Name Description Recovery Systems Appliance

File Access x x x x ORA-376 21 Error*

Deadlock x x x x ORA-4020 22 (System)*

x x x x ORA-700, RS- Soft Internal 23 700 (Exadata Error* Storage Cell)

Data file x x x x ORA-1157 cannot be 24 identified/locke d*

25 Media failure* x x x x ORA-(1242|1243)

Invalid file x x x x ORA-27048 26 header information*

Recovery x ORA- 27 Appliance task (45168|45111) failure

Recovery x ORA-45169 28 Appliance timer failure

Recovery x ORA-45109 Appliance 29 metadata corruption

Corruption in x ORA- 30 backup piece (45132|45167)

Corruption in x ORA-45165 31 backup data

Temperature, x x x x Cisco Switch (> 32 Value (Celsius) 56F)

33 Fan State x x x x Cisco Switch

Power Supply x x x x Cisco Switch 34 State

Module1 x x x x PDU Phase2 35 Threshold Evaluation

Updated: January 14, 2019 Page 17 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Exadata Exalogic SuperCluster Zero Data ZS PCA Loss Racked Item Name Description Recovery Systems Appliance

Module1 x x x x PDU Phase1 36 Threshold Evaluation

Module1 x x x x PDU Phase3 37 Threshold Evaluation

x Fabric Fabric Interconnect 38 Interconnect Alarms at Alarms Warning or Critical

Fabric x Fabric 39 Interconnect Interconnect Response Unresponsive

x Service Status for Active MySQL, OVM Management 40 Manager and Node Service OVM Manager Status CLI down

Management x 41 Node Unresponsive

*Related to Oracle Database component only

Appendix II – Description of Common For-Fee Monitoring Items The descriptions below are for common for-fee monitoring items. These are examples of items not covered by Fault Monitoring and do not represent a complete list.

 Performance Monitoring – Measures IT service components against agreed upon metrics and thresholds.  Availability Monitoring – Measures the availability of key IT infrastructure components against a defined availability target.  Capacity Monitoring – Measures resource utilization and performance against the defined capacity plan with the ability to adjust based on changing demand.

For assistance with for-fee monitoring, please contact [email protected].

Updated: January 14, 2019 Page 18 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Appendix III – Access Requirements Oracle requires a continuous connection to the Certified Platinum Configuration during delivery of Oracle Platinum Services, as described in the Oracle Platinum Services Technical Support Policy. The following table describes the user account access required by Oracle during the implementation and ongoing delivery of Oracle Platinum Services.

Servi Recovery ZS Racked PCA Patch Exalogic System Login ce Exadata SuperCluster Appliance Systems and (Required Justification Component Account Activ (Required?) (Required?) (Required?) Restore ?) ation

To set SNMP parameters and

Integrate root Yes Yes x x x x x create orarom monitoring d Lights account Out Ongoing Monitoring. This Manager orarom Yes Yes x x x x x account is created during the setup by Oracle

Required for implementing root Yes Yes x x x x solution, creating orarom user and configuring monitoring Compute/ DB hosts Ongoing Monitoring, primary owner of the OEM agent. orarom Yes Yes x x x x This account is created during the setup by Oracle

SSH keys for agent login root Yes Yes x x x without password, define

Storage SNMP parameters cells Ongoing monitoring. This cellmonitor Yes x x x account is created during the setup by Oracle.

To configure ASM ASM asmsnmp Yes Yes x x x monitoring from OEM and ongoing monitoring

To configure DB monitoring for OEM, ongoing monitoring DBMS dbsnmp Yes Yes x x x and configuration data collection

SSH keys for agent login root Yes Yes x x x x without password, define IB SNMP parameters Switches To monitor Infiniband nm2user Yes Yes x x x x Switches

Cisco Admin Yes Yes x x x x To define SNMP parameters; Updated: January 14, 2019 Page 19 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Servi Recovery ZS Racked PCA Patch Exalogic System Login ce Exadata SuperCluster Appliance Systems and (Required Justification Component Account Activ (Required?) (Required?) (Required?) Restore ?) ation

Switch only required for initial configuration

enable Yes x x x x

PDUs Admin Yes Yes x x x x To define SNMP parameters

To create shares for agent installation (Exalogic only) root Yes Yes x x x and to run workflow to enable OEM monitoring ZFS Created during installation and assigned to the agent role, orarom Yes Yes x x x which is used for ongoing monitoring

Control VMs - for

Exalogic root Yes Yes x for release 2.0.6.x.x

Ops Center VM and Exalogic root Yes Yes x OVMM VM for release 2.0.4.x.x

Domains root Yes Yes x & Zones

Recovery Ongoing monitoring. This Applianc rasys Yes No x account is created during the e setup by Oracle (Admin)

Recovery Initial Activation and one Applianc root Yes Yes x time SSH communication e between nodes

Manage To setup SNMP parameters ment for monitoring root x Switches (ES1-24)

MySQL root x To create orarom monitoring

Updated: January 14, 2019 Page 20 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Servi Recovery ZS Racked PCA Patch Exalogic System Login ce Exadata SuperCluster Appliance Systems and (Required Justification Component Account Activ (Required?) (Required?) (Required?) Restore ?) ation

Database account

x Ongoing monitoring. This orarom account is created during the setup by Oracle.

x To create orarom monitoring root Fabric account Interconn ect x Ongoing monitoring. This switches orarom account is created during the setup by Oracle

x To create orarom monitoring root Manage account ment Ongoing monitoring. This nodes orarom account is created during the setup by Oracle

Collection of Logs and Diagnostic Data

 Oracle’s default position is that support engineers will retrieve logs and diagnostic data that can be collected without posing a risk to the system or negatively impacting it in any way.  If an engineer is unable to collect data for a technical reason (for example, insufficient access or intrusive nature of the identified collection procedure), he/she will ask the customer to collect and upload the needed data to the service request  In a critical SEV1 situation, an engineer may ask the customer to upload diagnostic data if this will help expedite restoration and bring the system back to proper functioning faster.

Oracle Access to Data  OEM agents are installed using a unique account created specifically for monitoring (orarom). This account can be read-only and does not need administrative access to the Operating System or Oracle Database.  Within the Oracle Database, OEM agents use a generic DBSNMP account, which is enabled for monitoring including configuration collection. This configuration data can be used as diagnostics for restoration planning and for patch planning.

Updated: January 14, 2019 Page 21 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

 The generic DBSNMP account has restricted access to the database for monitoring purposes only. The users cannot run SQL commands, navigate Tablespaces, or maliciously query the Oracle Databases.  As described in the table above, Oracle requires administrative-level privileged access to the Certified Platinum Configuration during Oracle Platinum Services implementation—including setup of fault monitoring; during remote patch deployment events; and to assist with diagnostics and fault restoration. o Privileged access – to root or oracle accounts for example – does not need to be continuous. It can be provided on a temporary basis then revoked upon completion of task. For example, access can be provided for a remote patch deployment event, then revoked when the remote patch deployment event is complete. o During Oracle Platinum Services implementation – including setup of fault monitoring – direct access to the root and other privileged accounts is required as described in the table above. o During ongoing fault monitoring activities – including collection of diagnostic information to assist with fault restoration activities – access to root and other privileged accounts can be constrained and monitored with the use of tools such as sudo. o During a remote patch deployment event, access to root and other privileged accounts can be constrained and monitored with the use of tools such as sudo.  Group read and write access must be set for each database node’s diagnostic directory /u01/app/oracle/diag, for uploading relevant diagnostic files during the OEM SR automation process. Detailed information is available in How to setup diagnostic directory group permissions for Platinum Automated Diagnostic Upload (Doc ID 1633603.1)

Updated: January 14, 2019 Page 22 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Appendix IV Process Flow Diagrams

High-level Process Flow for Oracle SuperCluster, Oracle Exalogic, Oracle Private Cloud Appliance and Oracle Exadata The table below identifies the activities and owners associated with a Fault Monitoring event for Oracle SuperCluster, Oracle Exalogic, Oracle Private Cloud Appliance and Oracle Exadata where the Oracle Platinum Control Center opens SRs when OEM detects Platinum faults.

High-level Process Flow for Oracle Exadata and Oracle Recovery Appliance (OEM Auto Generated SRs) The table below identifies the activities and owners associated with a Fault Monitoring event for Oracle Exadata and Oracle Recovery Appliance where the Oracle Platinum control center for the OEM software automatically opens SRs when it detects Platinum faults.

Updated: January 14, 2019 Page 24 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Appendix V Sample Service Request Details Platinum Service Requests (SR) created by Oracle may be created manually by the Control Center after an OEM detected fault for Oracle SuperCluster, Oracle Private Cloud Appliance or Oracle Exalogic. It may be automatically created via an OEM detected fault for Oracle Exadata or Oracle Recovery Appliance. An SR may also be created automatically via ASR for your Certified Platinum Configuration.

SR Example I – OEM Detected Fault Source: OEM Type: Automatic

Abstract: SASR:ORA-600 - This is an automated database error on an Exadata System

Description Hostname: xyzdb02 Product Type: EM ASR PRODUCT Summary:SASR:ORA-600 - This is an automated database error on an Exadata System

Message Payload Data:

problem_key = ORA 600 [1350] target_name = db_db02 host_name = xyzdb02.oracle.com target_type = oracle_database

Hardware Component: Name:NA Id:NA

SASR:ORA-600 - This is an automated database error on an Exadata System;

Alerts received in last 30 days (limit 10) Date Summary SR 21 Jul 2014 03:56:40 SASR:oracle_ibswitch:metric_alert:Aggr [SR #] 21 Jul 2014 03:56:39 SASR:oracle_ibswitch:metric_alert:Aggr [SR#]

Updated: January 14, 2019 Page 25 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

SR Example II – ASR Detected Fault Source: Automated Service Request (ASR) Type: Automatic

Abstract: ASR:Memory module correctable errors exceeding acceptable levels.

Description Hostname: xyzdb01 Product Type: ORCL,SPARC-T4-4 Summary:ASR:Memory module correctable errors exceeding acceptable levels.

Fault event knowledge article: https://support.oracle.com/msg/SUN4V-8002-3R

The number of correctable errors associated with this memory module has exceeded Message-ID: SUN4V-8002-3R UUID: [UUID #] Time: Jun 9, 2014 6:44 AM (UTC) Severity: Major

FRU = hc://:chassis-mfg=unknown:chassis-name=ORCL,SPARC-T4-4:chassis-part=7020893:chassis- serial=[chassis serial #]:fru-serial=[fru-serial #]:fru-part=07020578,HMT42GR7BMR4A- G/chassis=0/cpuboard=0/dimm=8 Part number = 07020578,HMT42GR7BMR4A-G Certainty = 95 Class = fault.memory.dimm-page-retires-excessive

Alerts received in last 30 days (limit 10) Date Summary SR 09 Jun 2014 12:25:09 ASR:Memory module correctable errors e [SR #]

Updated: January 14, 2019 Page 26 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Appendix VI Sample Fault Notification Email Below is a sample notification email that would be sent to the customer contact defined by the customer for the Certified Platinum Configuration.

Updated: January 14, 2019 Page 27 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Appendix VII Sample Notification of EM generated SR Below is a sample notification sent to the customer contact of the Certified Platinum Configuration when an SR is automatically opened for an Oracle Platinum detected fault.

Updated: January 14, 2019 Page 28 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Appendix VIII Fault Event Telemetry and Configuration Data The OASG collects fault and configuration data to aid in the delivery of Fault Monitoring, remote patch planning and installation and support and restoration services. Below are samples of data collected via OASP and OEM under the Oracle collection manager (defined below).

For a demonstration of the OASP, see Oracle Platinum Services - Oracle Advanced Support Portal (OASP) Overview - [ Video ] (Doc ID 1607117.1)

Sample Fault Event Telemetry Data via OASP The information below is a sample of fault event telemetry data that is collected by the OASG and used for incident management and resolution. The detail is visible to the customer via the OASP.

Agent Platinum Connector Alert Group TT:oracle_database|EN:UserAudit:username|MG:D84385697496BC960548 Alert Key ON:sample host|EC:Metric Alert|CAT:Security, Article Id CTA Receive Time 2014-08-31 23:01:12 PDT Cleared Timestamp 2014-09-01 00:16:04 PDT Correlation Customer Id 55520521 Customer Name Sample Customer Debug Info1 Debug Info2 V:556(70 678)|C:a3d0060a-a779-4287-9ce1-7fbd7afb6efe| Event Time Drift 3 Grade 3 88aa025e-ed45-4fa5-8e4d- 18bf0cc5d58b|CM:0|Major|TT:oracle_database|EN:UserAudit:username|MG:D6448569 B496BC9205481E8A70692F1E|ON:samplehost|EC:Metric Alert|CAT:Security,|Platinum Identifier Connector|WebEvent::OracleEnterpriseManager::V12c::Generic_user_audit Manager OracleEnterpriseManagerV12c Managing Host gateway name Original Message V:556(70678) Original 3 Updated: January 14, 2019 Page 29 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Severity Reporter Statu Flash Severity 0 Summary DOWNGRADE WARNINGS: UserAudit:username User SYS logged on from samplehost Target uuid 99aa055f-ed45-4fab-8e6d-16be0cc5d58c

Sample Configuration Item via OASP Below is a sample Exalogic configuration item as shown in OASP configured for Oracle Platinum Services. The detail is visible to the customer via the OASP.

System Name Type Model

Target Name

/sample_Exalogic sample_exalogic Application Server Exalogic System ec1-vm- sample_exalogic Server SunFire X4170 M2

sample.sample.org pc1-vm- sample_exalogic Server SunFire X4170 M2

sample.sample.org pc2-vm- sample_exalogic Server SunFire X4170 M2

sample.sample.org sample- sample_exalogic Server SunFire X4170 M2

ovmm.sample.org

samplegw01.sample.org sample_exalogic Oracle Infiniband Switch QDR Infiniband Switch

samplegw02.sample.org sample_exalogic Oracle Infiniband Switch QDR Infiniband Switch

samplesn01.sample.org sample_exalogic Storage ZFS Storage Appliance

samplesn02.sample.org sample_exalogic Storage ZFS Storage Appliance

Updated: January 14, 2019 Page 30 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Sample Configuration Item Drilldown Oracle Platinum customers are able to drill down into each configuration item via the OASP. A detailed view may show the following information depending on the target type chosen:

Name: ec1-vm-sample.sample.org

Customer: SAMPLE CUSTOMER)

Category: ComputerSystem

Type: Server

External Id:

Make: Sun

Model: SunFire X4170 M2

Description:

Status: Production

UUID:

Serial Number: N/A

Barcode: host

Architecture: Firmware: IP Address Type Primary Assigned CI Management IP ec1-vm-sample.sample.org xx.yy.zzz.aa

Updated: January 14, 2019 Page 31 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Oracle Collection Manager Collections Oracle collection manager is a component of OEM housed on the OASG. Collections are made every 24 hours. If there is a change recognized between current and last collection, the change is uploaded to Oracle. This allows for the most current configuration data to be available to Oracle in the event of a fault for diagnostic purposes. Collections are attached to an SR upon submission. Below is a sample of the data collected via Oracle collection manager for an Oracle SuperCluster machine.

The item count in the collections will be configuration dependent.

Sample Configuration Collection

System Configuration Header Configuration [, May 15, 2014] Name Type Release Last Collected Host Oracle Homes Support ID Level Lifecycle Source

Updated: January 14, 2019 Page 32 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Hardware Configuration Header Hardware NAME VALUE Host Name Domain Vendor Name Virtual System Config Machine Architecture Clock Frequency(MHz) Memory Size (MB) Local Disk Space (GB) Total CPU Sockets Total CPU Cores Total Enabled CPU Cores Total CPU Threads CPU Board Count I/O Card Count Host ID System Serial Number Fan Count Power Supply Count Boot Disk Volume Serial Number System BIOS

Updated: January 14, 2019 Page 33 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Operating System Configuration Header Operating System NAME VALUE Name Vendor Name Base Version Update Level Distributor Version Max Swap Space (MB) Address Length (bits) Platform ID Current OS Run Level Default OS Run Level Platform Version ID Is DB Machine Member Is Exalogic Member Maximum Process Virtual Memory (MB) Timezone Timezone Region Timezone Delta

Hardware Configuration Components – Item count configuration dependent Hardware Components

evision

Name Manufac turer Type InSize Bytes Part Number Serial Number IdPCI Model R Location

Updated: January 14, 2019 Page 34 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

Operating System Configuration Registry Operating System Registered Software

Architecture

Name Vendor Name Version Installation Date Installed Location ID Description Vendor Specific Information Virtual Machine Name/Identifier Software Parent Identifier Product Parent Name Product Media Type Registry Source

Installed Firmware Register Installed Firmware Description Type Version Installation Date Provider Release Date

Installed Operating System Patch Register Installed OS Patches Id Vendor Applied Packages

Updated: January 14, 2019 Page 35 of 35 Author: Oracle Copyright © 2019, Oracle and/or its affiliates. All rights reserved.