<<

RE-IMAGING AN EXADATA STORAGE USING AN INTERNAL OR EXTERNAL USB DRIVE

www.hexaware.com Table of Contents Oracle Exadata Storage Server Software Rescue Procedure – A Brief Introduction 3

A Fast Forward experience to backup/restore/recovery 3 testing of your Future Exadata Environment

Cell Node Re-Imaging - Ideal Scenarios 3

Pragmatic Scenarios where cell node re-imaging is not required 3

Recovering a Cell Node using Internal USB 4

Recovering Cell Node using External USB 7

Summary 7

1 Oracle Exadata Storage Server Software Rescue Procedure - A Brief Introduction A Fast Forward experience to backup/restore/recovery testing of your Future Exadata Environment

• Have you experienced a disk failure and are deeply concerned about your ? • Has your system volume got corrupted due to the loss of disks? • Does your operating system have a corrupt file system?

You can resolve these issues yourself using the Exadata Cell rescue functionality provided by Oracle on Exadata Storage Server Software CELLBOOT USB flash drive. The whole procedure is quite straightforward, but enough due diligence must be done to make the recovery a delightful experience.

Oracle performs automatic backups of the operating system and cell software on each Exadata Storage Server without any Oracle DMA or operational process intervention. The critical files from the storage cells are backed up into the internal USB drive called CELLBOOT USB Flash Drive.

The internal USB (/dev/sdm) maintains the most recent configuration and the updated copy of the OS and storage cell software. The internal USB serves as the default first boot device. The Master Boot Record (MBR) and GRand Unified Boot loader (GRUB) are loaded from the USB. The HDD () acts as a fallback maintaining its own copy of the above data.

This paper describes the recovery process of a storage cell using the internal USB drive when both system disks are lost. This paper also outlines the cell node recovery procedure when both the system disks, and the internal USB drive are lost and no cell boot image is created on an external USB drive as a backup.

Cell Node Re-Imaging - Ideal Scenarios The recommended scenarios to go for cell node re-imaging are: • Corruption of MD of a critical partition. • Corruption of an unrecoverable file system of a critical partition. • Accidental Inconsistent data on disk (due to rm-rf).

Pragmatic Scenarios where cell node re-imaging is not required • When a non-critical partition is unrecoverable. • When file system corruption is minimal and file system correction check (fsck) is able to remediate. • When an inconsistent state of MD RAID can be manually remedied.

CRITICAL PARTITIONS CRITICAL PARTITIONS • Active partitions are critical: /dev/md5 (mirrored to • /dev/md11 on /var/log/oracle stores cell logs and is /dev/md6) on /(root FS) and /dev/md7 (mirrored to therefore non-critical. This can be recovered by /dev/md8) on /opt/oracle) reformatting.

• Internal usb (/dev/sdm1) itself is not critical and can be rebuilt, when booting the cell from disk. This normally takes place automatically while the cell is running the boot scripts. If not, manually run ./make_cellboot_usb.sh -force–verbose.

• /dev/md4 on /boot – contents can be copied from a good cell whose image version is identical.

2 Recovering a Cell Node using Internal USB The following steps outline the process to be followed when both system disks (volumes) are lost. The following steps help invoke the storage cell rescue procedure. NOTE: It is strongly recommended to open a Service Request and engage Oracle Support before starting the Cell rescue procedure.

1. The Exadata storage server can be accessed using the ILOM console. After the cell is booted, the Oracle Exadata splash screen

2. From the boot options listed, the user should select: CELL_USB_BOOT_CELLBOOT_usb_in_rescue_mode.

3. After the node has been booted, two choices would be displayed to the user: (e)nter interactive diagnostic shell (r)einstall or try to recover corrupted system

(r)einstall will delete, recreate and restore all partitions and system volumes from the data that is backed up to the CELLBOOT USB image

Choose (r) to reinstall and confirm with 'y' to continue.

3 4. The rescue process would prompt the user for the “root rescue password”. This password is not the server’s root password. The default root rescue password is ‘‘sos1exadata”.

5. The system prompts the user whether to erase the data partition and data disks. The decision is passed based on the type of recovery that the user intends to perform. • When the user chooses to erase the data partition and data disks, the entire cell disk metadata will be removed from non-system disks. The user will have to reconfigure the storage server. • When user does not choose to erase the data partition and data disks, the entire cell disk metadata will be intact and the user can import the cell disks at a later stage.

6. The following action needs to be carried out when the system prompts for restart or provides options for entering shell at the end of first phase of the rescue: a. The user is required to enter the shell and not restart the system b. Log in using the rescue root password in the shell [Note: password is "sos1exadata"]

c. The reboot command has to be run from the shell. The boot device selection menu can be accessed by pressing F8. F8 should be pressed as the cell restarts and before Oracle Exadata splash screen gets displayed. The user should select the RAID controller as the boot device. This enables the cell to boot from hard disks instead of CELLBOOT USB.

7. Once the rescue process is completed, the cell reboot will resume. In cases when the storage cell is powered off, the cell could be powered on by logging into the ILOM console.

8. After the cell boots, the user needs to validate the image information of the rescued cell with that of the other healthy cells.

9. After successful cell recovery, the cell must be configured. Since the data was preserved (by not erasing all data partitions), the cell disks can be imported a. The following command is used to import grid disks on all disks other than the ones that failed and were replaced during rescue procedure: CellCLI> IMPORT CELLDISK ALL FORCE Then confirm with ‘LIST CELLDISK’ command. b. The Cell Disks and Grid Disks should be re-created on the Physical Disks (System Disks) that were replaced during the rescue procedure. • Find out the Cell Disk names from the output of "list lun" and "list celldisk" commands. Example is shown below: CellCLI> list lun attributes name,deviceName,diskType where isSystemLun=TRUE 0_0 /dev/sdq HardDisk 0_1 /dev/sdr HardDisk CellCLI> list celldisk attributes name,deviceName,lun CD_00_celadm01 /dev/sdq 0_0 CD_01_celadm01 /dev/sdr 0_1 (………….. Output omitted for brevity)

• Create the Cell Disks manually for the replaced physical disks with below commands. SCORE CellCLI> create celldisk CD_00_celadm01 lun=0_0 CellCLI> create celldisk CD_01_celadm01 lun=0_1

4 • Check that the Cell Disks are created and the status is normal CellCLI> list celldisk CD_00_celadm01 normal CD_01_celadm01 normal (………….. Output omitted for brevity)

Now it’s time to create Grid Disks on the newly created Cell Disks.

• Find out the Grid Disk name, size and offset by running the "list griddisk" command on System Disks from one of the surviving Cells. Example shown below: CellCLI> list griddisk attributes name,size,offset where celldisk=CD_00_celadm02 DATAC1_CD_00_celadm02 2.8837890625T 32M RECOC1_CD_00_celadm02 738.4375G 2.8838348388671875T CellCLI> list griddisk attributes name,size,offset where celldisk=CD_01_celadm02 DATAC1_CD_01_celadm02 2.8837890625T 32M RECOC1_CD_01_celadm02 738.4375G 2.8838348388671875T • Create Grid Disks based on the size from above example. Ensure that the Grid Disk with lower offset is created first for optimal performance. CellCLI> CREATE GRIDDISK DATAC1_CD_00_celadm01 CELLDISK = CD_00_celadm01, - SIZE = 2.8837890625T CellCLI> CREATE GRIDDISK RECOC1_CD_00_celadm01 CELLDISK = CD_00_celadm01, - SIZE = 738.4375G • Verify that Grid Disks are created correctly, using “list griddisk” command. CellCLI> list griddisk attributes name,size,offset,status where celldisk = CD_00_celadm01 DATAC1_CD_00_celadm01 2.8837890625T 32M active RECOC1_CD_00_celadm01 738.4375G 2.8838348388671875T active

c. Now the user is required to log in to Oracle ASM instance and set the Disks to ONLINE using the following command for each disk group: SQL> ALTER DISKGROUP disk_group_name ONLINE DISKS IN FAILGROUP cell_name; If any system disks were replaced during the rescue procedure and Cell Disks and Grid Disks were recreated on them as explained in previous point, then the newly created Grid Disks need to be added to ASM using commands below: SQL>ALTER DISKGROUP DATAC1 ADD DISK 'o/*/DATAC1_CD_00_celadm01'; SQL>ALTER DISKGROUP RECOC1 ADD DISK 'o/*/RECOC1_CD_00_celadm01';

d. If flashcache and flashlog do not exist, create them by following these steps: cellcli> list flashlog cellcli> list flashcache Flashlog and flashcache can be created with the below commands: cellcli> create flashlog all cellcli> create flashcache all

e. The cell can be reconfigured using the ALTER CELL command. We have presented an example for the most-common parameters below: CellCLI> ALTER CELL smtpServer='my_mail.example.com', - smtpFromAddr='[email protected]', - smtpFromPwd=email_address_password, - smtpToAddr='[email protected]', - notificationPolicy='critical,warning,clear', - notificationMethod='mail,snmp'

f. The I/O Resource Management (IORM) plan and metric thresholds needs to be re-created.

5 Recovering a Cell Node using Internal USB In scenarios, where both USB and HDD are damaged, and there is no cell boot image created on external USB drive, a new external USB stick is used to recover the cell. Any USB stick of 4 GB or more can be used. The typical procedure steps are mentioned below: 1. The Vxxxxx.zip file from EDELIVERY needs to be downloaded and transferred to one of the compute nodes on the Exadata RACK and a new USB stick needs be inserted in the same compute node. 2. Unzip file VXXXXX.zip, unzip file cellImageMaker.tar.zip, and tar file should be extracted using tar –xvf which creates directory dl180. 3. Once inside the directory dl180, use the command makeImageMedia.sh to generate the new ISO image on the external USB : #makeImageMedia.sh –preconf –nodisktests 4. Unplug the external USB from the compute node and plug it to the external USB port of the cell node being rescued. 5. As the cell is configured to boot from Internal USB, the user needs to manually boot from external USB. During the boot time, in the first few seconds, press F8 to change the boot order and select the external USB as the first boot device. The user will infer that the cell has booted from the external USB if the GRUB menu entry looks completely different from the standard menu entry of a cell. 6. The data disks should not be erased. 7. As there is no backup of the configuration, ipconf will run during recovery. IPs, hostname, etc. needs to be provided. The customer should have an Exadata deployment PDF provided by Oracle ACS at the time of deployment. This document can be used to determine the information requested by ipconf. The file /opt/oracle.cellos/cell.conf on a healthy cell can also be used as a source. 8. After the recovery is complete, the cell needs to be configured using the same steps mentioned in the Internal USB Rescue procedure detailed in the previous section. 9. If the internal USB is not rebuilt within around 30 minutes of cell being recovered (check with imageinfo), the below command needs to be run, after removing the external USB :

Summary It is unlikely that both the system disks will fail and boot volumes will get corrupted. And to lose the internal USB would be an exceptional case -- unless you hit a bug or commit a human error. Although Exadata software takes care of the internal CELLBOOT USB backup, it is recommended to mount and check the contents of the CELLBOOT USB occasionally. Making an external USB drive for backup purposes will not need much effort. It would be ideal to have the external USB backup as of the newest Exadata software release. In instances, when internal CELLBOOT USB is corrupted along with System disks, the cell can be booted using that external USB. The recovery procedure would be the same as the one followed for the internal USB rescue. Any custom modification of alerts and thresholds should be well documented in order to use them after the rescue is performed. About Hexaware Hexaware is the fastest growing next-generation provider of IT, BPO and consulting services. Our focus lies on taking a leadership position in helping our clients attain customer intimacy as their competitive advantage. Our digital offerings have helped our clients achieve operational excellence and customer delight by ‘Powering Man Machine Collaboration.’ We are now on a journey of metamorphosing the experiences of our customer’s customers by leveraging our industry-leading delivery and execution model, built around the strategy— ‘Automate EverythingTM, Cloudify EverythingTM, Transform Customer ExperiencesTM.’

We serve customers in Banking, Financial Services, Capital Markets, Healthcare, Insurance, Manufacturing, Retail, Education, Telecom, Professional Services (Tax, Audit, Accounting and Legal), Travel, Transportation and Logistics. We deliver highly evolved services in Rapid Application prototyping, development and deployment; Build, Migrate and Run cloud solutions; Automation-based Application support; Enterprise Solutions for digitizing the back-office; Customer Experience Transformation; Business Intelligence & Analytics; Digital Assurance (Testing); Infrastructure Management Services; and Business Process Services.

Hexaware services customers in over two dozen languages, from every major time zone and every major regulatory zone. Our goal is to be the first IT services company in the world to have a 50% digital workforce.

NA Headquarters India Headquarters EU Headquarters APAC Headquarters Metro 101, Suite 600,101 Wood 152, Sector – 3 Level 19, 40 Bank Street, 180 Cecil Street, Avenue South, Iselin, Millennium Business Park Canary Wharf, #11-02, Bangkok Bank Building, New Jersey - 08830 ‘A’ Block, TTC Industrial Area London - E14 5NR Singapore - 069546 Tel: +001-609-409-6950 Mahape, Navi Mumbai – 400 710 Tel: +44-020-77154100 Tel: +65-63253020 Fax: +001-609-409-6910 Tel: +91-22-67919595 Fax: +44-020-77154101 Fax: +65-6222728 Fax: +91-22-67919500

Safe Harbor Statement Certain statements in this press release concerning our future growth prospects are forward-looking statements, which involve a number of risks, and uncertainties that could cause actual results to differ materially from those in such forward-looking statements. The risks and uncertainties relating to these statements include, but are not limited to, risks and uncertainties regarding fluctuations in earnings, our ability to manage growth, intense competition in IT services including those factors which may affect our cost advantage, wage increases in India, our ability to attract and retain highly skilled professionals, time and cost overruns on fixed-price, fixed-time frame contracts, client concentration, restrictions on immigration, our ability to manage our international operations, reduced demand for technology in our key focus areas, disruptions in networks, our ability to successfully complete and integrate potential acquisitions, liability for damages on our service contracts, the success of the companies in which Hexaware has made strategic investments, withdrawal of governmental fiscal incentives, political instability, legal restrictions on raising capital or acquiring companies outside India, and unauthorized use of our intellectual property and general economic conditions affecting our industry.

www.hexaware.com | [email protected] © 2019 Hexaware Technologies limited. All rights reserved. 6