Combining Opensaf with Vmware Architecture, Measurement
Total Page:16
File Type:pdf, Size:1020Kb
OpenSAF and VMware from the Perspective of High Availability Ali Nikzad, Ferhat Khendek Maria Toeroe Concordia University Ericsson SVM’2013 – Zurich – October 2013 1 Friday, October 18, 13 Outline } Introduction & Objectives } Some Background } Testbed, Failures and Metrics } The baseline architectures } Measurements } Analysis } Architectures combining OpenSAF and virtualization } Conclusion 2 Friday, October 18, 13 Introduction & Objectives } Service availability and continuity have become important requirements in several domains } High availability (demanded in some domains) is defined as at least 99.999% availability which is maximum of 5.26 minutes of downtime in a year } Including scheduled downtime for upgrade for instance } The computing world is moving toward cloud services and cloud computing } Virtualization is an important aspect } Virtualization has many advantages 3 Friday, October 18, 13 Introduction & Objectives } Evaluate and position virtualization and the SAForum middleware with respect to each other } Using OpenSAF and VMware in the test-bed } Analyzing the measurements } Determine pros and cons of using virtualization in/for high availability (HA) } Propose a solution(s) that combines the benefits from each 4 Friday, October 18, 13 Background – SAForum and OpenSAF AMF Checkpointing Notification Log SAForum is a consortium of telecommunication Etc. Logically grouped and computing companies developing standards for components enabling highly available services and applications Applications Application Interface OpenSAF is an open source project focused on SAForum Middleware Platform Service Availability implementing the SAForum Interface service specifications, among others AMF, SMF and Operating System IMM. Hardware Platform 5 Friday, October 18, 13 Background-AMF } AMF is one of the most important services defined by SAForum } AMF configuration: AMF logical entities … } AMF Node Node 1 Node 2 Node 3 } AMF Cluster Service Unit 1 Service Unit 2 Service Unit 3 } Component SA-Aware Pre-instantiable Non-SA-Aware Non-pre-instantiable Component 1 Component 2 Component 3 Service Group } Component Service Instance (CSI) Applicaon Service Instance } Service Unit (SU) Component Service Instance } Service Instance (SI) } Service Group (SG) } Redundancy Models 2N N+M N-Way N-Way-Active No-Redundancy } Application 6 Friday, October 18, 13 Background-Virtualization and VMware } Virtualization is the separation of a resource or request for a service from the underlying physical delivery of that service } VMs are hosted on software called hypervisor } Native (bare metal) } Hosted (non-bare metal) } VMware is one of the leading companies in providing virtualization solutions 7 Friday, October 18, 13 Background - VMware Availability solutions } VMware HA } VMware FT 8 Friday, October 18, 13 Testbed, Failures and Metrics } Test-beds: } 5 nodes cluster } Case study application: } VLC for Video streaming } Failures } VLC component failure (application failure) } VM failure } Physical node failure } More than 45 sets of measurements for metrics in different architectures for different failures 9 Friday, October 18, 13 Testbed – Metrics and Failures } Metrics } Qualitative (criteria) } Complexity of using the solution } Supported Redundancy Models } Scope of failure } Service continuity } Supported platforms Outage Time Recovery Time } Quantitative Metrics Failure Resump.on or Restart } Reaction Time of service Time } Repair Time First reac.on Faulty unit repaired } Recovery Time Reac.on Time } Outage Time Repair Time 10 Friday, October 18, 13 Baseline architectures SA-Aware and Non-SA-Aware versions of VLC on physical and virtual nodes VM availability with VMware HA VMware vSphere cluster SI VLC-CSI IP-CSI ESXi ESXi Node 1 Node 2 SG SU1 SU2 2N Red. VLC VLC component component IP IP component component Shared storage OpenSAF VLC Ubuntu Ubuntu Ubuntu VM Physical node 1 Physical node 2 SAF based architecture VMware HA 11 Friday, October 18, 13 Baseline architectures } OpenSAF on VMware HA } VMs’ lifecycle depends on the hypervisor’s availability mechanisms } VM migration is not considered 12 Friday, October 18, 13 Baseline architectures VLC component Architectures VM failure Node Failure failure Not OpenSAF on physical nodes with SA-Aware VLC component √ √ applicable OpenSAF on physical nodes with Non-SA-Aware VLC Not √ √ component applicable OpenSAF on virtual nodes with SA-Aware VLC component √ √ √ (with/without VMware HA enabled) OpenSAF on virtual nodes with Non-SA-Aware VLC component √ √ √ (with/without VMware HA enabled) VMware HA Not detectable √ √ 13 Friday, October 18, 13 Baseline architectures: Measurements Outage due to VLC component failure (application failure) 14 Friday, October 18, 13 Baseline architectures: Measurements Outage due to Virtual Machine failure 15 Friday, October 18, 13 Baseline architectures: Measurements Outage due to Node Failure I 16 Friday, October 18, 13 Baseline architectures: Measurements Repair of the faulty unit Repair of Failed component Failed VM Failed Node OpenSAF on Standalone machine Yes - No OpenSAF in VM Yes No No Yes (restarting the VM Yes(restarting the VM on VMware HA No on another host) another host) Yes(restarting the VM on OpenSAF in VM + HA Yes Yes another host) 17 Friday, October 18, 13 Baseline architectures: Analysis } SAF based architectures ü Service continuity with SA-Aware components and better reaction time ü Support application failure detection and recovery ü Less overhead û No repair for the cluster node } VMware HA ü Repair of the node is supported û but it is very long (~100s) û Inevitable 15% to 30% overhead due to the VM compared to the physical host deployment û No support for application failure } Baseline combined architecture ü The advantages of both architectures û but, Very long repair time (~100s) due to no redundancy of the VM 18 Friday, October 18, 13 Baseline architectures: Analysis The reaction time to the failure in OpenSAF is much faster So, why not manage the VMs’ life cycle with OpenSAF 19 Friday, October 18, 13 Architectures combining OpenSAF and virtualization: Availability in non-bare-metal hypervisor Physical Physical Cluster 1 Node 1 Cluster 1 Node 2 SG SU2 SU2 2N Red. VLC VLC component component IP IP component component SG No Red. OpenSAF Cluster 2 Ubuntu Ubuntu OpenSAF cluster 2 OpenSAF cluster 2 Node 1 Node 2 VM component 1 SU 1 SU 2 VM component 2 Hypervisor OpenSAF Hypervisor VMware workstaon Cluster 1 VMware workstaon Ubuntu Ubuntu 20 Friday, October 18, 13 Architectures combining OpenSAF and virtualization: Availability in non-bare-metal hypervisor Comparison of different architectures for VM failure Reaction Repair Recovery Outage VMware HA 72.166 27.166 99.332 OpenSAF with ESXi (VMware HA 1.905 107.90 0.047 1.953 manages the VMs) The new availability management in 3.449 3.73 0.056 3.505 non-bare-metal hypervisor * Times are in seconds 21 Friday, October 18, 13 Architectures combining OpenSAF and virtualization: Availability in non-bare-metal hypervisor Failure of the SA-Aware VLC component in different architectures Reaction Repair Recovery Outage VMware HA Not covered OpenSAF with no virtualization 0.009 0.136 0.046 0.055 OpenSAF with ESXi (VMware HA 0.013 0.243 0.068 0.081 manages the VMs) The new availability management in non- 0.012 0.848 0.580 0.592 bare-metal hypervisor The non-bare-metal hypervisor imposes delays on some of the measured times like repair and recovery bare-metal hypervisor can potentially resolve these delays 22 Friday, October 18, 13 Architectures combining OpenSAF and virtualization: Availability in bare-metal hypervisor SI 1 SI 2 VM 1-CSI VM 2-CSI Manager VM1 Manager VM 2 OpenSAF cluster 1 node 1 OpenSAF cluster 1 node 2 SG 1 SU1 SU3 2N red. VM 1 Comp VM 1 Comp VLC-CSI SI SG 2 SU2 SU4 IP-CSI 2N red. VM 2 Comp VM 2 Comp SG SU1 SU2 OpenSAF 2N Red. VLC VLC component component IP IP Ubuntu Ubuntu component component OpenSAF ESXi hypervisor 2 ESXi hypervisor 1 Ubuntu Ubuntu OpenSAF cluster OpenSAF cluster 2 Node 1 2 Node 2 23 Friday, October 18, 13 Architectures combining OpenSAF and virtualization: Availability in bare-metal hypervisor… VM Failure 24 Friday, October 18, 13 Architectures combining OpenSAF and virtualization : Availability in bare-metal hypervisor … Physical Node Failure 25 Friday, October 18, 13 Conclusion } From baseline architectures to combinations } Use the powerful availability management of OpenSAF and benefiting from virtualization } Increased outage time in the new deployment is because of the non-bare-metal hypervisor } The bare-metal architecture can possibly fix it… } Improve the repair time of a failed VM through management of the VM life cycle } Cover the different types of hypervisors (bare-metal and non-bare-metal) } The VM’s life cycle management } is not limited to a specific solution } can support other hypervisors like XEN and Linux KVM because of using similar and standard interfaces like libvirt } 26 Friday, October 18, 13 Thank you for your attention! 27 Friday, October 18, 13.