Architecting for the cloud: lessons learned from 100 CloudStack deployments Sheng Liang CTO, Cloud Platforms, Citrix CloudStack History 2008 2009 2010 2011 2012 Sept 2008: Nov 2009: May 2010: July 2011: April 2012: VMOps CloudStack Cloud.com Citrix Apache Founded 1.0 GA Launch & Acquires CloudStack CloudStack Cloud.com 2.0 GA The inventor of IaaS cloud – Amazon EC2 Amazon eCommerce Platform EC2 API Amazon Proprietary Orchestration Software Open Source Xen Hypervisor Commodity Networking Storage Servers CloudStack is inspired by Amazon EC2 Amazon CloudPortaleCommerce Platform CloudEC2 APIAPIs Amazon ProprietaryCloudStack Orchestration Software ESX Hyper-VOpen SourceXenServer Xen Hypervisor KVM OVM Commodity Networking Storage Servers There will be 1000s of clouds SP Data center mgmt Desktop Owner | Operator Owner and automation Cloud IT Horizontal Vertical General Purpose Special Purpose Learning from 100s of CloudStack deployments Service Providers Web 2.0 Enterprise What is the biggest difference between traditional-style data center automation and Amazon-style cloud? How to handle failures • Server failure comes from: ᵒ 70% - hard disk ᵒ 6% - RAID controller ᵒ 5% - memory ᵒ 18% - other factors 8% • Application can still fail for Annual Failure Rate of servers other reasons: ᵒ Network failure ᵒ Software bugs Kashi Venkatesh Vishwanath and ᵒ Human admin error Nachiappan Nagappan, Characterizing Cloud Computing Hardware Reliability, SoCC’10 11 Internet Core Routers … Access Routers Aggregation Switches Load Balancers … Top of Rack Switches Servers •Bugs in failover mechanism •Incorrect configuration 40 % •Protocol issues such Effectiveness of network as TCP back-off, redundancy in reducing failures timeouts, and spanning tree reconfiguration Phillipa Gill, Navendu Jain & Nachiappan Nagappan, Understanding Network Failures in Data Centers: Measurement, Analysis and Implications , SIGCOMM 2011 13 A. Promise users VM, storage, and networking will never fail -- no strategy to handle failures B. Backup VM for users and restore for users when failure happens C.Tell users to expect failure. Users to backup VM and handle failure themselves zCloud zCloud West East Zone Zone AWS AWS West East Zone Zone zCloud zCloud West East Zone Zone AWS Design for AWS West East Zone Failure Zone Cloud workloads Traditional-Style Amazon-Style Reliable hardware, backup entire Tell users to expect failure. cloud, and restore for users when Users to build apps that can failure happens withstand infrastructure failure Link aggregation VM backup/snapshots Storage multi-pathing Ephemeral resources VM HA, fault tolerance Chaos monkey VM live migration Multi-site redundancy Strong consistency Eventual consistency Designing a zone for a traditional workload Hypervisor Traditional-Style Availability Zone vSphere or XenServer Enterprise vCenter/XenCenter Storage Enterprise Networking (e.g., VLAN) SAN Networking Hypervisor Hypervisor Hypervisor L2 VLANs Cluster Cluster Cluster Network Services Enterprise Storage (e.g., SAN) Load Balancing VPN Multi-tier Apps Ent App Mgmt Designing a zone for an Amazon-style workload Amazon-Style Availability Zone Software Defined Networks Hypervisor (e.g., Security Groups, EIP, ELB,...) XenServer or KVM Server Server Server Server Storage Racks Racks Racks Racks Local EBS Server Server Server Server Racks Racks Racks Racks Networking L3 SDN based L2 Elastic IP Server Server Server Server Racks Racks Racks Racks Network Services Security Groups ELB GSLB Elastic Block Storage Multi-tier Apps 3rd Party Tools (e.g., RightScale, enStratusenStratus)) Object store is critical for Amazon-style cloud Availability Zone 2 NetScaler ELB/ Storage Cloud ? GSLB NetScaler Users Availability Zone 2 NetScaler Availability Zone 1 Same Cloud can Support Both Styles Apache CloudStack Mgmt Server Traditional Traditional AWS-style AWS-style AWS-style Style Style Availability Availability Availability Availability Availability Zone Zone Zone Zone Zone Object Storage Replication/DR Tests for a “true” cloud app • Does it require SAN or VLAN? • Does it run in multiple data centers? • Does it involve a distributed object store? • Is there a single point of failure? Learning from 100s of CloudStack deployments Service Providers Web 2.0 Enterprise Traditional-style Mostly Amazon-style Mostly traditional style Standby CloudStack Mgmt Server Cluster CloudStack Admin Internet Availability Zone 2 Primary CloudStack Mgmt Server Cluster Primary Router MySQL Backup Load Balancer MySQL L3 Core Switch Top of Rack Switch Object Store Servers … … … … … Availability Zone 1 Pod 1 Pod 2 Pod 3 Pod N Layer 3 cloud networking (security groups) Web DB Web VM VM VM Web DB Security Security Web Group Web Group DB VM VM VM … …… Web Web VM VM Layer 2 VLAN networking User 1 User 1 User 1 User User 2 User 1 … 2 …… OVS networking GRE Key 1 GRE Key 2 OVS User 1 OVS User 1 User 1 User OVS User OVS OVS 2 User 1 … 2 …… Multi-tier virtual networking Internet Network Services Public VLAN IPSec VPN • IPAM Customer • DNS Virtual Router Premises LB [intra] NetScaler VPX • MPLS VLAN • S-2-S VPN GRE Key 2 • Static Routes GRE Key 1 App VM • ACLs Web VM 1 1 • NAT, PF App VM • FW [ingress & egress] Web VM 2 GRE Key 3 2 • BGP Web VM 3 DB VM 1 Web VM 4 Web subnet App subnet DB Subnet 10.1.1.0/24 10.1.2.0/24 10.1.3.0/24 Network flexibility Network Services Service Providers Network Isolation L2 connectivity Virtual appliances No isolation IPAM Hardware firewalls VLAN isolation DNS LB appliances SDN overlays SDN controllers Routing L3 isolation IDS /IPS ACL appliances Firewall VRF NAT Hypervisor VPN LB IDS IPS “The Apache Way” • Collaborative software development • Commercial-friendly standard license • Consistently high quality software • Respectful, honest, technical-based interaction • Faithful implementation of standards • Security as a mandatory feature Apache CloudStack Community Pre Apache June Move (Jan Actuals 2012) # of companies 1 68 endorsing project # of companies 10 140 participating # of developers 40 238 working on project Apache CloudStack community projects • SDN • Smart Storage ᵒ Nicira ᵒ Hadoop + S3 API for object store ᵒ Midokura ᵒ NetApp (FlexPod, object store) ᵒ Big Switch Networks ᵒ Basho RIAK CS ᵒ Stratosphere ᵒ Caringo object store • Backup/DR ᵒ Cloudian S3 ᵒ Sungard • PaaS ᵒ CloudFoundry implementation through • Networking IronFoundry and Stackato teams ᵒ Cisco ᵒ Engine Yard ᵒ Brocade (ADX) ᵒ Cumulogic ᵒ GigaSpaces Workload requirements drive cloud architecture There is real demand for SDN in cloud infrastructure Open source developers drive cloud adoption More info http://cloudstack.org.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages34 Page
-
File Size-