<<

MULTI-SITE OPENSTACK DEPLOYMENT OPTIONS & CHALLENGES FOR TELCOS

Azhar Sayeed Chief Architect [email protected] DISCLAIMER Important Informaon

The informaon described in this slide set does not provide any commitments to roadmaps or availability of products or features.

Its intenon is purely to provide clarity in describing the problem and drive a discussion that can then be used to drive open source communies

Red Hat Product Management owns the roadmap and supportability conversaon for any product

2 AGENDA

• Background: OpenStack Architecture • Telco Deployment Use case • Distributed deployment – requirements • Multi-Site Architecture • Challenges • Solution and Further Study • Conclusions

3

OPENSTACK ARCHITECTURE WHY MULTI-SITE FOR TELCO? • Compute requirements – Not just at Data Center • Mulple Data Centers • Managed Service Offering • Managed Branch Office • Thick vCPE • Mobile Edge Compute • vRAN – vBBU locaons • Virtualized Central Offices • Hundreds to thousands of locaons • Primary and Data Center – • IoT Gateways – Fog compung

6 Centrally managed Compute closer to the user Multiple DC or Central Offices Independent OpenStack Deployments Remote Sites E2E Orchestrator • Hierarchical Connecvity model of CO • Remote sites with compute

Security & requirements Quality of Service (QoS) • Extend OpenStack to these sites Traffic Shaping Overlay Device Management Tunnel over

Main Data Center Backup Data Center A typical service almost always spans across mulple DCs

Remote Data 7 Centers Multiple DCs – NFV Deployment Real Customer Requirements Region 2 L2 or L3 . Extensions between DCs . . . Fully Redundant System . Region 1 . • 25 Sites 25 • 2-5 VNFs required at each site • Maximum of 2 Compute Nodes per site needed for these VNFs • Controllers Storage Requirements = Image storage only Redundant Storage Nodes • Total number of control Nodes = 25 *3 =75 Compute Nodes Configuraon Overhead • Total Number of Storage Nodes = 25 * 3 = 75 75% 8 • Total Number of Compute Nodes = 25 * 2 = 50 Virtual Central Office Real Customer Challenge Region Region 1 2

L2 or L3 . Extensions . between DCs . . . Fully Redundant System . 1000+ • 1000+ Sites – Central Offices • From few 10s to 100s of VMs • Fully Redundant configuraons • Terminaon of Residenal, Business and Mobile Services Controllers • Managing 1000 islands Storage Nodes • Tier 1 Telcos already have >100 sites today Compute Nodes

9 Management Challenge DEPLOYMENT OPTIONS

10 OPTIONS • Mulple Independent Island Model – seen this already • Common Authencaon and Management – External user policy management with LDAP integraon – Common Keystone • Stretched deployment model – Extend compute and Storage Nodes into other Data Centers – Keep central control of all remote resources • Allow Data Centers to share workloads – Tri-circle approach • Proxy the APIs – Master Slave model or cascading model • Agent based model • Something else??

11 Multiple DC or Central Offices Independent OpenStack Deployments Feed the load balancer • Site capacity independent of the other Management Plaorm • User informaon separate or replicated offline • Load balancer directs traffic where to L go to – Good for loadsharing • DR – external problem Directory B

L2 or L3 Extensions between DCs

Fully Redundant Fully Redundant System System Controllers Storage Nodes Region 1 Compute Nodes

Region 2…N 12 Good for few 10s of sites – What about 100s or Thousands of sites Extended OpenStack Model Shared Keystone Deployment Common or Shared Keystone Cloud Management Plaorm • Single Keystone for authencaon • User informaon in one locaon • Independent Resources • Modify the keystone endpoint table Keystone • Endpoint, Service, Region, IP Directory

L2 or L3 Extensions between DCs

Fully Redundant Fully Redundant System System Controllers Storage Nodes Region 1 Compute Nodes

… 13 Identy: Keystone – Single point of control Region 2 N Extended OpenStack Model Central Controller and Remote Compute & Storage (HCI) Nodes Central Controller Cloud Management Plaorm • Single authencaon • Distributed Compute Resources • Single Availability Zone per Region

L2 or L3 Region 1 Extensions Region 2…N between DCs

Fully Redundant System Replicated Storage – Controllers Galera Cluster Storage Nodes Cinder, Glance and Image Compute Nodes Directory

14 Manual Restore Revisiting the Branch Office - Thick CPE Can we deploy compute nodes at all the branch sites and centrally control them?

E2E Network Orchestrator

Data Center Enterprise Security & Firewall vCPE Quality of Service (QoS) Traffic Shaping Device Management IPSec, MPLS Internet or Other Tunnel mechanism

Enterprise vCPE

x86 with VNFs Deploy Nova Compute NFVI

15 OpenStack, OpenShift/ How do I scale it to thousands of sites? Kubernetes OSP 10 – Scale components independently Most OpenStack HA services and VIPs must be launched/managed by Pacemaker or HAProxy. However, some can be managed via systemctl thanks to the simplification of pacemaker constraints introduced in version 9 and 10. COMPOSABLE SERVICES AND CUSTOM ROLES

Hardcoded Custom Custom Custom Controller Role Controller Role Ceilometer Role Networker Role

Keystone Keystone

Ceilometer Ceilometer

Neutron Neutron

RabbitMQ RabbitMQ

Glance Glance

...... • Leverage composable services model – to define a Central Keystone

– Place functionality where it is needed – i.e. dis-aggregate • Deployable standalone on separate nodes or combined with other services into Custom Role(s). 17 – Distribute the functionality depending on the DC locations

Re-visiting the Virtual Central Office use case Real Customer Challenge Region 1

L2 or L3 Extensions Region 2 between DCs

Fully Redundant System Region 4

Region 3

Controllers Storage Nodes Compute Nodes Region 3a Region 3b

18 Require Flexibility and some Hierarchy CONSIDERATIONS Scaling across a thousand sites? • Some areas that we need to look at • Latency and Outage times • Delays due to distance between DCs and link speeds - RTT • The remote site is lost – headless operations and subsequent recovery • Startup Storms • Scaling Oslo messaging • RabbitMQ • Scaling of Nodes => Scale RabbitMQ/Messaging • Ceilometer (Gnocchi & Aodh)– heavy user of MQ

19 LATENCY AND OUTAGE TIMES Scaling across a thousand sites? • Latency between sites – Nova API Calls • 10, 50, 100 ms? Round trip me = Queue tuning • Boleneck link/node speed • Outage me – recovery me • 30s or more? • Nova Compute services flapping • Confirmaon – from provisioning to operaon • Neutron me outs – binding issues • Headless operaon • Restart –causes storms

20 RABBITMQ TUNING • Tune the buffers – increase buffer size • Take into account messages in flight – rates and round trip mes • BDP = Boleneck speed * RTT • Number of messages

• Servers * backends * requests/sec = Number of messages/sec Neutron • Split into mulple instances of message queues for distributed deployment • Ceilometer into a MQ – Heaviest user of MQ MQ • Nova into a single MQ • Neutron into a MQ Nova Conductor MQ • Refer to an interesng presentaon on this topic – “Tuning RabbitMQ Compute MQ at Large Scale Cloud” – Openstack Summit – Ausn 2016 Ceilometer Ceilometer Agents collector

21 RECENT AMQP ENHANCEMENTS • Eliminates the broker based model Broker • Enhances AMQP 1.0 Broker Broker • Separate messaging end point from message routers Hierarchical - Tree • Newton has AMQP driver for oslo messaging • Ocata provides perf tuning, upstream support for Triple-O • If you must use RabbitMQ • Use clustering and exchange configurations Broker Broker • Use shovel plugin with exchange configurations

and multiple instances Mesh - Routed

22 OPENSTACK CASCADING PROJECT

Parent

Child Child Child

Parent Proxy for Nova, Cinder, Celometer & Neutron subsystems per site AZ1 AZn At Parent – loads of proxys one set per Child Child User communicates to the master

23 TRICIRCLE AND TRIO2O Cascading solution split into two projects User1 • Tricircle – Networking across openstack clouds UserN • Trio2o – Single API Gateway for Nova, Cinder TRI-CIRCLE Trio2o Make Neutron(s) work as a single cluster API Gateway

AZ1 AZx AZn pod

Expand workloads into other OS instances Single Region with mulple sub regions Create Networking extensions Shared or Federated Keystone Isolaon of East-west traffic Shared or Distributed Glance Applicaon HA UID = TenantID+PODID

24 OPNFV Mul-Site Project – Eupherates release WHAT’S THE ALTERNATIVE? Remote Compute Nodes • Should we abandon the idea of Remote Nova Nodes? • Use Packstack/AllinOne – OSP in a – ala Vz uCPE • High overhead if you want to run 1-2 VNFs • Perhaps some optimization possible using Kolla/Container model • Initialize the remote nodes – Need L3/L2 connectivity for PXE • Make that a Kubernetes Node – Use containers on that node • Implement a new interface for remote nodes • Nova Agent on remote nodes ? • Abandon the idea of OpenStack – No!!!! No OpenStack really!!! ? • Use a CMP – to manage remote bare metal nodes • KVM – Hypervisor • Run Containers on remote nodes – Do we run into same issues?

25 VIRTUAL CONTROLLER MODEL Virtual controllers – to get around node restrictions Kolla –Containerizing the control plane • Kolla –Kubernetes and Kolla Ansible • Containerizing OSP control makes the previous options easier • Can remote nodes be considered as PODS in Kubernetes environments • Interface between Master and Host node • The containers can be deployed on those nodes to manage apps or even OSP services

Neutron VM1 VM2

Keystone Glance Nova

26 SUMMARY • Deploying OpenStack at multiple sites is a must for Telcos • Tri-circle and Trio2o offer good promise • Tune Rabbit MQ or move to MQ enhancements (AMQP) • Partition MQ • Scale MQ instances • Carefully craft the Availability Zone model • Nova Agent Proxy • Deploying baremetal at remote sites still an issue does not solve the problem of access • Another way of using call home • Use Kubernetes as master orchestrator => Kubernetes managing OSP managing container workloads – K8S Sandwich 27 THANK YOU

plus..com/+RedHat .com/redhatinc

linkedin.com/company/red-hat twitter.com/RedHatNews

youtube.com/user/RedHatVideos ABSTRACT Important Informaon

OpenStack provides a great Infrastructure-as-a-Service (IaaS) plarom for deployment of applicaons in virtual machines and containers. For telcos specifically, OpenStack unifies the point of presence (PoP), central office, and datacenter infrastructure. However, many telcos need OpenStack deployed in many datacenters around the region or country. The queson is how should they deploy OpenStack for mul-site needs?

Should they consider stretched deployment where different components sit in different locaons? Or should they consider replicang the enre OpenStack environment in each locaon? What impact does this have for Keystone, messaging, disaster recovery, and more importantly, unified management of all these sites?

This presentaon will discuss architectural and deployment opons for mul-site deployments of OpenStack 29