White Paper Cloud Computing OpenStack*

Network learnings on 66 bare metal nodes

Manjeet Singh Bhatia, Ganesh Mahalingam, Nathaniel Potter, Malini Bhandaru, Isaku Yamahata, Yih Leong Sun Open Source Technology Center, Intel Corporation

Executive summary switching implementations for LB and OVS. OSIC hosts the world’s largest OpenStack* [1] developer ML2 enables different switching solutions to coexist cloud, comprised of 2,000 nodes, to enable community- and interoperate in a single OpenStack cloud instance. wide development and testing of OpenStack features and The purpose of our experiment was to compare the functionality at an unmatched scale. Cloud and Software performance of these two switching solutions in various Defined Networking (SDN) are front and center in today’s scenarios. Figure 1 shows a high level design of ML2 with a enterprise and telecommunications landscape. OpenStack, switch agent. a leading open source cloud operating system, has a following of thousands and over two hundred production Controller 1 Controller N deployments [2]. Intel® and Rackspace* founded the Neutron Server Neutron Server OpenStack Innovation Center* (OSIC) [3] to accelerate the Plugins Plugins development and innovation of OpenStack for enterprises around the globe. ML2 ML2 In this whitepaper, we share our learnings from a grant of 66 nodes for a 3 week period in the OSIC developer cloud. The goal of this experiment was to compare the performance of two Layer 2 software switching solutions: bridge (LB) and Open VSwitch (OVS). We share how to deploy a cloud on the OSIC cluster, discuss the Network node 1 Network node 1 OpenStack neutron Modular Layer 2 (ML2) architecture, and present our experimental results. In the course of the Linux bridge agent OVS agent work we identified and fixed some deployment bugs. Last, but not least, we also share how to submit an experiment proposal to OSIC. Figure 1. Overview of ML2 design

Problem statement: Deploying on the OSIC developer cloud OpenStack networking performance We received 66 bare metal nodes with no operating Several events occur when a user launches a virtual system installed on the OSIC cluster. machine in an OpenStack cloud. Multiple Application The physical connectivity cabling was preprovisioned, Programmer Interface (API) calls are made, including one and no action was required from our end. We received a to create a virtual network interface card (vNIC). Behind document that detailed the switch and network layout. the scenes, there are database transactions, messages exchanged between various services over a message We used iLO* [4], the remote server management tool from queue, plenty of logging, and much more. Each element in HP*, to manage and build the servers. Each server comes this diverse set of components can impact performance with a unique IP address for iLO management. We were and might need to be tuned differently for different cloud given the iLO credentials needed to access the servers via usage scenarios. a browser or the command line. Each server has multiple network interfaces, and two of them are usable with PXE In general, OpenStack is designed to support various booting [5] in conjunction your favorite provisioning tool. vendor plugins. Its networking component, neutron, is no different. neutron allows users to choose their switching Deployment was a two-phase process. First, we installed solution. The Modular Layer 2 (ML2) plugin was introduced a minimal operating system, and second, we deployed in neutron in the Havana release to replace the monolithic the cloud itself on the hosts. Some of the hosts were configured as controller nodes and the rest as compute Test plan hosts. It took around 40 minutes for the first phase and We spawned 100, 200, 500, and 1000 virtual machines about another 21 minutes in phase two to complete the (VMs) for both LB and OVS deployments. All the VMs deployment of OpenStack. Note that the details might were on the same network, each VM was created with change in the future, and the deployment process might one virtual network interface, and a single switch become even simpler. port was hooked up. Spinning up a VM involves some communication between nova and neutron to create Phase I: Server provisioning virtual switch ports. Those switch ports are hooked up to We first manually installed * 14.04 Server as our the VMs. A small script facilitated creating these VMs and operating system on one of the hosts, and then used measuring the time it took them to become active. We did Cobbler* [6] to provision all the others. There are other not mix switching technologies within our deployments. open source tools available to provision servers like bifrost* [7] with ironic* [8]. VM launch results In table 1, we’ve shared the results obtained from our VM Phase II: OpenStack Deployment spawn tests with LB and OVS. Launch time is measured as the time period between the point at which nova’s We used OpenStack kolla [9] to deploy OpenStack on the launch API is called and the point at which nova marks the cluster. kolla was both simple to use and, with its vibrant database as task accomplished and returns all the VM-ids. responsive community, quick to provide bug fixes which However, many other activities happen behind the scenes eased our deployment task. before one can successfully ping the new VMs. We refer to We wrote some scripts and OpenStack-* [10] the period during which these activities occur as the “ping playbooks for predeployment work, such as configuring time”. network interfaces on all the servers, installing software Note that the times indicated refer to the total number of dependencies, and injecting ssh public keys. kolla uses VMs launched in a test. Interestingly, the launch and the Docker* containers and OpenStack-Ansible playbooks ping times seem to grow sublinearly. As seen on figure 2. to install OpenStack services. For large deployments, we recommend running a Docker registry on the local Time [s] 250 network that contains all the container images for the services you anticipate running in your cloud. 200 Our configuration was comprised of three control nodes, LB Launch three network nodes, and 58 compute nodes. We reserved 150 OVS Launch one node to serve as the deployment and monitoring LB Ping OVS Ping host. Please note that this setup is configurable in most 100 deployment tools. It typically varies based on use case and 50 anticipated workload. In a production system, TLS is used for security and REST API calls are HTTPS. However, we did 0 #VMs not use TLS in our deployment. Thus, all times should only 100 200 300 400 500 600 700 800 900 1000 be used for their trends and not their absolute values. Figure 2. VM launch and ping time growth. Experiments Most cloud deployments choose either LB or OVS, with The times match consistently across all VM ping times with the choice possibly depending on the administrators’ one huge exception: The one variation in the data was for prior experience with one or the other. Few, if any, 500 VMs with OVS. Further investigation is required to deployments have tried to change switching technologies determine the reason for this finding. in their production clouds. We were curious to compare Our study of the logs indicates that further performance and contrast the two solutions in terms of their ease of gains are possible by addressing issues using RabbitMQ* deployment, control plane (network setup latency), and for interprocess communication, and by removing data path performance (speed of handling the packets). database connection bottlenecks by using an active-active We compared performance along these dimensions by database. modifying the number of virtual machines being launched.

TOTAL LAUNCH TIME (SECONDS) TOTAL PING TIME (SECONDS)

#VMS LB OVS LB OVS

100 12 13 35 43 200 21 21 53 53 500 49 51 210 115 1000 61 63 268 263 Table 1. VM launch results, including both launch time and ping time. Port create/destroy stress results The RabbitMQ was used for message queuing on all three control nodes. The RabbitMQ process consumed a lot of We launched and destroyed virtual machines repeatedly CPU cycles on all three control nodes, sometimes using over ten runs. There was little difference in latency 90% of the CPU by itself, leaving few resources for other for port creation between OVS and LB. However, the processes to use. Some Erlang optimization of RabbitMQ delete operation exposed some differences. In LB, port would definitely save some CPU cycles for other processes. cleanup was the same across multiple runs. In OVS, the performance degraded halfway through our ten runs. This ciao[15] opted for a lightweight messaging protocol to might be a database effect, possibly caused by too many reduce the messaging burden by both reducing message rows in the table. size and the need to ensure persistence.

Discussion Neutron-server In the following section, we examine issues spanning ease In our experiments we used a single network. For each of deployment and experimental results, and also propose VM a virtual switch port is created and associated with further work. A quick look at the VM launch times indicates the network as a tap. At some points, the neutron server that OpenStack handles them quickly, as measured by process used over 90% of the CPU, which in turn lead to shorter launch times than the linear growth in the number delays in RabbitMQ message handling and other issues. of VMs launched. However, to develop a deeper insight, Can the neutron server process be more efficient and less we analyzed logs, CPU utilization, and the various running chatty? processes. See table 2 for a list of uncovered issues. Conclusions Database bottleneck In the course of deploying the cloud with OVS and LB The number of connections required to handle the volume we uncovered some issues and chased down their fixes, of API calls was too large to be handled by the CPU, and in so doing we helped improve OpenStack for the given that the database was configured in active-passive community. mode. If the database had been running in active-active mode, database connection handling would have been If you are a novice user, installing bare metal nodes and more successful. We had three control nodes running getting OpenStack up and running is non trivial. Kolla and mySQL processes but only one was actively serving all the Cobbler are certainly good tools for this task, and our team database connection requests. had experience with these tools. The Cloud Integrated Advanced Orchestrator (ciao)[15] is an In the context of our original intent, both OVS and LB open source scheduler for OpenStack that addresses the performed similarly regarding port creation latency during above issue by using a pull-mode scheduler, which reduces a VM launch. However, the time needed to delete ports database locking dependencies. stays constant with LB, while OVS environments take increasingly longer to delete ports over repeated runs. Message queue woes

ISSUE DESCRIPTION BUG REPORT LINK FIX/PATCH/THOUGHTS 1 Missing Linux Bridge dependencies in Kolla de- https://review.openstack. ployment. org/#/c/304951/ 2 Kolla hardcoded low limit on database connec- https://bugs. https://bugs.launchpad.net/ tions. Running at scale resulted in errors/time- launchpad.net/kolla/ kolla/mitaka/+bug/1563643/ outs. mitaka/+bug/1563643 comments 3 Scheduler retry attempts was set to a maximum https://bugs. Isaku Yamahata suggests of 3 that resulted in 40% failure rate under scale launchpad.net/kolla/ exploring the use of queu- testing. mitaka/+bug/1563664 ing theory to identify a sweet spot 4 During scale testing, virtual machines were https://bugs. Speculating that the be- occasionally assigned multiple IP addresses launchpad.net/kolla/ havior stems from miscom- despite the default request set to a single IP ad- mitaka/+bug/1565105 munication between the dress. scheduler and the message queuing service or may be a bug in the deployment tool. 5 ML2 out of sync with Linux Bridge as the number https://bugs.launchpad. It can be a bug in deploy- of VMs we continue to launch rises. net/kolla/+bug/1568202 ment tool not able to handle dependencies properly or it can be a bug in neutron.

Table 2. Uncovered issues. Another interesting observation is that with LB, the ML2 References plugin gets out of sync as we continued launching VMs in 1. OpenStack: http://www.OpenStack.org/ the 1000 VM spawn test. 2. User Survey: https://www.OpenStack.org/assets/ Future work survey/April-2016-User-Survey-Report.pdf Our three-week experimentation period went by too 3. OSIC: https://osic.org/ fast given all that we needed to learn to properly deploy, 4. ILO: https://en.wikipedia.org/wiki/HP_Integrated_ debug, configure, and support our cloud environment. Lights-Out Now that we are more knowledgeable, we hope to be able to dive right into our experiments when we are granted 5. PXE: https://en.wikipedia.org/wiki/Preboot_Execution_ another allocation. We would like to explore the following Environment items: 6. Cobbler: http://cobbler.github.io/manuals/quickstart/ » Performance with DPDK (Data Plane Development Kit) [13] » 7. Bifrost: http://docs.OpenStack.org/developer/bifrost/ enabled OVS switches. 8. Ironic: https://wiki.OpenStack.org/wiki/Ironic »» Sustained load tests to sniff out leaks and other cruft. 9. Kolla: https://wiki.OpenStack.org/wiki/Kolla »» Determine the breaking point load. 10. Ansible: https://en.wikipedia.org/wiki/Ansible_ » Launch OpenStack on top of Kubernetes and explore » (software) scaling of API services, particularly horizontal scaling. 11. Software Defined Networking:https://www. » Effects of introducing a Software Defined Networking » opennetworking.org/sdn-resources/sdn-definition (SDN) [11] controller such as OpenDayLight* [14]. 12. OSIC Cluster Proposal: https://www.osic.org/clusters »» Expand the current study focused on VMs to include the profiling of container networking solutions. 13. DPDK: http://www.intel.com/content/www/us/ en/intelligent-systems/intel-technology/packet- »» Develop additional instrumentation to get deeper processing-is-enhanced-with-software-from-intel- insight into port binding and virtual interface (VIF) driver dpdk.html performance. 14. OpenDayLight: https://www.opendaylight.org/ Appendix 15. CIAO: https://clearlinux.org/documentation/ciao- cluster-setup.html Requesting nodes on the OSIC cluster The community is invited and encouraged to propose experiments and request allocation within the OSIC developer cloud. Proposals that are deemed to return the highest value to the community are prioritized and granted allocation. Submit your request at osic.org/clusters today. There are three cluster sizes that can be allocated to the community based on usage plans and availability: »» Three cabinets, 66 bare metal nodes »» Six cabinets, 132 bare metal nodes »» Eleven cabinets, 242 bare metal nodes. To request an OSIC cluster you need to submit a request online at https://www.osic.org/clusters. Once you submit a request, you can expect a response in two to three weeks. It might take longer if the information provided in your request is incomplete. Access is typically granted for 21 days. A week prior to your allocation Visit osic.org to get more expiration, you will be notified of the upcoming expiration so that you can backup important files, logs, results, and details and become part of any code you have on those machines. the OSIC community.

* Other names and brands may be claimed as the property of others.