Dell XC Web-Scale Converged Appliance for Citrix® XenDesktop®

Dell Engineering September 2015

A Dell Appliance Architecture

Revisions

Date Description

March 2015 Initial release

Sep 2015 Document overhaul, XC730 + vGPU Addition

THIS DOCUMENT IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, HARDWARE SELECTIONS CONTAINED WITHIN ARE FROM THE BASIS OF BEST WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND.

Copyright © 2015 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell.

Dell, the Dell logo, and the Dell badge are trademarks of Dell Inc. and Windows are registered trademarks of Microsoft Corporation in the United States and/or other countries. VMware is a registered trademark of VMware, Inc. NVIDIA GRID™ and vGPU™ are registered trademarks of NVIDIA Corporation. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell Inc. disclaims any proprietary interest in trademarks and trade names other than its own.

A Dell Appliance Architecture

Contents

1 Introduction ...... 6 1.1 Purpose ...... 6 1.2 Scope ...... 6 1.3 What’s New ...... 6 2 Solution Architecture Overview ...... 7 2.1 Introduction ...... 7 2.2 Architecture...... 7 2.3 Nutanix Web-scale Converged Infrastructure ...... 9 2.4 Dell XC Web Scale – Solution Pods ...... 11 Network Architecture ...... 13 3 Hardware Components ...... 14 3.1 Network ...... 14 Dell Networking S60 (1Gb ToR Switch) ...... 14 Dell Networking S4810 (10Gb ToR Leaf Switch) ...... 15 3.2 Dell XC Converged Appliance ...... 17 Dell XC630 (A5) ...... 18 Dell XC630 (B5) ...... 18 Dell XC630 (B7) ...... 19 Dell XC730 (Graphics) ...... 20 3.3 GPUs ...... 22 NVIDIA GRID K1 and K2 ...... 22 3.4 Dell Wyse Clients ...... 23 Wyse 3020 Thin Client ...... 24 Wyse 5010 Thin Client ...... 24 Wyse 7020 Thin Client with WES7 ...... 24 Wyse 7020 Thin Client with WES8 ...... 24 Wyse 7010 Thin Client with Linux ...... 25 Dell Wyse Zero – Xenith 2 ...... 25 Dell Wyse Zero – Xenith Pro 2 ...... 25 Dell Wyse Cloud Connect ...... 26

A Dell Appliance Architecture

Dell Venue 11 Pro ...... 26 Dell Chromebook 11 ...... 26 4 Software Components ...... 27 4.1 Citrix ...... 27 Citrix XenDesktop ...... 27 Machine Creation Services (MCS) ...... 29 Citrix Profile Manager ...... 30 Citrix XenApp ...... 30 Citrix NetScaler ...... 34 4.2 Platforms ...... 36 VMware vSphere 6 ...... 36 Microsoft Windows Server 2012 R2 Hyper-V ...... 37 4.3 NVIDIA GRID vGPU ...... 37 vGPU Profiles ...... 38 5 Solution Architecture for XenDesktop 7 ...... 40 5.1 Management Role Configuration ...... 40 vSphere Management Role Requirements ...... 40 Hyper-V Management Role Requirements ...... 41 XenApp on vSphere ...... 42 XenApp on Hyper-V...... 42 SQL Databases ...... 43 DNS ...... 43 5.2 Storage Architecture Overview ...... 44 Nutanix Containers ...... 44 5.3 Virtual Networking ...... 44 vSphere ...... 45 Hyper-V...... 46 5.4 Scaling Guidance ...... 48 5.5 Solution High Availability ...... 49 5.6 Dell Wyse Datacenter for XenDesktop Communication Flow ...... 51 6 Solution Performance and Testing ...... 52 6.1 Load Generation and Monitoring ...... 52

A Dell Appliance Architecture

Login VSI 4 – Login Consultants ...... 52 VMware vCenter ...... 52 Microsoft Perfmon ...... 52 6.2 Performance Analysis Methodology ...... 52 Resource Utilization...... 53 Dell Wyse Datacenter Workloads and Profiles ...... 54 Dell Wyse Datacenter Workloads ...... 54 6.3 Testing and Validation ...... 56 Testing Process ...... 56 6.4 XC630-B5 Test Results for Citrix XenDesktop ...... 57 XenDesktop, 420 Pooled VMs, MCS, Light Workload – XC630-B5 ...... 60 XenDesktop, 260 Pooled VMs, MCS, Medium Workload – XC630-B5 ...... 63 XenDesktop, 200 Pooled VMs, MCS, Heavy Workload – XC630-B5 ...... 66 XenApp, 290 Sessions, Heavy Workload – XC630-B5 ...... 70 6.5 XC630-B7 Test Results for Citrix XenDesktop ...... 74 Test Results ...... 76 Results summary ...... 77 Bensen B7 Configuration, ESXi/XenDesktop MCS, 200 User Test, Heavy Workload ...... 77 XenApp, 290 Sessions, User Test, Heavy Workload ...... 81 Acknowledgements ...... 87 About the Authors ...... 88

A Dell Appliance Architecture

1 Introduction

1.1 Purpose This document addresses the architecture design, configuration and implementation considerations for the key components required to deliver virtual desktops or shared sessions via Citrix® XenDesktop® and XenApp® on Microsoft® Windows Server® Hyper-V® 2012 R2 or VMware® vSphere® 6.

1.2 Scope Relative to delivering the virtual desktop environment, the objectives of this document are to:

 Define the detailed technical design for the solution.  Define the hardware requirements to support the design.  Define the constraints which are relevant to the design.  Define relevant risks, issues, assumptions and concessions – referencing existing ones where possible.  Provide a breakdown of the design into key elements such that the reader receives an incremental or modular explanation of the design.  Provide solution scaling and component selection guidance.

1.3 What’s New  Introduce support for VMware vSphere 6  Introduce support for Citrix XenDesktop 7.6  Introduce support for new Dell XC730 Graphics appliance  Introduce support for NVIDIA GRID vGPU on vSphere 6

A Dell Appliance Architecture

2 Solution Architecture Overview

2.1 Introduction The Dell XC series delivers an out-of-the- infrastructure solution for virtual desktops that eliminates the high cost, variable performance, and extensive risk of conventional solutions. The Nutanix™ web-scale converged infrastructure is a turnkey solution that comes ready to run your VDI solution of choice. The Nutanix platform’s unique architecture allows enterprises to scale their virtual desktops from 50 to tens of thousands of desktops in a linear fashion, providing customers with a simple path to enterprise deployment with the agility of public cloud providers.

2.2 Nutanix Architecture The Nutanix web-scale converged infrastructure is a scale-out cluster of high-performance nodes (or servers), each running a standard hypervisor and containing processors, memory, and local storage (consisting of SSD Flash and high capacity SATA disk drives). Each node runs virtual machines just like a standard host.

In addition, local storage from all nodes is virtualized into a unified pool by the Nutanix Distributed File System (NDFS). In effect, NDFS acts like an advanced NAS that uses local SSDs and disks from all nodes to

A Dell Appliance Architecture

store virtual machine data. Virtual machines running on the cluster write data to NDFS as if they were writing to shared storage.

NDFS understand the concept of a virtual machine and provides advanced data management features. It brings data closer to virtual machines by storing the data locally on the system, resulting in higher performance at a lower cost. Nutanix platforms can horizontally scale from as few as three nodes to a large number of nodes, enabling organizations to scale their infrastructure as their needs grow.

The Nutanix Elastic Deduplication Engine is a software-driven, massively scalable and intelligent data reduction technology. It increases the effective capacity in the disk tier, as well as the RAM and flash cache tiers of the system, by eliminating duplicate data. This substantially increases storage efficiency, while also improving performance due to larger effective cache capacity in RAM and flash. Deduplication is performed by each node individually in the cluster, allowing for efficient and uniform deduplication at scale. This technology is increasingly effective with full/persistent clones or P2V migrations.

Sequential streams of data are fingerprinted Only a single instance of the shared VM Each node participates in, and performs, at 4K granularity for efficient deduplication data is pulled into the cache upon read its own fingerprinting and deduplication

VM 11 ... VM N VM 11 ... VM N VM 11 ... VM N

Hyypeerrvviissorr Hyypeerrvviissorr ... Hyypeerrvviissorr

Caacchee Caacchee Caacchee Cache Cache Cache CVM CVM CVM SSttorraaggee SSttorraaggee SSttorraaggee

NDFS

Nutanix Shadow Clones delivers distributed localized caching of virtual disks performance in multi-reader scenarios, such as desktop using Citrix XenDesktop or XenApp. With Shadow Clones, the

A Dell Appliance Architecture

CVM actively monitors virtual disk access trends. If there are requests originating from more than two remote CVMs, as well as the local CVM, and all of the requests are read I/O and the virtual disk will be marked as immutable. Once the disk has been marked immutable, the virtual disk is then cached locally by each CVM, so read operations are now satisfied locally by local storage.

The benefits of the Nutanix Platform are now exposed to scale out vSphere or Hyper-V deployments:

2.3 Nutanix Web-scale Converged Infrastructure The Nutanix web-scale converged infrastructure provides an ideal combination of both high-performance compute with localized storage to meet any demand. True to this capability, this reference architecture contains zero reconfiguration of or customization to the Nutanix product to optimize for this use case.

The next figure shows a high-level example of the relationship between an XC node, storage pool, container, pod and relative scale out:

A Dell Appliance Architecture

Dell XC Web Scale allows organizations to deliver virtualized or remote desktops and applications through a single platform and support end users with access to all of their desktops and applications in a single place.

A Dell Appliance Architecture

2.4 Dell XC Web Scale – Solution Pods The networking layer consists of the 10Gb Dell Networking S4810 utilized to build a world-class leaf/spine architecture with robust 1Gb switching in the S60 for iDRAC connectivity.

The compute, management and storage layers are converged into a single server XC Series appliance cluster, hosting either VMware vSphere or Microsoft Hyper-V . The recommended boundaries of an individual pod are based on number of nodes supported within a given hypervisor cluster, 64 nodes for vSphere or Hyper-V.

Dell recommends that the VDI management infrastructure nodes be separated from the compute resources onto their own appliance cluster with a common storage namespace shared between them based on NFS for vSphere or SMB for Hyper-V. One node for VDI management is required, minimally, and expanded based on size of the pod. The designations ds_rdsh, ds_compute, and ds_mgmt as seen below are logical NDFS containers used to group VMs of a particular type. Using distinct containers allows features and attributes, such as compression and deduplication, to be applied to groups of VMs that share similar characteristics. Compute hosts can be used interchangeably for XenApp or RDSH as required. Distinct clusters should be built for management and compute hosts for HA, respectively, to plan predictable failover, scale and load across the pod. The NFS or SMB namespace can be shared across multiple hypervisor clusters adding disk capacity and performance for each distinct cluster.

A Dell Appliance Architecture

High-performance graphics capabilities compliment the solution and can easily be added at any time to any new or existing Dell XC vSphere deployment. Simply add the appropriate number of XC730 appliances to your NDFS cluster and provide a superior user experience with vSphere 6 and NVIDIA GRID vGPU technology.

A Dell Appliance Architecture

Network Architecture Designed for true linear scaling, Dell XC series leverages a Leaf-Spine network architecture. A Leaf-Spine architecture consists of two network tiers: an L2 Leaf and an L3 Spine based on 40GbE and non-blocking switches. This architecture maintains consistent performance without any throughput reduction due to a static maximum of three hops from any node in the network.

The following figure shows a design of a scale-out Leaf-Spine network architecture that provides 20Gb active throughput from each node to its Leaf and scalable 80Gb active throughput from each Leaf to Spine switch providing scale from 3 XC nodes to thousands without any impact to available bandwidth:

L3 Spine

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33SFP+ Stack ID

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34SFP+ EST

L2 Leaf

ER S ST N U SY MA FA PS

Force10 S4810P RS-232

52 60 QSFP+

LNK ACT 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 56 ETHERNET SFP+

4

LAG

10 Gb/s

40+ Gb/s

MLAG Link

A Dell Appliance Architecture

3 Hardware Components

3.1 Network The following sections contain the core network components for the Dell Wyse Datacenter solutions. General uplink cabling guidance to consider in all cases is that TwinAx is very cost effective for short 10Gb runs and for longer runs use fiber with SFPs.

Dell Networking S60 (1Gb ToR Switch) The Dell Networking S-Series S60 is a high-performance 1/ 10Gb access switch optimized for lowering operational costs at the network edge and is recommended for iDRAC connectivity. The S60 answers the key challenges related to network congestion in ToR (Top-of-Rack) and service provider aggregation deployments. As the use of bursty applications and services continue to increase, huge spikes in network traffic that can cause network congestion and packet loss, also become more common. The S60 is equipped with the industry’s largest packet buffer (1.25 GB), enabling it to deliver lower application latency and maintain predictable network performance even when faced with significant spikes in network traffic. Providing 48 line-rate Gb ports and up to four optional 10Gb uplinks in just 1-RU, the S60 conserves valuable rack space. Further, the S60 design delivers unmatched configuration flexibility, high reliability, and power and cooling efficiency to reduce costs.

Model Features Options Uses

Dell Networking 44 x BaseT (10/100/1000) Redundant PSUs 1Gb S60 + 4 x SFP connectivity for iDRAC High performance 4 x 1Gb SFP ports the support copper or fiber High Scalability

12Gb or 24Gb stacking (up to 12 switches)

2 x modular slots for 10Gb uplinks or stacking modules

A Dell Appliance Architecture

Guidance:

 10Gb uplinks to a core or distribution switch are the preferred design choice using the rear 10Gb uplink modules. If 10Gb to a core or distribution switch is unavailable the front 4 x 1Gb SFP ports are used.  The front 4 SFP ports can support copper cabling and are upgraded to optical if a longer run is needed.

For more information on the S60 switch and Dell Networking, please visit: LINK

3.1.1.1 S60 Stacking The S60 switches are optionally stacked with 2 or more switches, if greater port count or redundancy is desired. Each switch will need a stacking module plugged into a rear bay and connected with a stacking cable. The best practice for switch stacks greater than 2 is to cable in a ring configuration with the last switch in the stack cabled back to the first. Uplinks need to be configured on all switches in the stack back to the core to provide redundancy and failure protection.

Dell Networking S4810 (10Gb ToR Leaf Switch) The Dell Networking S-Series S4810 is an ultra-low latency 10/40Gb Top-of-Rack (ToR) switch purpose- built for applications in high-performance data center and computing environments. Leveraging a non- blocking, cut-through switching architecture, the S4810 delivers line-rate L2 and L3 forwarding capacity with ultra-low latency to maximize network performance. The compact S4810 design provides industry- leading density of 48 dual-speed 1/10Gb (SFP+) ports as well as four 40Gb QSFP+ uplinks to conserve valuable rack space and simplify the migration to 40Gb in the data center core (Each 40Gb QSFP+ uplink can support four 10Gb ports with a breakout cable). Priority-based Flow Control (PFC), Data Center Bridge Exchange (DCBX), Enhance Transmission Selection (ETS), coupled with ultra-low latency and line rate throughput, make the S4810 ideally suited for converged leaf/spine environments.

A Dell Appliance Architecture

Model Features Options Uses

Dell Networking 48 x SFP+ (1Gb/10Gb) Single-mode/ multimode ToR switch for 10Gb S4810 + 4 x QSFP+ (40Gb) optics, TwinAx, QSFP+ converged connectivity breakout cables Redundant Power Supplies Stack up to 6 switches or 2 using VLT, using SFP or QSFP ports

Guidance:

 The 40Gb QSFP+ ports are split into 4 x 10Gb ports using breakout cables for stand-alone units, if necessary. This is not supported in stacked configurations.  10Gb or 40Gb uplinks to a core or distribution switch is the preferred design choice.  The front 4 SFP ports can support copper cabling and are upgraded to optical if a longer run is needed.

For more information on the S4810 switch and Dell Networking, please visit: LINK

3.1.2.1 S4810 Stacking The S4810 switches are optionally stacked up to 6 switches or configured to use Virtual Link Trunking (VLT) up to 2 switches. Stacking is supported on either SFP or QSFP ports as long as that port is configured for stacking. The best practice for switch stacks greater than 2 is to cable in a ring configuration with the last switch in the stack cabled back to the first. Uplinks need to be configured on all switches in the stack back to the core to provide redundancy and failure protection. It is recommended that the S4810 be configured for use in the leaf layer connective up to another switch in the spine layer.

A Dell Appliance Architecture

3.2 Dell XC Converged Appliance Consolidate compute and storage into a single chassis with Dell XC Series web-scale converged appliances, powered by Nutanix software. XC Series appliances install quickly, integrate easily into any data center, and can be deployed for multiple virtualized workloads including , test and development, and private cloud projects. For general purpose virtual desktop and virtual application solutions, Dell recommends the XC630. For more information please visit: Link

While the XC630 is ultimately a flexible and highly configurable converged appliance, Dell offers three optimized platforms with which to build your desktop virtualization project. The A5 configuration is recommended for entry-level, small scale or POC deployments where maximizing a balance of low cost and performance is crucial. The B5 boats a balanced configuration best suited for small or large production deployments consisting of task or knowledge workers. Finally the B7 configuration is positioned as the ultimate in performance and scalability providing an abundance of CPU power and tiered disk capacity to suit a virtual desktop project of any scale or user workload.

A Dell Appliance Architecture

Dell XC630 (A5) The Dell XC630-A5 platform is perfect for POCs, lighter user workloads, shared sessions or application virtualization. Each appliance comes equipped with dual 8-core CPUs and 256GB of high-performance RAM. Six disks come in each host, 2 x 200GB SSD for the hot tier (Tier1) and 4 x 1TB NL SAS disks for the cold tier (Tier2). The 64GB SATADOM boots the hypervisor and Nutanix Controller VM while the H730 PERC is configured in pass-through mode connecting to the SSDs and HDDs. 64GB is consumed on each of the first two SSDs for the Nutanix “home”. These six disks are presented to the Nutanix Controller VM running locally on each host. Each platform can be outfitted with SFP+ or BaseT NICs.

Dell XC630 – A5

CPU 2 x E5-2630v3 (8C, 2.4GHz)

Memory 16 x 16GB 2133MHz RDIMMs

Effective speed: 2133MHz @ 256GB

RAID Ctrls PERC H730 Mini – no RAID

Storage 64GB SATADOM (CVM/ Hypervisor)

2 x 200GB MLC SATA SSD 2.5” (T1)

4 x 1TB NL SAS 2.5” (T2)

Network 2 x 10Gb, 2 x 1Gb SFP+/ BT

iDRAC iDRAC8 Ent w/ vFlash, 8GB SD

Power 2 x 750W PSUs

Dell XC630 (B5) The Dell XC630-B5 platform is perfect for larger POCs, medium user workloads, shared sessions or application virtualization. Each appliance comes equipped with dual 12-core CPUs and 384GB of high- performance RAM. Six disks come in each host, 2 x 400GB SSD for the hot tier (Tier1) and 4 x 1TB NL SAS disks for the cold tier (Tier2). The 64GB SATADOM boots the hypervisor and Nutanix Controller VM while the H730 PERC is configured in pass-through mode connecting to the SSDs and HDDs. 64GB is consumed on each of the first two SSDs for the Nutanix “home”. These six disks are presented to the Nutanix Controller VM running locally on each host. Each platform can be outfitted with SFP+ or BaseT NICs.

A Dell Appliance Architecture

Dell XC630 – B5

CPU 2 x E5-2680v3 (12C, 2.5GHz)

Memory 24 x 16GB 2133MHz RDIMMs

Effective speed: 1866 MHz @ 384GB

RAID Ctrls PERC H730 Mini – no RAID

Storage 64GB SATADOM (CVM/ Hypervisor)

2 x 400GB MLC SATA SSD 2.5” (T1)

4 x 1TB NL SAS 2.5” (T2)

Network 2 x 10Gb, 2 x 1Gb SFP+/ BT

iDRAC iDRAC8 Ent w/ vFlash, 8GB SD

Power 2 x 750W PSUs

Dell XC630 (B7) The Dell XC630-B7 platform is ideal for high performance requirements, heavy user workloads, and dense shared sessions or application virtualization. Each appliance comes equipped with dual 16-core CPUs and 384GB of high-performance RAM. Six disks come in each host, 2 x 400GB SSD for the hot tier (Tier1) and 6 x 1TB NL SAS disks for the cold tier (Tier2). The 64GB SATADOM boots the hypervisor and Nutanix Controller VM while the H730 PERC is configured in pass-through mode connecting to the SSDs and HDDs. 64GB is consumed on each of the first two SSDs for the Nutanix “home”. These eight disks are presented to the Nutanix Controller VM running locally on each host. Each platform can be outfitted with SFP+ or BaseT NICs.

A Dell Appliance Architecture

Dell XC630 – B7

CPU 2 x E5-2698v3 (16C, 2.3GHz)

Memory 24 x 16GB 2133MHz RDIMMs

Effective speed: 1866 MHz @ 384GB

RAID Ctrls PERC H730 Mini – no RAID

Storage 64GB SATADOM (CVM/ Hypervisor)

2 x 400GB MLC SATA SSD 2.5” (T1)

6 x 1TB NL SAS 2.5” (T2)

Network 2 x 10Gb, 2 x 1Gb SFP+/ BT

iDRAC iDRAC8 Ent w/ vFlash, 8GB SD

Power 2 x 750W PSUs

Dell XC730 (Graphics) The Dell-optimized XC730 platform is ideal for high performance graphics workloads. Each appliance comes equipped with dual 14-core 120w CPUs and 256GB of high-performance RAM supporting up to 64 users per XC730. Six disks come standard in each host, 2 x 400GB SSD for the hot tier (Tier1) and 4 x 1TB NL SAS disks for the cold tier (Tier2). The 64GB SATADOM boots the hypervisor and Nutanix Controller VM while the H730 PERC is configured in pass-through mode connecting to the SSDs and HDDs. 64GB is consumed on each of the first two SSDs for the Nutanix “home”. These six disks are presented to the Nutanix Controller VM running locally on each host. Each platform can be outfitted with SFP+ or BaseT NICs.

A Dell Appliance Architecture

Important to note that this platform must be configured using CPUs with a max TDP of 120w and requires 1100w PSUs.

Dell XC730 – Graphics

CPU 2 x E5-2695v3 (14C, 2.3GHz)

Memory 16 x 16GB 2133MHz RDIMMs

Effective speed: 2133 MHz @ 256GB

GPU 2 x NVIDIA Grid K2; vGPU Profile: K220Q

RAID Ctrls PERC H730 Mini – no RAID

Storage 64GB SATADOM (CVM/ Hypervisor)

2 x 400GB MLC SATA SSD 2.5” (T1)

4 x 1TB NL SAS 2.5” (T2)

Network 2 x 10Gb, 2 x 1Gb SFP+/ BT

iDRAC iDRAC8 Ent w/ vFlash, 8GB SD

Power 2 x 1100W PSUs

A Dell Appliance Architecture

3.3 GPUs

NVIDIA GRID K1 and K2 NVIDIA GRID™ technology offers the ability to offload graphics processing from the CPU to the GPU in virtualized environments, allowing the data center manager to deliver true PC graphics-rich experiences to more users for the first time. NVIDIA's Kepler™- based GRID K1 and K2 boards are specifically designed to enable rich graphics in virtualized environments.

GPU Virtualization

GRID boards allow hardware virtualization of the GPU. This means multiple users can share a single GPU, improving user density while providing true PC performance and compatibility.

Low-Latency Remote Display

NVIDIA's patented low-latency remote display technology greatly improves the user experience by reducing the lag that users feel when interacting with their virtual machine. With this technology, the virtual desktop screen is pushed directly to the remoting protocol.

Maximum User Density

NVIDIA GRID boards have an optimized multi-GPU design that helps to maximize user density. GRID K1 boards, which include four Kepler-based GPUs and 16GB of memory, are designed to host the maximum number of concurrent users. GRID K2 boards, which include two higher end Kepler GPUs and 8GB of memory, deliver maximum density for users of graphics-intensive applications.

A Dell Appliance Architecture

Specs Grid K1 Grid K2

Number of GPUs 4 x Kepler GPUs (GK107) 2 x high-end Kepler GPUs (GK104)

Total CUDA cores 768 (192 per GPU) 3072 (1536 per GPU)

Core Clock 850 MHz 745 MHz

Total memory size 16 GB DDR3 8 GB GDDR5

Max power 130 W 225 W

Form Factors Dual slot (4.4” x 10.5”) Dual slot (4.4” x 10.5”)

Aux power 6-pin connector 8-pin connector

PCIe x16 (Gen3) x16 (Gen3)

Cooling solution Passive Passive/ Active

For more information on NVIDIA Grid, please visit: Link

3.4 Dell Wyse Cloud Clients

The following Dell Wyse clients will deliver a superior Citrix user experience and are the recommended choices for this solution.

A Dell Appliance Architecture

Wyse 3020 Thin Client Build out a simple or vast virtual desktop infrastructure (VDI) and stay within your budget using Wyse 3000 series thin clients. This easy-to- use family of clients for Citrix®, Microsoft®, VMware® or Wyse vWorkspace environments includes models that are available with virus-immune Wyse ThinOS or Wyse enhanced ™ Linux®. All offer vibrant multimedia performance, System-on-Chip (SoC) processors and hands-off Wyse Automated Management capability. Using about 7-watts of power in full operation, the T10D creates very little heat for a greener, more comfortable working environment. Link

Wyse 5010 Thin Client The Wyse 5000 series thin client is a highly efficient and powerful endpoint platform for Citrix®, Microsoft®, VMware®, and Wyse vWorkspace virtual desktop environments. It is available as a zero client, a diskless cloud desktop or a thin client with Windows Embedded Standard®, Wyse-enhanced enterprise Linux, or virus-immune Wyse ThinOS. With its extremely compact size and high performance, it’s a versatile virtual desktop endpoint or cloud desktop when used with Wyse WSM software. For more information, please visit: Link

Wyse 7020 Thin Client with WES7 The Dell Wyse 7020 is a super high-performance Windows Embedded Standard 7 thin client for virtual desktop environments. Featuring a quad-core AMD processor, and an integrated graphics engine that significantly boost performance; the 7020 achieves exceptional speed and power for the most demanding VDI and embedded Windows applications, rotational 3D graphics, 3D simulation and modeling, unified communications, and multi-screen HD multimedia. Take a unit from box to productivity in minutes. Just select the desired configuration and the 7020 does the rest automatically—no need to reboot. Scale to tens of thousands of endpoints with Dell Wyse WDM software or leverage your existing Microsoft System Center Configuration Manager platform. The 7020 is the thin client for power users who need workstation-class performance on their desktop or within a desktop virtualization environment (x86 or x64). For more information, please visit: Link

Wyse 7020 Thin Client with WES8 Dell Wyse 7020 is a super high-performance Windows Embedded 8 Standard thin client for virtual desktop environments. Featuring a quad-core AMD processor, the 7020 offers a vibrant Windows 8 experience and achieves exceptional speed and power for the most demanding embedded Windows applications, rich 3D graphics and HD multimedia. And you can scale to tens of thousands of 7020 endpoints with Dell Wyse Device Manager (WDM) software, or leverage your existing Microsoft System Center Configuration Manager platform. With single-touch or multi-touch capable displays, the 7020 adds the ease of an

A Dell Appliance Architecture

intuitive touch user experience. The 7020 is an ideal thin client for offering a high-performance Windows 8 experience with the most demanding mix of virtual desktop or cloud applications (x86 or x64). For more information please visit: Link

Wyse 7010 Thin Client with Linux Designed for power users, the new Dell Wyse 7010 is the highest performing thin client on the market. Highly secure and ultra-powerful, the 7010 combines Dell Wyse-enhanced SUSE Linux Enterprise with a dual-core AMD 1.65 GHz processor and a revolutionary unified engine for an unprecedented user experience. The 7010 eliminates performance constraints for high-end, processing-intensive applications like -aided design, multimedia, HD video and 3D modeling. Scalable enterprise-wide management provides simple deployment, patching and updates. Take a unit from box to productivity in minutes with auto configuration. Delivering unmatched processing speed and power, security and display performance, it’s no wonder no other thin client can compare. For more information, please visit: Link

Dell Wyse Zero – Xenith 2 Discover new levels of performance, simplicity and ease-of-use in the next generation of Wyse Xenith™ zero clients. They’re built for Citrix® HDX™, so users receive an instant-on desktop, and can enjoy a rich media experience that doesn’t tax servers and networks. And it’s easy, too. With simple operation and auto-configuration, Wyse 3000 and 5000 series Xenith zero clients for Citrix are always adapting to new network security and protocol advances. For more information, please visit: Link

Dell Wyse Zero – Xenith Pro 2 Dell Wyse Xenith Pro 2 is the next-generation zero client for Citrix HDX and Citrix XenDesktop, delivering ultimate performance, security and simplicity. With a powerful dual core AMD G-series CPU; Xenith Pro 2 is faster than competing devices. This additional computing horsepower allows dazzling HD multimedia delivery without overtaxing your server or network. Scalable enterprise-wide management provides simple deployment, patching and updates—your Citrix XenDesktop server configures it out-of-the-box to your preferences for plug- and-play speed and ease of use. Virus and immune, the Xenith Pro 2 draws under 9 watts of power in full operation—that’s less than any PC on the planet. For more information please visit: Link

A Dell Appliance Architecture

Dell Wyse Cloud Connect Designed to promote bring-your-own-device (BYOD) environments, Dell Wyse Cloud Connect allows you to securely access and share work and personal files, presentations, applications and other content from your business or your home. Managed through Dell Wyse Cloud Client Manager software-as-a-service (SaaS), IT managers can ensure that each Cloud Connect device is used by the appropriate person with the right permissions and access to the appropriate apps and content based on role, department and location. Slightly larger than a USB memory stick, Cloud Connect is an ultra-compact multimedia-capable device. Simply plug it into any available Mobile High-Definition Link (MHL) / HDMI port on a TV or monitor, attach a Bluetooth keyboard and mouse, and you’re off and running. Easy to slip into your pocket or bag, it enables an HD-quality window to the cloud, great for meetings and presentations while on business travel, or for cruising the and checking email after a day of work. For more information, please visit: Link

Dell Venue 11 Pro Meet the ultimate in productivity, connectivity and collaboration. Enjoy full laptop performance in an ultra-portable tablet that has unmatched flexibility for a business in motion. This dual purpose device works as a tablet when you're out in the field but also enables you to work on your desktop in the office thanks to an optional dock. For more information, please visit: Link

Dell Chromebook 11 The lightweight, easy-to-use Dell Chromebook 11 helps turn education into exploration - without the worries of safety or security. Priced to make 1:1 computing affordable today, Chromebook 11 is backed by Dell support services to make the most of your budget for years to come. The Chrome OS and Chrome browser get students online in an instant and loads web pages in seconds. A high-density battery supported by a 4th Gen ® processor provides up to 10 hours of power. Encourage creativity with the Chromebook 11 and its multimedia features that include an 11.6" screen, stereo sound and webcam.

A Dell Appliance Architecture

4 Software Components

4.1 Citrix

Citrix XenDesktop The solution is based on Citrix XenDesktop 7.6 which provides a complete end-to-end solution delivering Microsoft Windows virtual desktops or server-based hosted shared sessions to users on a wide variety of endpoint devices. Virtual desktops are dynamically assembled on demand, providing users with pristine, yet personalized, desktops each time they log on.

Citrix XenDesktop provides a complete virtual desktop delivery system by integrating several distributed components with advanced configuration tools that simplify the creation and real-time management of the virtual desktop infrastructure.

A Dell Appliance Architecture

The core XenDesktop components include:

 Studio Studio is the management console that enables you to configure and manage your deployment, eliminating the need for separate management consoles for managing delivery of applications and desktops. Studio provides various wizards to guide you through the process of setting up your environment, creating your workloads to host applications and desktops, and assigning applications and desktops to users.

 Director Director is a web-based tool that enables IT support teams to monitor an environment, troubleshoot issues before they become system-critical, and perform support tasks for end users. You can also view and interact with a user's sessions using Microsoft Remote Assistance.

 Receiver Installed on user devices, Citrix Receiver provides users with quick, secure, self-service access to documents, applications, and desktops from any of the user's devices including smartphones, tablets, and PCs. Receiver provides on-demand access to Windows, Web, and Software (SaaS) applications.

 Delivery Controller (DC) Installed on servers in the data center, the controller authenticates users, manages the assembly of users’ virtual desktop environments, and brokers connections between users and their virtual desktops.

 StoreFront StoreFront authenticates users to sites hosting resources and manages stores of desktops and applications that user’s access.

 License Server The Citrix License Server is an essential component at any Citrix-based solution. Every Citrix product environment must have at least one shared or dedicated license server. License servers are that are either partly or completely dedicated to storing and managing licenses. Citrix products request licenses from a license server when users attempt to connect.

 Machine Creation Services (MCS) A collection of services that work together to create virtual servers and desktops from a master image on demand; optimizing storage utilization and providing a pristine virtual machine to users every time they log on. Machine Creation Services is fully integrated and administrated in Citrix Studio.

 Provisioning Services (PVS) The Provisioning Services infrastructure is based on software-streaming technology. This technology allows computers to be provisioned and re-provisioned in real-time from a single shared-disk image.

A Dell Appliance Architecture

 Virtual Delivery Agent (VDA) The Virtual Desktop Agent is a transparent plugin that is installed on every virtual desktop or XenApp host (RDSH) and enables the direct connection between the virtual desktop and users’ endpoint devices.

Machine Creation Services (MCS) Citrix Machine Creation Services is the native provisioning mechanism within Citrix XenDesktop for virtual desktop image creation and management. Machine Creation Services uses the hypervisor APIs to create, start, stop, and delete virtual desktop images. Desktop images are organized in a Machine Catalog and within that catalog there are a number of options available to create and deploy virtual desktops:

 Random: Virtual desktops are assigned randomly as users connect. When they logoff, the desktop is reset to its original state and made free for another user to login and use. Any changes made by the user are discarded at log off.  Static: Virtual desktops are assigned to the same user every time with three options for how to handle changes made to the desktop: Store on local vDisk, Personal vDisk, or discarded on user log off. All the desktops in a random or static catalog are based off a master desktop template which is selected during the catalog creation process. MCS then takes snapshots of the master template and layers two additional virtual disks on top: an Identity vDisk and a Difference vDisk. The Identity vDisk includes all the specific desktop identity information such as host names and passwords. The Difference vDisk is where all the writes and changes to the desktop are stored. These Identity and Difference vDisks for each desktop are stored on the same data store as their related clone.

A Dell Appliance Architecture

While traditionally used for small to medium sized XenDesktop deployments, MCS can bring along with it some substantial Tier 1 storage cost savings because of the snapshot/identity/difference disk methodology. The Tier 1 disk space requirements of the identity and difference disks when layered on top of a master image snapshot, is far less than that of a dedicated desktop architecture.

Citrix Profile Manager Citrix Profile Management is a component of the XenDesktop suite which is used to manage user profiles and minimize many of the issues associated with traditional Windows roaming profiles in an environment where users may have their user profile open on multiple devices at the same time. The profile management toolset has two components: the profile management agent, installed on any device where the user profiles is managed, and a Group Policy Administrative Template, which is imported to a group policy.

In order to further optimize, the profile management folders within the user profile is redirected the users’ home drive. The folder redirection is managed via group policy objects within Active Directory. The following folders are redirected:

 Contacts  Downloads  Favorites  Links  My Documents  Searches  Start Menu  Windows  My Music  My Pictures  My Videos  Desktop

Citrix XenApp Citrix XenApp 7.6 includes enhancements in the areas of faster access to virtual apps with higher connection resiliency, improved graphics rendering, and new app-usage reporting and monitoring tools.

Citrix XenApp delivers Windows apps as secure mobile services. With XenApp, IT can mobilize the business - increasing user productivity, while reducing costs by centralizing control and security of intellectual property. XenApp delivers high-performance apps to any PC, Mac, laptop, tablet or smartphone that enable the delivery of a native experience that is optimized for the type of device, as well as the network. XenApp is built on a 3rd generation FlexCast Management Architecture (FMA) and is the only hybrid cloud-

A Dell Appliance Architecture

ready platform that separates the management plane from the workload to enable IT to securely deliver published apps on-premises, and manage workers and mobile workspaces either on-premises or in the cloud.

Benefits of hosted desktop sessions and applications:

 Management of applications (single instance)

 Management of simple desktop images (no applications installed)

 PVS to stream XenApp servers as well as user desktops

 Scalability of XenDesktop compute hosts: CPU and IOPS reduction via application offload

 Shared storage scalability: less IOPS = more room to grow

Citrix XenDesktop with XenApp integration can effectively deliver a desktop/application hybrid solution as well. Specifically where a single or small number of shared VDI desktop images are deployed via XenDesktop, each with common shared applications installed within the golden image. A user-specific application set is then deployed and made accessible via the hosted application compute infrastructure, accessible from within the virtual desktop.

A Dell Appliance Architecture

Alternatively, XenApp provides a platform for delivering Windows server-based sessions to users who may not need a full desktop VM. Hosted desktops increase infrastructure resource utilization while reducing complexity as all applications and sessions are centrally managed.

4.1.4.1 XenApp Integration into Dell Wyse Datacenter Architecture The XenApp servers can exist as physical or virtualized instances of Windows Server 2012 R2. A minimum of one (1), up to a maximum of eight (7) virtual servers are installed per physical compute host. Since XenApp instances are easily added to an existing XenDesktop stack, the only additional components required are:

 One or more Windows Server OS instances running the Citrix VDA added to the XenDesktop site The total number of required virtual XenApp servers is dependent on application type, quantity and user load. Deploying XenApp virtually and in a multi-server farm configuration increases overall farm performance, application load balancing as well as farm redundancy and resiliency.

4.1.4.2 NUMA Architecture Considerations Best practices and testing has showed that aligning XenApp design to the physical Non-Uniform Memory Access (NUMA) architecture of the server CPUs results in increased and optimal performance. NUMA alignment ensures that a CPU can access its own directly-connected RAM banks faster than those banks of the adjacent processor which are accessed via the Quick Path Interconnect (QPI). The same is true of VMs with large vCPU assignments, best performance will be achieved if your VMs receive their vCPU allotment from a single physical NUMA node. Ensuring that your virtual XenApp servers do not span physical NUMA nodes will ensure the greatest possible performance benefit.

The general guidance for XenApp NUMA-alignment on the Dell XC appliance is as follows:

4.1.4.3 A5 NUMA Alignment 8 physical cores per CPU in the A5 platform, 16 logical with Hyperthreading active, gives us a total of 32 consumable cores per appliance and falls in line with a 2x oversubscription rate. The Nutanix CVM will receive its vCPU allotment from the first physical CPU and so configuring the XenApp VMs as shown below will ensure that no NUMA spanning occurs which could lower performance.

A Dell Appliance Architecture

4.1.4.4 B5 NUMA Alignment 12 physical cores per CPU in the A5 platform, 24 logical with Hyper-threading active, gives us a total of 48 consumable cores per appliance and falls in line with a 2x oversubscription rate. The Nutanix CVM will receive its vCPU allotment from the first physical CPU and so configuring the XenApp VMs as shown below will ensure that no NUMA spanning occurs which could lower performance.

A Dell Appliance Architecture

4.1.4.5 B7 NUMA Alignment 16 physical cores per CPU in the A5 platform, 32 logical with Hyperthreading active, gives us a total of 64 consumable cores per appliance and falls in line with a 2x oversubscription rate. The Nutanix CVM will receive its vCPU allotment from the first physical CPU and so configuring the XenApp VMs as shown below will ensure that no NUMA spanning occurs which could lower performance.

Citrix NetScaler Citrix NetScaler is an all-in-one web application delivery controller that makes applications run five times better, reduces web application ownership costs, optimizes the user experience, and makes sure that applications are always available by using:

 Proven application acceleration such as HTTP compression and caching

 High application availability through advanced L4-7 load balancer

 Application security with an integrated Application Firewall

 Server offloading to significantly reduce costs and consolidate servers

A NetScaler appliance resides between the clients and the servers, so that client requests and server responses pass through it. In a typical installation, virtual servers (vservers) configured on the NetScaler provide connection points that clients use to access the applications behind the NetScaler. In this case, the NetScaler owns public IP addresses that are associated with its vservers, while the real servers are isolated in a private network. It is also possible to operate the NetScaler in a transparent mode as an L2

A Dell Appliance Architecture

bridge or L3 router, or even to combine aspects of these and other modes. NetScaler can also be used to host the StoreFront function eliminating complexity from the environment.

Global Server Load Balancing

GSLB is an industry standard function. It is in widespread use to provide automatic distribution of user requests to an instance of an application hosted in the appropriate data center where multiple processing facilities exist. The intent is to seamlessly redistribute load on an as required basis, transparent to the user community. These distributions are used on a localized or worldwide basis. Many companies use GSLB in its simplest form. They use the technology to automatically redirect traffic to (DR) sites on an exception basis. That is, GSLB is configured to simply route user load to the DR site on a temporary basis only in the event of a catastrophic failure or only during extended planned data center maintenance. GSLB is also used to distribute load across data centers on a continuous load balancing basis as part of normal processing.

XenDesktop HA with NetScaler White Paper: Link

Several of the management components of the XenDesktop stack are made highly-available using NetScaler to load balance traffic. The following management components require the use of a load balancer to function in a high availability mode:

 StoreFront Servers  Licensing Server  XenDesktop XML Service  XenDesktop Desktop Director  Provisioning Services TFTP Service

A Dell Appliance Architecture

4.2 Hypervisor Platforms

VMware vSphere 6 The vSphere hypervisor also known as ESXi is a bare-metal hypervisor that installs directly on top of your physical server and partitions it into multiple virtual machines. Each virtual machine shares the same physical resources as the other virtual machines and they can all run at the same time. Unlike other hypervisors, all management functionality of vSphere is done through remote management tools. There is no underlying , reducing the install footprint to less than 150MB.

VMware vSphere 6 includes three major layers: Virtualization, Management and Interface. The Virtualization layer includes infrastructure and application services. The Management layer is central for configuring, provisioning and managing virtualized environments. The Interface layer includes the vSphere web client.

Throughout the Dell Wyse Datacenter solution, all VMware and Microsoft best practices and prerequisites for core services are adhered to (NTP, DNS, Active Directory, etc.). The vCenter 6 VM used in the solution is a single Windows Server 2012 R2 VM or vCenter 6 , residing on a host in the management Tier. SQL server is a core component of the Windows version of vCenter and is hosted on another VM also residing in the management Tier. It is recommended that all additional XenDesktop components be installed in a distributed architecture, one role per server VM.

A Dell Appliance Architecture

Microsoft Windows Server 2012 R2 Hyper-V Windows Server 2012 R2 Hyper-V ™ is a powerful virtualization technology that enables businesses to leverage the benefits of virtualization. Hyper-V reduces costs, increases hardware utilization, optimizes business infrastructure, and improves server availability. Hyper-V works with virtualization-aware hardware to tightly control the resources available to each virtual machine. The latest generation of Dell servers includes virtualization-aware processors and network adapters.

From a network management standpoint, virtual machines are much easier to manage than physical computers. To this end, Hyper-V includes many management features designed to make managing virtual machines simple and familiar, while enabling easy access to powerful VM- specific management functions. The primary management platform within a Hyper-V based XenDesktop virtualization environment is Microsoft Systems Center SP1 (SCVMM).

SCVMM provides centralized and powerful management, monitoring, and self-service provisioning for virtual machines. SCVMM host groups are a way to apply policies and to check for problems across several VMs at once. Groups are organized by owner, operating system, or by custom names such as “Development” or “Production”. The interface also incorporates Remote Desktop Protocol (RDP); double- click a VM to bring up the console for that VM—live and accessible from the management console.

4.3 NVIDIA GRID vGPU NVIDIA GRID™ vGPU™ brings the full benefit of NVIDIA hardware-accelerated graphics to virtualized solutions. This technology provides exceptional graphics performance for virtual desktops equivalent to local PCs when sharing a GPU among multiple users.

GRID vGPU is the industry's most advanced technology for sharing true GPU hardware acceleration between multiple virtual desktops—without compromising the graphics experience. Application features and compatibility are exactly the same as they would be at the user's desk.

With GRID vGPU technology, the graphics commands of each virtual machine are passed directly to the GPU, without translation by the hypervisor. This allows the GPU hardware to be time-sliced to deliver the ultimate in shared virtualized graphics performance.

A Dell Appliance Architecture

vGPU Profiles

Virtual Graphics Processing Unit, or GRID™ vGPU™, is technology developed by NVIDIA® that enables hardware sharing of graphics processing for virtual desktops. This solution provides a hybrid shared mode allowing the GPU to be virtualized while the virtual machines run the native NVIDIA video drivers for better performance. Thanks to OpenGL support, VMs have access to more graphics applications. When utilizing vGPU, the graphics commands from virtual machines are passed directly to the GPU without any hypervisor translation. All this is done without sacrificing server performance and so is truly cutting edge.

The Dell graphics server platform is the award winning PowerEdge R730 which can accommodate two NVIDIA K1 or K2 graphics cards. The combination of Dell servers, NVIDIA vGPU technology and NVIDIA GRID cards enable high-end graphics users to experience high fidelity graphics quality and performance, for their favorite applications at a reasonable cost.

A Dell Appliance Architecture

Maximum vGPUs Graphics Virtual vGPU Maximum Physical Intended Card Display Profile Resolution GPUs User(s) Memory Heads Per Per GPU Card

K280Q 4GB 4 2560x1600 1 2 Designer

K260Q 2GB 4 2560x1600 2 4 Designer GRID 2 K2 K240Q 1GB 2 2560x1600 4 8 Power User

K220Q 512MB 2 2560x1600 8 16 Power User

K180Q 4GB 4 2560x1600 1 4 Entry Designer

K160Q 2GB 4 2560x1600 2 8 Power User GRID 4 K1

K140Q 1GB 2 2560x1600 4 16 Power User

K120Q 512MB 2 2560x1600 8 32 Power User

For more information about NVIDIA GRID vGPU, please visit: LINK

A Dell Appliance Architecture

5 Solution Architecture for XenDesktop 7

5.1 Management Role Configuration The Management role recommendations for the base solution are summarized below. Use data disks for role-specific application files such as data, logs and IIS web files in the Management volume.

vSphere Management Role Requirements

OS vDisk Role vCPU vRAM (GB) NIC Size (GB) Location

Nutanix CVM 8* 16 2 - C:\ (SATADOM)

DDC + Lic 4 8 1 40 SDS: ds_mgmt

Storefront 2 4 1 40 SDS: ds_mgmt

Primary SQL 4 8 1 40 + 200 SDS: ds_mgmt

vCenter 2 8 1 125 SDS: ds_mgmt Appliance

Total 20 vCPUs 44GB 6 vNICs 445GB -

A Dell Appliance Architecture

Hyper-V Management Role Requirements

Dynamic Memory OS vDisk Startup Role vCPU Min|Max Buffer Weight NIC RAM (GB)

Size (GB) Location

Nutanix CVM 8* 16 Dynamic Memory High 2 - C:\ Disabled (SATADOM)

DDC + Lic 4 8 384MB | 20% Med 1 40 SDS: 10GB ds_mgmt

Storefront 2 4 384MB | 20% Med 1 40 SDS: 6GB ds_mgmt

Primary SQL 4 8 384MB | 20% Med 1 40 + 200 SDS: 10GB ds_mgmt

SCVMM 2 8 384MB | 20% Med 1 40 + 50 SDS: 10GB ds_mgmt

Total 20 44GB 1.5GB | - - 6 410GB - vCPUs 36GB vNICs

A Dell Appliance Architecture

XenApp on vSphere The recommended number of XenApp VMs and their configurations on vSphere are summarized below based on applicable hardware platform.

Role HW VMs vCPUs RAM NIC OS vDisk Config per per (GB) host VM Size Location (GB) RDSH A5 3 8 32 1 80 SDS: ds_rdsh VM RDSH B5 5 8 32 1 80 SDS: ds_rdsh VM RDSH B7 7 8 32 1 80 SDS: ds_rdsh VM

XenApp on Hyper-V The recommended number of XenApp VMs and their configurations on Hyper-V are summarized below based on applicable hardware platform.

Role HW VMs vCPUs RAM NIC OS vDisk Config per per VM (GB) host

Size Location (GB) RDSH VM A5 3 8 32 1 80 SDS: ds_rdsh

RDSH VM B5 5 8 32 1 80 SDS: ds_rdsh

RDSH VM B7 7 8 32 1 80 SDS: ds_rdsh

A Dell Appliance Architecture

SQL Databases The Citrix, Microsoft and VMware databases are hosted by a single dedicated SQL 2012 R2 Server VM in the Management layer. Use caution during database setup to ensure that SQL data, logs, and TempDB are properly separated onto their respective volumes. Create all Databases that are required for:

 Citrix XenDesktop

 vCenter or SCVMM

Initial placement of all databases into a single SQL instance is fine unless performance becomes an issue, in which case database need to be separated into separate named instances. Enable auto-growth for each DB.

Best practices defined by Citrix, Microsoft and VMware are to be adhered to, to ensure optimal database performance.

Align all disks to be used by SQL Server with a 1024K offset and then formatted with a 64K file allocation unit size (data, logs, and TempDB).

DNS DNS plays a crucial role in the environment not only as the basis for Active Directory but is used to control access to the various Citrix and Microsoft software components. All hosts, VMs, and consumable software components need to have a presence in DNS, preferably via a dynamic and AD-integrated namespace. Microsoft best practices and organizational requirements are to be adhered to.

Pay consideration for eventual scaling, access to components that may live on one or more servers (SQL databases, Citrix services) during the initial deployment. Use CNAMEs and the round robin DNS mechanism to provide a front-end “mask” to the back-end server actually hosting the service or data source.

5.1.6.1 DNS for SQL To access the SQL data sources, either directly or via ODBC, a connection to the server name\ instance name must be used. To simplify this process, as well as protect for future scaling (HA), instead of connecting to server names directly, alias these connections in the form of DNS CNAMEs. So instead of connecting to SQLServer1\ for every device that needs access to SQL, the preferred approach is to connect to \.

For example, the CNAME “VDISQL” is created to point to SQLServer1. If a failure scenario was to occur and SQLServer2 would need to start serving data, we would simply change the CNAME in DNS to point to SQLServer2. No infrastructure SQL client connections would need to be touched.

A Dell Appliance Architecture

5.2 Storage Architecture Overview All Dell XC Web Scale appliances come with two tiers of storage by default, SSD for performance and HDD for capacity. A single common Software Defined Storage namespace is created across the Nutanix cluster and presented as either NFS or SMB to the hypervisor of each host. This constitutes a storage pool and one should be sufficient per cluster. Within this common namespace, logical containers are created to group VM files as well as control the specific storage-related features that are desired to be enabled such as deduplication and compression.

Nutanix Containers The following table outlines the recommended containers, their purpose and settings given the use case. Best practices suggest using as few features as possible, only enable what is absolutely required. For example, if you are not experiencing disk capacity pressure then there is no need to enable Capacity Tier Deduplication. Enabling unnecessary services increases the resource demands of the Controller VMs. Capacity tier deduplication requires that CVMs be configured with 32GB RAM.

Replication Perf Tier Capacity Tier Container Purpose Compression Factor Deduplication Deduplication

Ds_compute Desktop VMs 2 Enabled Disabled Disabled

Ds_mgmt Mgmt Infra VMs 2 Enabled Disabled Disabled

Ds_rdsh RDSH Server 2 Enabled Disabled Disabled VMs

Ds_vgpu vGPU-enabled 2 Enabled Disabled Disabled VMs

5.3 Virtual Networking The network configuration for the Dell XC Web Scale appliances utilizes a 10Gb converged infrastructure model. All required VLANs will traverse 2 x 10Gb NICs configured in an active/ active team. For larger scaling it is recommended to separate the infrastructure management VMs from the compute VMs to aid

A Dell Appliance Architecture

in predictable compute host scaling. The following outlines the VLAN requirements for the Compute and Management hosts in this solution model:

 Compute hosts o Management VLAN: Configured for hypervisor infrastructure traffic – L3 routed via spine layer o Live Migration VLAN: Configured for Live Migration traffic – L2 switched via leaf layer o VDI VLAN: Configured for VDI session traffic – L3 routed via spine layer  Management hosts o Management VLAN: Configured for hypervisor Management traffic – L3 routed via spine layer o Live Migration VLAN: Configured for Live Migration traffic – L2 switched via leaf layer o VDI Management VLAN: Configured for VDI infrastructure traffic – L3 routed via spine layer  An iDRAC VLAN is configured for all hardware management traffic – L3 routed via spine layer

vSphere The Management host network configuration consists of a standard vSwitch teamed with 2 x 10Gb physical adapters assigned. The CVM connects to a private internal vSwitch as well as the standard external vSwitch. All VMkernel service ports connect to the standard external vSwitch. All VDI infrastructure VMs connect through the primary port group on the external vSwitch.

Broker Roles SQL CVM

vNIC vNIC vNIC vNIC

VLAN 10 MTU

Port : : ESXi Host Group 64000 1

MGMT Vmk0 Standard vSwitch Standard vSwitch (teamed) (Internal) vMotion Vmk1 pNIC 1 pNIC 2 Nutanix Vmk2 Mgmt 10Gb 10Gb Spine LAG Dell Networking S4810

10Gb Leaf Switch The Compute hosts are configured in the same basic manner with the desktop VMs connecting to the primary port group on the external vSwitch.

A Dell Appliance Architecture

CVM

Desktop VMs

vNIC vNIC vNIC vNIC vNIC

VLAN 10 MTU

Port : : ESXi Host Group 64000 1

MGMT Vmk0 Standard vSwitch Standard vSwitch (teamed) (Internal) vMotion Vmk1 pNIC 1 pNIC 2 Nutanix Vmk2 Mgmt 10Gb 10Gb Spine LAG Dell Networking S4810

10Gb Leaf Switch

Hyper-V The Hyper-V configuration, while identical in core requirements and hardware, is executed differently due to how Hyper-V and Windows Server 2012 R2 implement networking and virtual switches. As shown in the diagram below, native Windows Server 2012 R2 NIC Teaming is utilized to load balance and provide resiliency for network connections. For the compute host in this scenario, a single LBFO NIC team is configured to connect to a Hyper-V switch for external traffic and one internal Hyper-V switch is used for the Nutanix CVM. All vNICs associated with the Management OS connect directly to the external Hyper-V switch.

Broker Roles SQL CVM

vNIC vNIC vNIC vNIC 64000

Hyper-V Host OS MTU : :

MGMT vNIC Hyper-V External Hyper-V Internal Failover Switch Switch vNIC Cluster 20Gb Live vNIC Migration LBFO Team pNIC 1 pNIC 2

10Gb 10Gb Spine LAG Dell Networking S4810 10Gb Leaf Switch

A Dell Appliance Architecture

The NIC team for the Hyper-V switch is configured as switch independent, dynamic for the load balancing mode with all adapters set to active. This team is used exclusively by Hyper-V.

The dedicated compute host configuration is shown in the diagram below and configured very similarly to the management host configuration.

CVM CVM

Desktop VMs

vNIC vNIC vNIC vNIC vNIC

VLAN VLAN 64000

Hyper-V Host OS MTU

20 : :

MGMT vNIC Hyper-V External Hyper-V Internal Failover Switch Switch vNIC Cluster 20Gb Live vNIC Migration LBFO Team pNIC 1 pNIC 2

10Gb 10Gb Spine LAG Dell Networking S4810

10Gb Leaf Switch

A Dell Appliance Architecture

5.4 Scaling Guidance Each component of the solution architecture scales independently according to the desired number of supported users. Additional appliance nodes can be added at any time to expand the Nutanix SDS pool in a modular fashion. While there is no scaling limit of the Nutanix architecture itself, practicality might suggest scaling pods based on the limits of hypervisor clusters (64 nodes for vSphere or Hyper-V). Isolating mgmt and compute to their own HA clusters provides more flexibility with regard to scaling and functional layer protection.

Another option is to design a large single contiguous NDFS namespace with multiple hypervisor clusters within to provide single pane of glass management. For example, portrayed below is a 30,000 professional user environment segmented by vSphere HA cluster and broker farm. Each farm compute instance is segmented into an HA cluster with a hot standby node providing N+1, served by a dedicated pair of mgmt nodes in a separate HA cluster. This provides multiple broker farms with separated HA protection while maintaining a single NDFS cluster across all nodes.

 The components are scaled either horizontally (by adding additional physical and virtual servers to the server pools) or vertically (by adding virtual resources to the infrastructure)  Eliminate bandwidth and performance bottlenecks as much as possible

A Dell Appliance Architecture

 Allow future horizontal and vertical scaling with the objective of reducing the future cost of ownership of the infrastructure.

Component Metric Horizontal scalability Vertical scalability

Virtual Desktop VMs per physical host Additional hosts and clusters Additional RAM or Host/Compute added as necessary CPU compute Servers power

Provisioning Desktops per instance Additional servers added to Additional network Servers the Provisioning Server farm and I/O capacity added to the servers

Desktop Desktops per instance Additional servers added to Additional virtual Delivery Servers (dependent on SQL the XenDesktop Site machine resources performance as well) (RAM and CPU)

XenApp Servers Desktops per instance Additional virtual servers Additional physical added to the XenDesktop Site servers to host virtual XenApp servers.

Storefront Logons/ minute Additional servers added to Additional virtual Servers the Storefront environment machine resources (RAM and CPU)

Database Concurrent Migrate databases to a Additional RAM and Services connections, dedicated SQL server and CPU for the responsiveness of reads/ increase the number of management nodes writes management nodes

File Services Concurrent Split user profiles and home Additional RAM and connections, directories between multiple CPU for the responsiveness of reads/ file servers in the cluster. File management nodes writes services can also be migrated to the optional NAS device to provide high availability.

5.5 Solution High Availability High availability (HA) is offered to protect each architecture solution layer, individually if desired. Following the N+1 model, additional ToR switches are added to the Network layer and stacked to provide redundancy as required, additional compute and management hosts are added to their respective layers, vSphere or Hyper-V clustering is introduced in both the management and compute layers, SQL is mirrored

A Dell Appliance Architecture

or clustered and NetScaler is leveraged for load balancing. Storage protocol switch stacks and NAS selection will vary based on chosen solution architecture.

The HA options provide redundancy for all critical components in the stack while improving the performance and efficiency of the solution as a whole.

 Additional switches added to the existing thereby equally spreading each host’s network connections across multiple switches.

 Additional ESXi or Hyper-V hosts added in the compute or mgmt layers to provide N+1 protection.

 Applicable Citrix infrastructure server roles are duplicated and spread amongst mgmt host instances where connections to each are load balanced via the addition of virtual NetScaler appliances.

 SQL Server databases also are protected through the addition and configuration of an "AlwaysOn" Failover Cluster Instance or Availability Group.

A Dell Appliance Architecture

5.6 Dell Wyse Datacenter for XenDesktop Communication Flow

SSL

Internet/ LAN

Hyper-V or vSphere Citrix NetScaler

Mgmt Hosts

ICA

/ / HDX StoreFront LDAP License File Active SQL Server Server Directory Server

TCP/1433

XML XML User Data User vCenter/ TCP

SCVMM /

LDAP 27000 Delivery PVS

Controller

( SMB

XML XML (MCS) ) HTTPS/ TCP 8100

SDS

TFTP

VDA

/ / PVS

VDA Hyper-V or vSphere Compute Hosts

MCS Machine PVS Machine XenApp Catalog Catalog

Shared Hosted Static or Random Static or Random XenApp Sessions or Windows Windows Virtual Desktops Virtual Desktops Instance RemoteApps Master Image Master Image

VDA VDA VDA

VDA VDA

VDA VDA

A Dell Appliance Architecture

6 Solution Performance and Testing

6.1 Load Generation and Monitoring

Login VSI 4 – Login Consultants Login VSI is the de-facto industry standard tool for testing VDI environments and server-based computing or RDSH environments. It installs a standard collection of desktop application software (e.g. Microsoft Office, Adobe Acrobat Reader) on each VDI desktop; it then uses launcher systems to connect a specified number of users to available desktops within the environment. Once the user is connected the workload is started via a logon script which starts the test script once the user environment is configured by the login script. Each launcher system can launch connections to a number of ‘target’ machines (i.e. VDI desktops), with the launchers being managed by a centralized management console, which is used to configure and manage the Login VSI environment.

VMware vCenter VMware vCenter was used for VMware vSphere-based solutions to gather key data (CPU, Memory and Network usage) from each of the desktop hosts during each test run. This data was exported to .csv files for each host and then consolidated to show data from all hosts. While the report does not include specific performance metrics for the Management host servers, these servers were monitored during testing and were seen to be performing at an expected performance level.

Microsoft Perfmon Microsoft Perfmon was utilized to collect performance data for tests performed on the Hyper-V platform.

6.2 Performance Analysis Methodology In order to ensure the optimal combination of end-user experience (EUE) and cost-per-user, performance analysis and characterization (PAAC) on Dell Wyse Datacenter solutions is carried out using a carefully designed, holistic methodology that monitors both hardware resource utilization parameters and EUE during load-testing. This methodology is based on the three pillars shown below. Login VSI is currently the load-testing tool used during PAAC of Dell Wyse Datacenter solutions; Login VSI is the de-facto industry standard for VDI and server-based computing (SBC) environments and is discussed in more detail below.

A Dell Appliance Architecture

Resource Utilization Poor end-user experience is one of the main risk factors when implementing desktop virtualization but the root cause for poor end-user experience is resource contention – hardware resources at some point in the solution have been exhausted, thus causing the poor end-user experience. In order to ensure that this has not happened (and that it is not close to happening), PAAC on Dell Wyse Datacenter solutions monitors the relevant resource utilization parameters and applies relatively conservative thresholds as shown in the table below. As discussed above, these thresholds are carefully selected to deliver an optimal combination of good end-user experience and cost-per-user, while also providing burst capacity for seasonal / intermittent spikes in usage. These thresholds are used to decide the number of virtual desktops (density) that are hosted by a specific hardware environment (i.e. combination of server, storage and networking) that forms the basis for a Dell Wyse Datacenter RA.

Resource Utilization Thresholds

Parameter Pass / Fail Threshold

Physical Host CPU Utilization 85%

Physical Host Memory Utilization 85%

Network Throughput 85%

Storage IO Latency 20ms

A Dell Appliance Architecture

Dell Wyse Datacenter Workloads and Profiles It’s important to understand user workloads and profiles when designing a desktop virtualization solution in order to understand the density numbers that the solution can support. At Dell, we use five workload / profile levels, each of which is bound by specific metrics and capabilities. In addition, we use workloads and profiles that are targeted at graphics-intensive use cases. We present more detailed information in relation to these workloads and profiles below but first it is useful to define the terms “workload” and “profile” as they are used in this document.

 Profile: This is the configuration of the virtual desktop - number of vCPUs and amount of RAM configured on the desktop (i.e. available to the user).  Workload: This is the set of applications used by performance analysis and characterization (PAAC) of Dell Wyse Datacenter solutions e.g. Microsoft Office applications, PDF Reader, Internet Explorer etc.

Hyper-V Hyper-V ESXi Physical Memory User Profile vCPUs Minimum Max Memory OS Memory Configured Memory Memory Reservation

Standard 1 2GB 1GB 2GB 1GB 2GB x86

Enhanced 2 3GB 1GB 3GB 1.5GB 3GB x86

Professional 2 4GB 1GB 4GB 2GB 4GB x64

Dell Wyse Datacenter Workloads Load-testing on each of the profiles described in the above table is carried out using an appropriate workload that is representative of the relevant use case. In the case of the non-graphics workloads, these workloads are Login VSI workloads and in the case of graphics workloads, these are specially designed workloads that stress the VDI environment to a level that is appropriate for the relevant use case. This information is summarized in the table below:

A Dell Appliance Architecture

Profile Name Workload OS Image

Standard Login VSI Light Shared

Enhanced Login VSI Medium Shared

Professional Login VSI Heavy Shared + Profile Virtualization

Graphics eFigures / AutoCAD - SPEC Viewperf Persistent

Further information for each of the workloads is given below. It is noted that for Login VSI testing, the following login and boot paradigm is used:

 For single-server / single-host testing (typically carried out to determine the virtual desktop capacity of a specific physical server), users are logged in every 30 seconds.

 For multi-host / full solution testing, users are logged in over a period of 1-hour, to replicate the normal login storm in an enterprise environment.

 All desktops are pre-booted in advance of logins commencing.

 For all testing, all virtual desktops run an industry-standard anti-virus solution (currently McAfee VirusScan Enterprise) in order to fully replicate a customer environment.

6.2.3.1 Login VSI Light Workload Compared to the Login VSI medium workload described below, the light workload runs fewer applications (mainly Excel and Internet Explorer with some minimal Word activity) and starts/stops the applications less frequently. This results in lower CPU, memory and disk IO usage.

6.2.3.2 Login VSI Medium Workload

The Login VSI medium workload is designed to run on 2vCPU’s per desktop VM. This workload emulates a medium knowledge worker using Office, IE, PDF and Java/FreeMind. The Login VSI medium workload has the following characteristics

 Once a session has been started the workload will repeat (loop) every 48 minutes.  The loop is divided in four segments; each consecutive Login VSI user logon will start a different segment. This ensures that all elements in the workload are equally used throughout the test.  The medium workload opens up to five applications simultaneously.  The keyboard type rate is 160 ms for each character.  Approximately two minutes of idle time is included to simulate real-‐world users.

Each loop opens and uses:

A Dell Appliance Architecture

 Outlook, browse messages.  Internet Explorer, browse different webpages and a YouTube style video (480p movie trailer) is opened three times in every loop.  Word, one instance to measure response time, one instance to review and edit a document.  Doro PDF Printer & Acrobat Reader, the Word document is printed and exported to PDF.  Excel, a very large randomized sheet is opened.  PowerPoint, a presentation is reviewed and edited.  FreeMind, a Java based Mind Mapping application.

6.2.3.3 Login VSI Heavy Workload

The heavy workload is based on the medium workload except that the heavy workload:

 Begins by opening four instances of Internet Explorer. These instances stay open throughout the workload loop.  Begins by opening two instances of Adobe Reader. These instances stay open throughout the workload loop.  There are more PDF printer actions in the workload.  Instead of 480p videos a 720p and a 1080p video are watched.  Increased the time the workload plays a flash game.  The idle time is reduced to two minutes.

6.2.3.4 Workloads Running on Shared Graphics Profile

Graphics hardware vendors (e.g. NVIDIA) typically market a number of graphics cards that are targeted at different markets. Consequently, it is necessary to provide two shared graphics workloads – one for mid- range cards and the other for high-end cards. These workloads are described in more detail below. It is noted that technologies such as the Citrix / NVIDIA GRID vGPU technology (where the NVIDIA drivers reside on the virtual desktop, giving shared-level density with near pass-through functionality) mean that in some cases, the higher-end workloads, traditionally used for pass-through GPU PAAC, may be more appropriate for shared GPU PAAC. Such scenarios will explicitly state the workload used.

6.3 Testing and Validation

Testing Process The purpose of the single server testing is to validate the architectural assumptions made around the server stack. Each user load is tested against four runs. A pilot run is conducted to validate that the infrastructure is functioning and valid data is captured. Subsequently three more runs are conducted allowing for correlation of data. Summary of the test results is listed out in the below mentioned tabular format.

A Dell Appliance Architecture

At different stages of the testing the testing team will complete some manual “User Experience” Testing while the environment is under load. This will involve a team member logging into a session during the run and completing tasks similar to the User Workload description. While this experience is subjective, it will help provide a better understanding of the end user experience of the desktop sessions, particularly under high load, and ensure that the data gathered is reliable.

Login VSI has two modes for launching user’s sessions:

 Parallel Sessions are launched from multiple launcher hosts in a round robin fashion; this mode is recommended by Login Consultants when running tests against multiple host servers. In parallel mode the VSI console is configured to launch a number of sessions over a specified time period (specified in seconds)

 Sequential Sessions are launched from each launcher host in sequence, sessions are only started from a second host once all sessions have been launched on the first host and this is repeated for each launcher host. Sequential launching is recommended by Login Consultants when testing a single desktop host server. The VSI console is configure to launch a specified number of session at a specified interval specified in seconds

All test runs which involved the six desktop hosts were conducted using the Login VSI “Parallel Launch” mode and all sessions were launched over an hour to try and represent the typical 9am logon storm. Once the last user session has connected, the sessions are left to run for 15 minutes prior to the sessions being instructed to logout at the end of the current task sequence, this allows every user to complete a minimum of two task sequences within the run before logging out. The single server test runs were configured to launch user sessions every 60 seconds, as with the full bundle test runs sessions were left to run for 15 minutes after the last user connected prior to the sessions being instructed to log out.

6.4 XC630-B5 Test Results for Citrix XenDesktop Validation for this project was completed for Citrix XenDesktop 7.6 and XenApp 7.6 on the following platforms.

ESXi 5.5 U2 / Citrix XenDesktop 7.6 / XC630

ESXi 5.5 U2 / XenApp brokered by XenDesktop 7.6 / XC630

Platform configurations are shown below and the Login VSI workloads used for load testing on each environment.

A Dell Appliance Architecture

RAID Platform Config CPU RAM HD Config Network Ctlr

XC630 B5 E5-2690v3 512 H730P 1 X 64GB SATADOM 2x 10Gb, (12Core, GB (CVM/ HV) 2 x 1Gb 2.6GHz) SFP+/ BT 2 x 400GB, Intel S3700, SSD‘s 2.5” (T1)

4 x 1TB NL SAS 2.5” (T2)

1GB networking was used for the deployment of the Dell XC appliances while all 10GB networking was used for the PAAC testing.

Compute and Management resources were split out with the following configuration across a three node Nutanix cluster and all test runs were completed with this configuration.

 Node 1 – XC630– Dedicated Management (vCenter Appliance, SQL Server, XenDesktop Delivery Controller, XenDesktop Storefront.)

 Node 2 – XC630– Dedicated Compute

 Node 3 –XC630 – Dedicated Compute Please refer to section 5.1 of this document for the configuration of each management virtual machine.

The virtual machines were non-persistent linked clone desktops each configured on Windows 8.1 aligning with the Login VSI 4.X virtual machine configuration, Office 2010 was used with each Virtual Machine sized at 32 GB. User Workload configuration of the load generation virtual machines is shown in the table below.

User Workload vCPUs Memory OS Bit Level HD Size GB

Professional User 2 4GB x64 32

The XenApp environment was configured as shown in the table below for each host according to platform. The XenApp images were built using Windows Server 2012 R2 Standard edition and Microsoft Office 2010 plus the support tools required for Login VSI testing.

A Dell Appliance Architecture

VMs per Disk Size Role Config vCPUs RAM (GB) NIC host GB

XenApp VM B5 5 8 32 1 80

Test Results

Validation was performed using CCC standard testing methodology using LoginVSI 4 load generation tool for VDI benchmarking that simulates production user workloads.

Each test run adhered to PAAC best practices with a 30 second session logon interval, 16 minutes of steady state after which sessions would begin logging off.

The following table summarizes the test results for the various workloads and configurations:

Avg Avg Avg Avg Avg Hyper Density Memory Memory Net Provisioning Workload CPU IOPS/ -Visor Per Host Consumed Active Kbps/ % User GB GB User

ESXi XenDesktop Heavy 100 99% 236 GB 48 GB 5.2 771 MCS

ESXi XenDesktop Medium 130 96% 263GB 52GB 4 499 MCS

ESXi XenDesktop Light 210 94% 355GB 63GB 3.1 368 MCS

ESXi XenApp Heavy 145 97% 80GB 22 GB 2.6 325

CPU Utilization* CPU % for ESX Hosts was adjusted to account for the fact that on Intel E5-2690v3 series processors the ESX host CPU metrics will exceed the rated 100% for the host if Boost is enabled (by default). The Adjusted CPU % Usage is based on 100% usage and but is not reflected in the charts. The figure shown in the table is the Compute host steady state peak CPU Usage. One Nutanix CVM Controller virtual machine is located on each node and is allocated 10000MHz of CPU reservation.

A Dell Appliance Architecture

Memory Utilization The figure shown in the table above is the average memory consumed per Compute host over the recorded test period. Active is the average active memory per Compute host over the recorded test period.

The IOPS results are calculated from the average Nutanix Cluster Disk IOPS figure over the test period divided by the number of users.

Network Utilization The figure shown in the table is the average Kbps/User per Compute host over the recorded test period.

XenDesktop, 420 Pooled VMs, MCS, Light Workload – XC630-B5 The architecture for this test run includes a dedicated management host and two compute hosts. Each of the compute hosts was populated with 420 pooled virtual machines and one Nutanix CVM.

The Nutanix CVM’s consumed <20% of the compute hosts CPU Usage at the start of the test run.

The below graph shows the performance data for 210 user desktops per host on a pair of Compute hosts with the expected extra percentage provided by the Intel Turbo Boost feature. The CPU reaches a steady state peak of 96% during the test cycle when approximately 210 users are logged on to each compute host.

The Management host in the cluster runs the vSphere, XenDesktop management virtual machines and a Nutanix CVM. The mgmt node’s CPU utilization is significantly lower than the compute hosts and does not exceed 20% at any point in the test cycle.

CPU Utilization 120

100

Management 80

Compute A 60 Compute B 40 CPU Threshold 85%

20 Turbo Performance Increase 19%

0

16:15 16:30 16:45 17:00 17:15 17:30 17:45 18:00 18:15 18:30 18:45 19:00 19:15 19:30 19:45 20:00 20:15 16:00

A Dell Appliance Architecture

Memory consumption was very reasonable with a max of 403GB consumed and only 102GB actually active at peak. A host configured with 384GB RAM would be more than sufficient given this use case.

Consumed Memory Active Memory 450 120 400 100 350 300 80 250 Management 60 Management 200 Compute A Compute A 150 40 100 Compute B Compute B 20 50

0 0

16:30 16:45 16:00 16:15 17:00 17:15 17:30 17:45 18:00 18:15 18:30 18:45 19:00 19:15 19:30 19:45 20:00 20:15

17:30 16:00 16:15 16:30 16:45 17:00 17:15 17:45 18:00 18:15 18:30 18:45 19:00 19:15 19:30 19:45 20:00 20:15

Network utilization is generally always very low and in this test reached a peak of 120Mbps on the compute hosts.

Network Kbps 140000

120000

100000

80000 Management 60000 Compute A 40000 Compute B

20000

0

16:15 16:30 16:45 17:00 17:15 17:30 17:45 18:00 18:15 18:30 18:45 19:00 19:15 19:30 19:45 20:00 20:15 16:00

The total cluster disk performance peaked at 2621 IOPS, the majority generated as expected by the compute hosts. These results are not indicative of the total disk performance potential of the platform.

A Dell Appliance Architecture

A Dell Appliance Architecture

XenDesktop, 260 Pooled VMs, MCS, Medium Workload – XC630-B5 The architecture for this test run includes a dedicated management host and two compute hosts. Each of the compute hosts was populated with 130 pooled virtual machines and one Nutanix CVM.

The Nutanix CVM’s consumed under 20% of the compute hosts CPU Usage at the start of the test run.

The below graph shows the performance data for 130 user desktops per host on a pair of Compute hosts with the expected extra percentage provided by the Intel Turbo Boost feature. The CPU reaches a steady state peak of 96% during the test cycle when approximately 130 users are logged on to each compute host.

The Management host in the cluster runs the vSphere, XenDesktop management virtual machines and a Nutanix CVM. The mgmt node’s CPU utilization is significantly lower than the compute hosts and does not exceed 20% at any point in the test cycle.

A Dell Appliance Architecture

CPU Utilization 120

100 Management

80 Compute A

60 Compute B

40 CPU Threshold 85%

20 Turbo Performance Increase 19%

0

8:55 9:05 9:15 9:25 9:35 9:45 9:55

10:15 10:25 10:35 10:45 10:55 11:05 11:15 11:25 11:35 10:05

Memory consumption was very reasonable with a max of 335GB consumed and only 84GB actually active at peak. A host configured with 384GB RAM would be more than sufficient given this use case.

Consumed Memory Active Memory 400 90 350 80 300 70 250 60 50 200 Management Management 40 150 Compute A Compute A 30 100 Compute B 20 Compute B 50 10

0 0

8:55 9:05 9:15 9:25 9:35 9:45 9:55

8:55 9:05 9:15 9:25 9:35 9:45 9:55

10:15 11:25 10:05 10:25 10:35 10:45 10:55 11:05 11:15 11:35

10:55 10:15 10:25 10:35 10:45 11:05 11:15 11:25 11:35 10:05

Network utilization is generally always very low and in this test reached a peak of 98Mbps on the compute hosts.

A Dell Appliance Architecture

Network Kbps 120000 100000 80000 Management 60000 Compute A 40000 20000 Compute B

0

8:55 9:05 9:15 9:25 9:35 9:45 9:55

11:35 10:05 10:15 10:25 10:35 10:45 10:55 11:05 11:15 11:25

The total cluster disk performance peaked at 1951 IOPS, the majority generated as expected by the compute hosts. These results are not indicative of the total disk performance potential of the platform.

A Dell Appliance Architecture

XenDesktop, 200 Pooled VMs, MCS, Heavy Workload – XC630-B5 The architecture for this test run includes a dedicated management host and two compute hosts. Each of the compute hosts was populated with 100 full virtual machines and one Nutanix CVM per host.

The Nutanix CVM’s consumed under 20% of the compute hosts CPU Usage at the start of the test run for the compute hosts and <10% for the mgmt host.

This chart does not include the additional 10-20% of CPU available from the Turbo boost feature. Without the inclusion there is a total of 62380 MHz available for Desktops, with Turbo boost the total available MHz value would be ~68168 MHz.

The below graph shows the performance data for 100 user sessions per host on a pair of Compute hosts. The CPU reaches a steady state peak of 99% during the test cycle when approximately 100 users are logged on to each compute host.

The Management host in the cluster runs the vSphere, XenDesktop management virtual machines and a Nutanix CVM. The mgmt node’s CPU utilization is significantly lower than the compute hosts and does not exceed 15% at any point in the test cycle.

A Dell Appliance Architecture

CPU Utilization 100 90 80 70

60 Management 50 Compute A 40 Compute B 30 CPU Threshold 85% 20 10

0

19:45 18:35 18:45 18:55 19:05 19:15 19:25 19:35 19:55 20:05 20:15 20:25 20:35 20:45

In regards to memory consumption for the cluster, out of a total of 512 GB available memory per node there were no constraints for any of the hosts. The Compute Hosts reached a max memory consumption of 313 GB with active memory usage reaching a max of 90GB. There was no memory ballooning or swapping.

Consumed Memory Active Memory 350 100 300 80 250 60 200 Management Management 150 Compute A 40 Compute A 100 Compute B 20 Compute B 50

0 0

18:35 18:45 18:55 19:05 19:15 19:25 19:35 19:45 19:55 20:05 20:15 20:25 20:35 20:45

18:45 18:55 19:05 19:15 19:25 19:35 19:45 19:55 20:05 20:15 20:25 20:35 20:45 18:35

A Dell Appliance Architecture

Network bandwidth is not an issue on this solution with a steady state peak of approximately 146,000 Kbps.

Network Kbps 160000 140000 120000 100000 Management 80000 60000 Compute A 40000 Compute B 20000

0

18:35 18:45 18:55 19:05 19:15 19:25 19:35 19:45 19:55 20:05 20:15 20:25 20:35 20:45

The Nutanix Cluster IOPS figures taken from the Nutanix Web Console showed that the cluster reached a maximum of 1,532 Disk IOPS during the test run with the Management node having the lowest disk IOPS of the three nodes. The activity spike in the graph after steady state peak shows the user logoffs taking place at the end of the test cycle. These results are not indicative of the total disk performance potential of the platform.

A Dell Appliance Architecture

The Login VSI Max user experience score for this test indicates that the VSI Max score not reached so there should have been little deterioration of user experience during testing.

Notes:

 As indicated above, the CPU graphs do not take into account the extra 10+% of CPU resources available through the 2690v3’s turbo feature.

 Subjective user experience showed mouse movement and window response times when clicking within a running session during steady state were good. Video playback while somewhat slow was still acceptable.

 User login times were consistent up until the final few sessions were launching when a few session took some extra time to login.

 512 GB of memory installed on each node is more than enough for this configuration and should run equally well with less memory installed.

A Dell Appliance Architecture

XenApp, 290 Sessions, Heavy Workload – XC630-B5 The architecture for this test run includes a dedicated management host and two compute hosts. Each of the compute hosts was populated with 5 XenApp virtual machines running Windows Server 2012 R2 and one Nutanix CVM. Each session host VM was allocated 32GB of RAM and 8vCPU’s.

The Nutanix CVM’s consumed approximately 14% of the compute hosts CPU Usage at the start of the test run.

This chart does not include the additional 10-20% of CPU available from the Turbo boost feature. Without the inclusion there is a total of 62380 MHz available for Desktops, with Turbo boost the total available MHz value would be ~68168 MHz.

The below graph shows the performance data for 145 user sessions per host on a pair of Compute hosts. The CPU reaches a steady state peak of 97% during the test cycle when approximately 145 users are logged on to each compute host.

The Management host in the cluster runs the vSphere, XenDesktop management virtual machines and a Nutanix CVM. The mgmt node’s CPU utilization is significantly lower than the compute hosts and does not exceed 11% at any point in the test cycle.

CPU Utilization 100 90 80 70

60 Management 50 Compute A 40 Compute B 30 CPU Threshold 85% 20 10

0

12:50 13:40 14:30 12:40 13:00 13:10 13:20 13:30 13:50 14:00 14:10 14:20 14:40 14:50 15:00 15:10 15:20 15:30

In regards to memory consumption for the cluster, out of a total of 512 GB available memory per node there were no constraints for any of the hosts. The Compute Hosts reached a max memory consumption of 109 GB with active memory usage reaching a max of 50GB. There was no memory ballooning or swapping.

A Dell Appliance Architecture

Consumed Memory 120

100

80

60 Management Compute A 40 Compute B 20

0

14:20 12:50 13:00 13:10 13:20 13:30 13:40 13:50 14:00 14:10 14:30 14:40 14:50 15:00 15:10 15:20 15:30 12:40

Active Memory 60

50

40 Management 30 Compute A 20 Compute B 10

0

12:40 12:50 13:00 13:10 13:20 13:30 13:40 13:50 14:00 14:10 14:20 14:30 14:40 14:50 15:00 15:10 15:20 15:30

Network bandwidth is not an issue on this solution with a steady state peak of approximately 80Mbps.

A Dell Appliance Architecture

Network Kbps 90000 80000 70000 60000 50000 Management 40000 Compute A 30000 Compute B 20000 10000

0

12:40 12:50 13:00 13:10 13:20 13:30 13:40 13:50 14:00 14:10 14:20 14:30 14:40 14:50 15:00 15:10 15:20 15:30

Monitoring of a single sample XenApp session host showed that its processor reached a maximum of 87% during the test run.

RDSH Host VM CPU 100 90 80 70 60 50 40 CPU 30 20 10

0

13:16 12:36 12:46 12:56 13:06 13:26 13:36 13:46 13:56 14:06 14:16 14:26 14:36 14:46 14:56 15:06 15:16 15:26 15:36

The Nutanix Cluster IOPS figures taken from the Nutanix Web Console showed that the cluster reached a maximum of 1,297 Disk IOPS during the test run with the Management node having the lowest disk IOPS of the three nodes. These results are not indicative of the total disk performance potential of the platform.

A Dell Appliance Architecture

A Dell Appliance Architecture

The Login VSI Max user experience score for this test indicates that the VSI Max score not reached so there should is little evidence of deterioration of user experience during testing.

Notes:

 As indicated above, the CPU graphs do not take into account the extra 10% of CPU resources available through the 2690v3’s turbo feature.

 Subjective user experience showed mouse movement and window response times when clicking within a running session during steady state were good. Video playback quality was good.

 User login times were consistent up until the final few sessions were launching when a few session took some extra time to login. 512 GB of memory was installed on each node in this test to demonstrate the recommended 256-384GB is more than enough for this configuration.

6.5 XC630-B7 Test Results for Citrix XenDesktop Validation for this project was completed for Citrix XenDesktop and XenApp 7.6 on the following platforms.

ESXi 5.5 U2 / Citrix XenDesktop 7.6 / XC630

ESXi 5.5 U2 / RDSH brokered by XenDesktop 7.6 / XC630

Platform configurations are shown below and the Login VSI workloads used for load testing on each environment.

A Dell Appliance Architecture

Enterprise Platform RAID CPU Memory HD Config Network Platform Config Ctlr

XC630 B7 E5-2690v3 512GB H730P 1 X 64GB 2 x 10Gb, 2 x (12Core, SATADOM (CVM/ 1Gb SFP+/ 2.6GHz) HV) BT

2 x 400GB, Intel S3700, SSD‘s 2.5” (T1)

4 x 1TB NL SAS 2.5” (T2)

1GB networking was used for the deployment of the Bensen XC appliances while all 10GB networking was used for the PAAC testing.

Compute and Management resources were split out with the following configuration across a three node Nutanix cluster and all test runs were completed with this configuration.

 Node 1 – XC630– Dedicated Management (vCenter Appliance, SQL Server, XenDesktop Delivery Controller, XenDesktop Storefront.)  Node 2 – XC630– Dedicated Compute  Node 3 – XC630 – Dedicated Compute

The virtual machines were non-persistent linked clone desktops each configured on Windows 8.1 aligning with the Login VSI 4.X virtual machine configuration, Office 2010 was used with each Virtual Machine sized at 32 GB. User Workload configuration of the load generation virtual machines is shown in the table below.

OS Bit HD Size User Workload vCPUs Memory Level GB

Professional User 2 4GB x64 32

The RDSH environment was configured as shown in the table below for each host according to platform. The RDSH images were built using Windows Server 2012 R2 Standard edition and Microsoft Office 2010 plus the support tools required for Login VSI testing.

A Dell Appliance Architecture

VMs Disk Role Config per vCPU’s RAM (GB) NIC Size GB host

RDSH VM B7 5 8 32 1 80

Test Results Validation was performed using CCC standard testing methodology using LoginVSI 4 load generation tool for VDI benchmarking that simulates production user workloads.

Each test run adhered to PAAC best practices with a 30 second session logon interval, 16 minutes of steady state after which sessions would begin logging off.

The following table summarizes the test results for the various workloads and configurations.

Avg Avg Memory Avg Avg Avg Net Hyper Density Provisioning Workload CPU Consumed Memory IOPS/Us Kbps/Use -visor Per Host % GB Active GB er r

ESXi XenDesktop Professio 100 99% 236 GB 48 GB 5.2 771 Kbps MCS nal

ESXi XenDesktop Professio 145 97% 80GB 22 GB 2.6 325 Kbps MCS RDSH nal

CPU Utilization* CPU % for ESX Hosts was adjusted to account for the fact that on Intel E5-2690v3 series processors the ESX host CPU metrics will exceed the rated 100% for the host if Turbo Boost is enabled (by default). The Adjusted CPU % Usage is based on 100% usage and but is not reflected in the charts. The figure shown in the table is the Compute host steady state peak CPU Usage. One Nutanix CVM Controller virtual machine is located on each node and is allocated 10000MHz of CPU reservation.

Memory Utilization. The figure shown in the table above is the average memory consumed per Compute host over the recorded test period. Active is the average active memory per Compute host over the recorded test period.

The IOPS results are calculated from the average Nutanix Cluster Disk IOPS figure over the test period divided by the number of users.

A Dell Appliance Architecture

Network Utilization The figure shown in the table is the average Kbps/User per Compute host over the recorded test period.

Summary of Tests Run

The ‘Number of Users’ column indicates the total number of user desktop test sessions run across two Compute hosts.

Platform Number Desktop Login VSI Configuration of Users Type Workload

B7 High 200 Linked Heavy Clone

B7 High 290 RDSH Heavy

Results summary

Bensen B7 Configuration, ESXi/XenDesktop MCS, 200 User Test, Heavy Workload

The architecture for this solution includes a dedicated management host and two compute hosts. Each of the compute hosts was populated with 100 full virtual machines and one Nutanix CVM per host. The Nutanix CVM’s took up approximately 14% of the compute hosts CPU Usage at the start of the test run. This chart does not include the additional 10% of CPU available from the Turbo boost feature. Without the inclusion there is a total of 62380 MHz available for Desktops, with Turbo boost the total available MHz value is 68168 MHz. The below graph shows the performance data for 100 user sessions per host on a pair of Compute hosts. The CPU reaches a steady state peak of 99% during the test cycle when approximately 100 users are logged on to each compute host. The Management host in the cluster runs the vSphere and XenDesktop management virtual machines and a Nutanix CVM virtual machine. Its CPU utilization is significantly lower than the compute hosts in the cluster. The CPU utilization for the management host does not exceed 15% at any point in the test cycle.

A Dell Appliance Architecture

CPU Utilization 100 90 80 70

60 Management 50 Compute A 40 Compute B 30 CPU Threshold 85% 20 10

0

19:05 20:25 18:45 18:55 19:15 19:25 19:35 19:45 19:55 20:05 20:15 20:35 20:45 18:35

In regards to memory consumption for the cluster, out of a total of 512 GB available memory per node there were no constraints for any of the hosts. The Compute Hosts reached a max memory consumption of 313 GB with active memory usage reaching a max of 90GB. There was no memory ballooning or swapping.

Consumed Memory 350 300 250

200 Management 150 Compute A 100 Compute B 50

0

20:35 18:45 18:55 19:05 19:15 19:25 19:35 19:45 19:55 20:05 20:15 20:25 20:45 18:35

A Dell Appliance Architecture

Active Memory 100 90 80 70 60 Management 50 40 Compute A 30 Compute B 20 10

0

20:35 18:45 18:55 19:05 19:15 19:25 19:35 19:45 19:55 20:05 20:15 20:25 20:45 18:35

Network bandwidth is not an issue on this solution with a steady state peak of approximately 146,000 Kbps.

Network Kbps 160000 140000 120000 100000 Management 80000 Compute A 60000 40000 Compute B 20000

0

18:45 18:55 19:05 19:15 19:25 19:35 19:45 19:55 20:05 20:15 20:25 20:35 20:45 18:35

The Nutanix Cluster IOPS figures taken from the Nutanix Web Console showed that the cluster reached a maximum of 1,532 Disk IOPS during the test run with the Management node having the lowest disk IOPS of the three nodes. The activity spike in the graph after steady state peak shows the user log off’ s taking place at the end of the test cycle.

A Dell Appliance Architecture

The Login VSI Max user experience score for this test indicates that the VSI Max score not reached so there should have been little deterioration of user experience during testing.

A Dell Appliance Architecture

Notes:

 As indicated above, the CPU graphs do not take into account the extra 10% of CPU resources available through the 2690v3’s turbo feature.  Subjective user experience showed mouse movement and window response times when clicking within a running session during steady state were good. Video playback while somewhat slow was still acceptable.  User login times were consistent up until the final few sessions were launching when a few session took some extra time to login.  512 GB of memory installed on each node is more than enough for this configuration and should run equally well with less memory installed.

XenApp, 290 Sessions, User Test, Heavy Workload

The architecture for this solution includes a dedicated management host and two compute hosts. Each of the compute hosts was populated with 5 RDS session host virtual machines running Windows Server 2012 R2 and one Nutanix CVM per host. Each session host was allocated 32GB of RAM and 8vCPU’s. The Nutanix CVM’s took up approximately 4% of the compute hosts CPU Usage at the start of the test run. This chart does not include the additional 10% of CPU available from the Turbo boost feature. Without the inclusion there is a total of 62380 MHz available for Desktops, with Turbo boost the total available MHz value is 68168 MHz. The below graph shows the performance data for 145 user sessions per host on a pair of Compute hosts. The CPU reaches a steady state peak of 97% during the test cycle when approximately 145 users are logged on to each compute host.

A Dell Appliance Architecture

The Management host in the cluster runs the vSphere and XenDesktop management virtual machines and a Nutanix CVM virtual machine. Its CPU utilization is significantly lower than the compute hosts in the cluster. The CPU utilization for the management host does not exceed 11% at any point in the test cycle. CPU Utilization 100 90 80 70

60 Management 50 Compute A 40 Compute B 30 CPU Threshold 85% 20 10

0

15:20 12:40 12:50 13:00 13:10 13:20 13:30 13:40 13:50 14:00 14:10 14:20 14:30 14:40 14:50 15:00 15:10 15:30

In regards to memory consumption for the cluster, out of a total of 512 GB available memory per node there were no constraints for any of the hosts. The Compute Hosts reached a max memory consumption of 109 GB with active memory usage reaching a max of 50GB. There was no memory ballooning or swapping.

Consumed Memory 120

100

80

60 Management Compute A 40 Compute B 20

0

15:10 12:50 13:00 13:10 13:20 13:30 13:40 13:50 14:00 14:10 14:20 14:30 14:40 14:50 15:00 15:20 15:30 12:40

A Dell Appliance Architecture

Active Memory 60

50

40 Management 30 Compute A 20 Compute B 10

0

12:40 15:30 13:00 13:10 13:20 13:30 13:40 13:50 14:00 14:10 14:20 14:30 14:40 14:50 15:00 15:10 15:20 12:50

Network bandwidth is not an issue on this solution with a steady state peak of approximately 79,000 Kbps.

A Dell Appliance Architecture

Network Kbps 90000 80000 70000 60000 50000 Management 40000 Compute A 30000 Compute B 20000 10000

0

13:20 14:30 12:50 13:00 13:10 13:30 13:40 13:50 14:00 14:10 14:20 14:40 14:50 15:00 15:10 15:20 15:30 12:40

Monitoring of a single, sample RDSH session host showed that its processor reached a maximum of 87% during the test run.

RDSH Host VM CPU 100 90 80 70 60 50 40 CPU 30 20 10

0

12:46 15:36 12:56 13:06 13:16 13:26 13:36 13:46 13:56 14:06 14:16 14:26 14:36 14:46 14:56 15:06 15:16 15:26 12:36

The Nutanix Cluster IOPS figures taken from the Nutanix Web Console showed that the cluster reached a maximum of 1,297 Disk IOPS during the test run with the Management node having the lowest disk IOPS of the three nodes.

A Dell Appliance Architecture

The Login VSI Max user experience score for this test indicates that the VSI Max score not reached so there should have been little deterioration of user experience during testing.

A Dell Appliance Architecture

Notes:

 As indicated above, the CPU graphs do not take into account the extra 10% of CPU resources available through the 2690v3’s turbo feature.  Subjective user experience showed mouse movement and window response times when clicking within a running session during steady state were good. Video playback quality was good.  User login times were consistent up until the final few sessions were launching when a few session took some extra time to login.  512 GB of memory installed on each node is more than enough for this configuration and should run equally well with less memory installed.

A Dell Appliance Architecture

Acknowledgements

Thank you to the Nutanix Technical Marketing team for the detail presented in section 2.2 of this document.

A Dell Appliance Architecture

About the Authors

Peter Fine is the Sr. Principal Solutions Architect for enterprise VDI solutions at Dell. Peter has extensive experience and expertise on the broader Microsoft, Citrix and VMware solutions software stacks as well as in enterprise virtualization, storage, networking and enterprise data center design.

Andrew Breedy is a Solution Engineer working in the Dell-Wyse Datacenter engineering group. Andrew has extensive experience in server hardware and on Microsoft and VMWare virtualization solutions as well as a background in engineering test processes and performance analysis and characterization.

Kristina Bako is a Sr. Systems Engineer and has been with Dell for just over two years and is part of the Dell Wyse Datacenter (DWD) Solution Engineering Group at Dell. Before joining Dell she has worked over 10 years as an Applications Engineer in Video Technology related areas. She holds a Bachelor’s degree in Audio/Video Engineering from the University of Salford and a Master’s degree in Telecommunications from Dublin City University.

Rick Biedler is the Solutions Development Manager for Datacenter appliances at Dell, managing the development and delivery of Enterprise class Desktop virtualization solutions based on Dell datacenter components and core virtualization platforms.

Manish Chacko is the Sr. Technical Marketing Advisor for Citrix-based solutions at Dell. Before writing about technology, Manish has spent time designing, implementing and supporting technology- in IT, Systems Engineering & Network Performance & Monitoring. Manish has been a long-time Dell customer & Advocate before becoming a Dell employee.

A Dell Appliance Architecture