Linux on IBM Z Networking: OSA-Express and Roce Express Side by Side — Stefan Raspl Linux on IBM Z Development
Total Page:16
File Type:pdf, Size:1020Kb
Linux on IBM Z Networking: OSA-Express and RoCE Express Side by Side — Stefan Raspl Linux on IBM Z Development IBM Z / © 2018 IBM Corporation Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. AIX* DB2* HiperSockets* MQSeries* PowerHA* RMF System z* zEnterprise* z/VM* BladeCenter* DFSMS HyperSwap NetView* PR/SM Smarter Planet* System z10* z10 z/VSE* CICS* EASY Tier IMS OMEGAMON* PureSystems Storwize* Tivoli* z10 EC Cognos* FICON* InfiniBand* Parallel Sysplex* Rational* System Storage* WebSphere* z/OS* DataPower* GDPS* Lotus* POWER7* RACF* System x* XIV* * Registered trademarks of IBM Corporation The following are trademarks or registered trademarks of other companies. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. Java and all Java based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. OpenStack is a trademark of OpenStack LLC. The OpenStack trademark policy is available on the OpenStack website. TEALEAF is a registered trademark of Tealeaf, an IBM Company. Windows Server and the Windows logo are trademarks of the Microsoft group of countries. Worklight is a trademark or registered trademark of Worklight, an IBM Company. UNIX is a registered trademark of The Open Group in the United States and other countries. * Other product and service names might be trademarks of IBM or other companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography. This information provides only general descriptions of the types and portions of workloads that are eligible for execution on Specialty Engines (e.g, zIIPs, zAAPs, and IFLs) ("SEs"). IBM authorizes customers to use IBM SE only to execute the processing of Eligible Workloads of specific Programs expressly authorized by IBM as specified in the “Authorized Use Table for IBM Machines” provided at www.ibm.com/systems/support/machine_warranties/machine_code/aut.html (“AUT”). No other workload processing is authorized for execution on an SE. IBM offers SE at a lower price than General Processors/Central Processors because customers are authorized to use SEs only to process certain types and/or amounts of workloads as specified by IBM in the AUT. IBM Z / © 2018 IBM Corporation 2 Agenda . The Cards – Models & Features – Virtualization Capabilities . Device Drivers, Features and Commands . Usage . Performance . Summary . References IBM Z / © 2018 IBM Corporation 3 The Cards OSA-Express RoCE Express RoCE Express2 10GbE OSA-Express6S 10GbE IBM Z / © 2018 IBM Corporation 4 Models OSA-Express RoCE Express . 1, 10 and 25GbE models with varying HW features: . Introduced with zEC12 for SMC-R – 1GbE: Base-T or fiber optics, 2 ports . 10 and 25GbE models, optical connectors only – 10 and 25GbE: Fiber only, 1 port . 25GbE model strictly requires 25GbE capable switch – no negotiation to 10GbE . 25GbE model strictly requires 25GbE capable switch – no negotiation to 10GbE . All models feature 2 ports . Considered platform’s native networking card . Fiber optics only . [1] . Supported by all operating systems on IBM Z TCP/IP or RoCE (RDMA over Converged Ethernet) . TCP/IP functionality exploited by Linux only . Supports TCP/IP[1] traffic only Feature z14 z13 zEC12 Feature z14 z13 zEC12 OSA-Express7S 25 GbE 25 GbE RoCE Express 2 10 GbE OSA-Express6S 10 GbE 1 GbE RoCE Express 10 GbE 10 GbE 10 GbE 1000Base-T OSA-Express5S 10 GbE 10 GbE 10 GbE 1 GbE 1 GbE 1 GbE 1000Base-T 1000Base-T 1000Base-T OSA-Express4S 10 GbE 10 GbE 10 GbE 1 GbE 1 GbE 1 GbE 1000Base-T 1000Base-T 1000Base-T 10 GbE OSA-Express3 1 GbE 1000Base-T IBM Z / © 2018 IBM Corporation 5 [1] Synonymous to any kind of “traditional” network traffic (UDP, SCTP, etc.) Models / Card Operating Modes OSA-Express RoCE Express . Multiple operating modes, configurable on a per-CHPID basis . Operating mode chosen by software – can be used . Covered in this presentation: in parallel – OSD: Queued Direct Input/Output (QDIO) – TCP/IP – OSE: Non-Queued Direct Input/Output – RDMA over Converged Ethernet (RoCE) . Other modes: – OSM: Required for DPM, connectivity to intra-node management network (INMN) – OSX: Connectivity to zEnterprise BladeCenter Extension . Supported up to z13 and OSA-Express5S only: – OSN: Open Systems Adapter for Network Control Program . Unsupported by Linux on Z: – OSC: OSA Integrated Console Controller IBM Z / © 2018 IBM Corporation 6 Card Features OSA-Express RoCE Express . Features (selection) . Features (selection) – HW offloads: Checksumming, TCP – HW offloads: Checksumming, TSO segmentation offload (TSO) – RDMA over Converged Ethernet (RoCE) – Layer 2 and layer 3 mode – Flow Control, Explicit Congestion Notification – VLAN, QoS, VIPA, ARP, et al – IPoIB, uDAPL, et al . RAS – VLAN, QoS, et al – Extended RAS . RAS – Concurrent firmware updates – Regular RAS 95+ percent completely concurrent – Changing optics of a single card disrupts entire PCHID – Firmware updates are disruptive IBM Z / © 2018 IBM Corporation 7 Virtualization OSA-Express RoCE Express . Up to 1,920 subchannels per card . Single Root I/O Virtualization (PCI SR-IOV) – 1,920 subchannels per card at one outbound queue . Virtual Functions (VFs) provide a limited subset of card – 480 subchannels per card at four outbound queues functionality . Physical function (PF) held by . 3 subchannels form a so-called CCW group device PCI Resource Group required per stack for OSD CHPIDs required for certain 160 to 640 stacks per card ⇒ functionalities, ⇒ including firmware . Each virtualized instance provides full functionality updates . RoCE Express: Up to 31 VFs per card(!), each VF handling both ports . RoCE Express2: Up to 63 VFs per port IBM Z / © 2018 IBM Corporation 8 Models / Ports and Plugging Rules OSA-Express RoCE Express Total Total #Ports Total #IP Stacks #Ports Total #IP Stacks Model #Cards #IP Stacks Model #Cards [2] #IP Stacks / Card #Ports / Card[1] / Card #Ports / Card / Machine / Machine z14 48 1-2 48-96 160-640 3,840-30,720 z14 8 2 16 31-126 328-1008 z14 ZR1 48 1-2 48-96 160-640 3,840-30,720 z14 ZR1 4 2 8 31-126 164-504 z13 48 1-2 48-96 160-640 3,840-30,720 z13 16 2 32 31 496 z13s 48 1-2 48-96 160-640 3,840-30,720 z13s 16 2 32 31 496 zEC12 48 1-2 48-96 160-640 3,840-30,720 zEC12 16 2 32 2 64 zBC12 48 1-2 48-96 160-640 3,840-30,720 zBC12 16 2 32 2 64 zBC12 w/ 16-32 1-2 16-64 160-640 1,280-20,480 OSA-Express3 [1] Depends on #outbound queues mode [2] Depends on card generation IBM Z / © 2018 IBM Corporation 9 Agenda . The Cards . Device Drivers, Features and Commands – Distro Support – Device Drivers – Tools . Usage . Performance . Summary . References IBM Z / © 2018 IBM Corporation 10 Distro Support OSA-Express RoCE Express . All OSA-Express models .