Linux on IBM Z and LinuxONE How to troubleshoot

July 16, 2020 —

Sa Liu on IBM Z and LinuxONE Service & Support Trademarks

The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. CICS* Global Business Services* MQ* SPSS* XIV* z/VSE* Cognos* IBM* Parallel Sysplex* System Storage* zEnterprise* DataStage* IBM (logo)* QualityStage System x* z/OS* DB2* InfoSphere Rational* Tivoli* z Systems* GDPS Maximo* Smarter Cities WebSphere* z/VM* * Registered trademarks of IBM Corporation

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a Registered Trade Mark of AXELOS Limited. ITIL is a Registered Trade Mark of AXELOS Limited. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. UNIX is a registered trademark of The Open Group in the United States and other countries. VMware, the VMware logo, VMware Cloud Foundation, VMware Cloud Foundation Service, VMware vCenter Server, and VMware vSphere are registered trademarks or trademarks of VMware, Inc. or its subsidiaries in the United States and/or other jurisdictions. Other product and service names might be trademarks of IBM or other companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non- IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography. This information provides only general descriptions of the types and portions of workloads that are eligible for execution on Specialty Engines (e.g, zIIPs, zAAPs, and IFLs) ("SEs"). IBM authorizes customers to use IBM SE only to execute the processing of Eligible Workloads of specific Programs expressly authorized by IBM as specified in the “Authorized Use Table for IBM Machines” provided at www.ibm.com/systems/support/machine_warranties/machine_code/aut.html (“AUT”). No other workload processing1 is authorized for execution on an SE. IBM offers SE at a lower price than General Processors/Central Processors because customers are authorized to use SEs only to process certain types and/or amounts of workloads as specified by IBM in the AUT. INTRODUCTION DATA STORAGE NETWORK PERFORMANCE CUSTOMER COLLECTION TROUBLESHOOTING TROUBLESHOOTING TROUBLESHOOTING CASES

Agenda

2 Linux on Z troubleshooting

§ Understand the problem § What are the symptoms of the problem? § Where / When / Under which condition does the problem occur? § Can the problem be reproduced? § What do you need to do? § Collect data before recovery § Collect data right after the problem occurs § Collect data from a healthy system § Keep track of the system setup and the latest changes

3 Linux on Z troubleshooting

§ How to do it? § Get the system prepared for data collection § Install packages for Linux tools (s390-tools, sysstat, …) § Enable Linux / have disks ready for standalone dump § Enable regular system activity monitoring § Learn to use the tools for data analysis

4 § System data § Performance data Data Collection § System dump

5 Collect System Data

§ dbginfo.sh § sosreport - RHEL § supportconfig - SLES § sosreport - Ubuntu

6 Collect System Data

§ dbginfo.sh – a script that collects data for debugging Linux on Z (requires root authority ) § It collects:

§ System information and generic configuration data Run before reboot! § List of devices and their configurations § System logs / trace data Distribution Package name § s390 debug buffer RHEL s390util § z/VM or KVM basic data SLES s390-tools (if Linux runs under z/VM or KVM) Ubuntu s390-tools

7 dbginfo.sh output root@system: # dbginfo.sh dbginfo.sh: Debug information script version 2.11.0-7.27 Copyright IBM Corp. 2002, 2018

Hardware platform = s390x Kernel version = 5.3.18 (5.3.18-22-default) Runtime environment = LPAR root@system # cd /tmp/DBGINFO-2020-07-15-08-44-18-test-1CA1E7/ s8315028:/tmp/DBGINFO-2020-07-15-08-44-18-test-1CA1E7 1 of 13: Collecting command output root@system # ls -t 2 of 13: Collecting z/VM command dbginfo.log etc .tgz 3 of 13: Collecting osa_oat_eth3.raw usr proc 4 of 13: Collecting sysfs osa_oat.out lib zvm_runtime.out 5 of 13: Collecting log files osa_oat_eth1.raw boot runtime.out 6 of 13: Collecting config files osa_oat_eth2.raw run journalctl.out 7 of 13: Collecting osa oat output osa_oat_eth0.raw var 8 of 13: Collecting ethtool output 9 of 13: Collecting OpenVSwitch output skipped 10 of 13: Collecting domain xml files skipped 11a of 13: Collecting docker container output skipped Linux commands output z/VM commands output 11b of 13: Collecting docker network output skipped 12 of 13: Collecting nvme output 13 of 13: Postprocessing

Finalizing: Creating archive with collected data Make sure you have enough disk space under /tmp Collected data was saved to: >> /tmp/DBGINFO-2020-07-15-08-44-18-test-1CA1E7.tgz << Use “dbginfo.sh -d ” to specify another location for the tarball Review the collected data before sending to your service 8 organization. Collect Performance Data § sadc – system activity data collector § perf – performance analysis tool § iostat – monitors I/O device load and the CPU utilization § dasdstat – display DASD performance data § ziomon / ziorep – collect FCP performance data and generate reports § z/VM MONWRITE – collects CP *MONITOR data § hyptop – dynamic real-time view of

For more details refer to the book Troubleshooting Guide 9 Collect Performance Data

§ sadc – system activity data collector (sysstat package) root@system: #/usr/lib64/sa/sadc -S XALL 10 sadc_output § Collect all counters at 10 second interval to a binary output file § sar – system activity report root@system: # sar -A -f sadc_output > sar_output

§ Convert the binary output file to a plain text report § Start sysstat service as a permanent service (recommended) § sadc default configuration at 10-minute intervals (/var/log/sa/) § Under z/VM, collect MONWRITE data in the same time period and same interval as sadc data!

10 Collect Performance Data § kSar – graphical sar analysis tool

11 Collect Performance Data

§ Recommended data collection process

§ 1) run dbginfo.sh # /sbin/dbginfo.sh § 2) start sadc at 5 second interval

# /usr/lib64/sa/sadc -S XALL 5 /tmp/server_sadc.out &

§ 3) run the test with workload § 4) stop the sadc with # killall sadc

§ 5) run dbginfo again # /sbin/dbginfo.sh § 6) convert sadc output file to a report # sar -A -f /tmp/server_sadc.out > server_sar

12 Collect System Dump

For more details refer to the book Using the Dump Tools 13 kdump – Kernel Crash Dump

§ To boot a new instance of the kernel in a pre- reserved memory section § To copy the existing memory untouched to storage or via network § tool § crashkernel=auto ( minimum amount of memory required: 4GB)

14 Configure the kdump on RHEL § Install kdump service # yum install kexec-tools § Configuring kdump memory usage, edit the kernel command line parameters in /etc/zipl.conf crashkernel=auto

§ Configuring the kdump target and the core collector by editing the /etc/kdump.conf

ext4 /dev/mapper/vg00-varcrashvol path /var/crash core_collector makedumpfile - --message-level 1 -d 31 § dracut issue with RHEL 8 kdump to SCSI disk § Workaround: add zfcp.allow_lun_scan=0 to the kernel parameter

15 Configure the kdump on SLES § Required packages § kexec-tools § makedumpfile § yast2-kdump § kdump configuration in /etc/sysconfig/kdump, add kernel parameters to KDUMP_CMDLINE_APPEND= § To ignore devices currently not in use with cio_ignore and lower the required amount of memory for crashkernel § zfcp.allow_lun_scan=0 is default in the kdump command line

16 Configure the kdump on Ubuntu

§ kdump enabled by default (16.04 and later) § Installation # apt install linux−crashdump

§ Configuration in the /etc/default/kdump−tools file § Using kdump-config command to configure kdump, check status, or save a vmcore file

17 § DASD (Direct Access Storage Device) Storage § zFCP/SCSI (Small Computer System Interface) Troubleshooting § LVM (Logical Volume Manager)

18 DASD

§ Check the status of DASDs § lscss – list channel subsystem devices § lsdasd – list channel attached DASDs # lscss Device Subchan. DevType CU Type Use PIM PAM POM CHPIDs ------0.0.eaae 0.0.000f 3390/0c 3990/e9 yes f0 f0 ff 34353233 00000000 0.0.eaaf 0.0.0021 3390/0c 3990/e9 yes f0 f0 ff 34353233 00000000 0.0.eab0 0.0.0022 3390/0c 3990/e9 yes f0 f0 ff 34353233 00000000

# lsdasd Bus-ID Status Name Device Type BlkSz Size Blocks ======0.0.eaae active dasda 94:0 ECKD 4096 21129MB 5409180 0.0.eaaf active dasdb 94:4 ECKD 4096 21129MB 5409180 0.0.eab0 n/f dasdc 94:8 ECKD

§ Device 0.0.eab0 is online, but not active

19 DASD

§ dasdfmt – format ECKD type DASD § fdasd – partitioning tool § dasdview – display DASD and VTOC information

# dasdview -t info /dev/dasdc --- VTOC info ------The VTOC contains: # dasdview -s 16 /dev/dasdc1 format 1 label(s) +------1 format 4 label(s)+------+------+ | HEXADECIMAL 1 format 5 label(s) | EBCDIC | ASCII | | 01....04 05....08 09....121 format 13....16 7 label(s) | 1...... 16 | 1...... 16 | +------0 format 8 label(s)+------+------+ | C9D7D3F1 000A0000 0000000F0 format 03000000 9 label(s) | IPL1...... | ????...... | +------Other S/390 and zSeries+------operating systems+------would see the following+ data sets: +------+------+------+ | data set | start | end | +------+------+------+ | LINUX.V0XEAB0.PART0001.NATIVE | trk | trk | | data set serial number : '0XEAB0' | 2 | 450764 | | system code : 'IBM LINUX ' | cyl/trk | cyl/trk | | creation date : year 2019, day 137 | 0/ 2 | 30050/ 14 | +------+------+------+ 20 SCSI over zFCP § Check the status of zFCP and SCSI § lszfcp – list information about zfcp adapters, ports, and units § lsscsi – list SCSI devices § ziorep_config – configuration report of the ziomon framework § -A adaptor # ziorep_config –A

Host: host0 Host: host1 CHPID: 60 CHPID: 61 Adapter: 0.0.191c Adapter: 0.0.195c Sub-Ch.: 0.0.001f Sub-Ch.: 0.0.0020 Name: 0xc05076ffd68018c0 Name: 0xc05076ffd6801f30 P-Name: 0xc05076ffd6801981 P-Name: 0xc05076ffd6801991 Version: 0x0007 Version: 0x0007 LIC: 0x00001716 LIC: 0x00001716 Type: NPIV VPORT Type: NPIV VPORT Speed: 16 Gbit Speed: 16 Gbit State: Online State: Online

21 SCSI over zFCP § ziorep_config § -D device # ziorep_config -D 0.0.191c 0x50050763070845e3 0x4082402a00000000 host0 /dev/sg0 /dev/sda 8:0 Disk 2107900 IBM 0:0:0:1076510850 0.0.191c 0x50050763070845e3 0x4083402a00000000 host0 /dev/sg1 /dev/sdb 8:16 Disk 2107900 IBM 0:0:0:1076510851 0.0.191c 0x50050763070845e3 0x4084402a00000000 host0 /dev/sg2 /dev/sdc 8:32 Disk 2107900 IBM 0:0:0:1076510852 0.0.191c 0x50050763070845e3 0x4085402a00000000 host0 /dev/sg3 /dev/sdd 8:48 Disk 2107900 IBM 0:0:0:1076510853 0.0.195c 0x50050763071845e3 0x4082402a00000000 host1 /dev/sg4 /dev/sde 8:64 Disk 2107900 IBM 1:0:0:1076510850 0.0.195c 0x50050763071845e3 0x4083402a00000000 host1 /dev/sg5 /dev/sdf 8:80 Disk 2107900 IBM 1:0:0:1076510851 0.0.195c 0x50050763071845e3 0x4084402a00000000 host1 /dev/sg6 /dev/sdg 8:96 Disk 2107900 IBM 1:0:0:1076510852 0.0.195c 0x50050763071845e3 0x4085402a00000000 host1 /dev/sg7 /dev/sdh 8:112 Disk 2107900 IBM 1:0:0:1076510853

§ -M mapper # ziorep_config -M 0.0.191c 0x50050763070845e3 /dev/sda /dev/mapper/mpatha 0.0.195c 0x50050763071845e3 /dev/sde /dev/mapper/mpatha 0.0.191c 0x50050763070845e3 /dev/sdc /dev/mapper/mpathb 0.0.195c 0x50050763071845e3 /dev/sdg /dev/mapper/mpathb 0.0.191c 0x50050763070845e3 /dev/sdb /dev/mapper/mpathc 0.0.195c 0x50050763071845e3 /dev/sdf /dev/mapper/mpathc 0.0.191c 0x50050763070845e3 /dev/sdd /dev/mapper/mpathd 0.0.195c 0x50050763071845e3 /dev/sdh /dev/mapper/mpathd For more details refer to the presentation: FCP with Linux on IBM Z 22 and LinuxONE: SCSI over Fibre Channel – Best Practices LVM § Check the status of LVM # dmsetup ls --tree § pvscan - scan all disks for physical volumes my_volgroup-LV2 (254:5) § - scan all disks for logical volumes |-mpathc (254:2) lvscan | |- (8:80) § vgscan - scan all disks for volume groups | `- (8:16) `-mpathd (254:3) § vgdisplay - display attributes of volume groups |- (8:112) `- (8:48) § dmsetup – low level logical volume my_volgroup-LV1 (254:4) management |-mpathc (254:2) | |- (8:80) § dmsetup ls –tree | `- (8:16) |-mpathb (254:1) § dmsetup table | |- (8:96) § dmsetup status | `- (8:32) `-mpatha (254:0) § /boot should NOT be LVM (normal partition) |- (8:64) `- (8:0)

23 § Network options on Linux on Z Network § QETH Troubleshooting § z/VM VSWITCH

24 Network options on Linux on Z

KVM z/VM LPAR LPAR LPAR

Linux1 Linux2 Linux3 Linux4 Linux5 Linux6 z/OS Linux8

virtio virtio NIC NIC

HiperSockets SMCD

ovswitch NIC NIC NIC NIC OSA IQD IQD

bond OSA RNIC RNIC VSWITCH GuestLAN OSA OSA

SMCR OSA OSA OSA Express OSA Express RoCE RoCE Express Express

25 qeth device driver

§ Supports § OSA Express § HiperSockets § GuestLAN § VSWITCH § Primary network driver for Linux on Z

26 Useful tools

§ General tools § ping § ip –s link § ss § traceroute § tcpdump § ethtool § net-tools-deprecated (ifconfig, netstat, route… ) à replaced by (SLES15)

§ Linux on Z specific tools § lscss – list channel subsystem devices § lsqeth – list qeth-based network devices § qetharp – querying and modifying ARP data (only layer 3 devices) § qethqoat – querying the OSA address table § znetconf -- list and configure network devices (in fly) § lszdev / chzdev – display or configure Z specific devices (persistent)

27 Commands example # lscss #lsqeth # lszdev Device SubchanDevice .name TYPE DevType CU Type ID Use : PIMeth0 PAM POM CHPIDs ON PERS NAMES ------dasd-eckd 0.0.eaae yes no dasda 0.0.b130 0.0.0019 card_typedasd 1732/01-eckd 1731/010.0.eaaf yes : 80OSD_1000 80 ff 8a 000000 00000000 yes no dasdb 0.0.b131 0.0.001a cdev0dasd 1732/01-eckd 1731/010.0.eab0 yes : 800.0.b130 80 ff 8a000000 00000000 yes no dasdc 0.0.b132 0.0.001b cdev1zfcp 1732/01- host 1731/01 0.0.191c yes : 800.0.b131 80 ff 8a000000 00000000 yes no 0.0.b0e0# ip -s 0.0.001clink show cdev2zfcp 1732/01dev- host eth0 1731/01 0.0.195c yes : 800.0.b132 80 ff 89 000000 00000000 yes no 0.0.b0e116: eth0: 0.0.001d chpidzfcp 1732/01-lun 1731/010.0.191c:0x50050763070845e3:0x4082402a00000000 yes : 808A 80 ff mtu 890000001500 qdisc 00000000pfifo_fast master yes nobond0 sdastatesg0 UP 0.0.b0e2mode DEFAULT 0.0.001e group onlinezfcp 1732/01 default-lun 1731/01 qlen 0.0.191c:0x50050763070845e3:0x4083402a00000000 1000 yes : 801 80 ff 89000000 00000000 yes no sdb sg1 link/ether 02:00:00:d4:29:02portnamezfcp-lun 0.0.191c:0x50050763070845e3:0x4084402a00000000 brd :ff:ff:ff:ff:ff:ff no portname required yes no sdc sg2 RX: bytes packetsportnozfcp-lun errors0.0.191c:0x50050763070845e3:0x4085402a00000000 dropped: 0 overrun mcast yes no sdd sg3 # znetconf -c 11184566 40204statezfcp - lun 0 0.0.195c:0x50050763071845e3:0x4082402a00000000 24 : UP0 (LAN ONLINE) 0 yes no sde sg4 Device IDs Type Card Type CHPID Drv. Name State TX: bytes packetspriority_queueingzfcp-lun errors0.0.195c:0x50050763071845e3:0x4083402a00000000 dropped: always carrier queue collsns 0 yes no sdf sg5 ------308 4buffer_count zfcp - lun 0 0.0.195c:0x50050763071845e3:0x4084402a00000000 0 : 64 0 0 yes no sdg sg6 0.0.b130,0.0.b131,0.0.b132layer2zfcp-lun 1731/01 0.0.195c:0x50050763071845e3:0x4085402a00000000 :OSD_1000 1 8A qeth eth0 online yes no sdh sg7 0.0.bdf0,0.0.bdf1,0.0.bdf2isolationqeth 1731/01 0.0.b0e0:0.0.b0e1:0.0.b0e2 :Virt.NIC none QDIO 02 qeth encbdf0 online yes no eth4 bridge_roleqeth 0.0.b130:0.0.b131:0.0.b132: none yes no eth0 # znetconf -r b130 bridge_stateqeth 0.0.b230:0.0.b231:0.0.b232: inactive yes no eth5 Remove network devicebridge_hostnotifyqeth 0.0.b130 (0.0.b130,0.0.b131,0.0.b132)?0.0.b2a0:0.0.b2a1:0.0.b2a2: 0 yes no eth6 Warning: this may affectbridge_reflect_promiscqeth network 0.0.bdf0:0.0.bdf1:0.0.bdf2connectivity!: none yes no encbdf0 Do you want to continueswitch_attrsqeth (y/n)?y 0.0.e030:0.0.e031:0.0.e032: unknown yes no eth1 Successfully removedgeneric device- ccw0.0.b1300.0.0009 (eth0) yes no

28 Configuration files and s390dbf

§ Network configuration files § /etc/sysconfig/network-scripts/ifcfg-* § /etc/sysconfig/network/ifcfg-* § /etc/netplan/*.yaml § s390dbf § /sys/kernel/debug/s390dbf/qdio_/ § /sys/kernel/debug/s390dbf/qeth_msg/ § /sys/kernel/debug/s390dbf/qeth_setup/

29 z/VM VSWITCH

§ Useful commands § QUERY VIRTUAL NIC – query virtual NIC § QUERY VIRTUAL OSA – display status of virtual OSA § QUERY VMLAN – determine the status of Guest LAN activity § QUERY VSWITCH DETAILS – show the details of VSWITCH

30 z/VM VSWITCH # vmcp query vswitch details VSWITCH SYSTEM VSW15G Type: QDIO Connected: 5 Maxconn: INFINITE PERSISTENT RESTRICTED ETHERNET Accounting: OFF USERBASED LOCAL VLAN Unaware ETHERNET: Layer2 MAC address: 02-46-0F-00-00-01 MAC Protection: Unspecified NOROUTER: Layer3 IPTimeout: 5 QueueStorage: 8 Isolation Status: OFF VEPA Status: OFF Uplink Port: State: Ready PMTUD setting: EXTERNAL PMTUD value: 9000 Trace Pages: 8 RDEV: BD03.P00 VDEV: 0600 Controller: DTCVSW2 ACTIVE Adapter ID: 3906000DA1E7.01B0 Acceptable maximum Uplink Port Connection: transmission size (MTU) RX Packets: 30138781 Discarded: 0 Errors: 0 TX Packets: 204692591 Discarded: 0 Errors: 0 RX Bytes: 12276525748 TX Bytes: 306684926983 VM Guest Device: 0600 Unit: 000 Role: DATA Port: 2049 Partner Switch Capabilities: No_Reflective_Relay Adapter Connections: Connected: 5 Adapter Owner: TEST0008 NIC: BDF0.P00 Name: HYD1G1 Type: QDIO RX Packets: 4585514 Discarded: 0 Errors: 0 TX Packets: 4654 Discarded: 0 Errors: 0 Virtual NIC RX Bytes: 309679608 TX Bytes: 306677 Device: BDF2 Unit: 002 Role: DATA Port: 2178 Options: Ethernet Broadcast Unicast MAC Addresses: 02-46-0F-00-00-08 IP: 172.18.73.8 Multicast MAC Addresses: 01-00-5E-00-00-01 33-33-00-00-00-01 33-33-FF-00-00-08 31 § Tuning hints and tips § Network tuning Performance Troubleshooting

32 Tuning hints and tips

For more details refer to IBM Knowledgecenter: Performance tuning 33 hints and tips Network tuning

§ General tuning parameters § buffer_count = 128 (default 64) – use chzdev to configure it persistently § MTU size 8992 if the application is able to send trunks > 1460 bytes § Set larger device transmission queue # ip link set txqueuelen 3000 § Enable RPS (Receive Packet Steering) # echo ff > /sys/class/net/eth0/queues/rx-0/rps_cpus § OSA recommendations § TCP Segmentation Offload (TSO) § Outbound (TX) checksumming § Scatter Gather (SG)

# ethtool -K tx on sg on tso on 34 § RHEL 8 installation § Channel bonding failover Customer § EP11 card missing master key Cases § Low network performance

35 Customer cases – RHEL 8 installation § Problem: network device can not be recognized during installation § Analysis: § Network configuration in the parmfile

ip=100.125.81.112::100.125.81.254:255.255.255.0:lxabc001:enccw0.0.0600:none

§ Look for the kernel message in journalctl

kernel: qeth 0.0.0600: MAC address 02:14:00:00:00:23 successfully registered on device eth0 kernel: qeth 0.0.0600: Device is a Virtual NIC QDIO card (level: V642) with link type Virt.NIC QDIO. kernel: qeth 0.0.0600 enc600: renamed from eth0

36 Customer case: RHEL 8 installation (cont‘d) § Reason: Predictable network device names prefix for device network type device type bus-ID e.g. en c 600 à it omits leading 0s!!! en=Ethernet ccw 0.0.0600

§ Solution: Adapt the parmfile configuration to the predictable network device name

ip=100.125.81.112::100.125.81.254:255.255.255.0:lxabc001:enc600:none

37 Customer case: Channel bonding failover

§ Problem: The customer setup a channel bond in mode balance-xor on RHEL 8.1 and tested failover using Network Manager command § Analysis: nmcli connection down NIC command dis-enslaves the bond and failover to the other slave does not work # cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bond0 Bonding Mode: adaptive load balancing Primary Slave: None Currently Active Slave: eth2 eth1 eth2 MII Status: up MII Polling Interval (ms): 1000 Up Delay (ms): 1000 Down Delay (ms): 1000 # nmcli connection down eth1 Slave Interface: eth2 MII Status: up Speed: 1000 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 02:a2:0f:00:00:21 38 Slave queue ID: 0 Customer case: Channel bonding failover (cont‘d) # cat /proc/net/bonding/bond0 Bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: adaptive load balancing eth1 eth2 Primary Slave: None Currently Active Slave: eth2 MII Status: up MII Polling Interval (ms): 1000 Up Delay (ms): 1000 # ip link set dev eth1 down Down Delay (ms): 1000

Slave Interface: eth2 MII Status: up Speed: 1000 Mbps § Solutions : Duplex: full Link Failure Count: 0 To use ip link set dev NIC Permanent HW addr: 02:a2:0f:00:00:21 Slave queue ID: 0 down command to simulate a link Slave Interface: eth1 failure MII Status: down Speed: 1000 Mbps Duplex: full Link Failure Count: 1

Permanent HW addr: 02:a2:0f:00:00:25 39 Slave queue ID: 0 Customer case: EP11 card missing master key § Problem: The customer setup two EP11 domains active on an Ubuntu system. When executing the pkcsconf command to check the token, only every second time the master key is recognized.

§ Analysis: The syslog indicates with error message: no master key set.

EP11 EP11

§ Solutions: Set both EP11 domains with the same master key

40 Customer case: Low network performance § Problem: The customer had steaming data transfer between two Linux on Z servers (SLES12 SP3) with 500 km. The bandwidth is 650 Mb/s. With a single connection, the channel is loaded on average at 250 Mb/s. § Analysis: Network performance measurement and analysis with iperf3 § Solutions : System upgrade to SLES12 SP4 with network tuning § TCP Segmentation Offload (TSO)

# ethtool -K NIC_NAME tx on sg on tso on § Improved throughput to 390 Mbit/s

§ TCP congestion control: BBR (not a general recommendation, depends on workload, better throughput with large queue size ) net.ipv4.tcp_congestion_control = bbr § Improved throughput to ~500 Mbit/s 41 42

Sa Liu Schoenaicher Strasse 220 D-71032 Boeblingen Certified Technical Specialist Mail: Postfach 1380 Linux on IBM Z and LinuxONE D-71003 Boeblingen Service & Support Phone (+49)-7031-16-3104 IBM Systems [email protected]

Questions? § Troubleshooting Guide https://www.ibm.com/support/knowledgecenter/linuxonib m/liaaf/lnz_r_svcnt.html § Using the Dump Tools https://www.ibm.com/support/knowledgecenter/linuxonib m/liaaf/dumptools_container.html § Virtual Server Management References https://www.ibm.com/support/knowledgecenter/en/linuxon ibm/liaaf/lnz_r_va.html § How to use FC-attached SCSI devices with Linux on System z https://www.ibm.com/support/knowledgecenter/linuxonib m/liaaf/lnz_r_ts.html § SCSI over Fibre Channel – Best Practices http://public.dhe.ibm.com/software/dw/linux390/lvc/zFCP_ Best_Practices-BB-Webcast_201805.pdf?cm_sp=dw-dwtv-_- linuxonz-_-presentation-PDF

43