Application Centric Networking Troubleshooting 101 Install & Implementation of ACI
BRKACI-2333
Mike Frase Agenda • ACI • Technical Tools • Fabric Component Access • Discovery and Boot-up • Management & Tunnel Addressing • Faults/Events • Managed Objects • Health Scores • Upgrades
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 3 ACI – Level Set for Troubleshooting Next Generation Network Engineer Skillset
Programming Networking Bash Atomic Counters Routing Protocols JSON REST NVGRE DC Switching XML Northbound Git NXOS Southbound N9K VXLAN Python Multicast Object Opflex NFV Chef Model Puppet UCSD Open vSwitch Orchestration ACI VRF Automation Open- L4-L7 AVS Big Data stack Windows Databases Linux Hyper-V Containers Performance Vmware KVM Sysinternals Virtualization OS/Applications Linux
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 5 Spines APIC GUI CLI
Web CLI access to all APIC Leafs Object controllers and Browser Admin Spine & Leaf switches REST Python INBAND SDK
API APIC Cluster Tools
OOB
• APIC cluster is the distributed controller for managing all the policies and running state for ACI fabric and for interfacing with VM controllers and L4-L7 services boxes • It is a highly redundant cluster of linux based servers connected to Leaf switches on infra network • It is not in the control plane or datapath • Application networking needs are expressed in APIC as application-level policies through REST interface. • The policies are automatically pushed and applied to the network infrastructure via embedded policy elements
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 6 Technical Tools
7 7 Typical tools for ACI
Depending on level of troubleshooting and automation you wish to learn and engage with on ACI
• Putty or Terminal application with ability to SSH/telnet • Java Runtime Environment 1.6.0.45 for CIMC access • SCP/SFTP/TFTP server running locally on laptop (Windows: Pumpkin, FileZilla) • SCP/SFTP/TFTP client utility (Windows: WinSCP) • Google Chrome in Incognito Mode • POSTMAN app in Chrome • Text editor with JSON/XML editing features (e.g., indentation, formatting, validation, etc. Example: Sublime Text w/Indent XML/Pretty JSON plugins, Notepad++) • Wireshark • Python 2.7 (with IDE/editor highly preferred: e.g., PyCharm, Eclipse, Sublime Text) • ACI Cobra SDK (matching version of APIC installed) • vSphere Client for v5.0. 5.1, 5.5 • vSphere Web Plugin
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 8 Deployment options – Learning to use a REST client of choice
Postman from Goggle most popular and easy to navigate
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 9 API Inspector
. Captures the API calls (GET, DELETE, POST) made from the GUI to the APIC
. To access click on “welcome, admin” link – top right of GUI
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 10 API Inspector
• Ability to monitor and track all communications from APIC GUI to APIC
• You will see regular “GET” messages, each time the GUI refreshes information
• All regular API calls will be filtered as DEBUG
• Use cases shown in all beta test plans
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 11 API Inspector
• Each action you perform in the GUI will result in a POST API call which can be captured with the API Inspector
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 12 API Inspector
• Each action you perform in the GUI will result in a POST API call which can be captured with the API Inspector
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 13 API Inspector
• Each action you perform in the GUI will result in a POST API call which can be captured with the API Inspector
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 14 API Inspector
• Each action you perform in the GUI will result in a POST API call which can be captured with the API Inspector
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 15 API Inspector
• Each action you perform in the GUI will result in a POST API call which can be captured with the API Inspector
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 16 Uses with Visore
Italian for Viewer
• APIC Object Store Browser • Tool for verifying XML • Letting you browse the XML schema as if it were html. • This is like the real-time electronic version of the XML API book • Used extensively in Troubleshooting and verifying operations
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 17 UI Tools
Health Faults Audits Events
Statistics Call-home Syslogs SNMP
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 18 System Component Access CLI Available at the Switch
AAA via TACACS+, Radius and LDAP is supported when logging into switch CLI console. Configuration mode is not supported at switch console. There are two scenarios where administrators would log into switch console:
• From APIC, admin can remote login to switch console
• Login directly via serial console port on the switch front panel or SSH to management IP via out of band or inband
For majority of use cases, admin should utilize APIC.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 20 Switch Access to CLI via vsh
Directory Structure on switch
fab2-spine2# ls aci bootflash data dev isan lib mit proc sys usb var bin controller debug etc lc logflash mnt sbin tmp usr volatile
Enter NXOS shell fab2-spine2# vsh Cisco NX-OS Software Copyright (c) 2002-2020, Cisco Systems, Inc. All rights reserved. NX-OS/Titanium software ("NX-OS/Titanium Software") and related documentation, files or other reference materials ("Documentation")are the proprietary property and confidential information of Cisco Systems, Inc. ("Cisco") and are protected, without limitation, pursuant to United States and International copyright and trademark laws in the applicable jurisdiction which provide civil and criminal penalties for copying or distribution without Cisco's authorization.
It is also possible to execute VSH commands directly from iShell, using the syntax vsh -c
fab2_spine1# vsh -c "show version module 21“ ModNo Image Type SW Version SW Interim Version BIOS Version 21 SLC 11.0(1b) 11.0(1b)
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 21 Switch CLI Standard operations for the Cisco networking types . show commands
. standard debugs
. attach Mod commands
. Access to logs and supportability commands Example leaf101# sh lldp neighbors Capability codes: (R) Router, (B) Bridge, (T) Telephone, (C) DOCSIS Cable Device (W) WLAN Access Point, (P) Repeater, (S) Station, (O) Other
Device ID Local Intf Hold-time Capability Port ID apic1 Eth1/1 120 90e2.ba4b.fad4 apic3 Eth1/3 120 90e2.ba4d.0350 Services-UCS-A Eth1/7 120 B Eth1/2 spine201 Eth1/49 120 BR Eth7/1 spine202 Eth1/50 120 BR Eth8/1 Total entries displayed: 5
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 18 APIC CLI
. APIC CLI is a set of tools that support interactive and programmatic interface to the APIC
. Bash is the standard shell on the controller that provides interactive command line interface
. The APIC information model (mit) is presented to the user as a standard Unix file system
. Programmability is supported through standard scripting languages like bash scripts, Perl and python against the file system
. Bash completion hooks for APIC commands to support command completion and help
. Unix based security mapped to APIC RBAC
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 23 APIC ssh access (iShell) 3 uses • Direct Configuration You enter in at this directory • Shell Scripts /home/admin • Python API
SSH to the APIC admin/password
admin@fab2_apic1:~> ls aci debug mit admin@tsi-apic1:~> cd aci summary~ admin@tsi-apic1:aci> ls admin@fab2_apic1:~> admin fabric l4-l7-services system tenants vm-networking
admin@tsi-apic1:~> cd debug You start to see the admin@tsi-apic1:debug> ls same structure as leaf1 leaf2 spine2 tsi-apic1 the APIC GUI admin@tsi-apic1:~> cd mit admin@tsi-apic1:mit> ls comp dbgs expcont fwrepo topology uni
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 24 APIC iShell CLI
For help ESC ESC
admin@tsi-apic1:admin> config Show running configuration moprint MO display format controller Controller configuration moset Set mo properties diagnostics Display diagostics tests for equipment groups moshow Show mo for the given rn path dn Display the current dn mostats Show statistics command faults Display faults passwd Change user password firmware Add/List/Upgrade firmware records Display records health Display health info reload Reload a node loglevel Read/Write loglevels services l4-l7 services man Show man page help svcping Ping a service device moconfig Configuration commands techsupport Tech Support collection mocreate Create an Mo trafficmap Display the traffic map between two fabric nodes modelete Delete an Mo version Display version info mofind MO find
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 25 Monitoring Interfaces
Use Case Mode Filtering Destination Protocol Format Event Subscription Subscriber-based Class, Implicit HTTP+ websocket XML / JSON Subtree(dynamic) Syslog Server Policy-based Class (fault / event Syslog server host Syslog Protocol defined instances) name / IP address
Call Home Policy-based Class (*) (fault CallHome gw SMTP XML / JSON instances) hostname / IP address SNMP (switch only) Policy-based Class (**) (counters) SNMP collector SNMP Protocol defined host name / IP address
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 26 Discovery and Bootup ACI Fabric Initialization
ACI Fabric • ACI Fabric supports discovery, boot, inventory and systems maintenance processes via the APIC
Fabric Discovery is through LLDP and is done automatically and progresses as administrator registers the switches to join the fabric. Once a switch is registered, its LLDP neighbors are now visible for the admin APIC APIC APIC to approve for them to join the fabric.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 28 ACI Fabric Initialization
ACI Fabric • ACI Fabric supports discovery, APIC boot, inventory and systems maintenance processes via the APIC
Fabric Discovery is through LLDP and is done automatically and progresses as administrator registers the switches to join the fabric. Once a switch is registered, its LLDP neighbors are now visible for the admin to approve for them to join the fabric.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 29 Discovery Flow
The steps are:
1) LLDP Neighbor Discovered 2) Tunnel End Point IP address assigned to the node 3) Node software upgraded 4) Policy Element Intra-Fabic Messaging (IFM) Setup
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 30 Spine Bringup (1) Switch comes up in booting state – Don’t relay/switch packets – Send LLDP, DHCP resp packets to Sup – All ports in L3 mode Initial factory Config LLDP enabled on all ports – LLDP enabled on all ports – Only required features enabled
10.0.0.4 10.0.0.5
10.0.0.1
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 31 Spine Bringup (1) Switch comes up in booting state – Don’t relay/switch packets – Send LLDP, DHCP resp packets to Sup LLDP – All ports in L3 mode Port type = Edge-port – LLDP enabled on all ports System-id = …… – Only required features enabled Switch-type = T1 Spine (2) Detects ports connected to other switches/APIC (via LLDP) – Link local IP assignment LLDP Start DHCP on LLDP validated ports Port type = Core-port – System-id = …… Switch-type = Leaf
10.0.0.4 10.0.0.5
10.0.0.1
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 32 Spine Bringup (1) Switch comes up in booting state – Don’t relay/switch packets – Send LLDP, DHCP resp packets to Sup – All ports in L3 mode – LLDP enabled on all ports – Only required features enabled (2) Detects ports connected to
DHCP Discovery other switches/APIC (via LLDP) System-id = …… – Link local IP assignment Switch-type = T1 – Start DHCP on LLDP validated ports Spine POD ID = 1 (3) Sends DHCP request
DHCP Relay System-id = …… Switch-type = T1 10.0.0.4 10.0.0.5 Spine POD ID = 1 GIADDR = 10.1.1.2 Relay Agent Info = ….
10.0.0.1
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 33 Spine Bringup (1) Switch comes up in booting state – Don’t relay/switch packets – Send LLDP, DHCP resp packets to Sup – All ports in L3 mode – LLDP enabled on all ports – Only required features enabled (2) Detects ports connected to other switches/APIC (via LLDP) – Link local IP assignment – Start DHCP on LLDP validated ports (3) Sends DHCP request (4) APIC responds with TFTP server address and install script location 10.0.0.4 10.0.0.5 – Spine installs host-route for TFTP server pointing to neighbor
DHCP Response Client IP: 10.1.1.4 TFTP Server IP: 10.1.1.1 10.0.0.1 Gateway: 10.1.1.2 Script Location: xxxxx Relay Agent Info = ….
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 34 Spine Bringup (1) Switch comes up in booting state – Don’t relay/switch packets – Send LLDP, DHCP resp packets to Sup – All ports in L3 mode – LLDP enabled on all ports – Only required features enabled (2) Detects ports connected to other switches/APIC (via LLDP) – Link local IP assignment – Start DHCP on LLDP validated ports (3) Sends DHCP request (4) APIC responds with TFTP server address and install script location 10.0.0.4 10.0.0.5 – Spine installs host-route for TFTP server pointing to neighbor (5) Switch downloads and executes Default Infrastructure the install script. Install script Policy Controller IP – Downloads and installs switch image Overlay VLAN – Downloads Infrastructure Policy Wiring plan 10.0.0.1
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 35 Spine Bringup (1) Switch comes up in booting state – Don’t relay/switch packets – Send LLDP, DHCP resp packets to Sup – All ports in L3 mode – LLDP enabled on all ports – Only required features enabled (2) Detects ports connected to other switches/APIC (via LLDP) – Link local IP assignment – Start DHCP on LLDP validated ports (3) Sends DHCP request (4) APIC responds with TFTP server address and install script location 10.0.0.4 10.0.0.5 – Spine installs host-route for TFTP server pointing to neighbor (5) Switch downloads and executes the install script. Install script – Downloads and installs switch image – Downloads Infrastructure Policy (6) Switch reboots 10.0.0.1
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 36 LLDP
Verification From ishell on the inactive node, issuing the command show lldp neighors will verify the status and information on the connected devices.
leaf101# show lldp neighbors Capability codes: (R) Router, (B) Bridge, (T) Telephone, (C) DOCSIS Cable Device (W) WLAN Access Point, (P) Repeater, (S) Station, (O) Other Device ID Local Intf Hold-time Capability Port ID apic1 Eth1/1 120 90:e2:ba:4b:fa:d4
If the apic is not present, and no neighors are shown, check to make sure that the lldp process is running on the node.
leaf101# show processes |grep lldp 5619 S 41a497e7 1 - lldp
If the apic is not present and lldp is active, confirm the cable connection of the device to the fabric or to the apic
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 37 DHCP
The Tunnel End Point (TEP) addresses used by the nodes on the fabric are allocated from a pool by the APIC. The range by default is in 10.0.0.0/16 which is the default value but can be changed and is configured during the initial apic configuration script.
By default, the APICs in the cluster will use an address of 10.0.0.1 and above. For example, apic1 will be 10.0.0.1 while apics 2 and 3 will be 10.0.0.2 and 10.0.0.3.
The address assigned to a switch node will not change unless the node is decommissioned from the fabric. So, when the switch or the apics are reloaded the same address should be assigned.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 38 DHCP Verification From the shell access on the apic, verify the allocated address with acidiag fnvread. Verify that the node can be pinged from the apic. TEP addresses admin@apic1:~> acidiag fnvread
ID Name Serial Number IP Address Role State LastUpdMsgId ------101 leaf101 SAL17267Z9U 10.0.0.4/24 leaf active 0 102 leaf102 SAL1733B948 10.0.0.5/24 leaf inactive 201 spine201 FGE173400AK 10.0.0.10/24 spine active 0 202 spine202 FGE17420181 10.0.0.11/24 spine active 0
Total 4 nodes
admin@apic1:~> ping 10.0.0.4 PING 10.0.0.4 (10.0.0.1) 56(84) bytes of data. 64 bytes from 10.0.0.4: icmp_seq=1 ttl=54 time=0.207 ms 64 bytes from 10.0.0.4: icmp_seq=2 ttl=54 time=0.168 ms 64 bytes from 10.0.0.4: icmp_seq=3 ttl=54 time=0.146 ms --- 10.0.0.4 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2734ms rtt min/avg/max/mdev = 0.146/0.173/0.207/0.029 ms admin@apic1:~>
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 29 DHCP show
show dhcp internal info client: verify the client information present
show ip route vrf overlay-1: Verify the infra routing information is present
Check the state in /mit/sys/summary: • in-service: node has an IP address and is running the configured firmware • out-of-service: Node does not have ip address • invalid-ver: switch has detected that it is not running the software that the node configuration file is configured. On the APIC Shell: • ps -ef | grep dhcpd: Verify that the dhcpd process is running • Check /var/log/dme/log/svc_ifc_appliancedirector.bin.log: Verify fabric node information in the log
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 40 Node Boot File and Image Download
• The APIC in the DHCP Offer passed the boot file information for the node. This will be in the form of fwrepo/boot/node-
Through shell access on the switch node, verify the state in /mit/sys/summary: • in-service: node has an IP address and is running the configured firmware • out-of-service: Node does not have ip address • invalid-ver: switch has detected that it is not running the software that the node configuration file is configured If the node is running a different version that what has been configured for that node on the APIC, the node will be in a state invalid-ver until the node has completed the upgrade process.
If the node stays in invalid-ver, verify the configuration through the APIC GUI under Admin- >Firmware the Fabric Node Firmware and the available software loads in the Firmware Repository.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 41 Using the Logs on APIC
On the node, in /tmp/logs/svc_ifc_policyelem.log the example output shows:
The node retrieving the boot file: 5134||14-08-03 13:37:57.879-08:00||firmware||DBG4||||downloadUrl – fetching http:/ /10.0.0.3:7777 / fwrepo/boot/node-101 ||../dme/svc/policyelem/src/gen/ifc/beh/imp/./f irmware/FileDownloader.cc||134
And then downloading the configured file and comparing against the running version:
4045||14-08-03 13:37:58.878-08:00||dhcp||DBG4||co=doer:0:0:0x605:1||RespBI::handleVersionDirective ver=any, fwName=http://10.0.0.3:7777 / fwrepo /, runningVer=n9000-11.0(0.824) ||../dme/svc/policyelem/src/gen/ifc/beh/imp/./dhcp/RespBI.cc||278 4045||14-08-03 13:37:58.878- 08:00||dhcp||INFO||co=doer:0:0:0x605:1||Running verion passes desired version check - bringing node into service||../dme/svc/policyelem/src/gen/ifc/beh/imp/./dhcp/RespBI.cc||300
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 42 Inter Fabric Messaging (IFM) Policy Element (policyelem) Session between APIC and Node
The last step in the process is the Policy Element IFC to be established. The encrypted TCP session is initiated from the APIC to the node when the node is listening on TCP port 12183.
Verification:
From the node shell, netstat -a |grep 12183:
Check if there is an active TCP session between the node and APIC Confirm the node is listening on port 12183
leaf101# netstat -a |grep 12183 tcp 0 0 leaf101:12183 *:* LISTEN tcp 0 0 leaf101:12183 apic2:43371 ESTABLISHED tcp 0 0 leaf101:12183 apic1:49862 ESTABLISHED tcp 0 0 leaf101:12183 apic3:42332 ESTABLISHED
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 43 Confirming in the logs
If the node is not listening on the port 12183 Confirm • if the state of /mit/sys/summary is invalid-ver, the node is still in the previous process step • Confirm if the policy element has crashed If the node is listening on port 12183 but there are no established sessions, assuming that IP connectivity between the node and APIC has been confirmed (previous ping test), check: • On the node /tmp/logs/svc_ifc_policyelem.log the session information.
3952||14-08-02 21:06:53.875-08:00||ifm||DBG4||co=ifm||incoming connection established from 10.0.0.1:52038||../dme/common/src/ifm/./ServerEventHandler.cc||42 bico 52.241 3952||14-08-02 21:06:53.931- 08:00||ifm||DBG4||co=ifm||openssl error during SSL_accept()||../dme/common/src/ifm/./IFMSSL.cc||185 3952||14-08-02 21:06:53.931-08:00||ifm||DBG4||co=ifm||openssl: error:14094415:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate expired ||../dme/common/src/ifm/./IFMSSL.cc||198 3952||14-08-02 21:06:53.931-08:00||ifm||DBG3||co=ifm||incoming connection to peer terminated (protocol error)||../dme/common/src/ifm/./Peer.cc||227
In this example output, the session does not establish because the certs expired
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 44 Management and Tunnel Addressing
45 45 Fabric Management Routing & overylay-1
The ACI platform provides the user administrator with both InBand management and Out of Band management. The OOB management is configured for front panel ports and also is used for the first configuration steps on the APIC controllers.
While we want to use the OOB management for certain tasks, the fabric itself will always prefer a in-band destination over the OOB interface.
It is always important to remember that the In-band management of the fabric is an overlay inside the fabric. By definition since it's an overlay it is also a tenant of the fabric. It is one of the 3 default tenants of the fabric ( common, infra and mgmt ).
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 46 Infrastructure: overlay-1
Spine Spine
Leaf Leaf Leaf Leaf Leaf Leaf
VPC VPC VPC A S VPC Server APIC DCI DCI Blade VPC switch FEX FEX VTEP A S WAN or Server Brownfield AVS
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 47 Infrastructure: overlay-1
. Loopback addresses PTEP – Physical Tunnel End Point, assigned to all the nodes (loopback0 / acidiag fnvread output) FTEP – Fabric Tunnel End Point, one for the entire fabric (pervasive, leafs - looopback1023) Proxy-TEP – Proxy Tunnel End Point, assigned to a set of spines (loopbacks 1 & 3) . North-bound infrastructure Includes leaves and spines L3 sub-interfaces are used between leaves and spines IS-IS is run on the sub-interfaces to maintain infra reachability Council of Oracles Protocol (COOP) is run on the PTEP loopback to sync end-point (EP) database MP-BGP is run on the PTEP loopback to sync WAN routes VXLAN tunnels to PTEPs of other leaves and spine proxy TEPs . South-bound infrastructure Includes leaves, Virtual TEPs (VTEP)and APICs Leaves and the VTEPs/APICs are assumed to be L2 adjacent H/W learning or ARP is used to discover reachability to these nodes VXLAN tunnels to VTEPs
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 48 How ISIS is leveraged
. ISIS is used within the Infrastructure VRF (overlay-1) in order to: . Exchange infra routes among all Spines and Leafs . Dynamically exchange TEP Information . Allow graceful introduction of a Spine or a VPC-Peer upon reboot
. Can view ISIS information with #show isis
. ISIS Dynamic TEP’s #show isis dteps vrf overlay-1
. Can view the Infra VRF routes with #show ip route vrf overlay-1
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 49 vPC VIP TEP and Address advertisement
. vPC peers inside a fabric advertise a Virtual TEP IP address to which all end points behind VPC ports are attached.
. This way any remote leaf can simply send a packet destined to a vPC end point to the outer virtual TEP IP address and this packet will be load balanced to either of the vPC peer leafs and get forwarded out to the end host.
. The VIP address is configured on a loopback interface (different from physical IP loopback interface, i.e Loopback0)
Leaf# show system internal epm vpc
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 50 Tunnel Types based on Destination Type . Fabric Tunnels: These run between the fabric elements (ToRs/VPCs and Spines) and use an enhanced version of VXLAN tunnels. There are two sub-categories:
ToR/VPC-ToR/VPC Tunnels: These are used during ‘normal’ operation where ingress ToR tunnels traffic to egress ToR/VPC.
ToR-Spine Tunnels: These are required when an ingress ToR is not able to identify the egress ToR for a destination because destination-to-egress-ToR learning has not yet occurred. In this case the packet is tunneled to a spine, which in turn re-tunnels it to the appropriate egress ToR. The destination of these tunnels originating in the ToRs is the spines’ anycast IP address. Each spine will also create tunnels to the ToRs/VPCs for re-tunneling.
. Tenant Tunnels: These run between ToRs and their locally-attached hypervisor. These tunnels can be any of standard or enhanced VXLAN, or NVGRE tunnels.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 51 Address Allocation
. The configured address range is sub-divided into multiple DHCP pools for use within the fabric. The different DHCP pools are:
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 52 Address Allocation
. When changing the default TEP address range, the following formula will help determine the number of pool addresses needed: N * (P * 32) + 96 where: N = the number of APICs in the cluster P = the number of non-reserved DHCP pools per APIC
. For example, an APIC cluster of 3 with all DHCP pools in use requires: 3 * (4 * 32) + 96 which can be covered by a /23 subnet.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 53 Understanding internal Network
Every ACI system ships with a set of default tenants. One of them is the mgmt tenant. A bridge domain is then created by the admin that we will be used for the in-band management domain.
TEP IP address range Overlay-1 given at install also created at install
Spine1 Leaf4 Leaf3 Leaf2 Leaf1
Leaf1# show ip route vrf all
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 54 Internal In-band Network Internal subnet is to be used between the different members of that group for in band management. Using here the address space 10.1.22.0/24 that is a valid routable subnet in the Data Center. In ACI you always have to create a internal network for in-band management. You can't just extend the L2 into the fabric for this.
Spine1 Leaf4 Leaf3 Leaf2 Leaf1 L3 external network to reach internal switches Spine1 Leaf4 Leaf3 Leaf2 Without L3 external configured the Leaf1 fabric in-band addresses are not accessible from networks not directly Router connected to the fabric
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 55 Key Points of Management interface
When both in-band and out-of-band management are configured, the APIC uses the following forwarding logic:
• Packets that come in an interface, go out that same interface
• Packets sourced from the APIC, destined to a directly connected network, go out the directly connected interface
• Packets sourced from the APIC, destined to a remote network, prefer in-band, followed by out-of-band
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 56 APIC OOB Management 192.168.1.10 192.168.1.11
192.168.1.4 192.168.1.5 192.168.1.6 192.168.1.7 192.168.1.8 192.168.1.9
192.168.1.10 192.168.1.20 192.168.1.30
Customer routed Network 172.1.1.10
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 47 Out of Band Interface Verification
Out-Of-Band Management
Out-of-band (OOB) management is accomplished through a dedicated physical interface on the APIC and fabric nodes (switches). The initial APIC setup script allows you to configure the OOB IP address:
Out-of-band management configuration ... Enter the IP address for out-of-band management: 10.1.22.10/24 Enter the IP address of the default gateway [None]: 10.1.22.1 Enter the interface speed/duplex mode [auto]:
On the APIC, the OOB configuration creates an interface called oobmgmt:
admin@fab2_apic1:~> ip add show dev oobmgmt 115: oobmgmt:
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 58 Fabric Switches management Interface
Once the fabric is initialized and discovered, you can configure the OOB addresses for the fabric nodes (switches) through the object model interfaces (GUI, API, CLI). The step-by-step configuration for this is available in ACI Getting started guide on CCO
On the fabric nodes (switches), the OOB configuration is applied to interface eth0 (aka mgmt0):
fab2_leaf1# ip add show dev eth0 2: eth0:
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 59 APIC In-Band Management
10.1.22.10 10.1.22.11
Management VXLAN
Management 10.1.22.9 VLAN 10.1.22.4
10.1.22.5 10.1.22.6 10.1.22.7 10.1.22.8
10.1.22.1 10.1.22.2 10.1.22.3 Customer Network
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 60 Outputs to verify Operations
Regardless of whether you are using L2 or L3, the encapsulation VLAN used for the Inband EPG is used to create a sub-interface on the APIC using the name format bond0.
On APIC
admin@fab2_apic1:~> ip add show bond0.10 116: bond0.10@bond0:
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 61 Faults
62 62 What is a fault ?
. Faults, events and audit logs are essential tools for monitoring the administrative and operational state of an ACI fabric as well as troubleshooting current and past issues
. Definition of “fault” (ISO/CD 10303-226) : "An abnormal condition or defect at the component, equipment, or sub-system level which may lead to a failure“
As it is a “condition”, it has a lifecycle: it occurs, persists for some time and possibly disappear
As it is "abnormal" and "may lead to failure", it requires user attention
It refers to a “component, equipment or subsystem”
Fault indications are meant for user consumptions and as such should be understandable and ideally actionable.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 63 Examples
Condition User Action
Operational (example: link down) Check link, SFP, etc.
Physical resource unavailable (example: not enough fans) Provide missing resource
HW malfunction (example: memory errors, CRC errors, component reset component, run diags, call failures, spurious resets, etc.) Cisco TAC Environmental (example: board temperature too high) Clear air flow or shutdown overheated component Inconsistenr or incomplete configuration (example: ports in a port-channel Correct the configuration have different config) Failed policy deployment, clustering issues Debug connectivity issues
Logical resource unavailable (example: ID, address, ...) Provide missing resource
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 64 Faults in ACI . A fault is a Managed Object (MO) contained in M.I.T. . It is a child of the affected MO chassis-1 . It has the following properties: code severity card-1 card-2 lifecycle description timestamps fault - port-1 F456 . Faults RN is “fault-”, for example, fault-F123
fault- fault - . Can be queried by DN F123 F789 and class (fault:Inst)
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 65 Fault Lifecycle
Timer and severity values can be customized using monitoring policies
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 66 Faults in GUI
Look for the “faults” tab on the right
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 67 Faults in CLI
Using the faults command:
admin@apic1:~> faults --help Usage: faults system [history] faults controller
Options: -h --help admin@apic1:~> faults controller apic1 Severity Code Cause Ack Last Transition Dn ------critical F0104 port-down no 2014-05-28 01:02:40 topology/pod-1/node-1/ sys/caggr-[po1.1]/ fault-F0104 major F0101 equipment-failed no 2014-05-28 01:02:40 topology/pod-1/node-1/ sys/ch/p-[/dev/sda]-f-[/dev/sda]/fault-F0101
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 68 Getting all faults using moquery . Getting all faults in txt to analyze later:
leaf1# moquery -c faultInst > /tmp/fault-20141112.txt leaf1# ls -l /tmp/fault-20141112.txt -rw------1 admin admin 40113 Nov 13 13:37 /tmp/fault-20141112.txt . Want that in json? leaf1# moquery -c faultInst -o json > /tmp/fault-20141112-2.txt leaf1# ls -l /tmp/fault-20141112-2.txt -rw------1 admin admin 46410 Nov 13 13:40 /tmp/fault-20141112-2.txt leaf1# more /tmp/fault-20141112-2.txt { "imdata": [ { "faultInst": { "attributes": { "dn": "sys/phys-[eth1/11]/fault-F1186", "domain": "infra", "code": "F1186", "occur": "1", "subject": "failure-to-deploy", "severity": "warning", "descr": "Port configuration failure. . Want to get all configuration failed fault ? pod2-leaf1# moquery -c faultInst -f 'fault.Inst.code == "F0467"' | egrep "cause|dn" cause : configuration-failed dn : uni/epp/fv-[uni/tn-testTenant2/ap-testAP/epg-testEPG]/nwissues/fault-F0467
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 69 Investigate a fault
admin@apic1:~> moquery -c fvCtx | grep dn | grep tn-testTenant2
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 70 Events
• An event is a specific condition that occurs at a certain point in time (for example “link went from down to up”)
• Represented in the system as MOs of class event:Record
• As they are part of the normal system workflow, they do not necessarily require user attention
• Useful for monitoring and debugging issues
• Similar to an entry in a log file: once created, they are never modified
• Only deleted when we a maximum number specified in a retention policy is hit
• Events are triggered by “event rules”, defined by developers
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 71 Events in GUI
• Much like other navigation / HISTORY / EVENTS
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 72 Accounting - audit log
• A mechanism to track user-initiated configuration changes • When a user creates/modifies/deletes an MO, we create an “audit record” containing affected MO DN, user name, timestamp and change details • System also creates logs for log-in/log-out to controllers and nodes • Similar to an entry in a log file: once created, they are never modified • Configuration change logs are MOs of class aaaModLR • Login/logout logs are MOs of class aaaSessionLR • Accounting logs get deleted only when a maximum number specified in a retention policy is hit
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 73 Audit log
Who created that ?
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 74 Audit log
Who created that ?
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 75 Core Files
76 76 How to identify that a process has crashed and collect the core When a process crashes, generally you will see that a core file is created which will provide key information to help development determine why that process may have crashed. The core files can be located in a couple of ways.
APIC GUI: The APIC GUI provides a central location to collect the core files for APICS and nodes in the fabric. This would be the recommended path for this step. As shown below, decoding the core file will require shell access.
An export policy can be created from ADMIN -> IMPORT/EXPORT in Export Policies -> Core. However, there is a default core policy where files can be downloaded directly. As shown:
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 77 Taking Core Files off APIC
In the OPERATIONAL tab, the core files, the location and URL links to download the files will be present.
The naming convention of the tarballed files at this stage are as shown here:
example: http://172.23.102.180/files/2/techsupport/dbgexp_coreexp-default_Leaf101_sysid-101_2014-06-06T17- 14_1402074896_0x101_vmm_log.4260.tar.gz
- File location: - In this example http://172.23.102.180/files/2/ which is the IP address and the node ID (2) of the apic where the file is located - Node name which in this example is Leaf101 - Node ID as shown as sysid in the filename. In this example 101. - date/timestamp - process which crashed. This example is vmm.
The files can be downloaded by clicking on the link in the file name.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 78 Core access via Shell CLI
The core files can also be found through the shell. ssh into the APIC with admin credentials and check the /data/techsupport directory. Here the core and a log file for the process that crashed on that APIC can be found as shown here: admin@apic1:core> ls -l total 192224 -rw-r--r-- 1 root root 98034811 Jun 6 08:29 core.svc_ifc_vmmmgr.bin.32078.gz -rw-r--r-- 1 root root 451014 Jun 6 08:29 log.svc_ifc_vmmmgr.bin.32078
On the APIC, the tarball bundled core/log file can be located in /data/techsupport. admin@apic1:techsupport> ls -l total 257828 -rw------1 root root 88650430 Jun 6 08:29 dbgexp_coreexp-default_apic1_sysid-1_2014-06-06T08-29_svc_ifc_vmmmgr.bin_log.32078.tar.gz
For the switch core files, there are two methods that can be used. Either - On the APIC where the file is located at /data/techsupport. Note that if you have a cluster of APICs, where the files get stored does not appear to be deterministic. - If the node is not part of the fabric, the core file can be found in /logflash/core.
If a switch reloaded but there is no core file, check the reason for the reload with the command show system reset-reason. If the reason is due to kernel panic (as shown in the example below), gather the kernel dmesg oops file from /mnt/pstore. spine201# show system reset-reason ----- reset reason for Supervisor-module 27 (from Supervisor in slot 27) --- 0) At 2014-10-29T18:52:49.864+00:00 Reason: kernel-panic Service:system crash Version: 11.0(1d) On the APIC or the switch nodes, the command show cores can be used also to see a summary of the core files that have been generated.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 79 Health Scores
80 80 APIC Health Score
. Health Score provides a quick overview of the health of the system/module . It is based on the Faults generated in the fabric . Range: 0 to 100 (100 is perfect health score) . Each fault reduces the health score based on the severity of the fault . Health Score is propagated to container and related MOs . Health Score policies can control the penalty values, propagation, health Records.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 81 Health Score Views
. System — aggregation of system-wide health, including pod health scores, tenant health scores, system fault counts by domain and type, and the APIC cluster health state.
. Pod — aggregation of health scores for a pod (a group of spine and leaf switches), and pod-wide fault counts by domain and type.
. Tenant — aggregation of health scores for a tenant, including performance data for objects such as applications and EPGs that are specific to a tenant, and tenant-wide fault counts by domain and type.
. Managed Object — health score policies for managed objects (MOs), which includes their dependent and related MOs. These policies can be customized by an administrator.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 82 Health score degraded - identification Navigating to the System Health Dashboard will identify the switch that has a diminished health score
• Double clicking on that leaf will allow navigation into the faults raised on that device. Here we click on rtp_leaf1
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 83 Drilling down
Double Click on Degraded Health Score or Highlight the Health Tab
Health Score
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 84 Getting to Object Fault
Interface 1/35 on this Leaf having issues
Interface has a fault due to being used by an EPG however interface is missing an SFP transceiver
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 85 Tech Support
86 86 Tech Support Features • One interface to collect tech-support from any subset of fabric components and features • Save to fabric, or export to remote server • On-demand or periodic • Configurable data collection • Downloadable via http from the fabric
• Tech-Support are HUGE !!! (multi gig of tar data) • They mostly contain logs useful for development. For Postmortem of an event recommended to get Tech-support of APIC’s and impacted leaves ASAP as some logs rollover quickly.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 87 Create Tech Support policy
In Admin Import/Export export policies Techsupport …
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Tech-Support APIC CLI
admin@bdsol-aci2-apic2:~> techsupport switch 101 remote
admin@bdsol-aci2-apic2:~> techsupport switch 101 Triggering techsupport for Switch 101 using policy supNode101 Triggered on demand tech support successfully for node 101, will be available at: /data/techsupport on the controller. Use 'status' option with your command to check techsupport status … Wait few minutes admin@bdsol-aci2-apic2:~> techsupport switch 101 status Nodeid: 101 Collection Time: 2014-11-14T19:03:39.657+02:00 Status: success Detailed status: Task completed Location: /data/techsupport/dbgexp_tsod-supNode101_pod2-leaf1_sysid-101_2014-11-14T19-03CET.tar.gz on APIC 2
admin@bdsol-aci2-apic2:~> admin@bdsol-aci2-apic2:~> cd /data/techsupport/ admin@bdsol-aci2-apic2:techsupport> ls -l total 996984 -rw-r--r-- 1 root root 100602052 Nov 14 18:06 dbgexp_tsod-supNode101_pod2-leaf1_sysid-101_2014-11-14T19- 03CET.tar.gz
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 89 Upgrades
90 90 Upgrades
There are three types of software images that can be upgraded:
• The APIC software image. • The switch software image — software running on leafs and spines of the ACI fabric. • The Catalog image — the catalog contains information about the capabilities of different models of hardware supported in the fabric, compatibility across different versions of software, and hardware and diagnostic utilities. The Catalog image is implicitly upgraded with the controller image. Occasionally, it may be required to upgrade the Catalog image only to include newly qualified hardware components into the fabric or add new diagnostic utilities.
You must upgrade the switch software image for all the spine and leaf switches in the fabric first. After that upgrade is successfully completed, upgrade the APIC controller software image.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 91 Compatibility Checks
• Image level compatibility • Card level compatibility • Feature level compatibility
APIC controller does reboot at upgrade leaving cluster stable with other 2 operation controllers It will upgrade in cascading manor as the upgraded controller converges back into cluster and becomes “Fully Fit”
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 92 Configuration export always a good Practice In Admin – import/export Export Policies Configuration Create an export destination (SCP server, …), specify export format (json or xml) Choose start now and submit Check in operationnal tab progress of the export.
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 93 Common Upgrade issues
• APIC is using standard Linux distribution so Linux standard SCP commands are required in the firmware download tasks • APIC cluster must be in “Fully Fit” state to start an upgrade. Verify cluster status and process failures • Monitor faults for upgrade concerns, pauses, actions. Fault F1432 is most common
Verifying upgrade status https://
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 94 More Detailed Topics for Session BRKACI-3344
FRIDAY 11:30AM Application Centric Networking Troubleshooting 201 – Day 2 Operations
Advanced subjects to be covered: APIC Troubleshooting DHCP Operations Clocking Syncing, NTP Detailed look at Atomic Counters Identifying Packet Flow Trough the Fabric EPG-EPG use-case
Support Technical Panel at end
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 95 Call to Action
• Visit the World of Solutions for – Cisco Campus – Walk in Labs – Technical Solution Clinics • Meet the Engineer • Lunch time Table Topics • DevNet zone related labs and sessions • Recommended Reading: for reading material and further resources for this session, please visit www.pearson-books.com/CLMilan 2015
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 96 Complete Your Online Session Evaluation
• Please complete your online session evaluations after each session. Complete 4 session evaluations & the Overall Conference Evaluation (available from Thursday) to receive your Cisco Live T-shirt.
• All surveys can be completed via the Cisco Live Mobile App or the Communication Stations
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 97 Q & A
Managed Objects & DME
100 Everything is an object Managed Objects Objects are hierarchically organized
dMIT Distributed Managed Information Tree (dMIT) contains comprehensive system information Root • discovered components • system configuration • operational status including statistics and faults
A single logical dMIT presented to user through REST interface on any APIC Internally dMIT is split into various services and shards in various APICs
MO • class • DN • prop1 • prop2 • …
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 101 API MO Classes – What are they?
From the UI
BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 102 Internals of ACI Logical Model Policy APIC Application centric configurations, IPolicy Controller Target Groups, Rules, Configurations • The policy update to switches is asynchronous i.e. the REST call to APIC does not wait for Policy Update update to switches Shared • APIC decides which subset Memory of logical model to push based on explicit or implicit registration (subset) Concrete Model of policies from switch Logical Model Ports, Cards, Subset of Implicit Forwarding Interfaces, Complete Render Deploy plane • On switch NXOS process are VLANs, ACL Logical notified of MO update using MTS Model message which then reads the MO NXOS from shared memory (objectstore) Policy Element Process
103 Switch BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public Data Management Engine (DME) Process that controls access to the APIC, to configure the logical model
Opflex server for external opflex elem
Switch
NXOS NXOS
NXOS
Process Process Process
Get logical MO from PM and push concrete MO Delegate localObjectstore faults, (Shared memory) to configure switch Atomic counters, Collect stats from events, records, core handling NXOS and push to APIC health score BRKACI-2333 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Public 104