Network State Awareness and Troubleshooting

Network State Awareness and Troubleshooting

#CLUS Network State Awareness and Troubleshooting Aamer Akhter / [email protected] BRKARC-2025 #CLUS Agenda • Troubleshooting Methodology • Packet Forwarding Review • Control Plane • Topology • Logging • Routing Protocol Stability • Data Plane • Active Monitoring • Passive Flow Monitoring • QoS • Getting Started #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 4 Cisco Webex Teams Questions? Use Cisco Webex Teams (formerly Cisco Spark) to chat with the speaker after the session How 1 Find this session in the Cisco Events App 2 Click “Join the Discussion” 3 Install Webex Teams or go directly to the team space 4 Enter messages/questions in the team space Webex Teams will be moderated cs.co/ciscolivebot#BRKARC-2025 by the speaker until June 18, 2018. #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 5 Keeping Focused: What This Session is About • This session is about basic network troubleshooting, focusing on fault detection & isolation • Some non-Cisco specifics • For context, we will cover some basic methodologies and functional elements of network behavior • This session is NOT about • Architectures of specific platforms • Data Center technologies • This is the 90 min tour. ;-) #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 6 Network Quality is a Complex, End-to-End Problem Affects Join/Roam Affects Quality/Throughput Client firmware Affects Both* WAN Uplink usage End-User services Client density AP coverage Configuration WLC Capacity WAN QoS, Routing, ... Authentication RF Noise/Interf. Addressing CUCM ISE WANWhat is the problem? There are 100+ points of DHCP Office site Where is theNetwork problem? services DC APs Cisco Prime™ failure Mobilebetween clients user and app Local WLCs How can I fix the problem* Both = Join/roam fast? and quality/throughput #CLUS © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public Network state awareness? • What is it: • View of network, what it is doing, and why • Monitoring of data network performance, in comparison with previous working states • Quick detection of hard failures • Early warning for • soft failures • performance issues • and tomorrows’ problems • Faster problem resolution • Greater confidence in network by users and application operators #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 8 Think Like a Network Detective Find the Suspects Question Suspects Improve Be Prepared #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 9 Control Plane & Data Plane • Control Plane Gossip from Admin Edict other routers • Processes variety of information sources and policies, creates forwarding information base (FIB) Routing show ip bgp APIs Statics PfR Protocol(s) show ip ospf • Best known intention w/o actual packet in hand Control Plane show ip route show ip policy • Data Plane Int B show ip cef • The actual forwarding process Int A packet show mpls forwarding… (might be SW or HW based) Data Plane show mac address-table Int C • Granted some decision flexibility show policy-map int… • Driven by arriving packet details, traffic show interface conditions etc. Passive Measurements show flow monitor ifmib CbQoS *Flow #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 10 Data Plane Decision Flexibility • Control plane: condenses options driven by policies and (relatively) slower moving (ms to secs), aggregated information, eg. prefix reachability, interface state • Data plane responds to packet conditions • Destination prefix to egress interface matching • Multi-path (ECMP / LAG) member selection • Interface congestion • QoS class state • Access Lists • Packet processing fields (TTL expire, etc) • IPv4 fragmentation, etc #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 11 Network as a System: Independent Decisions • Each network device makes an independent forwarding decision • Explicit Local / domain policies • Device perspective might not be symmetric • Data plane flexibility • Asymmetric routing: forward and reverse path are different • Caused by traffic engineering policies, popular at WAN-edge and admin boundaries Congested link R5 is doing ECMP hash R3 R1 R6 R2 R5 A B R4 your network You don’t control #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 12 Data Plane and Control Plane Changes • Change is normal, but some changes are more interesting: • Single change that causes loss of reachability or suboptimal performance • Instability: high rate of change • 3Ws: When, where, and what #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 13 Control Plane 3Ws: When, where, and what What do I have? • Establish inventory baseline <owner/dept> • Device names, IPs, configuration <device-name> • Modular HW configuration <IP address> • Serial # (for support & replacement) <Contact> Example device label • History (where has it been placed) • Clearly label devices, ownership and contact info <current-location> to <destination-location> • Establish standards for location, <circuit src/dst id> device/port names Example cable label • Check for changes periodically (tooling) #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 15 How is it wired together? • Establish network topology baseline • Visual inspection • Be prepared to be surprised! • show cdp neighbor show lldp neighbor • CDP / LLDP for Layer-2 neighborships • Traverse spanning-tree blocked, but not L3 R1 SW1 R2 • Monitor for non-leaf changes R1#show cdp neighbors Capability Codes: R - Router, T - Trans Bridge, B - Source Route Bridge S - Switch, H - Host, I - IGMP, r - Repeater Device ID Local Intrfce Holdtme Capability Platform Port ID SW1 Eth 0 157 T S WS-C3524-XFas 0/0/0 #CLUS © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public Tools for Topology & Inventory Management • Most NMS tools have some element of inventory and topology awareness • DNA Center • APIC-EM DNA Center: Discovery • Cisco Prime Infrastructure • NetBrain • (open source) NetDisco http://www.netdisco.org • (open source) Netdot https://osl.uoregon.edu/redmine/projects/netdot DNA Center: Physical Neighbor Topology #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 17 Logging • Centrally: for ease of analysis and search • Cisco Prime Infrastructure & Cisco EPNM– full featured tool for inventory, and monitoring • Moogsoft - automates early detection of service failures, collaboration & knowledge base • syslog-ng – preprocessing, relay and store(file/db) • Logstash(ELK), fluentd – multisource collection, storage and analysis service timestamps log datetime msec show-timezone ! • Locally: in case logs can’t get home logging host <ipaddr> logging trap 6 logging source interface Loopback 0 ! logging buffered <size> 6 logging presistant url disk0:/syslog size <TotalLogsSize> filesize <OneFileSize> #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 18 Alerting & Collaboration • Routing of alerts / interesting events • Coordinating response • Is this noise or signal? • IM tools (Spark, hipchat etc.) • Which team(s) to alert? • Email • Who is on duty? • Ticketing tools (OTRS, Jira, ServiceNow, Moogsoft…) • How to contact: SMS, IM, phone call… • Pagerduty, Openduty PagerDuty OTRS #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 19 State of the Routing Table • Be familiar with normal behavior of important service prefixes • Establish quickly if problem is control plane or data plane • show ip route / ipRouteTable MIB / show ip traffic (Drop stats) • Nagios: check_snmp_iproute.pl • Track objects and EEM (config) track 100 ip route 0.0.0.0 0.0.0.0 reachability event manager applet TrackRoute_0.0.0.0 event track 100 state any action 1.0 syslog msg "route is $_track_state“ # 01:09:21: %HA_EM-6-LOG: TrackRoute_0.0.0.0: route is down blog.ipsapce.net #show ip route 192.168.2.2 Routing entry for 192.168.2.2/32 Known via "ospf 1", distance 110, metric 11, type intra area Last update from 10.0.0.2 on FastEthernet0/0, 00:00:13 ago Routing Descriptor Blocks: * 10.0.0.2, from 2.2.2.2, 00:00:13 ago, via FastEthernet0/0 Route metric is 11, traffic share count is 1 blog.ipspace.net #CLUS BRKARC-2025 © 2018 Cisco and/or its affiliates. All rights reserved. Cisco Public 20 # show ip ospf Routing Process "ospf 1" with ID 192.168.0.1 Start time: 00:01:46.195, Time elapsed: 00:48:27.308 Supports only single TOS(TOS0) routes Supports opaque LSA OSPF Area / AS-Wide Supports Link-local Signaling (LLS) Supports area transit capability Supports NSSA (compatible with RFC 3101) Supports Database Exchange Summary List Optimization (RFC 5243) Event-log enabled, Maximum number of events: 1000, Mode: cyclic Router is not originating router-LSAs with maximum metric • Remember that OSPF data in area should be Initial SPF schedule delay 5000 msecs consistent Minimum hold time between two consecutive SPFs 10000 msecs Maximum wait time between two consecutive SPFs 10000 msecs Incremental-SPF disabled • Understand ‘normal’ rate of changes Minimum LSA interval 5 secs Minimum LSA arrival 1000 msecs • LSA refresh /30-min unless a change LSA group pacing timer 240 secs Interface flood pacing timer 33 msecs • Track SPF runs over time Retransmission pacing timer 66 msecs Number of external LSA 0. Checksum Sum 0x000000

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    100 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us