Beyond Fault Management Implementing a NOC to Maintain High Availability Jim Thompson Jay Thondakudi
Total Page:16
File Type:pdf, Size:1020Kb
PS-510 © 2001, Cisco Systems, Inc. All rights reserved. 1 Beyond Fault Management Implementing a NOC to Maintain High Availability Jim Thompson Jay Thondakudi © 2001, Cisco Systems, Inc. All rights reserved. 2 MAKING IT INTERACTIVE © 2001, Cisco Systems, Inc. All rights reserved. 3 Copyright © 2001, Cisco Systems, Inc. All rights reserved. Printed in USA. Presentation_ID.scr Agenda • Availability Measurement and your business • Overview of a NOC • Network Management Framework • Fault Management • Performance Management • Tool Issues • People, Processes and Procedures • Back to the Concept of the NOC © 2001, Cisco Systems, Inc. All rights reserved. 4 Method for Attaining a Highly Available Network or a road to five 9’s • Establish a Standard Measurement Method • Define Business Goals as Related to Metrics • Categorize Failures, Root Causes, and Improvements • Take Action for Root Cause Resolution and Improvement Implementation © 2001, Cisco Systems, Inc. All rights reserved. 5 Why should we care about network availability? Recent studies by Sage Research determined that US based Service Providers encountered: • Percent of downtime that is unscheduled: 44% • 18% of customers experience over 100 hours of unscheduled downtime or an availability of 98.5% • Average cost of network downtime per year: $21.6 million or $2,169 per minute! Downtime - Costs too much!!! SOURCE: Sage Research, IP Service Provider Downtime Study: Analysis of Downtime Causes,Costs and Containment Strategies, August 17, 2001, Prepared for Cisco SPLO B © 2001, Cisco Systems, Inc. All rights reserved. 6 Copyright © 2001, Cisco Systems, Inc. All rights reserved. Printed in USA. Presentation_ID.scr Upside has Downside Application Downtime • Number of employees • Direct Revenue Loss affected x hours out • Compensatory x burdened hourly rate payments • Lost future revenue • Billing losses • Investment revenue losses • Customers • Revenue Recognition • Suppliers • Cash Flow • Financial markets • Lost discounts • Business Partners (accounts payable) • Careers lost • Temporary employees • Payment guarantees • Equipment rental • Credit rating • Overtime costs • Stock price Source: Gartner Research AVVID Team;rmarwah © 2001, Cisco Systems, Inc. All rights •reserved.Travel expenses 7 What are your business goals? • Financial ROI Economic Value Added Revenue/Employee • Productivity • Time to Market • Organizational Mission • Customer Perspective Satisfaction Retention Market Share Define your ‘end-state’? What is your goal? © 2001, Cisco Systems, Inc. All rights reserved. 8 BusinessBusiness Objectives Objectives & & Availability Availability Definition of Network Availability for your company– generalized (i.e.): Network availability management is the optimization of the avail ability and reliability of networking services and of the supporting network infrastructure, with the objective of ensuring all business requirements are met. Availability management policywill set a definite course of action for availability measurement, reporting, and procedures. This strategy for availability management will be based on defined and agreed upon business requirements that will help drive policy to determine present and future resource decisions. The role of network management including availability and performance management functions, is in managing growth and achieving effici encies of scale as a critical strategic objective for your company © 2001, Cisco Systems, Inc. All rights reserved. 9 Copyright © 2001, Cisco Systems, Inc. All rights reserved. Printed in USA. Presentation_ID.scr Availability Business Requirements • Yield Control Yield per customer by volume & mix Maximizing the yield per bandwidth, per router • Activity-based accounting Cost of downtime Cost of waiting for a needed part or tool Cost of reworking or redesign © 2001, Cisco Systems, Inc. All rights reserved. 10 Availability Business Requirements Availability as a Basis for Productivity Data Measurement of Total-Factor Productivity Benchmarking the Organization Overall Organizational Performance Metric Availability as a Basis for Organizational Competency Availability as a Core competency Availability Improvement as an Innovation Metric Resource Allocation Information Identify defects Identify root cause Measure MTTR – tied to process © 2001, Cisco Systems, Inc. All rights reserved. 11 How does your network enable your business to reach its goals? • User Requirements Timeliness Interactivity Reliability Quality Adaptability Security Affordability • Application Requirements Mission Criticality Controlled-Rate Applications Real-time • Performance Requirements Delay Reliability Capacity • Network Requirements Scaling Services Interoperability Performance Monitoring Troubleshooting © 2001, Cisco Systems, Inc. All rights reserved. 12 Copyright © 2001, Cisco Systems, Inc. All rights reserved. Printed in USA. Presentation_ID.scr Agenda • Availability Measurement and your business • Overview of a NOC • Network Management Framework • Fault Management • Performance Management • Tool Issues • People, Processes and Procedures • Back to the Concept of the NOC © 2001, Cisco Systems, Inc. All rights reserved. 13 What is a NOC? A Helpdesk? A trouble-ticket logging center? A Break-fix team? All of the above? © 2001, Cisco Systems, Inc. All rights reserved. 14 What is a NOC? Maybe a worst case discussion on CHAOS Theory! © 2001, Cisco Systems, Inc. All rights reserved. 15 Copyright © 2001, Cisco Systems, Inc. All rights reserved. Printed in USA. Presentation_ID.scr What is a NOC? Are there references? • RFC1302 - Building a Network Information Services Infrastructure • Authored February ’92 by Merit contributors © 2001, Cisco Systems, Inc. All rights reserved. 16 What is a NOC? • From RFC1302 Definition of a NIC and a NOC A Network Information Center (NIC) is an organization whose goal is to provide informational, administrative, and procedural support, primarily to users of its network and, secondarily, to users of the greater Internet and to other service agencies. © 2001, Cisco Systems, Inc. All rights reserved. 17 What is a NOC? • Definition of a NIC and a NOC (cont) A Network Operations Center (NOC) is an organization whose goal is to oversee and maintain the daily operations of a network. … A NIC must work closely with its NOC to ensure users get the best service possible. © 2001, Cisco Systems, Inc. All rights reserved. 18 Copyright © 2001, Cisco Systems, Inc. All rights reserved. Printed in USA. Presentation_ID.scr Are there other resources? • North American Network Operator’s Group www.nanog.org www.merit.edu/internet/ Informative mailing lists Sean “Backhoe King” Donelan • puck.nether.net/netops/ NOC Telephone/Contact List! © 2001, Cisco Systems, Inc. All rights reserved. 19 Are there other resources? www.nocbuilder.com © 2001, Cisco Systems, Inc. All rights reserved. 20 Example of Network Management Goals • Increase/Maintain Network Availability • Provide meaningful analysis and correlation of events to assist in FAST fault resolution • Provide effective use of engineering resources by automating repetitive tasks and de-complexifing network operation © 2001, Cisco Systems, Inc. All rights reserved. 21 Copyright © 2001, Cisco Systems, Inc. All rights reserved. Printed in USA. Presentation_ID.scr Example of Network Management Goals • Ensure network configuration information is available for network/device restoration. • Monitor to ensure that no network outage is caused by a device which has reached its performance limitations. • Provide regular reports which summarize the network for the various management teams which require them. © 2001, Cisco Systems, Inc. All rights reserved. 22 What kind of NOC? Reactive Proactive Operational © 2001, Cisco Systems, Inc. All rights reserved. 23 Develop A Plan “First comes thought; then organization of that thought, into ideas and plans; then transformation of those plans into reality. The beginning, as you will observe, is in your imagination.” Napoleon Hill © 2001, Cisco Systems, Inc. All rights reserved. 24 Copyright © 2001, Cisco Systems, Inc. All rights reserved. Printed in USA. Presentation_ID.scr What are our goals? • What type of shop are we? Lights Out & Remote… … or BIC-EOT © 2001, Cisco Systems, Inc. All rights reserved. 25 Initial Considerations • What are our Availability/SLA requirements? Maintenance Windows? • Is the NOC simply identify and escalate… … or also fix? • How many managed devices are we responsible for? • Roughly how many events/day (hour, minute) do we currently get or expect? © 2001, Cisco Systems, Inc. All rights reserved. 26 Initial Considerations (cont’d) • How much $$ can we allocate? • What are the skills of the operators? • What types of devices are we monitoring? • What are the technologies in use? © 2001, Cisco Systems, Inc. All rights reserved. 27 Copyright © 2001, Cisco Systems, Inc. All rights reserved. Printed in USA. Presentation_ID.scr Develop A Plan! “Given two equally likely solutions to a problem… The simplest one is usually correct.” Occam’s Razor © 2001, Cisco Systems, Inc. All rights reserved. 28 Develop A Plan! “Given a choice between two methodologies, choose the simplest -- the method which requires the fewest resources. .” NOC Corollary to Occam’s Razor © 2001, Cisco Systems, Inc. All rights reserved. 29 Agenda • Availability Measurement and your business • Overview of a NOC • Network Management Framework • Fault Management • Performance Management • Tool Issues • People, Processes and Procedures • Back to the Concept of