Large Scale Monitoring of Home Routers
Total Page:16
File Type:pdf, Size:1020Kb
IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications 21-23 September 2009, Rende (Cosenza), Italy Large scale monitoring of home routers S. Costas-Rodr´ıguez+,R.Mart´ınez-Alvarez´ +,F.J.Gonz´alez-Casta˜no∗+, F. Gil-Casti˜neira∗,R.Duro× +Gradiant, ETSI Telecomunicaci´on, Campus, 36310 Vigo, Spain ∗Departamento de Ingenier´ıa Telem´atica, Universidad de Vigo, Spain ×Grupo Integrado de Ingenier´ıa, Universidad de La Coru˜na, Spain Tel: +34 986 813788, fax: +34 986 812116 E-mail: {scostas,rmartinez}@gradiant.org,{javier,xil}@det.uvigo.es,[email protected] Abstract – This paper describes our experience with reception power, memory usage or uptime are useful to concurrent asynchronous monitoring of large populations of detect and fix many potential problems. end-user broadband-access routers. In our real tests we focused on home/office ADSL Despite of the wealth of research in large-scale monitoring, which assumes that it is possible to inquiry individual nodes routers, although our results are valid for any other access efficiently, end-user access routers usually have manual legacy technology. We monitored the routers of the Spanish ISP interfaces, either HTTP- or telnet-oriented. They seldom and VoIP operator Comunitel (www.comunitel.es). With offer a direct interface to other programs. Moreover, the our approach, a full monitoring cycle of 22,300 such uptime of end-user routers is unpredictable. For all these routers took less than five minutes. reasons, commercial large-scale monitoring tools such as SNMP collectors are useless. The rest of this paper is organized as follows: In This research is motivated by the fact that some telecom- section II we review the background, comprising academic munications operators do not let end-users buy their routers research and existing industrial solutions. In section III in the consumer electronics market. Instead, they rent we describe the practical difficulties in end-user router the routers and maintain them under long-term contracts. monitoring. In section IV we present our solution, based By monitoring line signal-to-noise ratio, transmission and reception power, memory usage or uptime, the operators on concurrent asynchronous connections. Finally, sec- can predict many types of router failures. In any case, this tion V concludes. information may feed their data warehouses for future use. In our field tests we monitored the routers of the Spanish II. BACKGROUND ISP and VoIP operator Comunitel (www.comunitel.es). With our approach, a full monitoring cycle of 22,300 such routers A. Academic research took less than five minutes. Large-scale scalable monitoring typically follows a dis- Keywords: Monitoring, access networks, SNMP tributed schema [1]. Instead of relying on a single collec- tor, there are multiple monitoring devices working con- I. INTRODUCTION currently. This schema is highly reliable against network This paper describes our experience with concurrent failures. asynchronous monitoring of large populations of end-user The research in [2] identifies three types of distributed broadband-access routers. monitoring: static decentralized, programmable decentral- Despite of the wealth of research in large-scale monitor- ized and active distributed. In the latter, mobile agents ing, which assumes that it is possible to inquiry individual control monitoring nodes. They migrate through the net- nodes efficiently, end-user access routers usually have work and identify the optimal nodes to activate monitoring manual legacy interfaces, either HTTP- or telnet-oriented. functions. Thus, the active distributed monitoring architec- They seldom offer a direct interface to other programs. ture adapts itself to the state of a dynamic network. This Moreover, the uptime of end-user routers is unpredictable. strategy is also followed in [3]. For all these reasons, commercial large-scale monitoring Regardless of the number of monitoring nodes in a tools such as SNMP collectors are useless. network and their evolution, each monitoring node must Obviously, at the network core we will find complex collect data from a large number of network nodes. A nodes with adequate monitoring interfaces. However, end- monitoring node must be as efficient as possible, because user device monitoring is of paramount importance for otherwise it could compromise scalability. This paper operators that rent and maintain those devices instead of focuses on this practical problem, and thus our solution is allowing end-users to purchase them. The operators that valid for any monitoring system, either static or dynamic. follow this model waste large sums of money attending Regarding monitoring interfaces, previous work tried to service calls, due to router failures or service degradations solve the limitations of SNMP. It has been proposed to that are often predictable from monitoring data. Typical adopt CORBA or Java RMI-based interfaces [4] to access examples are uncontrolled growth of NAT tables or unde- data. The corresponding objects would retrieve data (via clared p2p filesharing activities blocking too many ports. SNMP) and add extra information like physical location Measures like line signal-to-noise ratio, transmission and in the network or the relationship with other devices. In practice, as we will see in section III, end-user devices Model Number Telsey CPVA500 2.500 have extremely limited monitoring interfaces. Telsey CPVA3 11.000 Although less related to this paper, there are many other Telsey Gada 200 research lines in network monitoring. Among them, we can Zyxel Prestige 700 - cite the following: OneAccess 200 1.900 Cisco 800 series 6.700 • Passive monitoring [5], [6] analyzes data packets at Total: 22.300 different points of the network using packet sniffers. Table I These sniffers may extract packet headers, transmis- END-USER COMUNITEL ROUTERS IN THE FIELD TESTS sion rates, number of retransmissions, packet sizes, etc. Passive monitoring is somewhat limited, but it may help to feed advanced network analysis tools with the data they need. costs to gain a significant market share. Among other • AI techniques may assist monitoring systems in net- consequences, this implies severely limited firmware. work failure detection [7]. Ideally, this will enable Due to the fact that the end-user usually manages his automatic problem fixing, to reduce the number of router himself, all interesting parameters (used memory, warnings human operators must handle. CPU load, SNR and attenuation...) are available via telnet B. Industrial systems or www interfaces. SNMP services are often limited to a single MIB to report uptime. Unfortunately, manufacturers By default, many ISPs and operators employ the com- are reluctant to add new MIBs, as it would increase cost mercial program HP OpenView [8], [9]. It relies on SNMP in products with a low profit margin. Finally, the telnet or for data acquisition, but it can be enhanced with external web monitoring interfaces are not standard, so each device collectors. Specifically, it admits plain text data, XML files is managed in a different way. or inputs from SQL databases. In the free software arena, RRDtools [10] is the pre- Another problem in this context is that users shut their ferred solution. RRD is a database format designed to routers down quite often. This, the fact that end-user store monitoring data in an efficient way. RRDtools allows routers have high monitoring response times and the un- to create RRD databases, monitor devices and store their predictable load of the access network (which also affects results periodically, as well as to extract data and create the response time of the devices), make it useless to adopt graphs from them. Data acquisition follows the SNMP a sequential monitoring algorithm, namely an algorithm protocol. In theory, RRDtools admits external collectors to monitor individual devices in a pure sequential manner as HP OpenView does. (waiting for the acknowledgement of each device before Many monitoring programs like Cricket [11] (a system monitoring the next one). Existing commercial software to generate statistics from RRDtools data) or BigSister [12] works this way. employ RRD. Finally, ISP operators do not want to be constrained by Finally, OpenNMS is a tool with growing popularity the choice of a specific router model. They always have [13]. It also relies on SNMP. It determines if diverse a number of alternatives, and replace routers for technical protocols (HTTP, ping...) are available. or marketing reasons quite often. Therefore, a large-scale To sum up, the most popular tools and solutions rely monitoring software for end-user routers must be flexible on SNMP monitoring. As we will see in the next section, and quickly adaptable to any kind of device. unlike high-performance routers, end-user devices do not All theses problems render the high-end solutions in have full SNMP support. They still depend on “manual” section II-B useless, since they only admit SNMP and per- protocols to report their state. form sequential monitoring (thus being unable to handle arbitrary device shutdowns). III. MONITORING END-USER DEVICES A. Typical difficulties B. End-user routers in this research The programs in section II-B are highly advantageous In this research1 we monitored the 22,300 end-user for large-scale network monitoring. They employ SNMP routers of the Spanish operator Comunitel [14] at the time to acquire data, or rely on complex software like virtual this paper was written. There were six different