Monitoring System

Performance Analysis and Monitoring in Information Systems

J. P. Germano Alberto R. Silva Fernando M. Silva Vodafone Portugal, INESC-ID INESC-ID AV. D.Joao˜ II, LT.1.04.01 Parque Nac¸oes˜ R. Alves Redol, 9 R. Alves Redol, 9 1998-017, Lisboa, Portugal 1000-029, Lisboa, Portugal 1000-029, Lisboa, Portugal Email: [email protected] Email: [email protected] Email: [email protected]

Abstract— This work presents the BlackBird system, which is these applications are faced with the challenge of assuring an analysis and monitoring service for data-intensive enterprise the best possible quality of service and the attainment of the applications, without restrictions on the targeted architecture or negotiated Service Level Agrement (SLA). For this task it is employed technologies. Monitoring systems are an essential tool for the effective management of Enterprise Applications and the essential to have monitoring systems capable of providing a attainment of the demanding service level agreements imposed comprehensive view of the application status and the most crit- to these applications. However, due to the increasing complexity ical components, in order to anticipate performance problems and diversity of these applications, adequate monitoring systems and act before there is any impact on the quality of service. are rarely available. The BlackBird monitoring system is able The currently available monitoring systems can provide effi- to interact with these applications through different technologies employed by the Monitored Application, and able to produce cient monitoring on the network and device level, however, Metrics regarding the application service level goals. The Black- due to the complexity and diversity of the applications, these Bird architecture is composed by several Application Interface systems are unable to provide the desired monitoring on the Modules, and by a central component responsible for Metrics application level. Most of the available monitoring systems calculation and presentation. Application Interface Modules in- specifically target applications or technologies that have a large teract with the target monitored Application in order to get performance data in a common format. These data are stored user base, Database servers, WEB servers, Unix or Windows in a common repository and used for Metrics calculation and Servers. In these cases the monitoring systems is adapted to a presentation. The BlackBird system can be specified through a set fixed architecture, monitors pre-determined system parameters of pre-defined Configuration Objects, allowing it to be extensible and expected system components. This kind of monitoring and adaptable for applications with different architectures. ignores all the functionality that is developed over the base application. Even though, it is this added functionality that 1 - INTRODUCTION implements the business logic and produces the most relevant The telecommunications business, in which Vodafone is contribution to the delivered quality of service. The adaptation included, is one of the most competitive markets. As a provider capabilities of this kind of systems are usually limited to of high technology services there is a constant pressure to predefined components and working parameters, which may implement new technologies that will allow the diversification not be the most relevant to the specific use of the application. of the provided services and the improvement of existing ser- The need for adequate monitoring applications is even more vices. Like in most large scale and technology based business, serious for applications developed in house or when the the Information Technologies (IT) infrastructure of telecom- Monitored Application results from an extensive customization munications companies has become the main base of support of a base application. In this case the only solution is to to business processes, and many Enterprise Applications are develop, also in house, the necessary monitoring systems. This now considered mission-critical, reaching a point where the extra development effort will certainly increase project cost performance of these applications has a direct relation to the and complexity and cause development delays. performance goals of the entire company. The objective of the BlackBird monitoring system is assist- The complexity and diversity of the business rules and ing in the effective management of the extremely diversified provided services, together with the pressure for fast imple- set of applications from the Information & System Tech- mentation demand a vast portfolio of different applications in nologies department (DTSI) of Vodafone Portugal. Providing the organization. These applications can be extremely diverse, continuous monitoring of these applications and being actively in terms of complexity, architecture, base technologies and ap- used to optimize performance and diagnose problems. It has plication provider. And, as result of fierce competition environ- two main features: i) can monitor an extremely diversified ment, all these applications are required to constantly evolve range of applications such as the one found at Vodafone and adapt, in order to implement new business requirements Portugal; ii) provides complex Metrics that relate to the main and support new services. In organizations such as Vodafone application goals, complementing the monitoring systems cur- the teams responsible for the operation and management of rently deployed at Vodafone Portugal. The diagram in fig. 1 represents a general view of the monitoring system, it obtains 1.1 - Existing Systems data from a number of servers using different technologies, and generates visual representations of the system status and The currently available monitoring systems are as diverse performance that are presented to the application operators. as the application they target, and the differences between systems can range all aspects of the monitoring process: i) gen- eral architecture; ii) application instrumentation; ii) targeted applications. In terms of architecture, most monitoring systems em- ploy a Manager-Agent architecture, there is usually a central management system and several agents each one dedicated to a target resource or application. A complete monitoring system will have several types of agents specifically developed for the interfaces and architectures of the various types of monitored resources. The agent will usually reside on the same host as the monitored resource, it will communicate with the monitored resource and relay the obtained data to the central management system using a protocol such as SNMP [2]. This is the architecture that emerged from the first network management systems and is currently used by all monitoring systems based on the SNMP protocol. It is especially adequate for monitoring of vast numbers of distributed resources such as computer networks Openview [1], Nagios [3] or grid com- puting systems MonALISA [9]. As more applications evolved Fig. 1. Monitoring System Overview. from centralized to distributed and from raw processing to providing services, a new monitoring architecture became The need for monitoring systems was born from the need possible, Agentless Monitoring. In this case there is no need to assure high availability of the first enterprise level systems. to deploy an agent on the application host, all data gathering These first monitoring systems where developed by the system is accomplished by remote access using standard protocols providers as part of services packages. With the appearance and logins. There is only a central management system that of large scale enterprise networks it became also necessary to executes calls to the services provided by the Monitored develop monitoring systems capable of monitoring these vast Application, and by remotely accessing applications resources collections of computers. Thereby, it was network monitor- like databases, servers and file systems. Agentless Monitoring ing that originated the first vendor independent monitoring typically provides lightweight monitoring, but with limited systems, such as HP Openview [1]. As the systems and depth of data gathering or monitoring capabilities. However, networks evolved and matured so did the monitoring systems, it is less intrusive, easier to deploy and does not require Simple Network Management Protocol (SNMP) [2] became continuous development. Agentless Monitoring is especially the most widely used management protocol and is currently adequate for services based applications and is the only option the base to most network and device management systems, HP for proprietary and closed source applications. One example Openview [1] and Nagios [3]. of agentless monitoring system is Longitude [10] from Heroix. However, due to the numerous programming languages and In the first monitoring systems one key element of the the almost infinite number of architectures and purposes, appli- monitoring process was application instrumentation, which cation level monitoring remains an extremely diversified field consists in modifying the existing applications in order to col- with no predominant protocols or methodologies. For applica- lect additional data during run-time. The basic instrumentation tion level monitoring it remains the developer’s responsibility technique is to insert instrumentation code at key points of to provide the application with the interfaces that will facilitate interest in the program. During execution, the instrumentation its monitoring. Due to development costs, time constraints or code is then executed together with the original program technology constraints, the monitoring capabilities provided by code, producing events that that will give the main monitoring Enterprise Applications can vary greatly: i) application specific system information about the application’s current status. The monitoring systems provided by the developer, Microsoft importance of application instrumentation for the management Operations Manager [4], Oracle Enterprise Manager [5]; ii) of complex distributed application resulted in several technol- proprietary APIs to be used by general purpose monitoring ogy level standards like JMX for and Java2 Platform systems, Manage Engine [6]; iii) integration modules for Enterprise Edition (J2EE) and WMI for Microsoft Products standard monitoring protocols, SNMP [2]; iv) implementation and the Microsoft .NET framework. of technology specific monitoring protocols, Java Management Other major difference between the various available sys- Extensions (JMX) [7], Windows Management Instrumentation tems is the range of applications they target, monitoring (WMI) [8]. systems can be application specific or general purpose. Ap- plication specific systems are usually developed by companies application monitoring. Even applications that do not employ a that wish to provide a high level of monitoring to it’s product strict component based architecture can usually be modeled as range, the monitoring system will then be included in a a set of distinct programs. Whatever the purpose or technology support package or as an additional component. Naturally, used the complexity of Enterprise Applications dictates that application specific systems can provide the best monitoring the full task to be executed must be split between simpler tasks, of any application. However, it becomes impossible to com- that will be performed by different program modules. By con- bine the monitoring of applications that work together. Most sidering a definition of Component less restrictive than the one of all, developing dedicated monitoring systems is a costly usually associated with Component Based Software it should process that can only be supported by large companies with be possible to model any application as component based, a significant application portfolio. Examples of such systems where each component may have a number of parameters that are Microsoft Operations Manager [4] and Oracle Enterprise can be used as indicatiors of the general application health Manager [5]. Third party companies will only risk developing and performance. And, by modeling the application, it should application specific systems for applications that can guaranty be possible to capture a more abstract level of application a large user base. Quest Software provides versions of the functionality, which is closer to the business logic and to the Spotlight [11] monitoring system for BEA WebLogic Server, main application goals of quality of service. Oracle, etc. The option taken by most companies for adding monitoring capabilities to their applications is to implement 1.3 - Comparative Analysis a standard management technology such as SNMP, JMX or The Monitoring Systems referenced in the previous section WMI, the organization employing the application will then are some of the most widely used and present a representative use a general purpose monitoring system. General purpose sample of the existing monitoring solutions. The table I monitoring systems aim to provide monitoring services to a summarizes the main features of these systems. range of applications as wide as possible. For this they will The objectives for this work clearly requires a general implement support for standard monitoring technologies and purpose monitoring application, and considering the context of protocols, and for proprietary protocols used in applications Vodafone Portugal, the agentless architecture seems the most with a large user base. These systems will also include a wide adequate, as it the less intrusive on the monitored systems. variety of agentless monitoring capabilities. One example is Also, in order to minimize the impact on the Monitored Ap- ManageEngine [6] from AdventNet. Due to the development plications, adding instrumentation code to existing applications costs and the limited market, application specific systems tend is not an option. to be more expensive and limited to only a few applica- Most of these systems allow user defined data gathering, tions. Organizations will employ application specific systems however, the user is usually required to supply an extensive only for the most critical applications, or for widely used set scripts for obtaining data from the Monitored Application applications such as servers and databases, where general in a format accepted by the Monitoring System. The BlackBird purpose systems will be used for the remaining low user base System requires only a minimum of information for executing applications. the same command, handles all data validation and conver- sion. Also, the BlackBird System allows the simultaneous 1.2 - Future Trends execution of any commands regarded as necessary. Although Extensive research has been aimed at improving the mon- most of these system provide some form of support to user itoring support of Enterprise Applications, with the main defined Metrics, Metrics based on different data sources are focus on improving the monitoring extensions provided by usually limited to reporting purposes. Without combining data the monitored application. Most of this work is focused on sources, Metrics will always provide a fragmented view of the JMX [12] technology which is part of J2SE platform, the Monitored Application. The BlackBird System is able to JMX defines an architecture in which client applications provide Metrics based on any combination of data sources. can provide a remotely accessible management interface that From these systems, the ones that provide a Component exposes their internal structures and resources. Although, it is Based Monitoring, support only applications developed using only applicable to Java and the J2SE platform. It has a close the frameworks J2EE or .NET. Component based monitoring resemblance to the WMI for the .NET framework, and the provides the best view of the high level functions of an main concepts are applicable to other programming languages application, and allows Metrics related to the application and architectures, including legacy applications [13]. The use main goals. The BlackBird System introduces a simplified the standard management architectures and instrumentation Component definition, for extending the concept of component techniques on enterprise applications opens the way to auto- based monitoring to applications that were not developed as mated application management and self managed applications Component Base Applications. [14]. All this research presents a common characteristic, it targets 2 - REQUIREMENTS component based applications [15], and because of the compo- The BlackBird monitoring system must provide the five nent organization and the management interfaces provided by key features: i) monitor a wide range of applications, being the base technologies it is possible to obtain a good detail of adaptable to the architecture and technologies of the Monitored TABLE I FEATURE COMPARISON TABLE.

Targeted Supported Agent Interface Alerts and User Defined User Defined Component Systems Protocols Based Notifications Data Gathering Metrics Based HP Openview General SNMP Y Thin Y Y Y N Operations Purpose Client Nagios Network Network services Optional WEB Y Y Y N and Host Microsoft Microsoft WMI Y WEB Y Y Y Y Operations Manager Applications and Client Oracle General ODBC and Y WEB Y Y Y Y Enterprise Manager Purpose Network services Spotlight Dedicated JMX Y Thin N N N Y Client Manage Engine General Multiple Optional WEB Y Y N Y Purpose Longitude General Multiple N WEB Y Y Y Y Purpose BlackBird General Multiple N WEB Y Y Y Y Purpose

Application; ii) provide in depth application level monitoring, 2.2 - Use Cases component based; iii) easily adaptable to the evolution of The use of the Blackbird Monitoring System involves one the Monitored Application; iv) low impact on the moni- Application Owner which is the manager of an Enterprise tored system, agentless and without additional application Application that will be monitored using the Blackbird system, instrumentation; v) graphic interface for data visualization the Blackbird Administrator which is the responsible for and configuration. Currently, the BlackBird System is able the maintenance and administration of the Blackbird system, to gather data from the Monitored Application by remote and multiple Operators that accesses the monitoring pages execution of SQL and invocation of WEB Services. Databases generated by the Blackbird system and respond to alarms. are central components to every major application at Vodafone, The initial deployment of a Monitoring Requirement, fig 2, implement great part of the business logic and are probably involves the configuration of all Monitoring Objects necessary the best source of information about application status and to gather the required data, compute Metrics and alarms, performance. WEB Services are also widely used at Vodafone format graphics and pages. DTSI, they provide support for many business critical appli- cations and have enormous potential for usage expansion.

Start Data Update Graphics Alert Collection Pages

Page 2.1 - Monitoring Requirement «include» «include» «extend» Apply The Blackbird system aims to provide a monitoring service Changes «extend» Edit Configuration Graphic to an application without imposing any limitations on the «extend» Object target architecture, therefore, the operations to be performed «extend» «extend» Consult must be specified by a user with detailed knowledge of Metric Configuration Command the Monitored Application. The Application Owner has the Object List knowledge of the application goals and performance require- Consult ments and will be responsible for defining the Monitoring Application List Requirement detailing the Monitoring Objects necessary to specify the required monitoring service. A small set of Monitoring Objects is enough to specify the required monitoring service: Commands to be executed using Blackbird Administrator the interfaces provided by the monitored application where the result will be stored and used to calculate Metrics; Metrics Fig. 2. Deploy Monitoring Requirement use case. defined by a formula to be executed on the stored data to produce a result that is related to the application’s performance Once the Monitoring Requirement is deployed and the indicators; Alerts for evaluating thresholds on Metrics and BlackBird System is started, it will initiate data gathering send notifications; Graphics that use the Metric as a data and produce the graphics and pages that compose the final source and plot the data according to the type and format; result of the monitoring process. Now, application operators Pages for containing graphics relative to the same application will access these pages and visualize the graphics, fig. 3, in component or type of component. order to determine the status of the application. class Domain Model «include» Main Page Application Graphics Alarms + layout: char Screen Screen + title: char

+Contains 1

1..* Consult Follow Drill Application Down Link Graphics List + datasource: Metric + format: char + link: char + type: int

1

1 +DataSource

Application Operator Module Metric +DataSource + dataStore: DataStore + formula: char 1..* 1 Protocol

Fig. 3. Visualize Monitoring Output user case. 1 +Evaluates *

DtabaseScript WebServ iceRequest Alert + login: char + arguments: char + condition: char + SQL_statement: char + login: char + message: char + method: char 3 - ARCHITECTURE + state: int + WSDL_location: char As proposed in section 1.2, by modeling the Monitored Application as a set of interacting components it should be possible to obtain a more de detailed view of the application Fig. 5. Domain Model. status and performance, also, it should be easier to obtain Metrics that relate to the application performance goals. For monitoring purposes, which is the scope of this work, we and aggregation functionalities of the BlackBird System by consider a simplified definition of Application Component: i) computing the formula specified in the Monitoring Require- executes a well defined task within the application; ii) can ment. The Graphic class provides the visual presentation to be univocally referenced; iii) has a set of working parameters the Metric objects, it will use the output of the Metric as which can be obtained using the application interfaces. Based a data source and apply the type of graphic and the format on this definition of Application Component the model in fig. 4 request in the Monitoring Requirement. A Graphic may be represents the Monitored Application: a table of values or various chart formats. The Page class provides the base for generating the monitoring interface that will be accessed and navigated by the operators. By combining the information from the Page and Graphic objects, the BlackBird system generates the required interface as an WEB Applicaton containing the requested charts and tables. The Alert class provide automatic notification of performance problems. It is defined as boolean condition to be evaluated. Fig. 4. Application Model. 3.1 - BlackBird Components The Blackbird system uses an agentless architecture, it Figure 5 presents the high level Domain Model for the is composed of an Adaptation Layer itself composed by a BlackBird System. Each of the required Configuration Objects variable number local Interface Modules designed for specific defined in the requirements section is implemented by a protocols, an Aggregation Layer that handles data storage and dedicated class, except for Commands that is split between the Metrics calculation, and a Presentation Layer for generating Module class and subclasses dedicated to specific technolo- the monitoring pages and graphics. The Component Diagram gies, DatabaseScript and WebserviceRequest. The of the BlackBird System is represented in fig. 6. dataStore attribute of Module is a set of database objects that will store all data produced by the associated command 3.1.1 - Adaptation Layer and provide support to Metrics calculation. Provides one of the main features of the BlackBird System, The Module sub classes provide an adaptation layer that adaptation to the technologies of the Monitored Application. isolates the BlackBird architecture from any technology de- All technology and protocol specific processing is performed tails, they handle all technology specific logic like establish- by interface modules, where each type of interface module ing a connection, authentication, formatting the command, handles a specific technology or protocol, performs all the obtaining and validating the response. Finally, they convert tasks necessary for executing the requested command, converts the command result to a normalized Extensible Markup Lan- the command output to the normalized format and delivers guage (XML) document and deliver that document to the it to the dataStore. The Adaptation Layer also performs dataStore. For simplifying Metrics definition and calcu- the first step for providing a monitoring service adapted to lation, the dataStore is designed to be accessed as a rela- the architecture of the monitored system. By allowing mul- tional entity. The Metric class provides the data processing tiple Module to execute independently it becomes possible cmp Architecture

Presentation Agregation Layer Adaptation Layer 3.1.3 - Presentation Layer Layer

PageManager MetricManager ModuleManager Is responsible for the final output of the BlackBird Monitor- ing System, which are monitoring pages containing visualiza-

ApplicationServer Database tions of the status and performance of the Monitored Applica-

Database:: DataStore JDBC Monitored tion. The monitoring pages are generated from the Graphic ApplicationServer: DatabaseScript Application Page Module JDBC and Page objects and deployed on an Application Server as Database:: Metric a . However, the automated generation of ApplicationServer: JDBC Graphic: WEBSercices JDBC Database:: Module the WEB Application was considered to be out the scope WEBServiceCall Alert of this work, so the Webcockpit [16] application is used to generate the monitoring pages based on a configuration file created by the BlackBird System from the existing Page and Graphic objects. The PageManager component is Fig. 6. BlackBird Components. responsible for managing the generation and deployment of the WEB Application that provides the monitoring interface. to specify as many data sources as required for compiling 3.2 - Result Format a complete repository of performance data that will allow In order to isolate the BlackBird architecture from any the calculation of any relevant Metrics. The ModuleManager technology details it is necessary to define a normalized component controls execution of the Interface Modules and format for the results data, this format must be: i) capable manages the dataStore of those objects. For this work of containing any result; ii) easily convertible to a relational only two modules where developed, an SQL module that uses format. According to the application model defined at the JDBC to connect to the monitored Database for executing an beginning of this section (fig. 4), and to support component SQL statement and an WEB Services module that implements level monitoring, the output of a command executed on the the Apache Axis Framework for dynamically invoking Web monitored system may be defined as a tree of components services. These technologies are base to most of Vodafone each containing any number of working parameters, including applications, and allow the demonstration of the BlackBird a component identifier. According to this definition, any result capabilities in combining data obtained using different tech- may be formatted as a XML document with the schema from nologies. fig. 7. The BlackBird architecture is expandable to other types of protocols. Adding support for aditional protocols to the BlackBird system requires no changes to existing components only the development of: i) a new interface module to handle the required protocol; ii) two new configuration tables and associated views; iii) new stored procedures for the edit operations on the new type of module.

3.1.2 - Aggregation Layer Stores all performance data in an organized an easily accessible form, performs Metrics calculation and alerts ver- Fig. 7. XML Schema. ification. The Aggregation Layer contains the dataStore attributes from all existing Module objects implemented as database tables and views, the combination of all these objects 4 - IMPLEMENTATION AND DESIGN constitutes a complete repository of all performance data The BlackBird Database serves two purposes, storage and gathered from the Monitored Application. And, since all this processing of Performance Data from the Monitored Ap- data is accessible through relational queries, it should be plications, and support to the configuration and control of possible to implement any Metric required by the Application Monitoring Objects. The BlackBird System is intended for Owner. The Aggregation Layer is responsible for other of the an enterprise environment where security and role separation main features of the BlackBird System, Component Based is always a major concern, because of that for each Monitored Monitoring. Since all performance data gathered from the Application there is a dedicated database schema for contain application can be used as input for the Metric objects, it the implementations of dataStore Metric and Alert is possible aggregate the data collected by different Modules that exist for that application. The main BlackBird schema using the componet identifier to produce Metrics that provide a contains the definitions of all Monitoring Objects in use. complete view of all aspects for that Application Component. The MetricManager component is responsible for creating 4.0.1 - Main Schema and updating the implementation of Metric formulas, and The BlackBird main schema presented in fig. 8 is used actively evaluating alert conditions. for controlling the monitoring process, contains Monitoring Objects definitions, and additional entities for implementing then provides access to that same data through a relational application separation and access control. interface. The final implementation of dataStore is com- posed of a Results Table and a Translation View, result data class dbo is stored XML format in the Results Table, then based on SQL_command SQL_login WEBServ _login WEBServ _command the format of the XML result the BlackBird System creates a

0..1 0..1 0..1 0..1 view for selecting all XML elements as columns of result set. The complex implementation of the dataStore allows the

1 * implementation of the Metric object simply as a database module * 1 view created from the SQL formula defined in the Metric. 1..* metric

1..* * One of the most important functions of the BlackBird

1 1 1..* System is the management of the dataStore. The creation

graphic 0..1 alert 0..* of a new Module requires the creation of a new Results 1..* Table and the associated Decoding View. The Results Table 1..* 1..* 1 has a fixed definition so it’s creation posses no problems. page The Decoding View definition however, performs the mapping

1..* of XML elements in relational columns, thus it depends 1 1 application on the results format. For creating the Decoding View, the 1 1 1 ModuleManager requests the Module a XML Schema 1..* 1..* 1 Definition (XSD) which describes the XML document that 1 operator application_owner error_messages it will be used for delivering command results. To determine 1..* * the result format the Module execute the command on the Monitored Application for the first time and use the first Fig. 8. Main Schema. command response to produce the XSD schema that will describe the result XML document. The management process of Metric is much simpler, upon changes to the formula, 4.0.2 - Monitored Application Schema the existing view is dropped and recreated according the new For each application there is a separate schema that contains SQL select statement. the result data obtained from the Monitored Application. This 4.1 - schema contains the database objects that implements the The BlackBird interface supports two main functions, vi- dataStore and Metric. The creation and modification sualization of the final output of the monitoring process, of all database objects in the Client Schema is performed and configuration of the Monitoring Objects. The interface is automatically by the BlackBird System as a result of changes implemented as a WEB application running on an Apache to the Configuration Objects. Figure 9 presents an example Tomcat application server. Figure 10 represents the page schema containing the implementation of two dataStore and structures of the BlackBird interface. two Metric objects.

ui Interface

App Owner Page

Application List

+Select Application 1

*

Application Page

Graphics Pages

Configuration Page +Select Graphics Page

Alerts 1 1 Configuration Page Graphics Page

1 * Objects Lists Charts

Edit Actions Tables

+Select Operation 1

*

Edit PAge

Input Parameters

Fig. 10. BlackBird Interface.

Fig. 9. Monitored Application Schema. The starting page for both monitoring and management is the Application Owner Page, it presents a list of all The dataStore provides a vital function of the BlackBird applications currently being monitored and provides links to System, it receives data in the form of XML documents and the Application Pages and to the configuration page. The Application Page presents contacts for alerts and a list of Communication between these systems is provided by a active alerts each with a links to a related graphic page. middleware of WEB Services from WebMethods. The PPB Presents also a list of all Graphics Pages configured and for application was developed at Vodafone Portugal, it contains this application. The final output of the monitoring process is an Oracle database and was developed using C and PL/SQL. an WEB application containing charts and tables that provide There is already a dedicated monitoring system for PPB a visual representation of Metrics related to the performance which was developed together with the main application, goals of the monitored application. From the existing Page and provides performance information and alerts. Although and Graphic objects, the PageManager generates the it delivers extremely valuable information, it implements a Webcockpit configuration file that specifies the contents of the static set of Configuration Objects and is not applicable to required WEB application. Then, the Webcockpit application any other applications. This example will add monitoring of is used to generate the Java Server Pages (JSP) that implement recent functionalities of the PPB application that have not yet the required charts and tables. been included in the dedicated monitoring system, currently Configuration pages use the same application server and all only basic unavailability alerts are provided. active operations are also implemented using JSPs, however One of the tasks performed by the PPB application is these are static pages independent of any monitoring config- credits to the costumer balance in the IN platform. Recent uration. The main Configuration Page presents a list of all Business Requirements ”Vita Recarga” and ”SOS Extra” have Configuration Objects of the various types currently configured introduced new scenarios on which it is necessary to apply for this application and available edit actions. The contents credits to the customer balance. When the condition defined of the Edit Pages depends on the type of monitoring object in the Business Requirement is verified, the PPB system and the requested change. For each type of Monitoring Object generates a request for balance credit, this request is then there is a set of stored procedures for performing all valid added to a queue for processing. The processing queue exists changes, and a view for obtaining all relevant configuration as a database table containing requests to be processed, and data. Based on these objects, the Edit Page presets a form the processing of each request requires a call to a WebService for collecting these parameters. Since Edit Pages are prepared for confirming current balance. The main performance criteria to determine the necessary management stored procedures for this task are the average processing time and maximum from the monitoring object type, the addition of new Interface processing time for these requests, the current monitoring Modules does not require any changes to the Configuration includes alarms if thresholds on these parameters are exceeded. Pages or Edit Pages, it only requires the creation of the appropriate stored procedures and views. 5.0.2 - PPB Monitoring Example For monitoring this aspect of the PPB application, problems 5 - CASE STUDY diagnosis and performance tuning, it would be extremely The case stydy examples focus on two applications currently helpful to have a line chart presenting the average request under the responsibility of the Billing & Mediation team at execution time, the average Web Service response time, and Vodafone Portugal DTSI, the Pre Paid Billing System (PPB) the difference between the two times, which represents the and the Mediation Device (MD). Both are Mission Critical contribution of the PPB System to the total processing time. applications that require 24 hour support and monitoring. This is a relatively simple example, however, the presented The monitoring system currently used for monitoring the parameters have direct relation to the application performance applications is HP Openview Operations, this system provides goal. Furthermore, by splitting the application goal into sub complete network and device monitoring, and also application parameters, it becomes easier to identify the contribution of level monitoring based on application specific scripts and each application component to the final result. The output of programs. These examples aim to extend the detail level of this graphic is also helpful for performance tuning, since it monitoring, by providing real time graphics and alerts on the illustrates the impact of configuration changes on each of the performance goals of these applications, which is only possible parameters that contribute to the application performance goal, through the combination of multiple sources of performance it should be easier to establish a compromise that produces the data. The BlackBird System is currently integrated in the best overall results. production environment of these applications, and all the results presented here where obtained from the production 5.0.3 - Mediation Device environments of the applications. The second example applies to the MD system also from Vodafone Portugal. The Mediation Device processes billing 5.0.1 - Pre Paid Billing System records of all traffic types, GSM, 3G, GPRS, SMS, etc. Col- The PPB application is responsible for lifecycle manage- lects billing records from all Network Elements, validates and ment of all pre-paid clients, interacts with SIEBEL which selects billable records, pre-rates and sends billable records is the Customer Relationship Management (CRM) solution to the billing systems (ARBOR) and PPB. The Mediation employed by Vodafone Portugal, and with the Intelligent Device was also developed at Vodafone Portugal, it contains Network (IN) system from Alcatel-Lucent that handles real an Sybase database and was developed using C and Transact time rating and the management of the costumer balance. SQL. Currently all monitoring and alerts on the Mediation Device Application are provided by scripts and programs integrated in the Openview Operations System. This example will use some of the same scripts as input and use the data processing capabilities of the BlackBird System for providing Metrics directly related to the application performance goals. The Mediation Device processes various types of records refereed to as Data Stream, and for each Data Stream, there are various Network Elements. Thereby, the Mediation Device architecture can easily be modeled into components. The basic components are the Network Elements that have parameters such as current delay and records processing rate. These elements can be grouped into the main application components the Data Streams. The most critical SLA defined for the Mediation Device are all related to the delay between record generation on the network element and its delivery to the desti- nation billing systems. Given these, the fundamental indicator Fig. 11. PPB Monitoring Configuration Page. of performance for the Mediation Device is the number of records processed per unit of time. This example uses the Seconds existing scripts as data source for creating charts on the delay associated with network elements and records processing rate according to type of traffic.

5.1 - Configuration Process Once the required output is defined, the Application Owner must elaborate the Monitoring Requirement in order to specify the Configuration Objects necessary for producing the desired monitoring output. The Monitoring Requirement for the PPB example contains the following Configuration Objects: i) Lo- gin details for the PPB database; ii) Login details for WEB Fig. 12. PPB Queue. Service calls; iii) SQL command for obtaining the average processing time; iv) WEB command for obtaining the Web Service response time; v) Metric for combining data from both In this example the BlackBird System was used for fast commands; vi) A line chart that uses that Metric; vii) A page implementation of a detailed monitoring service, and to gen- for containing the the Graphic. erate Metrics that aggregate data obtained using different The configuration of the Monitoring Requirement can be technologies, Database access and WEB Services call. performed either by the BlackBird Manager or the Application The chart from figure 13 shows the evolution of the delay in Owner using the Configuration Interface described in the Call Records processing for the Network Element components previous section. Starting from the Configuration Page the user of the MD application, only the ones that present a delay above selects and action to be applied on an existing Configuration a defined threshold are shown. Since this is batch process Object or the creation of a new object. Then the user is pre- the delay tends to evolve in steps. The figure was captured sented with a form for providing or updating the configuration during a peak hour so there is a tendency for an increase in parameters of the object. Upon edit confirmation the user is processing delay. On the bottom there is a table that shows returned to the Configuration Page and the BlackBird Manager current processing status of Network Elements. proceeds to execute all the necessary actions such as creating Figure 14 shows a chart presenting the evolution of number and updating the objects in the Application Schema. of Call Records processed per hour for the Stream components Figure 11 shows the monitoring configuration page for the of the MD. This figure was captured during a maintenance PPB application listing the Monitoring Objects configured for operation that required stoping the processing of some types this example. of traffic. In this example the BlackBird System was used to improve the existing monitoring to detail the information 5.2 - Monitoring Output on the component level, and provide real time visualization of The final output of the PPB monitoring example is shown Metrics directly related to the SLA an performance indicators. in figure 12. This chart shows the evolution of the requests These examples are not enough explore the full capabilities processing time and the contributions of the WEB Service call of the BlackBird System, all aspects the architecture are aimed and PPB processing. This capture shows normal conditions, at flexibility, so that an Application Owner may implement any where the contribution of WEB Service call to final processing monitoring he regards as necessary. Even though these simple time is negligible. examples are enough to show that the BlackBird System can Minutes WMI for .NET application. Another possible improvement would be to usage of dynamically generated arguments for the commands executed on the Monitored Applications. Although the data repository provides easily accessible source of data, the definition of Metrics is still the most de- manding configuration task. This task can be greatly simplified by the addition of a query building interface, either as an applet on the Edit Page or as a separate application. For visualization purposes, it could be possible to have better drill down capabilities and root cause analysis by automatically generating a hierarchical view of the Monitored Application, such as the one provided by the Spotlight [11] monitoring system. This would require an improved defini- tion of application modeling that also includes the relations between Applications Components.

REFERENCES [1] Hewlett-Packard Development Company., “HP Operations Dashboard Fig. 13. Mediation Device Network Elements. software,” http://www.managementsoftware.hp.com/products/ovd/ds/ovd ds.pdf, last visited on September 2007.

Records [2] Engineering Task Force., “Simple network management proto- col (snmp) applications,” ftp://ftp.rfc-editor.org/in-notes/rfc3413.txt, last visited on September 2005. [3] Ethan Galstad., “Nagios 3.x documentation,” http://nagios.sourceforge.net/docs/nagios-3.pdf, last visited on Septem- ber 2007. [4] Microsoft Corporation., “System Center Operations Manager,” http://technet.microsoft.com/en-us/opsmgr/default.aspx, last visited on September 2007. [5] Oracle Corporation., “Oracle Enterprise Manager,” http://www.oracle.com/technology/products/oem/prod focus/db mgmt.html, last visited on September 2007. [6] AdventNet, Inc., “Applications Manager,” http://manageengine.adventnet.com/products/index.html, last visited on Fig. 14. Mediation Device Streams. September 2007. [7] Sun Microsystems., “SR-000003 JavaTM Management Extensions (JMXTM),” http://jcp.org/aboutJava/communityprocess/final/jsr003/index.html, last be effectively used for providing an advanced monitoring visited on September 2007. service to real world enterprise application. [8] Microsoft Corporation., “Windows Instrumentation: WMI and ACPI,” http://www.microsoft.com/whdc/system/pnppwr/wmi/wmi-acpi.mspx, 6 - CONCLUSIONSAND FUTURE WORK last visited on September 2007. [9] H. Newman, I.C.Legrand, P. Galvez, R. Voicu, and C. Cirstoiu, “Mon- The BlackBird System application examples on production alisa : A distributed monitoring service architecture,” CHEP03, 2003. Enterprise Applications have demonstrated that the BlackBird [10] Heroix Corporation., “Longitude,” System can be used for monitoring enterprise applications with http://www.heroix.com/agentless/agentless performance monitoring main.htm, last visited on September 2007. non standard complex architectures, and that it can provide [11] Quest Software, Inc., “Spotlight for BEA WebLogic Server,” valuable metrics close to the main application goals. More http://www.quest.com/spotlight-for-bea-weblogic-server/, last visited on specifically, these examples show that the BlackBird System September 2007. [12] H. Kreger, “Java management extensions for application management,” can be used for: IBM SYSTEMS JOURNAL, vol. 40, 2001. [13] N. K. Diakov, M. van Sinderen, and D. Quartel, “Monitoring extensions • fast implementation of a detailed monitoring service; for component-based distributed software,” Fifth International Confer- • generating Metrics that aggregate data obtained using ence on Protocols for Multimedia Systems, 2000. different technologies; [14] A. Diaconescu, A. Mos, and J. Murphy, “Automatic performance man- agement in component based software systems,” Proceedings of the • improving the existing monitoring to detail the informa- International Conference on Autonomic Computing, 2004. tion on the component level; [15] A. Mos and J. Murphy, “Compas: Adaptive performance monitoring of • providing real time visualization of Metrics directly re- component-based systems,” Proc. of the 2nd ICSE Workshop on Remote lated to the SLAs an performance indicators. Analysis and Measurement of Software Systems, 2004. [16] Peter Jonathan Klauser, “Webcockpit Open Source Project,” In terms of data collection and interface to the Monitored http://webcockpit.sourceforge.net/schema/schema.html, last visited on Applications, the natural improvement is the development of September 2007. additional Interface Modules for additional protocols: Remote Shell for access to unix systems, JMX for J2EE applications,