Maintaining Quality of Based on ITIL-Based IT Service

V Koji Ishibashi (Manuscript received January 18, 2007)

Interest in the IT Infrastructure Library (ITIL) of system management best practices has increased in recent years, and are starting to incorporate ITIL in their IT systems. To help with this incorporation, Fujitsu provides the Systemwalker product group, which supports ITIL-based IT service management. ITIL contains many kinds of management processes. In this paper, we focus on the service deliv- ery area, which includes capacity, availability, and service level , and discuss the functions provided by Systemwalker Service Quality Coordinator (SSQC) and Systemwalker Availability View (SAView) from the ITIL perspective. An overview of the architecture used to implement these functions is also included.

1. Introduction functions and has been widely accepted in the The IT service management processes of the Japanese IT market. IT Infrastructure Library (ITIL)1) arise from the SAView is a new product that was launched following two core areas: in 2006 and provides visualization of 1) Service support: Processes related to the service availability. daily operation and support of an IT service SSQC and SAView can be positioned as 2) Service delivery: Long-term planning and products that play a supporting role in imple- improvement processes related to IT service menting the following management processes provision that fall under the ITIL service delivery core In this paper, we mainly discuss the service area: delivery part of these two core areas, which con- 1) tains the processes used to maintain the high 2) Availability management quality of the services provided by an IT system. 3) Service level management In 2006, the Systemwalker products, which The functions and architecture of SSQC and were launched in 1995 as Japan’s first integrated SAView are described below. IT service management products, were enhanced to the V13 versions to support all of ITIL. 2. Capacity management Especially, Systemwalker Service Quality The aim of capacity management is the Coordinator (SSQC) and Systemwalker Availa­ continued provision, now and in the future, of bility View (SAView) are related to the service business services that are highly cost-effective delivery area. Figure 1 shows the Systemwalker in terms of capacity and performance. To achieve architecture. this end, capacity management clarifies the SSQC was launched at the end of 2003. It business service requirements, the business ser- provides capacity and vice capabilities that the current IT system can

334 FUJITSU Sci. Tech. J., 43,3,p.334-344(July 2007) K. Ishibashi: Maintaining Quality of Service Based on ITIL-Based IT Service Management

Systems operations know-how

Availability mgmt/capacity mgmt Incident mgmt Process mgmt Service Quality Coordinator IT Service Management IT Process Master Availability View

Incident mgmt Change mgmt Availability mgmt Capacity mgmt

Enterprise Centric Manager

JOB scheduling/automatic operations Operation Manager Prevention of information leaks from PCs Desktop series Resource control Resource Coordinator

Server/Network Client PCs

Figure 1 Systemwalker architecture. provide, and the IT infrastructure required to tion software. provide business services in the future. The capacity and performance information Some examples of the use of capacity man- collected by SSQC is about the following types of agement are: resource usage in the IT system infrastructure: 1) Expanding the IT infrastructure in 1) CPU usage rate, CPU queue length preparation for future increases in 2) Disk busy rate, number of disk queue transaction throughput requests 2) Performance tuning so IT system resources 3) Available memory capacity, number of are used effectively swap-in/swap-out operations 3) Predicting the requirements of business 4) Disk usage rate services in the future SSQC can also collect the following types of The personnel of an IT infrastructure sup- performance information concerning the middle- port must perform various processes ware of an IT system: to implement capacity management. For exam- 1) Response time at each client PC ple, they must measure and monitor IT system 2) Number of Web server processing requests performance, predict service deployment and and the response times for those requests demand, and perform capacity planning and 3) Number of application (AP) server requests, tuning. wait time, and processing time To assist in these capacity management 4) Execution multiplicity for batch processing processes, SSQC provides functions for collecting 5) SQL execution time on the DB server and analyzing capacity and performance infor- 6) Amount of free table space area for the DB mation from all parts of an IT system, ranging In addition, SSQC can collect the through- from the infrastructure to the business applica- put of business applications by establishing a

FUJITSU Sci. Tech. J., 43,3,(July 2007) 335 K. Ishibashi: Maintaining Quality of Service Based on ITIL-Based IT Service Management

data import interface with them. of a time series analysis. The above information can be used to per- SSQC provides many kinds of analyses form, for example, the following types of capacity functions, and by using these functions, the IT management: infrastructure support organization can eas- 1) Establishment of the criteria for the ily perform the capacity management processes resource capacity required for business defined in ITIL. based on correlation analysis of business application throughput, Web server through- 3. Availability management put, and CPU usage rates The purpose of availability management is 2) Prediction of future processing demands to maintain a high level of availability for the based on time series analysis of business services provided by an IT infrastructure with a application throughput favorable cost-effectiveness in order to achieve 3) Prediction of CPU and disk resource capaci- business goals. ties that will be required in the future based For example, availability management can on predictions of future business throughput be used to: Figure 2 shows some example SSQC 1) Monitor whether IT services are being reports. Figure 2 (a) shows the result of a corre- processed as planned lation analysis, and Figure 2 (b) shows the result 2) Reduce the fault occurrence frequency in an

(a) Result of correlation analysis (b) Result of time series analysis

Figure 2 Example SSQC reports.

336 FUJITSU Sci. Tech. J., 43,3,(July 2007) K. Ishibashi: Maintaining Quality of Service Based on ITIL-Based IT Service Management

IT infrastructure by performing preventive 3) Response time breakdown analysis maintenance SSQC monitors the responses of Web appli- 3) Keep the mean time between failures (MTBF) cations. It also measures and displays the time at a high level by minimizing the downtime taken for these responses and the time taken to due to faults download the elements of the displayed HTML The personnel of an IT infrastructure screen. SSQC, therefore, not only monitors support organization must perform various availability but also provides functions for inves- processes to implement availability management. tigating the causes of problems. For example, they must design and implement To maintain IT system availability, period- the IT system availability and measure, monitor, ic IT system reviews about failures and system report, and improve the IT system availability. weaknesses are important. Furthermore, Fujitsu To assist in these availability management regards these investigation functions as being processes, SAView provides a function for important for maintaining availability from the monitoring business services according to their viewpoint of reducing the mean time to repair operation plans. SAView can also maintain activity logs of business services to enable the availability to be visualized. Figure 3 shows two examples of SAView screens. SSQC also assists in availability visualiza- tion by polling to check the service availability and by providing service downtime reports. In addition, SSQC provides the following troubleshooting functions for minimizing service interruptions caused by performance problems in the IT infrastructure: 1) Drill Down View screen SSQC can display detailed IT infrastructure resource information and middleware perfor- mance information from the time a performance (a) Monitoring business services problem arises. Users can compare these values with the values obtained at times of normal oper- ation to see at a glance the cause of the problem. Items showing large fluctuations in value can be considered related to the cause of the problem. Figure 4 shows an example of a Drill Down View screen. 2) Transaction breakdown analysis When SSQC is used together with Fujitsu’s Interstage2) Application Server and Symfoware3) Server, it can detect the location of performance bottlenecks in online transactions. Figure 5 shows an overview of transaction (b) Activity logs of business services breakdown analysis. Figure 3 Example SAView screens.

FUJITSU Sci. Tech. J., 43,3,(July 2007) 337 K. Ishibashi: Maintaining Quality of Service Based on ITIL-Based IT Service Management

Figure 4 Drill Down View screen.

Web server AP server

Request

DB server

IBAS IBAS

■ Monitoring of transaction ■ Analysis of processing time ■ Analysis of processing time throughput and average/ breakdown, in transaction breakdown, in transaction maximum processing times units, at each server units, for Web/EJB at each server applications at each server

IBAS: Interstage Business Application Server

Figure 5 Overview of transaction breakdown analysis.

338 FUJITSU Sci. Tech. J., 43,3,(July 2007) K. Ishibashi: Maintaining Quality of Service Based on ITIL-Based IT Service Management

(MTTR). agreed on and set between the provider and For example, in a certain data center, by recipient. Instead, some SLAs are implicitly set, using SSQC, the cause of a slowdown was detect- especially in in-house IT systems. ed and system operation was restarted in an hour. The problem was caused by an exhaustion 4.1 Management processes of DB temporary area and had also occurred in Because Fujitsu has constructed and the previous year. However, because SSQC was operated many mission-critical IT systems, not in use at that time, the cause was not inves- we have abundant experience of service level tigated and it took 10 hours to restart operation. management. In this case, SSQC reduced the IT system MTTR The ultimate aim of service level manage- to just 10% of the previous value. ment is to maintain and improve the QoS. To These investigation functions were incor- achieve this, the following management processes porated into the first version of SSQC and related to the service level must be continuously distinguish it from other similar products. performed: 1) Monitoring 4. Service level management 2) Reporting The purpose of service level management 3) Reviewing is to maintain and improve the quality of an IT 4) Predicting service. Service level management obtains a 5) Maintaining consensus between a service provider and recipi- The above processes suggest implementa- ent concerning the quality of an IT service and tion of the capacity management and availability monitors, reports, and reviews the quality for a management described above. specified period. We have previously postulated the following The following are some examples of service as service level management processes:4), 5) level management: 1) Determine the SLA. 1) The IT service provision department guar- 2) Determine configurations in accordance antees the maximum response time and with the SLA. reports the monthly response status to the 3) Collect information required for automation IT service users. of on-going processes, regular performance 2) The IT service provision department guar- information, and other information. antees the upper limit for the amount of 4) For short-term problems: service down-time in a month and provides • Detect problems continuous monitoring and improvement to • Identify potential problems uphold this guarantee. • Predict problems The first requirement of service level man- • Generate alerts indicating problem agement is for the service provider and recipient occurrences to reach an agreement and establish a service These processes relate to availability level agreement (SLA). Service level manage- management. ment must then incorporate the SLA into the 5) Write regular service level reports based on IT infrastructure and continually monitor and the SLA. Include predictions concerning the report the service level. The Quality of Service next reporting period. (QoS) provided to the recipient must then be 6) Predict medium-term problems. maintained and improved. This process also relates to availability In practice, not all SLAs are explicitly management.

FUJITSU Sci. Tech. J., 43,3,(July 2007) 339 K. Ishibashi: Maintaining Quality of Service Based on ITIL-Based IT Service Management

7) If required, conduct capacity planning and mation collection, analysis, and reports.6) tuning studies. The SLAs that are subject to service level This process relates to capacity management. management are not limited to performance and 8) Submit an SLA report. availability related items. Information han- 9) Review SLA-related requirements. dled by the ITIL service support components — 10) Change tools and the environment. , , Figure 6 shows the relationships between and — are also used as these processes. indices. Some examples of this information are: Items to note are the regular implementa- 1) Average time required for the service desk tion of information collection, problem detection, managed by the IT service provision depart- reports, and reviews. The processes to be per- ment to resolve incidents reported by IT formed only on demand are capacity planning service recipients and tuning. 2) The number of proactive problem analy- ses performed by the IT service provision 4.2 Provided solutions department SSQC supports all of the above capacity and These types of indices must also be targeted availability related processes, for example, infor- as part of service level management.

START

Configure Determine SLA Publish Change tools/ Review SLA/ report environment requirements

Collect/store Detect/predict Generate performance short-term problems, service level and other data generate alerts report

Predict medium/ long-term problems

Conduct capacity Conduct tuning Generate planning study, study, alert generate report generate report

Once only Continuous On-demand

Figure 6 Service level management processes.

340 FUJITSU Sci. Tech. J., 43,3,(July 2007) K. Ishibashi: Maintaining Quality of Service Based on ITIL-Based IT Service Management

SSQC can handle these types of information using a three-layer architecture. The functions as user information and supports the reporting of of these layers are as follows: this information. 1) Agent layer This layer performs data collection. 5. Architecture • Agent SSQC enables capacity management, An agent is an operation unit installed on avail­bility management, and service level man- a managed server. Agents collect resource infor- agement of an IT system, and SAView enables mation and performance information concerning the visualization of availability. The architec- applications, Web servers, AP servers, DB tures of SSQC and SAView are described below. servers, and the platform operating system itself. Agents also store the collected informa- 5.1 SSQC tion, without changing its format, during periods 5.1.1 Three-layer construction specified for problem analysis purposes. As shown in Figure 7, SSQC is implemented • Browser agent

Enterprise manager • Monitoring • Reporting

Report framework

PDB Distributed DB • Summary data

Data transport path

Manager Data Proxy manager transport path PDB Distributed DB • Detailed data

Data Http transport path path

Agent Browser agent

Trouble- shooting log

PDB : Performance database

Figure 7 SSQC architecture.

FUJITSU Sci. Tech. J., 43,3,(July 2007) 341 K. Ishibashi: Maintaining Quality of Service Based on ITIL-Based IT Service Management

A browser agent is an operation unit installed Either of the following two information on an end-user PC that measures the end-user transfer modes can be selected for the data trans- response time. port paths: 2) Manager layer 1) Push mode: This mode enables just-on-time This layer collects and stores information. information transfer when data is collected. • Manager A proprietary protocol is used to push data A manager is an operation unit installed on up from the lower layers to the upper layer. an admin server in the department. Managers 2) Pull mode: HTTP requests are sent from the gather the information collected by the agent upper layer to the lower layers, and infor- layer and store detailed information. They also mation is pulled up in response to these send summary information to the enterprise requests. This mode enables secure data manager described below. transfer from agents or proxy managers out- Managers also perform polling to collect side the firewall to the internal manager. service activity status information. In many cases, both the enterprise man- • Proxy manager ager and the managers are installed together on A proxy manager is an operation unit that a single server that runs as just one IT service operates on behalf of a manager to collect service management server. activity information and information from agents. As described above, the functions of each Proxy managers are used for two reasons. layer can be customized. For example, the enter- One reason is to distribute the processing prise manager and the managers can be arranged load by collecting information on behalf of by installing the report frameworks of the enter- overloaded managers. The other is to reduce the prise manager in each department server. In number of data transport paths. Especially as this configuration, senior managers can access all a security policy, it is generally recommended systems data from the enterprise manager, and to reduce the number of internal-external paths department managers can use the managers to that pass through a firewall server such as data access systems data only in their departments. transport paths connecting external agents with internal mangers protected by the firewall. For 5.1.2 Distributed database and presentations example, if a proxy manager is set outside the The information collected by SSQC is han- firewall, it can collect information from external dled in a number of forms by agents and stored agents and send it to internal managers through in a distributed database in the enterprise man- a single data transport path. ager and the managers. 3) Enterprise manager layer The collected data is classified by resolution. This layer shows information about the SSQC keeps data having a rather coarse resolu- entire IT system. tion for long-term analysis and fine-resolution • Enterprise manager data for the trouble investigation function. The enterprise manager is an operation unit This data is automatically deleted from the on an enterprise admin server. The enterprise distributed database at the specified expiration manager stores the information sent from the time. managers in each department, holds the report One of the distinguishing points of SSQC is framework, and performs status monitoring and that it collects several types of calculated data for reporting. different purposes in a distributed database. Collected data is sent from the lower layers Table 1 shows the data management to the upper layer through data transport paths. scheme that shows the type of stored servers and

342 FUJITSU Sci. Tech. J., 43,3,(July 2007) K. Ishibashi: Maintaining Quality of Service Based on ITIL-Based IT Service Management

Table 1 Data management scheme. Data form Storage scheme 1-minute resolution Stored at agent system 10-minute resolution Stored on department server and kept for 7 days 1–hour resolution Stored on department server and kept for 6 weeks 1-day resolution Stored on department server and kept for 53 weeks Summary Stored on enterprise server and kept for 3 days (renewed daily)

EJB interface Message interface Business server 2

Systemwalker Centric Manager Linkage to external Systemwalker applications Other systems Availability View Manager

Systemwalker Systemwalker Availability Operation View agent Manager

Systemwalker Centric Manager Systemwalker Centric Manager

IT operation management server Business server 1

Figure 8 SAView architecture.

the retention period of each type of data. Like other Systemwalker products, SAView The report base is the main presentation comprises a manager and agents. function in the enterprise manager. It access- Agents are assigned to each business server es the above distributed database to access and collect batch processing activity informa- the contents to be displayed and analyzed. It tion from Systemwalker Operation Manager.7) extracts and analyzes the required information, The collected activity information is sent to the implements presentations and monitoring, and manager, where it is stored as activity logs. A provides reports for managers. comparison of this information with the business The utilization of these reports completes planning information defined in the manager the series of service level management processes enables batch processing availability to be visual- and enables service level reporting, short-term ized based on plans and actual results. and long-term status analyses, and trouble SAView also has the following interfaces for investigation. collecting activity information concerning other types of processing: 5.2 SAView architecture 1) EJB interface Figure 8 shows the SAView architecture. This interface is provided by the manager

FUJITSU Sci. Tech. J., 43,3,(July 2007) 343 K. Ishibashi: Maintaining Quality of Service Based on ITIL-Based IT Service Management

of SAView. It receives business system activity CMDB. information directly from applications. 3) Dashboard 2) Message interface A dashboard that supports all the This interface receives event messages from components of service delivery — service level Systemwalker Centric Manager on business management, availability management, and servers and admin servers. capacity management — will provide a flexible SAView can collect start and stop informa- and integrated visualization GUI. tion about any business activity by defining event In conjunction with the above product messages. enhancements, we also plan to continue research into service management systems that conform 6. Conclusion to ITIL. In this paper, we described the functions We hope that these activities will lead to provided by SSQC and SAView with reference to even greater benefits for Fujitsu’s customers. ITIL and described the management processes these functions support. We also described the References product architectures required to implement 1) The IT Service Management Forum (itSMF) International: The knowledge Network for IT these processes. Service Management. Currently, SSQC and SAView are accepted http://www.itsmf.org/ 2) T. Kosuge and T. Ishikawa: Interstage: Fujitsu’s by customers as effective tools for performing Application Platform Suite. FUJITSU Sci. Tech. J., ITIL service delivery processes. 43, 3, p.274-284 (2007). 3) T. Goto: Disaster Recovery Feature of Symfoware However, to improve their flexibility and DBMS. FUJITSU Sci. Tech. J., 43, 3, p.301-314 usability, we are planning to add the following (2007). 4) M. Tsykin et al.: Automated Monitoring and SSQC and SAView functions in the future: Reporting of Enterprise Quality of Service. 1) SOA-based architecture support Proceedings of the 7th World Multi-conference on Systemics, Cybernetics and Informatics Support for Service Oriented Architecture (SCI2003), Orlando, July 2003. (SOA) based architectures will enable flexi- 5) M. Tsykin et al.: On Automated Monitoring of SLAs. CMG Journal of Capacity Management, ble access to the information held by the ICT Summer 2002, CMG, p.27-36. infrastructure management functions so 6) K. Ishibashi and M. Tsykin: Management of Enterprise Quality of Service. FUJITSU Sci. this information can be used for service level Tech. J., 40, 1, p.133-140 (2004). management. This support will also make 7) Fujitsu: Systemwalker Operation Manager. http://www.fujitsu.com/global/services/ it easy to implement service level manage- software/systemwalker/products/operationmgr/ ment that is linked to the information held by index.html ITIL service support functions. 2) Flexibility by using a federated CMDB Koji Ishibashi, Fujitsu Ltd. By using system configuration information Mr. Ishibashi received the B.S. degree stored in a federated in Communication Engineering from Osaka University, Osaka, Japan in database (CMDB), our products will improve the 1981. He joined Fujitsu Ltd., Kawasaki, Japan in 1981, where he has been availability of IT systems by providing capabili- engaged in research and development of system management software since ties such as troubleshooting of problems caused 1990. He is currently responsible for by resource faults or by changing the IT system developing Systemwalker Service Quality Coordinator and Systemwalker configuration. Availability View.

In addition, the information held by SSQC and SAView will be able to be used more flexi- bly when it can be provided through a federated

344 FUJITSU Sci. Tech. J., 43,3,(July 2007)