Analyzing Software Component Graphs of Grid Middleware: Hint to Performance Improvement*

Analyzing Software Component Graphs of Grid * Middleware: Hint to Performance Improvement Pingpeng Yuan, Hai Jin, Kang Deng, and Qingcha Chen Service Computing Technology and System Lab Cluster and Grid Computing Lab Huazhong University of Science and Technology, Wuhan, 430074, China [email protected] Abstract. Grid middleware is a kind of important service management tool and is composed of different interaction units. The units of a grid middleware do not interact in random ways and are very well connected. As shown in this paper, irrespective of the specific features of each grid middleware analyzed, the final outcome of grid middleware is a small world, hierarchical component diagram with well-defined statistical properties. These measurements of network are largely independent of the particular the application and indicated key execu- tion path or key elements. Therefore, analyzing grid middleware structure can show a roadmap to tune performance of grid middleware. Based on analysis of those diagrams, the key components of grid middleware are outlined. Keywords: Grid computing, Middleware, Performance, Small world. 1 Introduction Grid middleware is a software stack designed to present disparate computing and data resources in a uniform manner, such that these resources can be accessed remotely by client software without knowing a priori the systems' configurations. Thus, development of grid middleware is a challenging activity that requires considerable expertise. Currently there are some grid middleware, such as Globus [1], UNICORE [2] and gLite [3]. However, due to some reasons, theirs performance is not so good that adop- tion of grid middleware is restricted to few domain. Software, including grid middleware is composed of components which interact with each other. One of the aims of software development is to address the interactions of components. The action of software development leads to emergent software organizations whose structures lie outside the realm of explicit design. Performance of software is highly related with software structure. Understanding the structural features of software systems may provide models, metaphors, and tools to help us tune performance or reach better design. Recently, after the underlying structures of many natural and artificial systems have been found to share many scale-free and small-world qualities, software systems * This paper is supported by National Science Foundation of China under grant No.90412010. A. Bourgeois and S.Q. Zheng (Eds.): ICA3PP 2008, LNCS 5022, pp. 305–315, 2008. © Springer-Verlag Berlin Heidelberg 2008 306 P. Yuan et al. are identified as another important class of complex networks. Some research declared that the internal structures of software programs exhibit scale-free properties. For example, Myers [4] studied six open source software projects and showed that the networks formed by their class collaboration or call graphs showed approximate scale-free properties. Although, the research mentioned above declared properties of software networks were scale-free, few research based on software network analysis were performed on indicating how to improve performance. The objective of the research is to enhance comprehension about the nature and the performance factors of grid middleware by quantitative measurement and structural analysis of existing grid middleware, and try to give some suggestion on performance optimization of grid middleware during development. Section 2 introduces the related work. Next section presents software graph and its topological measurement. In section 4, three kinds of grid middleware are introduced briefly. Section 5 firstly introduces how to obtain component graph from existing grid middleware, then describes statistics of component graphs of grid middleware. At the sane time, some discussion and comments about the implication on grid middleware development are presented. Finally, section 6 concludes the paper. 2 Related Work Software is built up out of many interacting units and subsystems at many levels of granularity: subroutines, classes, source files, libraries, et al. Performance of software systems is tightly related with measures of software. Early measures of software were centered in intra-module aspects like program length or number of lines of code (LOC). Those measures cannot reflect complicated relationship between software units. Recently, there is a growing interest in analyzing software structure or software architecture measurement (inter-module). So, graph theory, which studies the properties of graphs, has been widely accepted as a tool to analyze software unit diagrams. The study of graph properties can be valuable in many ways for understanding the characteristics of the underlying software systems. Current graph theory based research mainly focused on indicating statistical characteristics of software unit diagram. Many researches have found software unit diagram is scale-free network. Nathan LaBelle and Eugene Wallingford [5] analyzed complex networks in open-source software at the inter-package level and showed that the coupling of modules at this granularity creates a small-world and scale-free network. Valverde et al [6] presented the evidence for the emergence of scaling in software architecture graphs from a well-defined local optimization process. Alex Potanin et al [7] examined the graphs formed by object-oriented programs written in a variety of language, and showed that these turn out to be scale-free networks. Alexander Chatzigeorgiou et al [8] presented four different applications of graph theory concern- ing: the identification of God classes, clustering, detecting of design patterns and scale-freeness of OO systems. Spiros Xanthos [9] used spectral graph partitioning for identifying dense communities of classes (clusters) within an object-oriented software system. To improve the underlying development processes, some research paid attention to software evolution. Sergi Valverde and Ricard V. Solé [10, 11] analyzed some large Analyzing Software Component Graphs of Grid Middleware 307 software applications and found software evolution is a small world. Luis Lopez- Fernandez et al [12] proposed the use of social network analysis for characterizing libre software projects, their evolution over time and their internal structure. Rajesh Vasa [13] presented recurring high-level structural and evolutionary patterns that had been observed in a number of public-domain object-oriented software systems and defined a simple predictive model that could aid developers in detecting structural changes. de Moura et al [14] showed that due to software growth in time, especially as a result of performance optimization of the program, software unit network has a small-world structure, as a consequence, to optimize language runtime systems and improve the design of future OO languages. Other research investigated community network of developers. Patrick Adam Wagstrom et al [15] analyzed several methods of communication, a social networking site, project mailing lists, and developer weblogs, to understand the social network structure behind Free and Open Source Software (F/OSS) projects. This social network data was used to create a model of F/OSS development. Yongqin Gao and Greg Madey [16] studied the community network of the SourceForge.net to understand the open source software movement, to gain insights of the network development and its influence to individual development. 3 Software Graphs Software is composed of many interacting units and subsystems at many levels of granularity. The interactions and collaborations of those units can be used to define graphs that form a description of a system. Here we use a broader unit definition that depends on the different granularity level. In package level, the unit is package, in class and method level, the units are class and method respectively. Unit dependency represents ways of information transfer between components. Designing software involves the definition of an information flow traversing a chain of related components. High software performance means efficient information transferring between components. This requires understanding the interactions among software units. The interactions between software units are multidimensional and multifaceted, and representation of a software system typically involves a very complex space of interactions. In order to understand the interaction network, a graph G=(V, E) for the software under consideration is defined. Let V={vi} (i = 1, …, N) be the set of nodes and L={(vi, vj)} the set of links. Node of unit diagrams maps to a function or proce- dure in method level, a class in class level and a package in package level. Topologi- cal measurements of software graph include degree distribution, clustering, and betweenness. Here, we focus on those measurements related with performance. Those measurements include degree and betweenness. In the following, these measurements are presented briefly. 3.1 Degree in For each node i in a unit graph, there is both an in-degree ki and an out-degree out ki . The performance of a unit is related with the performances of its dependency. 308 P. Yuan et al. The in-degree and out-degree of a method are the numbers of called and calling. The in-degree and out-degree of a class are the sum of in-degree and out-degree of its methods respectively. Similarly, the in-degree and out-degree of a package are the sum of in-degree and out-degree of its classes. The in- and out-degree distributions Pin(k) and Pout(k) indicate the probability of finding a node with a specified in-degree or out-degree k, respectively, in a given graph. Many complex networks have recently been found to possess a power law distribution, where the probability, P(k), of a particular node having a certain number of connections, k, decays with the power law P(k)~k-γ, where generally 2<γ<3. 3.2 Betweenness Betweenness plays a key role in the characterization of complex networks, which can be used to quantify a node’s importance. Betweenness of a vertex measures the con- trol which a vertex has over interaction in the network, and can be used to identify key actors in the network.

Load more