IGI Publishing

IJGHPC Editorial Board Editors-in-Chief: Emmanuel Udoh, Indiana U. and Purdue U., USA Frank Zhigang Wang, Cranfield U., UK

Associate Editors: Eric Aubanel, U. of New Brunswick, Canada Nik Bessis, U. of Bedfordshire, UK Wolfgang Gentzsch, EU Project DEISA and BoD at Open Grid Forum, Germany Hai Jin, Huazhong U. of Science and Technology, China Daniel Katz, U. of Chicago, USA Stan Kurkovsky, Central Connecticut State U., USA Maozhen Li, Brunel U., UK Karim Rezaul, Guildhall College, UK Ruth Shaw, U. of New Brunswick, Canada Jia Zhang, Northern Illinois U., USA

IGI Editorial: Heather A. Probst, Director of Journal Publications Jamie M. Wilson, Journal Development Editor Ron Blair, Journal Editorial Assistant Chris Hrobak, Journal Production Lead Gregory Snader, Production Assistant Brittany Metzel, Production Assistant

International Editorial Review Board:

Nabil Abdennadher, U. of Applied Sciences, Robert Hsu, Chung Hua U., Taiwan Switzerland Yiming Hu, U. of Cincinnati, USA Khalid Al-Begain, U. of Glamorgan, UK Minglu Li, Shanghai Jiao Tong U., China Nick Antonopoulos, U. of Derby, UK Mariusz Nowostawski, U. of Otago, New Zealand David Al-Dabass, Nottingham Trent U., UK Stephen L. Scott, Oak Ridge National Lab, USA Giovanni Aloisio, U. of Salento, Italy Athina Vakali, Aristotle U. of Thessaloniki, Greece Jose Manuel Garcia Carrasco, Universidad de Chao-Tung Yang, Tunghai U., Taichung, Taiwan Murcia, Spain Yan Yu, SUNY at Stony Brook, USA Yeh-Ching Chung, National Tsinghua U., Taiwan Antanas Zilinskas, Institute of Mathematics and Yuhui Deng, Jinan U., China Informatics, Lithuania Noria Foukia, U. of Otago, New Zealand Horvath Zoltan, Eovos Lorand Tudomanyegyetem, Na Helian, Hertfordshire U., UK Hungary

IGIP IGI Publishing www.igi-global.com CALL FOR ARTICLES International Journal of Grid and High Performance Computing An official publication of the Information Resources Management Association!

The Editors-in-Chief of the International Journal of Grid and High Performance Computing (IJGHPC) would like to invite you to consider submitting a manuscript for inclusion in this scholarly journal. The following describes the mission, coverage, and guidelines for submission to IJGHPC.

MISSION: The primary mission of the International Journal of Grid and High Performance Computing (IJGHPC) is to provide an international forum for the dissemination and development of theory and practice in grid and cloud computing. IJGHPC publishes refereed and original research articles and welcomes contributions on current trends, new issues, tools, societal impact, and directions for future research in the areas of grid and cloud computing. This journal is targeted at both academic researchers and practicing IT professionals.

COVERAGE/MAJOR TOPICS: Topics can include (but are not limited to) the following: • Advanced collaboration techniques and scaling issues • Bio-inspired grid resource management ISSN 1938-0259 • Cloud architectures eISSN 1938-0267 • Cloud business process integration Published quarterly • Cloud foundation concepts • Cloud platforms and infrastructures • Cloud reliability and security • Cloud services All submissions should be emailed to: • Cloud standards [email protected] • Cloud types • Combating global terrorism with the world wide grid • Emerging standards for organizations and international • Grid programming, models, tools, and API projects • Grid services, concepts, specifications, and frameworks • Future of grid, trends, and challenges • Grid uses and emerging technology • Grid and software engineering aspects • New initiatives, SOA, autonomic computing, and semantic • Grid architecture, resources, and data management grid • Grid education and applications - science, engineering, and • Simple API for grid applications (SAGA) business • Software and hardware support for HPC • Grid evolution, characterization, and concepts • Test, evaluation, and certificate presentation • Grid fundamentals, algorithms, and performance analysis • Wireless and optical grid, characteristics, and • Grid impact, scientific, and industrial and social implications applications • Grid instrumentation, measurement, and visualization • Work flow management • Grid portals and security

Ideas for Special Theme Issues may be submitted to the Editor-in-Chief.

Please recommend this publication to your librarian. For a convenient easy-to-use library recommendation form, please visit: http://www.igi- global.com/ijghpc and click on the "Library Recommendation Form" link along the right margin. International Journal of Grid and High Performance Computing

October-December 2010, Vol. 2, No. 4

Table of Contents

Research Articles

1 Fuzzy Allocation of Fine-Grained Compute Resources for Grid Data Streaming Applications Wen Zhang, Tsinghua University, China Junwei Cao, Tsinghua University and Tsinghua National Laboratory for Information Science and Technology, China Yisheng Zhong, Tsinghua University and Tsinghua National Laboratory for Information Science and Technology, China Lianchen Liu, Tsinghua University and Tsinghua National Laboratory for Information Science and Technology, China Cheng Wu, Tsinghua University and Tsinghua National Laboratory for Information Science and Technology, China

12 A Semantic-Driven Adaptive Architecture for Large Scale P2P Networks Athena Eftychiou, University of Surrey, UK Bogdan Vrusias, University of Surrey, UK Nick Antonopoulos, University of Derby, UK

31 A Method of 3-D Microstructure Reconstruction in the Simulation Model of Cement Hydration Dongliang Zhang, Tongji University, China

40 Network Architectures and Data Management for Massively Multiplayer Online Games Minhua Ma, University of Derby, UK Andreas Oikonomou, University of Derby, UK

51 Managing Inconsistencies in Data Grid Environments: A Practical Approach Ejaz Ahmed, King Fahd University of Petroleum and Minerals, Saudi Arabia and University of Bedfordshire, UK Nik Bessis, University of Bedfordshire, UK Peter Norrington, University of Bedfordshire, UK Yong Yue, University of Bedfordshire, UK

65 Evaluating Heuristics for Scheduling Dependent Jobs in Grid Computing Environments Geoffrey Falzon, Brunel University, UK Maozhen Li, Brunel University, UK

International Journal of Grid and High Performance Computing, 2(4), 1-11, October-December 2010 1

Fuzzy Allocation of Fine-Grained Compute Resources for Grid Data Streaming Applications

Wen Zhang, Tsinghua University, China Junwei Cao, Tsinghua University and Tsinghua National Laboratory for Information Science and Technology, China Yisheng Zhong, Tsinghua University and Tsinghua National Laboratory for Information Science and Technology, China Lianchen Liu, Tsinghua University and Tsinghua National Laboratory for Information Science and Technology, China Cheng Wu, Tsinghua University and Tsinghua National Laboratory for Information Science and Technology, China

ABSTRACT

Fine-grained allocation of compute resources, in terms of configurable clock speed of virtual machines, is essential for processing efficiency and resource utilization of data streaming applications. For a data streaming application, its processing speed is expected to approach the allocated bandwidth as much as possible. Automatic control technology is a feasible solution, but the plant model is hard to be derived. In relation to the model free characteristic, a fuzzy logic controller is designed with several simple yet robust rules. Performance of this controller is verified to out-perform classic controllers in response rapidness and less oscillation. An empirical formula on tuning an essential parameter is obtained to achieve better performance.

Keywords: Data Streaming, Fine-Grained Allocation, Fuzzy Control, Grid, Resource Allocation

1. INTRODUCTION Such applications, called grid data streaming applications, require the combination of band- Grid (Foster & Kesselman, 1998) is now playing width sufficiency, adequate storage and com- a major role in providing on-demand resources puting capacity to guarantee smooth and high- to various scientific and engineering applica- efficiency processing, making them different tions, among which those with data streaming from other batch-oriented ones. A case in point characteristics are gaining popularity recently. is LIGO (Laser Interferometer Gravitational- wave Observatory) (Deelman & Kesselman, 2002), which is generating 1TB scientific data DOI: 10.4018/jghpc.2010100101

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 2 International Journal of Grid and High Performance Computing, 2(4), 1-11, October-December 2010 per day and trying to benefit from processing Owe to the progress of virtualization capabilities provided by the Open Science Grid technology, it is possible to allocate fine- (OSG) (Pordes, 2004). Since most OSG sites are grained compute resources. But the premise is CPU-rich but storage-limited with no LIGO data to determine the required compute resources available, data streaming supports are required according to the needed computing capac- to utilize OSG CPU resources. Such applications ity. Unfortunately, it is not so easy for the are novel in that (1) they are continuous and relationship between the amount of compute long running in nature; (2) they require efficient resources and the generated compute capacity data transfer from/to distributed sources/sinks for a given application is complex because of in an end-user-pulling way; (3) it is often not other influencing factors and it is hard, if not feasible to store all the data in entirety because impossible to be obtained. Or put it another way, of limited storage and high volumes of data to the precise model is unavailable. It is natural to be processed; (4) they need to make efficient resort to classical control theory to solve such use of high performance computing (HPC) a tracking or regulation problem as has been resources to carry out compute-intensive tasks done in computing field, but for the absence of in a timely manner. Great challenge is proposed precise models, the classical controllers are just to provide sufficient resources, including com- baffled. Fortunately, fuzzy logic control theory pute, storage and bandwidth to such streaming provides an alternative which requires not the applications so that they can meet their service precise models but only some experiences level objectives (SLOs) while maintaining high of human beings. In this paper, a fuzzy logic resource utilization. controller (FLC) is designed with some simple Just like other grid applications, resource but robust fuzzy rules to decide the amount of allocation is essential to achieve high efficiency compute resources for the expected comput- of data processing for streaming applications. ing capacity, so as to realize the fine-grained But different from the conventional batch- compute resource allocation for data streaming oriented applications, processing efficiency of applications, which will guarantee service level data streaming applications is co-determined agreements (SLAs) while maintaining high by compute capacity, bandwidth to supply data resource utilization. in real time and storage. Just as proven in our The rest of this paper is organized as fol- previous work (Zhang & Cao, 2008), compute, lowing: Section 2 formulates an optimization bandwidth and storage must be allocated in a problem and proposes the necessity of fine- cooperative and integrated way. But at that grained compute resources allocation, which time, emphasis was laid on allocation of band- is resolved with fuzzy controller described width and storage. As for compute resources, in Section 3. Some experimental results are they were just allocated in a coarse-grained provided in Section 4, to justify the fuzzy al- way, i.e., each application was assigned to a location. The next section overviews the related processor exclusively, which may cause waste research in this field and this paper is concluded of compute capacity for the limitation of data in the last section. supply speed. In some cases, end users must pay for the compute resources they occupy even if they cannot make full utilization of them. So, 2. PROBLEM FORMULATION it is desirable to allocate fine-grained compute In a data streaming scenario, data in remote resources for each application, i.e., to allocate sources will be transferred to local storage, read just enough compute resources to guarantee by processing program one tuple by another and smooth processing. Compute resources should deleted. From a macroscopic viewpoint, data also be assigned on demand, and unilateral is just processed in a form of tuple streams. redundancy of them makes no sense, only to waste users’ budget.

The amount of data in storage varies over Tf max P t dt (1) time and can be described as following: ∫ 0 i ()

Q () t = ()t− P ()t,∀ t ≥ 0 i I i i or the total throughput for all types of data:

Qi ()0= 0 n n Tf Tf   max P t dt = max  P t dt ∑ ∫ 0 i () ∫ 0 ∑ i () i=1  i=1  Where Qi(t), Ii(t) and Pi(t)stand for the data amount in storage, assigned bandwidth and , processing speed for data type i(i=1,…,n) at time t, and n is the number of applications run- or, if each type of data processing has different ning simultaneously in a shared computing privileges or weights:  infrastructure. Qi () t is the derivative of Qi(t), n T T  n  which reflects the integrated effects of data f f   max wiP i () t dt = max  wiP i () t dt ∑ ∫ 0 ∫ 0 ∑  supply and processing. i=1  i=1  Data processing programs run constantly to process the available data. If no data is locally available, they will be idle which wastes com- where Tf is the evaluation time span and wi is putational resources. So Pi(t) can be described as the weight of data type i. Another goal is to minimize the cost of  computation, as described below  0Qi () t = 0 P() t =  i  > 0Qi () t > 0  Tf min c t C t dt (2) ∫ 0 i() i () Usually, the total data amount in storage is limited, i.e., where ci(t) stands for the price of compute resource and C (t) is the allocated compute n i Q t ≤ S resource. Or it can be expressed to minimize ∑ i () i=1 the total cost of all the applications

n n Tf Tf   where S is the total available storage. min c t C t dt = min  c t C t dt ∑ ∫ 0 i() i () ∫ 0 ∑ i() i () i=1  i=1  Ii(t) represents the data transferring speed with the following constraint:

n It is obvious that as long as Pi(t) can be I t ≤ I t ∑ i () () adjusted to follow Ii(t) precisely, both goals i=1 can be achieved. Unfortunately, the map from compute resource to compute speed, denoted where I(t) is the total available local bandwidth. as G, cannot be established precisely because The optimization goal is to allocate ap- of the uncertainties and stochastics where propriate computing capacity, i.e., Pi(t) for each application to maximize the throughput Pi () t = GC()i ()t of each data type:

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 4 International Journal of Grid and High Performance Computing, 2(4), 1-11, October-December 2010

As a solution, fuzzy control is introduced to be dynamically configured, VMs facilitate to determine the precise amount of compute the fine-grained compute resource allocation. resources in real time, to achieve the two goals Virtualization technology, namely Xen as set above. will be applied to create a virtual machine with configurable clock frequency for each application. Cap, one of Xen’s interfaces regarding 3. FUZZY ALLOCATION CPU allocation will be adjusted dynamically OF FINE-GRAINED according to the allocated network bandwidth COMPUTE RESOURCE and measured processing speed based on the pre-defined fuzzy rules as following. As can be inferred, ultimate throughput of a data streaming application and cost of compu- 3.2. Fuzzy Controller Overview tation are determined mainly by the allocated bandwidth and computing resources or, more A fuzzy logic controller can be depicted with the exactly, data supply and processing speeds. This following diagram in Figure 1. With two inputs, paper is mainly focused on allocation of compute one for the error between the reference output resource, and details of bandwidth and storage and the realistic output, the other for the error’s allocation have been elaborated in our previous derivative, through fuzzy inference based on work (Zhang, Cao, Zhong, Liu, & Wu, 2008). fuzzy rules via fuzzification and defuzzification, But the relationship between allocated compute some control law will be generated. resource quota and processing speed of a certain Coefficients such as Ke, Kec are called application is not so straight forward, for there quantization factors and Ku is the proportional are so many factors influencing the processing factor, which are responsible for mapping inputs capacity of the given compute resources. Pre- and output to the given scope of discourse re- cise mathematical model of this relationship is spectively. Fuzzification is to transform precise hard, if not impossible, to be derived. On the values of inputs into fuzzy sets with correspond- other hand, sometimes such precise models are ing membership functions, which is indispens- not indispensable and some situations can be able for fuzzy inference. Outputs of fuzzy infer- handled with experience of human beings. It ence are fuzzy sets, which must be transformed is natural to resort to fuzzy control theory for into a clear value by defuzzification. it can work smartly according to pre-defined The fuzzy controller’s action is guided by fuzzy rules rather than precise models. a set of fuzzy rules, which are stored in a rule base as a part of the controller. These rules are 3.1 Fine-Grained usually in IF-THEN formats defined in terms Compute Resource of linguistic variables, different from the numerical controllers. These linguistic variables Virtualization technology has been applied to are a natural way resembling human thoughts Grid computing field (Figueiredo & Dinda, to handle uncertainties created by stochastics 2003; Keahey & Doering, 2004; Foster & present in most computer systems. Freeman, 2006). Owe to the virtualization Each linguistic variable is related to with technology progress, such as Xen (Barham & a linguistic value, such as NB, NM, NS, O, Dragovic, 2003), it is possible to allocate fine- PS, PM and PB, where N, P, B, M, S and O grained compute resources for applications. are abbreviations of negative, positive, big, Virtual machines (VMs) are able to instantiate medium, small and zero respectively, and the multiple independently configured guest envi- combination of them just takes on a degree of ronments on a host resource at the same time, to truth. The mapping from a numeric value to a provide performance isolation. With the ability degree of truth for a linguistic value is done by the membership function.

3.3. Assignment of Linguistic and increase respectively. The triangular mem- Variables and Fuzzy Rules bership functions are demonstrated in Figure 3. In this paper, linguistic variables of inputs, i.e., Fuzzy rules are defined as following: e and ec include negative, zero and positive, which means lower than, equal with and higher ① IF e is zero THEN u is no-change; than the given reference value respectively, ② IF e is negative AND ec is negative THEN and their Gaussian membership functions are u is inc-fast; demonstrated in Figure 2. ③ IF e is negative AND ec is positive THEN Linguistic variables of output include dec- u is inc-slow; fast, dec-slow, no-change, inc-slow and inc-fast, ④ IF e is positive AND ec is negative THEN where dec and inc are abbreviations of decrease u is dec-slow;

Figure 1. Diagram of fuzzy logic controller

Figure 2. Gaussian membership functions of inputs

Figure 3. Triangular membership functions of output

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 6 International Journal of Grid and High Performance Computing, 2(4), 1-11, October-December 2010

⑤ IF e is positive AND ec is positive THEN integral, and derivative (PID) controller and a u is dec-fast. fuzzy controller are applied respectively. As mentioned above, the key for the opti- 3.4. Fuzzy Controller Design mization goals defined in (1) and (2) is to make the generated processing speed approach the Data streaming applications and correspond- bandwidth allocated for each application. So ing fuzzy control can be depicted in Figure 4. this optimization is transformed into a tracking As can be seen, data is streamed into the local problem, which is very popular in automatic storage with given bandwidth, and processing control field. As well known, the step functions programs fetch data tuples from storage. From or the composition of them are hard to track macroscopic view, the data amount (DA in Fig- and they are usually used as the benchmark to ure 4) in storage is the integral of the difference verify the control algorithm. Here, the allocated between data supply (in terms of bandwidth) and bandwidth for a given application is set as a cleanup (i.e., data processing). As mentioned composition of three step functions as following above, as long as the amount of data in storage is upper than a certain value, processing will B() t =u() t + u() t −20 −1.( 5u t −30), 100 ≥ t ≥ 0 be running constantly and compute resources will be utilized. So if and only if the amount of data in storage is kept upper than the given where u(t-t ) stands for a step function jumping value and appropriate compute resources are 0 from 0 to 1 at time t (t =0, 20 and 30 respec- allocated, both optimization goals list in (1) 0 0 and (2) can be achieved. Then as long as the tively), as shown in Figure 5. It means that the data amount in storage at any time can be kept allocated bandwidth for an application jumps at around the pre-defined Ref in Figure 4, the certain moment at keep constant again. data processing speed can keep abreast with A fuzzy controller with rules defined in 3.3 the data supply. So, the data amount in storage is applied, and the performance is shown in will be monitored and its difference with the Figure 6 and Figure 7 respectively. Ref is just the e as the input of FLC, and the As can be observed, in the initial stage, latter will generate the processing speed, i.e., there are some oscillations in the generated p in Figure 4. computing capacity calculated by the fuzzy controller, but after that the generated computing capacity follows the bandwidth precisely. 4. EXPERIMENTAL RESULTS As for the observed data amount in storage, it will be reach the settled reference and fixed. To verify the fuzzy allocation algorithm of To verify the FLC’s performance, a PID fine-grained compute resources for grid data controller replaces the FLC in Figure 4, whose streaming applications, some experiments transfer function can be expressed as are carried out where a classical proportional,

Figure 4. Diagram of control system

Figure 5. Bandwidth of data supply

Figure 6. Generated compute capacity (FLC’s output)

Figure 7. Amount of data in storage with FLC

where T1>>T0. Here, let T1=1, T0=0.1. Other ki CC()s= kp + + kd s parameters are kp=5, ki=1,kd=5. s The generated computing capacity calculated by the PID controller is shown in Figure where k , k , and k are the coefficients of p i d 8, while the observed amount of data in storage proportional, integral and derivative functions is shown in Figure 9. respectively. Physically, the derivative part will As can be seen, both curves can track the be replaced by an approximate implementation, given values; however there are still some oscil- e.g., the transfer function can be re-written as lations, which means that the performance of such a PID controller is not so ideal. Of course

ki T1 s careful adjustment of parameters, including k , CC()s= k + + k p p d k , and k may lead to a better performance, but s T0 s + 1 i d it is human labor intensive.

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 8 International Journal of Grid and High Performance Computing, 2(4), 1-11, October-December 2010

Figure 8. Generated compute capacity (PID controller’s output)

Figure 9. Amount of data in storage with PID controller

As shown in Figure 6 to Figure 9, the t−∆ t B()t d t fuzzy logic controller outperforms the classic ∫ t− n∆t PID controller. There is less oscillation in the ku () t = 2 (3) n∆ t compute capacity generated by the FLC than in that by the PID controller. It means that the FLC can track the allocated bandwidth more which substitutes the current bandwidth with its precisely to obtain the goals in (1) and (2). As average in the nearest past, where △t is a small for the data amount in storage, the FLC can period of time. For the piecewise continuous reach the stable status in shorter time and there bandwidth, Ku determined in formula (3) will is no oscillation while the PID controller takes guarantee the tracking performance. longer to converge to the set reference and there Actually, this empirical formula has some- are oscillations. The experimental results justify thing to do with parameter settling in the fuzzy the FLC and its fuzzy rules here. controller and it will not hold true in all occa- Actually, not all the fuzzy controllers can sions, so it cannot be generalized arbitrarily. achieve so good performance. The key point is to set the proportional factor, i.e., ku properly. An empirical formula is obtained as 5. RELATED WORK As demonstrated in our previous work, compute, ku () t = 2B() t bandwidth and storage should be allocated for data streaming applications in a cooperative way. Most resource allocation infrastructures which means that this proportional factor must available in the grid filed, such as Legion be adjusted in real time to double the bandwidth. (Chapin & Katramatos, 1999), Nimrod/G In physical implementation, this formula can (Buyya & Abramson, 2000) and Condor be rebuilt as (Litzkow & Livny, 1988), are largely geared

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Grid and High Performance Computing, 2(4), 1-11, October-December 2010 9 to support batch-oriented applications rather ment the so-called performance container and than the streaming ones, i.e., they just allocate compute throttling framework, to realize the compute resources regardless of cooperation of “controlled time-sharing” of high performance data supply and processing. Some schedulers compute resources, i.e., fine-grained CPU al- are developed to support data streaming ap- location. System identification is carried out to plications, such as E-Condor, GATES (Chen establish the model of controlled object and a & Agrawal, 2006), and Streamline (Agarwalla proportional and integral (PI) controller is ap- & Ahmed, 2006), but they mainly concern on plied. This work has similar motivation with us, computational resource allocation. What’s but in the data streaming scenario, it is hard to more, they just allocate compute resources in produce a precise model from allocated band- a coarse-grained scale. width and compute resource to the generated Control theory has been successfully utilization or throughput as explained above, applied to control performance or quality of so the classical controllers are not suitable, and service (QoS) for various computing systems. then we resort to fuzzy control theory, which An extensive summary of related work is is model free. But in essence, our approach is presented in the first chapter of (Hellerstein & also a feedback controller. Diao, 2004). Some control types, such as PID Cloud computing (Carr, 2008) is getting control (Abdelzaher & Shin, 2002; Parekh & prosperous now and some entities allege that Gandhi, 2002), pole placement (Diao & Gandhi, they can provide compute resources on-demand. 2002), linear quadratic regulator (LQR) (Diao A case in point is Amazon elastic cloud com- & Gandhi, 2002) and adaptive control (Kamra puting. But actually they just provide compute & Misra, 2004; Lu & Lu, 2002) are proposed. resources with coarse granularity. It is left to Most of them require precise models of con- the end users to decide the amount of compute trolled objects. resources, but on the other hand, it is very dif- The first application of fuzzy control was ficult for the users to estimate their requirements. introduced into industry (King & Mamdani, So the users may tend to apply more resources 1974). Fuzzy control is also a research topic in than they really need, which will cause resource computing system (Diao & Hellerstein, 2002; Li under-utilization and budget waste. In such & Nahrstedt, 1999), but it is mainly focused on cases, it is desirable to allocate just enough rather admission control to get a better QoS. Adaptive than redundant compute resources to end users fuzzy control is applied for utilization manage- without their participation, i.e., automatically ment of periodic tasks (Suzer & Kang, 2008), by some control mechanism, just as what we where the utilization is defined as the ratio of do in this paper. the estimated execution time to the task period. Fuzzy inference is carried out on fuzzy rules to decide the threshold, a point over which the 6. CONCLUSION QoS of tasks should be degraded or even tasks Virtualization technology makes resource al- should be rejected. But the estimated execution location of grid computing more flexible with time must be provided, which is a challenge better granularity. Fuzzy control theory can for it plays an important role in this control be applied to allocate just enough compute algorithm. What’s more, in some cases, QoS resource for data streaming applications. For cannot be degraded because the tasks cannot the model-free characteristic of fuzzy logic be decomposed, which limits its wide use in controllers, it eliminates the intensive human more fields. labor in parameters’ tuning. What’s more, with The latest relevant work (Park & Hum- simple yet robust linguistic rules, fuzzy logic phrey, 2008) is focused on providing predictable controllers achieve better performance than execution so as to meet the deadlines of tasks. classic controllers. Virtualization technology is applied to imple-

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 10 International Journal of Grid and High Performance Computing, 2(4), 1-11, October-December 2010

ACKNOWLEDGMENT Deelman, E., Kesselman, C., Mehta, G., Meshkat, L., Pearlman, L., Blackburn, K., et al. (2002). GriPhyN This work is supported by National Science and ligo: building a virtual data grid for gravitational wave scientists. In Proceedings of the 11th IEEE Int. Foundation of China (grant No. 60803017), Symp. on High Performance Distributed Computing Ministry of Science and Technology of (pp. 225-234). China under the national 863 high-tech R&D Diao, Y., Gandhi, N., Hellerstein, J. L., Parekh, S., & program (grants No. 2006AA10Z237, No. Tilbury, D. M. (2002). MIMO control of an apache 2007AA01Z179 and No. 2008AA01Z118), web server: modeling and controller design. In Pro- Ministry of Education of China under the proceedings of the American Control Conf. gram for New Century Excellent Talents in Uni- Diao, Y., Hellerstein, J. L., & Parekh, S. (2002). versity and the Scientific Research Foundation Using fuzzy control to maximize profits in service for the Returned Overseas Chinese Scholars, level management. IBM Systems Journal, 41, 3. and the FIT foundation of Tsinghua University. doi:10.1147/sj.413.0403 Figueiredo, R. J., Dinda, P., & Fortes, J. (2003). A case for grid computing on virtual machines. In REFERENCES Proceedings of the 23rd Int’l. Conf. on Distributed Computing Systems. Abdelzaher, T. F., Shin, K. G., & Bhatti, N. (2002). Performance guarantees for web server end-systems: Foster, I., Freeman, T., Keahy, K., Scheftner, D., a control-theoretical approach. IEEE Trans. on Paral- Sotomayer, B., & Zhang, X. (2006). Virtual clusters lel andDistributed Systems, 13. for grid communities. In Proceedings of the IEEE Int. Sym. on Cluster Computing and the Grid. Agarwalla, B., Ahmed, N., Hilley, D., & Ramachan- dran, U. (2006). Streamline: a scheduling heuristic Foster, I., & Kesselman, C. (1998). The grid: blueprint for streaming applications on the grid. In Proceed- for a new computing infrastructure. San Francisco, ings of the 13th Annual Multimedia Computing and CA: Morgan Kaufmann. Networking Conf. Hellerstein, J. L., Diao, Y., Parekh, S., & Tilbury, D. Barham, P., Dragovic, B., Fraser, K., Hand, S., Har- (2004). Feedback Control of Computing Systems. ris, T. L., Ho, A., et al. (2003). Xen and the art of New York: Wiley. doi:10.1002/047166880X virtualization. In Proceedings of the ACM Symp. on Kamra, A., Misra, V., & Nahum, E. M. (2004). Operating Systems Principles. Yaksha: A self-tuning controller for managing the performance of 3-tiered web sites. In Proceedings of Buyya, R., Abramson, D., & Giddy, J. (2000). th Nimrod/G: an architecture for a resource manage- the 12 IEEE Int’l. Workshop on Quality of Service. ment and scheduling system in a global computa- Keahey, K., Doering, K., & Foster, I. (2004). From tional grid. In Proceedings of the High Performance sandbox to playground: dynamic virtual environ- Computing ASIA. ments in the grid. In Proceedings of the 5th Int. Carr, N. (2008). The big switch: rewriting the world, Workshop in Grid Computing. from Edison to Google. China: CITIC Press. King, P. J., & Mamdani, E. H. (1974). Application of Chapin, S. J., Katramatos, D., Karpovich, J., & Grim- fuzzy algorithms for control simple dynamic plant. shaw, A. S. (1999). The legion resource management In . Proceedings of the IEEE Control Theory App, system . In Job Scheduling Strategies for Parallel 121(12), 1585–1588. Processing (pp. 162–178). New York: Springer Li, B., & Nahrstedt, K. (1999). A control-based Verlag. doi:10.1007/3-540-47954-6_9 middleware framework for quality of service adapta- Chen, L., & Agrawal, G. (2006). A static resource tions. Communications, 17, 1632–1650. allocation framework for grid-based streaming Litzkow, M., Livny, M., & Mutka, M. (1988). Condor applications. Concurrency and Computation, 18, – a hunter of idle workstations. In Proceedings of 653–666. doi:10.1002/cpe.972 the 8th Int. Conf. on Distributed Computing Systems (pp. 104-111).

Lu, Y., Lu, C., Abdelzaher, T., & Tao, G. (2002). An Pordes, R. (2004). The open science grid. In Proceed- adaptive control framework for QoS guarantees and ings of the Computing in High Energy and Nuclear its application to differentiated caching services. In Physics Conf., Interlaken, Switzerland. Proceedings of the 10th IEEE Int’l. Workshop on Quality of Service. Suzer, M. H., & Kang, K. D. (2008). Adaptive fuzzy control for utilization management. In Proceedings Parekh, S., Gandhi, N., Hellerstein, J. L., Tilbury, D., of the IEEE Int’l. Symp. on Object/Component/ Jayram, T. S., & Bigus, J. (2002). Using control theory Service-oriented Real-time Distributed Computing. to achieve service level objectives in performance management. Real Time Systems Journal, 23, 1–2. Zhang, W., Cao, J., Zhong, Y. S., Liu, L. C., & Wu, C. (2008). An integrated resource management and Park, S. M., & Humphrey, M. (2008). Feedback- scheduling system for grid data streaming applica- controlled resource sharing for predictable escience. tions. In Proceedings of the 9th IEEE/ACM Int. Conf. In Proceedings of the ACM/IEEE conf. on Super- on Grid (Grid 2008), Tsukuba, Japan (pp. 258-265). computing.

Wen Zhang is currently a Ph.D candidate with Tsinghua University. His research covers integrated resource scheduling and management of grid data streaming applications which are more and more popular in science and engineering. Now he is also engaged in cloud computing. Recently, he carries out research and implementation of fine-grained resource allocation for grid and cloud computing with help of control theory based on virtualization technology.

Junwei Cao is currently Professor and Assistant Dean, Research Institute of Information Technol- ogy, Tsinghua University, Beijing, China. He was a Research Scientist at MIT LIGO Laboratory and NEC Laboratories Europe. He received B.S. and M.S. from Tsinghua University in 1996 and 1998, respectively. He got his Ph.D. in Computer Science from University of Warwick, UK, in 2001. He is a Senior Member of the IEEE Computer Society and a Member of the ACM and CCF.

Yisheng Zhong received the B.E. degree from Harbin Institute of Technology in Control En- gineering, Harbin, P.R. China, M.E. degree from the University of Electro-Communications in Electronic Engineering, Tokyo, Japan, and Ph.D. degree from the Hokkaido University in Electrical Engineering, Sapporo, Japan, in 1982, 1985 and 1988, respectively. He worked as a Post-doctorate scholar in Tsinghua University from 1989–1990, and since 1991, he has been with the Department of Automation, Tsinghua University, where he is currently a professor. His research interests include robust control, nonlinear control and electromechanical system control.

Lianchen Liu is currently an Associate Professor of Department of Automation, Tsinghua Univer- sity, Beijing, China. He received the Ph.D. from NanKai University, Tianjin, China. His research interests include large scale scientific resource sharing, distributed computing, etc.

Cheng Wu is a Professor of Department of Automation, Tsinghua University, Beijing, China, Director of National CIMS Engineering Research Center, and Member of Chinese Academy of Engineering. He received his B.S. and M.S. from Department of Automation, Tsinghua University in 1962 and 1966, respectively. His research interests include complex manufacturing system scheduling, grid/cloud applications, etc.

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 12 International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010

A Semantic-Driven Adaptive Architecture for Large Scale P2P Networks

Athena Eftychiou, University of Surrey, UK Bogdan Vrusias, University of Surrey, UK Nick Antonopoulos, University of Derby, UK

ABSTRACT

The increasing amount of online information demands effective, scalable, and accurate mechanisms to manage and search this information. Distributed semantic-enabled architectures, which enforce semantic web technologies for resource discovery, could satisfy these requirements. In this paper, a semantic-driven adaptive architecture is presented, which improves existing resource discovery processes. The P2P network is organised in a two-layered super-peer architecture. The network formation of super-peers is a conceptual representation of the network’s knowledge, shaped from the information provided by the nodes using collective intelligence methods. The authors focus on the creation of a dynamic hierarchical semantic-driven P2P topology using the network’s collective intelligence. The unmanageable amounts of data are transformed into a repository of semantic knowledge, transforming the network into an ontology of conceptually related entities of information collected from the resources located by peers. Appropriate experiments have been undertaken through a case study by simulating the proposed architecture and evaluating results.

Keywords: Collective Intelligence, Distributed Information Retrieval, Domain Ontology, Peer-To-Peer (P2P) Networks, Semantic Web

INTRODUCTION of data aims to improve knowledge discovery and data management. Traditionally, adding The Semantic Web idea (Berners-Lee, Hendler, metadata to resources is a manual and expen- & Lassila, 2001) in unstructured adaptive P2P sive process and probably the main cause of networks is an approach to represent, manage, the slow growth and difficult implementation and retrieve distributed knowledge in an ef- of the Semantic Web. ficient manner. To pursue the Semantic Web Various efforts have been made from the vision, metadata is an essential add-on for P2P research community to create P2P systems information resources. This rich representation that can discover resources efficiently and accurately. Standard P2P searching technologies like Gnutella 2 (Stokes, 2002) produce irrelevant DOI: 10.4018/jghpc.2010100102

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010 13 search results with low precision rate. Kazaa ogy. Peers with similar interests are organised (Liang, Kumar, & Ross, 2004) is a very popular in a concept cluster. Each cluster is assigned a super-peer based application, which appears combination of concepts which best describe to offer better scaling; however it inherits the the peers that belong to the cluster. This network limitations that come with flooding algorithms. organisation aims to efficiently route the query Structured P2P architectures on the other hand, to peers that can satisfy it. This technique uses like Chord (Stoica, Morris, Karger, Kaashoek, broadcasting for query forwarding; when the & Balakrishnan, 2001), CAN (Ratnasamy, query reaches the most appropriate cluster is Francis, Handley, Karp, & Schenker, 2001) and then broadcasted to all peers. The network uses Pastry (Rowstron & Druschel, 2001), guarantee globally known ontologies for clustering its retrieval of existing network resources with the nodes. This approach however is an example use of Distributed Hash Tables (DHTs). The of structured P2P networks. Yongxiang Dou maintenance of a distributed index however and Xiaoxian Bei (2008) also presented a se- comes with high cost, as additional traffic is mantic information retrieval system based on a generated for maintaining the routing informa- hybrid ontology integration approach. In hybrid tion in a highly dynamic environment. approaches, each network source employs its Trying to overcome the problems men- own local ontology but all ontologies are built tioned above, there are a number of P2P upon a global shared vocabulary. The authors research systems (Cudré-Mauroux, Agarwal, are focusing on the problems involved with & Aberer, 2007; Ehrig et al., 2003; Nakauchi, information representation and integration in Morikawa, & Aoyama, 2004; Nejdl et al., P2P semantic retrieval networks. This research 2002; Tatarinov & Halevy, 2004) that try to and the system proposed by the authors are in encompass the semantic technology notion, to preliminary stages and no experiments have represent meaning and knowledge, as well as proven the validity of the approach yet. to use reasoning for retrieving the knowledge. Summarising, P2P networks suffer from Knowledge needs to be encoded in a structured low quality of results, large amounts of poorly form to become widely accessible. An ontology managed data and increased network traffic. structure (Jepsen, 2009) is an important part of The main motivation of this work is to attempt the semantic web as it represents knowledge at and resolve these issues by comprising the the level of concepts; ontology provides a shared Semantic Web idea in unstructured adaptive understanding through conceptualisation. In P2P networks. The importance of this work is P2P systems, a domain specific ontology can reflected in the challenge to pursue the Semantic be used for classifying network resources to Web vision by understanding the semantics of ontology related concepts. Related research the data, to provide fault tolerance and scal- methods, are trying to exploit the benefits ability improvements. This can be achieved by provided by semantics through an ontology adapting P2P architectures and combining the structure in order to enhance data management Semantic Web technologies with P2P networks, and to improve resources discovery. As a result enhancing in this way the P2P performance in of that, the query can also be categorised to a terms of accuracy, speed and traffic. specific concept and be routed to the peer that Consequently, the current research focuses supports the specific concept. on creating a dynamically adaptive semantic- An ontology-based P2P topology for ser- driven P2P topology for managing and dis- vice discovery has been described by (Schlosser, tributing knowledge in an efficient manner. Sintek, Decker, & Nejdl, 2003), where the The main aim is to transform the resources of network is organised in a HyperCup topology. a large scale P2P system from unmanageable Information or services that peers provide are amounts of data into a structured knowledge categorised to general concepts; these concepts based repository of indexed resources. The and their relationships form the network ontol- implementation of a Collective Intelligence (CI)

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 14 International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010

(Segaran, 2007) system was necessary for this the network traffic and improve the resource work, where raw peer data was transformed to discovery experience. semantic network knowledge. The CI method Peer-clustering approach was used both by employs a number of algorithms for mining (Crespo & Garcia-Molina, 2005) and (Löser, resources’ metadata and transforming it to an Naumann, Siberski, Wolfgang Nejdl, & Thaden, ontology topology of related concepts. More 2004). In the former, Crespo et al. presented specifically, the proposed system incorporates the Semantic Overlay Network (SON) for P2P a two layer architecture where the upper layer which is formed by semantically related peers. provides a conceptual representation of the The authors use a predefined fixed hierarchy system. Through CI, the resources of the system which is used when forming the overlay net- in the lower level are harvested and the network work into semantically related clusters. In the knowledge is represented through an ontology latter approach, the peers are organised in a which is maintained by the super-peers of the super-peer network in order to better improve network. The network employs an adaptive routing efficiency. Additional work (Li, Lee, & topology which varies based on the network re- Sivasubramaniam, 2004) on peer organisation quirements. The system initially starts as a mesh using semantic overlays is based on the idea topology, but as the network size increases and of “small world networks”, where peers are the workload on each super-peer changes, the clustered according to the semantics of their network converts in a semantic graph structure local data and self-organised as a small world in order to meet the needs of the network. The overlay network. network transformation provides a more flexible Similarly, for improving the efficiency topology, where P2P technologies combined of information retrieval in P2P networks (Ko- with the network ontology provide effective bayashi et al., 2009) proposed a query driven information management and resource retrieval. P2P system where dynamic peer clusters are The article is organised as follows: in the created based on query characteristics. The next section the related work is discussed. Then, dynamic clusters are created when particular the proposed semantic-driven architecture and network resources with similar features become the use of collective intelligence for building more popular. Initially, the network consists of the network knowledge is explained, giving as a number of static clusters, each controlled by a a case study the creation of music ontology. The leader peer. Each leader is responsible in decid- final sections include the performance evalua- ing when a new dynamic cluster needs to be tion of the proposal and the paper closes with created by evaluating the query features. Even the conclusion and the future work. though dynamic clusters can be very effective in efficiently retrieving resources, if the dynamic clusters created are ephemeral, then clusters TOWARDS A SEMANTIC could end up being ineffective. The dynamic P2P NETWORK clusters need to be time persistent in order to be useful in query search; otherwise they just Query routing efficiency and information create an extra overhead to the network. The retrieval accuracy are considered important above drawback can be dealt by evaluating time aspects when evaluating the performance of persistence and not only query characteristics P2P distributed systems, because they can be before creating a dynamic cluster. considered as the key to the scalability of the Furthermore, in this proposal the cluster P2P system. Flooding algorithms and random leader nodes are connected with each other selection algorithms perform poorly as far as forming a mesh topology; this type of connec- accuracy and efficiency is concerned (Tsou- tion organisation can affect the scalability and makos & Roussopoulos, 2003). Peer clustering maintainability of the network. Even though a proposed solutions are attempting to reduce mesh topology has the advantages of directly

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010 15 exchanging information between the peer lead- A SEMANTIC-DRIVEN ers and eliminating the traffic problems by ARCHITECTURE creating smaller paths for query propagation; it comes also with disadvantages like latency in The philosophy underlying the semantic-driven information retrieval due to the bigger number architecture resides in the detail where instead of records in the index storage, and expensive of searching the network peers directly to find maintenance due to the messages sent among a resource, the super-node(s) forming the on- the leader peers for updating their information. tology concepts are searched instead. This can The semantic-driven proposal tackles the above be feasible by clustering the resources based problems by adapting a more flexible topol- on their concepts and matching them to the ogy, where the network transforms to the least extracted concept of the user’s query. There- expensive and more effective topology based fore, the semantic-driven architecture can be on the timely network needs. achieved by automatically assigning concepts HP Company’s pSearch (Tang, Xu, & to resources, within a two layer network topol- Dwarkadas, 2003) is a prototype P2P IR system. ogy, and by using domain specific ontology for It uses a CAN (Content Addressable Network) conceptually defining user’s queries. topology to create a semantic overlay, where resources are organised based on their semantics. System Architecture The semantic vector of a document is used as the key to store the document index in the CAN. Using super-peers in a P2P system is like em- Documents semantics are generated using latent bedding the benefits of a centralised topology semantic indexing (LSI) algorithm. This system, in a distributed system. Super-peer networks however, has low dimensionality of only few have the advantages of locating rare resources hundred semantic spaces, which means that all and being least expensive in terms of network resources of the system can only be matched to traffic. In contrast with flat architectures that the specific semantic spaces (concepts). In other uses blind search techniques that do not scale and words, the network topology is limited by the produce a lot of traffic. Random flat algorithms number of semantic spaces, since it is directly which come as a variation considerably reduce connected to it. Similar limitations are faced the number of messages sent but still the success by the system proposed by (Yamato & Sunaga, depends on the random choices made. Neither 2006) where again a CAN topology is used for structured approaches satisfy the requirements managing resources semantics. of the proposed system since they come with Liu, Antonopoulos and Mackin (Liu, high maintenance overheads (Tsoumakos & Antonopoulos, & Mackin, 2007) provide a Roussopoulos, 2003). P2P algorithm for resource discovery. The The architecture of the proposed system fol- P2P network can be seen as mimicking the lows a hierarchical two-layer approach consist- social behaviour of people where, people are ing of peers and super-peers. The higher level of represented by peers and relationships by their super-peers defines the network’s concepts and interconnections. This work can relate to social relationships among them. The ordinary peers, P2P, however, instead of having interconnected which are the owners of the network resources, nodes where each node relates to a number of are part of the lower level. Each resource in interests, there are concepts and relevant nodes the network is assigned with metadata, which connected to a related concept super-node. are keywords conceptually describing the re- Thus, instead of searching the nodes to find a sources. Each peer is responsible for enriching resource, the concepts are searched. The pro- its resources with metadata. The knowledge posed architecture is described in more detail of the network is being represented through a in the following sections. domain specific ontology. The ontology consists

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 16 International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010

of concepts; a concept can be defined as a unit the appropriate super-peer when the work load of knowledge which is described through key- reaches a critical point. The capacity threshold words. Concepts are maintained by super-peers. is calculated by measuring the query rate per A super-peer may be responsible for one or time period. The load of the super-peer can be more concepts. Preferably, semantically related forecasted and the network can be partitioned concepts should reside in the same super-peer. before the super-peer becomes unable to serve The peers are connected to the super-peers that the network; avoiding a potential bottleneck. support the concepts they relate to (Figure 1). For reaching a correct load balancing between the existing super-peer and the newly created Adaptive Topology super-peer, equally demanded concepts are handed to the new super-peer. An adaptive network topology is proposed in Particularly, every time a super-peer re- this work, where the size and connectivity on ceives a query, the queries-received counter the super-peer layer varies according to the is increased. Additionally, the counter for the query needs of the network. Initially the network queries-received-per-concept is increased, starts with only one super-peer. As the network based on the query concept. When the queries- load increases and the super-peer is no longer received-per-concept exceeds the predefined capable of handling all the network requests, threshold, the network is partitioned and part of some of the load is transferred to a newly cre- the network is handed over to a different super- ated super-peer. Ordinary peers are appointed peer. To decide which part of the network to as super-peers from their parent super-peer assign another super-peer, the current super-peer based on their capacity and availability. The transfers a number of concepts necessary for super-peer categorises the incoming queries staying under the load threshold. Initially the to a subject concept in order to append it to high level concepts are assigned to other super-

Figure 1. Semantic-driven architecture

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010 17 peers; gracefully as the network size increases corresponding super-peer (concept). However, the lower level concepts are assigned as well. if no exact match is found, that may indicate This has a long term benefit as it prevents rapid that either the concept does not exist or the peer or unnecessary network topology changes. relates to more than one concept. In the former As the network increases in number of case, a new super-peer is created to support super-peers a mesh topology is formulated, the non-existing concept; the super-peer is since all super-peers are interconnected to one responsible to find relevant resources from the another. When the network increases in size, the network. In the latter case, the peer connects to topology needs to convert into a semantic graph, the top n more relevant concepts super-peers. where super-peers with conceptual similarity are interconnected and each super-peer is associated Query Expansion and Routing to a number of conceptually relevant peers. The concepts are then monitored for their lifetime For achieving a higher success rate the us- stability while the network becomes stable, in ers’ queries are expanded through the use of terms of newly inserted concepts, when it can semantically related-terms. These terms must convert back to a mesh structure; where all be similar to the whole concept of the query, super-peers are interconnected again. rather than being synonymy similar to the query By using an adaptive topology the ad- keywords. As the focus of this work is not on vantages of both mesh and semantic graph query expansion, a standard query expansion structures are employed. A mesh topology technique is employed. has the advantages of forwarding directly the During the query expansion process the query to the appropriate concept, processing query is analysed using the domain specific and matching the query to a concept only once ontology. The keywords of the query are com- and discovering effortless (with negligible pared against keywords of the concepts in the number of hops) if a resource exists or not. ontology and the most related concept(s) are However, employing a mesh topology in large extracted. If more than one concept is extracted networks with not steady concepts comes with the correlation among the concepts must be disadvantages; there is latency in finding the determined, in order to keep only the concepts matching concept since a big index is used for with minimal semantic distance. Addition- searching and a lot of traffic is generated for ally, in the case where more than one concept keeping the network up to date. Therefore an is derived, and these concepts relate to each adaptive topology can take advantage of both other, then query specialisation takes place. In structures when it is applied timely correctly. query specialisation the more specific of the concepts is kept. Peers Joining Activity During the query routing algorithm, the query is firstly submitted to the parent super-peer Initially when a peer joins the network is con- (concept) from the client (peer). Then the query nected to a randomly selected super-peer. The is analysed locally, where the query expansion super-peer is responsible to assign the peer to process takes place, as described above. When a specific concept and consequently to the cor- the query needs to be forwarded to another responding parent super-peer. In order to find super-peer, a heuristic technique is applied for the matching concept, the peer’s files’ metadata finding the best matching concept (super-peer) are compared against the keywords related to for the query to be forwarded to. Depending on concepts (this is feasible if super-peers keep the network topology, mesh or semantic graph, information for all concepts - mesh topology-, a different approach is followed to find the as the topology changes the search algorithm query’s matching concept. In a mesh topology adapts to the new topology characteristics.). the querying super-peer have information for When a match is found the peer connects to the all the rest super-peers and the concepts of the

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 18 International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010 network. For finding the matching concept and successful the query forwarding stops, or to all consequently the related super-peer, its global super-peers at once. The query is considered knowledge index can be searched. unsuccessful in the case where no matching In a semantic graph topology each super- concept can be found in the network (Figure 2). peer stores information only for a number of There is also the possibility some queries its neighbouring super-peers. Thus, if the query not to be matched to a concept. Unclassified does not match to a concept based on the query- queries are sent randomly to a number of peers ing super-peer’s information, the query must belonging to the querying super-peer; then the be forwarded to a neighbouring super-peer query is sent randomly to a super-peer. In this with close semantic similarity of the query’s case the super-peer performs a cosine similar- keywords. If no super-peer can be found with ity measurement between the filename key- close semantic similarity then the query is either words and query keywords of all the resources send to all the neighbours or to a random one. that are unclassified as well. For complex The same process continues until a matching queries, where a lot of unrelated concepts have concept is found or until a number of hops have been found, the user could choose between been reached (this limitation aims to reduce precision and recall. If precision is chosen then unnecessary traffic). Eventually if a matching the user prefers accuracy on the search results. concept is found, a list of super-peers that relate On the other hand if recall is chosen the user to the matching concept is chosen. The query is prefers more results over accuracy. In this case then forwarded to the related super-peers; either all the resources that match the query concepts to one super- peer at a time and if the query is are retrieved. F-measure (or F-score) considers

Figure 2. Conceptual diagram

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010 19 both precision and recall and produces a as a set of concepts, relations and axioms. weighted average of both parameters, thus this Formally presented as: formula can be used when retrieving results of complex queries. O={C, R, Ao}

where C represents sets of concepts; R denotes COLLECTIVE INTELLIGENCE o ONTOLOGY sets of relations and A represents sets of axioms (relationships among the concepts). Similarly, The definition of collective intelligence (Sega- the proposed network ontology consists of a ran, 2007) perfectly describes a P2P collabora- set of concepts where each concept comprises tion platform where the knowledge within the of an ordered set of keywords. network is shared efficiently among the peers. Collective intelligence in our system is built with Ontology Creation the contribution of all the peers connected to the The ontology is initially created by the first network. Each peer enriches the P2P network’s peer that forms the P2P network. As more peers pool of knowledge through its resources. join the network they contribute through their The cumulative knowledge of the network resources to the ontology expansion. The file is represented through an ontology (Figure 3). resources exist in the peer level but the network A set of concepts linked with semantic relations knowledge is managed at the super-peer level. enable the semantic categorisation of network Each resource in the network is described by resources and peers to concepts, and the routing a file descriptor and file related knowledge of the query based on its semantic evaluation. (metadata). A parent super-peer is responsible A formal definition of ontology is presented by for analysing those file resources and enriching Pretorius (2004), where an ontology is defined

Figure 3. Ontology

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 20 International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010 the ontology with new concepts. A super-peer music file sharing systems are of great interest; may be responsible for a set of concepts. prove are large scale P2P application systems When a peer enters the network and con- that have been deployed for that purpose. Music nects to the conceptually appropriate super-peer, semantics are rich, which enables the creation the parent super-peer is responsible to analyse of a widely accepted vocabulary for the music the peer’s resources and to enrich the ontology ontology; and of course there is a huge amount with the retrieved concepts and relationships. of music data available which makes it feasible The retrieved peer concepts may already ex- to perform pragmatic evaluation experiments. ists in the ontology, thus no action needs to be In general multimedia (movies, music, pictures) taken in this case. Each resource of the peer is file sharing is a domain where a lot of metadata assigned to a concept, preferably to the most rich resources can be found and utilised for specific concept type (discussed analytically building the corresponding ontology. in section “Case study: Music ontology”). As- For creating the music ontology, a P2P file signing resources to concepts facilitates more sharing crawler has been customised and used efficient resource retrieval during the search- for collecting real user data. IR-Wire (“IR-Wire: ing process. This scenario is valid in the case Gnutella Data Crawler and Analyzer,” 2006) where all super-peers own the same part of is a Gnutella data crawler and analyser. It is the ontology. In the case where the ontology a publicly available system that collects data is distributed among the super-peers and each that can be further used for creating realistic super-peer holds only a part of it, a different file-sharing environment models. process takes place. This process will be presented in more detail in future developments Ontology Design of this work. Castano, Ferrara, Montanelli, and Varese For the construction of the music domain spe- (2009) presented a similar work where the notion cific ontology, a controlled vocabulary is used of collective intelligence is introduced as well: as a backbone for the ontology. The vocabulary the iCoord system. iCoord involves knowledge is essential for a globally shared ontology, sharing as opposed to our proposed system for meaningfully communicating the specific which is more technically related to file sharing. domain knowledge within the network. The Another difference with iCoord is the way the selected terms in the controlled vocabulary serve ontology is updated. In iCoord the information as music concept types. The predefined concept (named community manifesto) circulates the types are used only for the needs of this case network enabling with this way the community study, future developments will allow the P2P members to store knowledge. As opposed to network to intelligently generate the required our system, where the ontology is scattered in a vocabulary. At this stage, the ontology entails distributed manner among the super-peers and four music concept types: artist, album, genre it is updated through synchronisation processes. and song. These types have been chosen among These processes are responsible for avoiding others because they are considered essential duplicate of information as far as the sharing of descriptors for identifying a music resource. common parts of the ontology is concerned. The Moreover, popular music file sharing systems ontology information is updated as the queries use these descriptors as searching fields in their propagate the network. search engines. As described in the section above, the on- Case Study: Music Ontology tology is formally defined as a set of concepts, relationships, descriptions and keywords. Each For evaluating the proposed semantic driven concept of the ontology is assigned with a unique architecture, a music file sharing system has id. A concept is not defined by a single name been used as a case study scenario. Distributed but by a description which is a combination

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010 21 of keywords. It is possible that a particular belongs to the Artist called (Table keyword to be associated with more than one 2). Each music file resource in the network concept in the ontology. In other words, as- represents a Song, thus each file relates to a sociation between keyword and concept is one concept of type Song, which is the most spe- to many, rather than one to one. For example cific among the concept types. This model of the keyword associates with the the music ontology enables the storage of the concept of type Artists as well as with the file resources’ in a conceptual approach, solv- concept of type Album. Each keyword curries ing with this way the known problem of un- a weight indicating the frequency counter, manageable network resources. Furthermore, i.e., how many times the keyword is used for the user search is enriched with semantic ca- describing various concepts in the ontology. pability for retrieving more accurate results in Figure 4 represents how concepts, descriptions an efficient manner when reasoning is used for and keywords correlate and form the ontology. retrieving knowledge from the ontology. Concepts are interrelated with relationships. Three types of relationships exist and can be Ontology Data Population used to relate different types of concepts, as illustrated in Table 1. Concepts of the same type Music files are rich in metadata. For harness- cannot relate to each other. Each relationship ing the metadata of the network resources, a between two concepts carries a weight which metadata acquisition process takes place and indicates how close the semantic similarity of for each music file the most appropriate con- the concepts is. cepts were retrieved based on their attached Thus the associated concepts’ relationship metadata (e.g., artist, album). For the needs is: -Album of- , of this work the resources of the network as meaning that the Album called well as the music ontology are represented and stored in a relational database. Each re-

Figure 4. ORM diagram concept-description-keyword

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 22 International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010

Table 1. Relationships among concepts

Concept type Relationship name Concept type Track – track of – Artist Track – track of – Album Track – genre of – Genre Artist – genre of – Genre Album – genre of – Genre Album – album of – Artist

Table 2. Two concepts presenting a type of an Album and an Artist. These two concepts can only relate with the relationship name -Album of-

Concept Type Description keywords 1 Album 2 Artist source is represented as a table row where the PERFORMANCE EVALUATION corresponding columns give the metadata of the resource. For transforming the network The performance of the semantic-driven resources to a collective intelligence knowl- architecture was evaluated by simulating edge base –music ontology-a conceptualisation the proposed environment. For carrying out process took place. As mention in the previous the experiments for the proposed system, a section, there is a predefined vocabulary where special-purpose simulator has been developed. the ontology concepts are built based on. Each The simulator was designed bearing in mind concept of the ontology must be of a specific real network conditions, like running queries type namely artist, album, genre, song. Thus, in parallel and continuously generating new for every music resource, concepts are created queries. Nevertheless, it is worth mentioning which are of a specific type, based on the given that the simulator omits other issues, including metadata (Table 3). the nodes’ join-leave activity, the delays of Thus the associated concepts’ generated messages caused by network delays, the actual are the following: load a node could have for processing and The resource is assigned with the more propagating message and so on. Despite these specific concept (of type song), which is Con- omissions, the simulator is able to provide the ceptId 4 in this case. basic simulations for understanding the funda- mental properties of the proposed system and

Table 3. A music resource which incorporates the following information

ResourceId FileName Genre Album Artist Song 1 Madonna-Jump.mp3 Pop Confessions on a Dance Madonna Jump Floor

Table 4.

ConceptId ConceptType Keyword(s) 1 Artist Madonna 2 Album Confessions on a Dance Floor 3 Genre Pop 4 Song Jump the other simulating benchmark algorithm. The nodes which are relevant to the topic or to the set of experiments undertaken serves as a proof area of interest of the topic. An interest area of concept and basis for future developments. of Social-P2P is a semantic area with a set of topics. Correspondingly, in the proposed Simulation Environment system peers are connected to super-peers based on their resources conceptual similarity. The music ontology comprises of 450 concepts, Queries are forwarded to conceptually related 20 represent genre concepts, which structure super-peers and peers. An interest area in the the higher-level of the super-peers topology; proposed system corresponds to a concept of the rest represent concepts of type artist, al- type genre, consisting of set of other concepts bum and song. Concepts of different types are which correspond to set of topics. interconnected through relationships. In total, The authors of Social P2P in their experi- there are 330 resources in the network, each ments use interest areas from the open directory one described by a song concept and distributed categories (“ODP - Open Directory Project,” to various nodes among the 200 that form the 1998-2010). To set a fair comparison between topology. There are 440 unique queries, 30 of the two algorithms the music ontology is set which are chosen at each timeslot to propagate as a knowledge base for both systems; thus the the network. It is important to note that 10% social P2P algorithm had to be adapted and use of the queries match no resources, i.e., when the music ontology as well. An interest has as analysed the queries have no match with the equivalence a genre concept, which represents concepts of the ontology, thus no resources the most generic concept of the ontology. Top- are related to them. Time is represented as a ics relate to concepts of type artist, album and sequence of timeslots during which queries song. For simulating the Social P2P algorithm are generated, forwarded and processed. Each additional parameters were defined. Each query experiment comprises of 100 timeslots. In this had a TTL equals to 3 hops, for limiting the life environment two algorithms are tested for their time of messages. The number of peers to be performance, the proposed semantic-driven contacted in each hop was set to 3. The knowl- algorithm and the Social P2P, which is used as edge index has an upper limit of 40 entries. a benchmark against the proposed algorithm. Currently there is no commonly agreed Data Collection evaluation methodology and benchmark for semantic search, thus Social P2P was chosen During the experiments real user queries and as a comparable system. Social P2P is related real nodes information are used from an exist- to the proposed system based on some analo- ing music file sharing P2P system. The data gous characteristics. In Social P2P nodes are has been gathered after customising to the connected based on interests’ topics and nodes proposed system’s needs an open source P2P covering the same topics are more likely to system, the LimeWire Data Crawler (“IR-Wire: be interconnected. Queries are forwarded to Gnutella Data Crawler and Analyzer,” 2006).

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 24 International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010

LimeWire Data Crawler collects information Query Success from gnutella-like network and stores it in a relational database. More specifically the data For defining a successful query the following crawler is built on top of LimeWire client, thus heuristic process takes place: the query string it supports all the LimeWire functionalities. is matched against the most specific concept The data crawler collects incoming and outgo- retrieved (refer to query expansion section). A ing queries, peer information and music file query is considered successful when the cosine resources metadata. similarity of the matching concept-based string As described in the section “Ontology data is greater than 60%. population”, for creating the music ontology the Figure 5 demonstrates the performance of metadata of the collected network resources was both the semantic-driven algorithm as well as analysed. Similarly for creating the network the Social P2P algorithms in terms of query suc- topology the collected peer information was cess. The semantic-driven algorithm produces analysed and the corresponding peers of the higher success rate when compared to Social network were generated. Each peer represents a P2P. This performance can be attributed to the real host of a peer-to-peer system thus it consists semantic-driven architecture of the network. of a number of resources. The collected queries Since peers are assigned to specific concepts were used as such without any modifications. based on their resources’ conceptualisation, However, since this report consists of initial and queries are routed approximately to those experiments for the proposed system, queries semantic similar peers, the possibility for high that would generate results were mostly used. success rate increases. In Social P2P queries are Even though, a number of queries that would forwarded to peers that have previously satisfied not match any resources were employed as well query requests, assuming that peers gradually for proving that the system retrieves the correct connect to other peers with the same interests. data resources. However this is not always the case and that is why Social P2P produces smaller success than Simulation Experiments semantic-driven architecture. and Results For making a fair comparison of the two systems, the success rate is calculated against A set of experiments were carried out and the the resource-rich queries. In this case the que- performance of each algorithm has been evalu- ries that have related resources only are ated using the following criteria: counted and used for calculating the success rate. Additionally the number of queries • Query success rate: the success rate of a propagated without related resources is the same query is calculated by dividing the number in both systems. This is demonstrated in Figure of successful queries by the number of 5, in column 3 tagged as ‘Resource-rich. generated queries. Figure 6 gives a more detail comparison • Traffic: this is the total number of messages between the two systems. It can be observed sent by the peers when forwarding queries that the semantic-driven system is more stable and query results. – 80% success in average – in the successful queries throughout the experiment lifecycle. It is important to note that for each query On the other hand, Social P2P produces an the precision rate of the resources returned is increasing query success trend in the beginning always 1; the resources returned are always of the experiments continuing with a more stable relevant to the query. behaviour through the end. In the beginning of

Figure 5. Semantic-driven and SocialP2P general performance comparison

the experiment the peers have little knowledge related resources exist, the success rate falls of the network, but as queries propagate the due to the absence of related resources and not network and peers’ knowledge index is enriched, due to the heuristics employed by the proposed the success rate increases and becomes more algorithm. stable. The credits for the stable behaviour of the semantic-driven approach are given to Traffic the conceptual way peers and super-peers are interconnected. This design ensures that if a This set of experiments calculates the traf- resource exists there is a high probability for fic generated by the semantic-driven system locating it. A resource might not be located in and compares it against the Social P2P. This the situation where this resource is rare and the evaluation also involves some mathematical majority of the other resources that reside in the calculated metrics which are compared against same peer belong to non-related concepts. It is the simulation results values. To calculate the also worth mentioning that semantically poor number of messages the Social P2P network resources have fewer possibilities in locating produces, the following formula is used: them as well. For proving that the semantic-driven ap- M =Q * proach returns only query related resources, the Social P2P N success rate of resource-rich queries is compared where Q is the number of queries per timeslot against the success rate of resource-poor queries N and Q the number of querying peers per query. (Figure 7). The success rate is slightly lower in P For a network with 30 generated queries the case of propagating in the network some at a timeslot, 3 peers to be contacted at a time resources-poor queries; since there are not re- and TTL=3 the following calculations give the lated resources to retrieve. The success rate is maximum number of messages to be generated higher in the case where only resource-rich per timeslot: M = 1170 messages. The queries propagate the network; since there are Social P2P number of messages generated for returning the resources that conceptually relate to the queries. query results to the requesting peer also needs Also, it sets a fair evaluation for the proposed to be added. This is calculated by multiplying system to test it against queries that conceptu- the messages, by the number of peers each ally match to resources. In the case where no

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 26 International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010

Figure 6. Success rate semantic-driven and social P2P

peer has forwarded the query to. Thus, the Again, the above number of messages does total number of messages produced is: 1980 not include the messages sent for returning the messages per timeslot. The simulation results results to the requested peer. For getting the give total average messages 1887, which is an total number of messages the messages found expected value. For calculating the number of above need to be multiplied by 2 (that is all messages for the semantic-driven network the the peers that the query has been forwarded to following formula can be applied: send back a response). Thus, the total number of messages produced is: 182 messages. The simulation results give total average messages

M semantic-driven = QN + +QF 165, which is an expected value. The graph in Figure 8 demonstrates these figures diagram- where QN is the number of generated queries matically. It is observed that the semantic-driven per timeslot, NSP is the number of super-peers architecture generates much less messages than the query is forwarded to, P is the peers specific the Social P2P network. The reduced number to the concept and QF the number of forwarded of messages is credited to the super-peer hier- queries. archical approach the system employs. This Due to the use of real data during the justifies the initial decision of using a dynamic proposed system simulations a number of as- hierarchical layer when building the proposed sumptions need to be made. It is assumed that system. It is proved that even the controlled in average, each super-peer is the parent of 10 flooding used by Social P2P generate a lot of (200 peers/20 super-peers) peers, therefore, at traffic in the network. each time approximately one third of the peers To sum-up, the semantic-driven system are considered to be query concept specific outperformed the Social P2P in success rate peers. After applying the above assumptions on and cost of messages, fulfilling the purpose of the formula the following results are retrieved: its design. The specific architecture was initially chosen among random flat and structured approaches for its advantages of reduced traffic M Semantic-driven= 91 messages and the capability of locating resources easily (Figure 8).

Figure 7. Success rate semantic-driven and social P2P

Figure 8. Average messages sent

CONCLUSION A set of experiments carried out through simulations and proved that the above concept For improving the resource discovery processes improves the searching process, by presenting of existing P2P systems the semantic-driven a high success rate and cost efficiency in terms hierarchical architecture has been proposed. of messages sent when searching for network The idea is based on conceptualising, using resources. The proposed system outperforms semantics, the network resources and building when compared against Social P2P algorithm, the system architecture based on those concepts; by producing a higher and more stable behav- using a dynamic hierarchical design. Based iour in query success results. Social P2P, on the on this philosophy, the ontology concepts are other hand, starts with a low success rate which searched first when requesting for resources, however continuously increments, giving at instead of the network peers. the end a stable behaviour, lower though than

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 28 International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010 the semantic-driven system. When taking into Castano, S., Ferrara, A., Montanelli, S., & Varese, consideration the proposed system attributes G. (2009). Semantic coordination of P2P collec- it can be concluded that this research also has tive intelligence. In Proceedings of the Interna- tional Conference on Management of Emergent the prospective in contributing towards Web3.0 Digital EcoSystems (pp. 99-106). New York: ACM. research targets; which focuses on learning- doi:10.1145/1643823.1643842 enabled architectures, intelligent applications, Crespo, A., & Garcia-Molina, H. (2005). Semantic distributed databases (“The World Wide Data- Overlay Networks for P2P Systems. In Agents and base” Nova Spivack) and distributed computing. Peer-to-Peer Computing (pp. 1-13). Retrieved from http://dx.doi.org/10.1007/11574781_1 FUTURE WORK Cudré-Mauroux, P., Agarwal, S., & Aberer, K. (2007). Gridvine: An infrastructure for peer information The simulation experiments and presented management. IEEE Internet Computing, 36–44. doi:10.1109/MIC.2007.108 results of the semantic-driven architecture identified the perspectives this work has to Dou, Y., & Bei, X. (2008). Ontology-Based Semantic offer. Current evaluation acts as proof of Information Retrieval Systems in Unstructured P2P concept for the proposed system. Further Networks. In Proceedings of the Wireless Communi- cations, Networking and Mobile Computing, the 4th evaluation should include a larger dataset and International Conference (WiCOM ‘08) (pp. 1-4). topology for measuring the scalability of the architecture. Additionally, the idea of network Ehrig, M., Haase, P., Siebes, R., Staab, S., Stucken- schmidt, H., Studer, R., et al. (2003). The SWAP Data adaptivity, proposed in this work, needs to be and Metadata Model for Semantics-Based Peer-to- implemented in the simulation environment Peer Systems. In Multiagent System Technologies (pp. for testing its advantages against the current 1096-1097). Retrieved from http://www.springerlink. network structures. Further work should include com/content/mtm7gu9t8bnuuu16 the implementation of a dynamic environment IR-Wire: Gnutella Data Crawler and Analyzer. by including the peer’s join-leave activity and (2006). Retrieved January 8, 2010, from http://ir.iit. measuring the system’s performance based on edu/~waigen/proj/pirs/irwire/index.html the network’s churn rate. Finally, consistency Jepsen, T. C. (2009). Just What Is an Ontology, management for replication and synchronisation Anyway? IT Professional, 11(5), 22–27. doi:10.1109/ processes needs to be employed for supporting MITP.2009.105 the network’s distributed ontology. Kobayashi, Y., Watanabe, T., Kanzaki, A., Yoshihisa, T., Hara, T., & Nishio, S. (2009). A Dynamic Cluster Construction Method Based on Query Character- ACKNOWLEDGMENT istics in Peer-to-Peer Networks. In Proceedings of the AP2PS ‘09, the First International Conference Special thanks go to Panos Alexandropoulos (pp. 168-173). and Stefan Stafrace of the University of Surrey, for providing their technical knowledge and Li, M., Lee, W. C., & Sivasubramaniam, A. (2004). Semantic small world: An overlay network for peer- feedback for this work. to-peer search. In Proceedings of the 12th IEEE International Conference on Network Protocols (pp. 228-238). REFERENCES Liang, J., Kumar, R., & Ross, K. (2004). The KaZaA Overlay: A Measurement Study. In Proceedings of Berners-Lee, T., Hendler, J., & Lassila, O. (2001). the 19th IEEE Annual Computer Communications The Semantic Web. Scientific American Magazine. Workshop. Retrieved from http://www.scientificamerican.com/ article.cfm?id=the-semantic-web

Liu, L., Antonopoulos, N., & Mackin, S. (2007). Schlosser, M., Sintek, M., Decker, S., & Nejdl, Social Peer-to-Peer for Resource Discovery. In W. (2003). Hypercup-hypercubes, ontologies, and Proceedings of the 15th EUROMICRO International efficient search on peer-to-peer networks (pp. Conference on Parallel, Distributed and Network- 112–124). LNCS. Based Processing (PDP’07) (pp. 459-466). Segaran, T. (2007). Programming Collective Intel- Löser, A., Naumann, F., Siberski, W., Nejdl, W., & ligence: Building Smart Web 2.0 Applications. New Thaden, U. (2004). Semantic Overlay Clusters within York: O’Reilly Media. Super-Peer Networks. In Databases, Information Systems, and Peer-to-Peer Computing (pp. 33-47). Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., Retrieved from http://www.springerlink.com/con- & Balakrishnan, H. (2001). Chord: A scalable peer- tent/cd6jv7alxv3eac1k to-peer lookup service for internet applications. In Proceedings of the 2001 conference on Applications, Nakauchi, K., Morikawa, H., & Aoyama, T. technologies, architectures, and protocols for com- (2004). Design and Implementation of a Semantic puter communications (p. 160). Peer-to-Peer Network. In High Speed Networks and Multimedia Communications (pp. 961-972). Stokes, M. (2002). Gnutella2 Specifications Part one. Retrieved from http://www.springerlink.com/ Retrieved February 2, 2010, from http://g2.trillinux. content/8mvv93v3p8pnuut4 org/index.php?title=G2_specs_part1 Nejdl, W., Wolf, B., Qu, C., Decker, S., Sintek, M., Tang, C., Xu, Z., & Dwarkadas, S. (2003). Peer- Naeve, A., et al. (2002). EDUTELLA: a P2P Net- to-peer information retrieval using self-organizing working Infrastructure based on RDF. In Proceedings semantic overlay networks (pp. 175-186). New York: of the 11th international conference on World Wide ACM. doi:10.1145/863955.863976 Web (pp. 604-615). Tatarinov, I., & Halevy, A. (2004). Efficient query ODP - Open Directory Project. (1998). Retrieved reformulation in peer data management systems. In January 17, 2010, from http://www.dmoz.org/ Proceedings of the 2004 ACM SIGMOD international conference on Management of data (pp. 539-550). Pretorius, A. J. (2004). Ontologies-Introduction and New York: ACM. doi:10.1145/1007568.1007629 Overview. Vrije Universiteit Brussel. Tsoumakos, D., & Roussopoulos, N. (2003). A Ratnasamy, S., Francis, P., Handley, M., Karp, R., & comparison of peer-to-peer search methods. In Schenker, S. (2001). A scalable content-addressable Proceedings of the Sixth International Workshop on network. In Proceedings of the 2001 conference the Web and Databases. on Applications, technologies, architectures, and protocols for computer communications (p. 172). Yamato, Y., & Sunaga, H. (2006). P2P Content Searching Method using Semantic Vector which is Rowstron, A., & Druschel, P. (2001). Pastry: Scalable, Managed on CAN Topology. Journal of Multimedia, distributed object location and routing for large-scale 1(6), 1. doi:10.4304/jmm.1.6.1-9 peer-to-peer systems. In Proceedings of the IFIP/ ACM International Conference on Distributed Sys- tems Platforms (Middleware) (Vol. 11, pp. 329-350).

Athena Eftychiou is PhD student in Semantic P2P networks at the University of Surrey. She graduated from the Higher Technical Institute Cyprus in 2000, after studying Computer Studies. In 2002 she started a part-time BSc, while working in the IT industry as a web developer. She received her BSc in Computing and Information Systems from the external programme of the University of London in 2007. The same year she joined the MSc in Internet Computing at the University of Surrey, to complete it in 2008 with distinction. Her MSc dissertation involved the

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 30 International Journal of Grid and High Performance Computing, 2(4), 12-30, October-December 2010 evaluation of discovery mechanisms for non-replicable reusable resources in P2P networks. Her teaching experience includes teaching and lab support on various modules such as Enterprise System Development, Web Technologies, Information Modelling and Information Discovery. Her research interests cover the area of distributed knowledge and resource sharing technologies as well as semantic web knowledge modelling and retrieval models.

Bogdan Vrusias graduated with a BSc (Honours) in Computing and IT from the University of Surrey, UK, where he also accomplished his PhD in the area of multimedia and neural computing. He worked for the University of Surrey as a technology transfer associate in data mining and neural networks technologies from 1998 to 2001, then as a research officer for the EPSRC SoCIS project till 2004, followed by his current position as a lecturer in intelligent and distributed systems. He belongs to the Biologically Inspired Modelling and Applications research group and his research interests include neural computing, multimedia information retrieval, distributed systems, business intelligence, image and video analysis, data mining, knowledge representation and management. He has more than 20 international, peer-reviewed publications, he has been the principal investigator of several EPSRC projects, and he is a reviewer of numerous high quality international journals and conferences.

A Method of 3-D Microstructure Reconstruction in the Simulation Model of Cement Hydration

Dongliang Zhang, Tongji University, China

ABSTRACT

An accurate and reliable computer simulation system can help practical experiments greatly. In a cement hydration simulation system, the basic requirement is to reconstruct the 3-D microstructure of the cement particles in the initial state while mixed with water. A 2-D SEM/X-ray image is certainly achievable; however, it is not easy to obtain parallel images due to the small scale of the cement particles. In this regard, a method is proposed to reconstruct the 3-D structure from a single microstructure image. In this method, micro-particles are regenerated in a growing trees mode, which by modifying the generating probability of the leaves, the irregularity and the surface fraction of particles can be controlled. This method can fulfill the requirement for the parameters of the 3-D image while assuring that the 2-D image is in full accord.

Keywords: 3-D Reconstruction, Cement Hydration, Image, Microstructure, Trees Mode

INTRODUCTION properties of the concrete to a rather accurate degree. Based on the simulation, a favorable Today, rapid developing computing technology proportion and maintaining method can be provides incredible aids in various fields, CAD, found. While, 2-D images which SEM/X-ray CAM, computer simulation etc. An accurate technology mostly offered cannot satisfy the and reliable simulation system can help labo- requirement of researching on the relationship ratorial experiments greatly, while replacing between microstructure and properties of ce- practical experiments. The simulation of cement ment, the requirement of the researching on hydration is a novel and reliable approach in the transforming mechanism in physical and studying the properties of cement. Based on chemical procedure. So reconstructing the 3-D scanning electron microscopy (SEM), X-ray microstructure is inevitable. Based on SEM/X- technology and powerful computing ability, a ray technology and digital image processing, we digital computer simulation of cement hydra- begin with a 2-D SEM/X-ray image, combining tion is proposed to represent the procedure of with the characteristic of cement particles, label cement hydration in microscope and predict the the pixels then reconstruct the 3-D structure ensuring all the parameters in accord with that of the 3-D structure. In this paper, a method DOI: 10.4018/jghpc.2010100103

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 32 International Journal of Grid and High Performance Computing, 2(4), 31-39, October-December 2010 of 3-D reconstruction is introduced, and a tree same sample digital image can be achieved in structure is used to model the structure of ce- which different compositions are classified ment particles. For each particle, the tree will and colored. generate from a root, by choosing the probability of the leaves’ growing condition different 2-D Digital Images of shapes of the trees can be achieved. Thus, the Cement Particles shape of particles can be simulated in proper surface fraction, which is most important in Bentz and Stutzman prepared the specimen the hydration procedure. The probability of the (Dale & Paul, 1994; Bentz, 1993), using SEM leaves’ growing are discussed and formulated and X-ray technology, as described above, map- through an experiment, and the relationship ping each pixel of a 2-D image into a certain between shape and the parameter probability phase in the specimen. First about 25 grams of will be found. the cement powder of interest are blended with The rest of this paper is organized as fol- an epoxy resin to form almost dry paste. The lows: In the next section, basic knowledge of our paste is pressed into a sample mold and cured research are described, including the applica- at 60 ºC for 24 hours. The cured specimen is tion on SEM and X-ray technology on cement cut using a low-speed diamond-wafering saw. microstructure research and the procedure of Second, sawing marks are easily removed by achieving the 2-D image. Following, the method sandpaper. Finally polishing is done on a lap on 3-D reconstruction and result of the method wheel for about 30 seconds each. The specimen are being revealed. At last, experimental result is cleaned after each polishing stage by gently will be presented. wiping on a clean cloth. The specimen is then coated with carbon to provide a conductive surface for viewing in the SEM. In the BE RELATED WORK AND images, brightness is proportional to the aver- TECHNOLOGY age atomic number of a phase. For the major phases presented in portland cement, the phases Application of Scanning from brightest to darkest are tetracalcium alu- Electron Microscopy (SEM) minoferrite (C4AF), tricalcium silicate (C3S), and X-Ray Images tricalcium aluminate (C3A) and dicalcium

silicate (C2S), gypsum, and the resin-filled In recent years, the applications of SEM and voids. To supplement the information content X-ray images in describing the properties of of the BE image, X-ray images are obtained cement materials have made a great progress for the elements calcium, iron, aluminum, and (Dale & Paul, 1994; Li, 2000; Bentz, 1993; sulfur. Figure 1 illustrates a 2-D digital image Wittmann, Roelfstra, & Sadouki, 1984). The of cement particles. SEM and X-ray microscope analyzing technology have been used in recognizing the main phases in Portland cement (Dale & Paul, 1994; PROPOSED METHOD IN Bentz, 1997). In addition to a standard SEM, RECONSTRUCTING 3-D backscatter electron imaging (BSE) capabilities STRUCTURE provide the technicians with a unique advantage in evaluating particle structures. BSE images A 2-D digital image is achieved as described are used to distinguish between particles based above, but this image is not adequate to describe on atomic weight (the brighter the particle im- the properties of the cement sample, and not age, the heavier the atomic weight) (Dale & suitable for the simulation. Traditionally a 3-D Paul, 1994; Li, 2000). X-ray is used to detect image is often reconstructed from a series of and recognize different chemical elements. parallel slices, but in such a small scale like the Combining the BSE and X-ray images of the cement specimen mentioned above, is very hard

Figure 1. A processed digital image of Portland cement(256 m m´200 m m,512´400 pixels), red-C3S,light green- C2S, green- C3A, orange- C4AF,cyan- gypsum,yellow- K2SO 4, white-CaO

to do so. Another slice may totally different with Figure 1 we can easily get the value of those this one. So a method is proposed to solve this parameters: problem. In the proposed solution the exact Number of particles:2503, verge pixels of all shape of each particle is discarded but in fact particles:28510(13.92%), water to cement ratio what we need is not the shape but the degree in volume(w/c):0.65, C3S(red):79034(38.59%), of irregularity. In other word, the surface frac- C2S(green):17990(8.78%), tion of the particles is the most important that C 3 A(light green):10997(5.37%), determines the speed and the completeness of CA2SO4(grey):10821(5.3%), the physical and chemical reaction. K2SO4(yellow):3355(1.64%). Additionally we can get the distribution of The Extraction of Information the particles scale through statistic, and store from a 2-D Image it into an array: Areas [NumofPart], where NumofPart is the number of particles. Then in The 2-D image in Figure 1 represents a ran- the same way, we get the array of each pixel’s dom selection in the sample and is classified verge pixels and put then into array Edge [Nu- in pixels. So it can be looked as a 3-D sample mofPart], and also with the main phases we get whose thickness is one pixel, and we can use this arrays of pixels. sample to reconstruct the 3-D structure. From the 2-D image parameters can be extracted: The Reconstruction the number of particles, the distribution of the of 3-D Structure particles scale, water to cement ratio in volume (w/c), verge pixels of all particles, verge pixels There are so many particles in sample image of each particle, proportion of each phase in that we can believe the representative of the cement, verge pixels of each phase, etc. From sample. So the water to paste ratio in vol-

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 34 International Journal of Grid and High Performance Computing, 2(4), 31-39, October-December 2010

Chart 1. 2-D area array

2-D Area (in pixels) 47 1 …

Chart 2. 2-D area array converted to 3-D volume array

3-D volume (in pixels) 243 1 … ume:0.65 can be believed as that of the target In the prior researchers’ work the particles 3-D sample space. Based on this premises some of cement are assumed to be spheres, and easy of the information of 3-D structure can be de- to construct, but a sphere has the smallest sur- duced as we assume the target structure occupied face fraction among the different shapes with a space of 512´400´400. equal volume and a non-spherical object often Particles in 3-D structure is calculated in this has a much larger surface, which is quite im- way: first convert the array Areas[NumofPart] portant in the simulation modal of cement as into volume array Volume[NumofPart], for each is mentioned before. So a new particle structure particle we use the relationship between a ball and a corresponding algorithm are proposed and its biggest circle: here to fulfill the requirement of surface fraction.  3 As for each unit in a space grid there are   4  pixels of 2 − D image  pixels of 3-D image = p  six adjacent units, the new structure is called 3  p    a “Semi-Five Forked Tree”, in which the root (the blue one in Figure 1) has six children, other Thus we get a sample group of 3-D particles nodes (the green ones and the yellow ones in whose volume distribution is consistent with Figure 1) have utmost five children. The root that of the 2-D sample. Then we fill the 3-D represents a unit in the space grid and the chil- region with this sample. Of course, only one dren represent the adjacent units. A tree here group can not fulfill the requirement of w/c, so stands for one particle. Figure 2 illustrates the we repeat using the sample group until the w/c structure of a “semi-five forked tree” and a 3-D reaches the request. During the procedure we model of a particle. can get total count of the particles. Using the proposed structure, the construct- Besides the requirement of w/c and par- ing algorithm can be described as follow: ticle size distribution, the proportion of verge pixels in 3-D structure is also required consis- 1) For each particle in 3-D volume array, se- tent with that of the original 2-D image. But, lect a unit not occupied in the target space how is the surface fraction of real particles? We grid as the root, then generates leaves (all can think it in this way, a 2-D image sample is the left adjacent free spaces) at a certain a random 3-D sample whose thickness is only probability P and store them into a queue. one pixel and the fraction of it is just that of 2) For each leaf in the queue, in sequence 3-D structure. So the verge pixels in 3-D space generates leaves (all the left adjacent free is 11404000(28510´400). Thus we get the spaces) at the same probability P and mark basic relationship between 2-D and 3-D struc- up that it is no longer a leaf. ture, and Table 1 illustrates the parameters and 3) If a particle has reach the target volume, the relationship between 2-D and 3-D structure. jump to 1) else jump to 2)

Table 1. Relationship between 2-D and 3-D structure

Parameters (w/c) Particles Verge pixels Modal space (in pixels)

2-D 0.65 2503 28510 512´400

3-D 0.65 20024 11404000 512´400´400

Figure 2. Space structure (a), logical structure (b) of semi-five fork tree and 3-D model of a particle(c)

The problem now turns to why using P and quadratic equation y = a/x + b, and use ap- how P is determined. During the experiments proximation method we can calculate a and b. on this algorithm it is found that the number We don’t choose the value from 0 to 0.3 because of leaves of a tree construct by the algorithm this kind of P will slow down the algorithm. is determined by the probability P. So the Let Ei = yi - a/xi + b, so to find out a, b is now n irregularity of a particle is under control. An 2 4 come down to the minimum of F(a, b) = E experiment on a tree with 10 nodes (leaves ∑ i i=1 and non-leaves) is done and the relationship n a between generation probability P and number = ()y − −b 2 ∑ i x of leaves is shown in Figure 3. i=1 i a From Figure 3 we can see that form P=0.3 y − −b n i to P=1.0 the number of leaves decrease mo- ∂F xi notonously. And the trend proximately like a Namely: = −2∑ = 0 ∂a i=1 xi

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 36 International Journal of Grid and High Performance Computing, 2(4), 31-39, October-December 2010

Figure 3. The relationship between generation probability P and number of leaves

Figure 4. A result of 3-D reconstruction of particles

Figure 5. a result of 3-D reconstruction of particles with 3 classified phases(red-C3S,yellow- C2S,green-C3A)

Figure 6. Real experimental result and simulation result

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 38 International Journal of Grid and High Performance Computing, 2(4), 31-39, October-December 2010

∂F n a model, because the spherical has a relatively = −2 ()y − −b = 0 smaller surface fraction and diffuses with a ∂b ∑ i x i=1 i lower speed. Furthermore the spherical cannot reach the real hydration degree. Simultaneous equations:  n n 1 n y  y − n i CONCLUSION  ∑ i ∑ ∑  x x a = i=1 i=1 i i=1 i  2 In this paper, we proposed a method to recon-   n 1 n 1    − n struct the microstructure of cement particles  ∑  ∑ 2   i=1 x  i=1 x from a single 2-D SEM/X-ray image. Irregular  i i  1 n a n 1 shape is used instead of traditional sphere which  b =y −  ∑ i ∑ loses the accuracy of surface fraction. Through  n n x  i=1 i=1 i experiments we find a method to control the irregularity of cement particles. Using the Thus we can get a, b and P for particles procedure of trees’ growing generates the 3-D 4 at the size of 10 , also we can get each P for structure. Furthermore this method is also suit- different size of particle in the same way. Us- able to other kinds of particles’ 3-D reconstruc- ing different P repeat the generating algorithm tion whose shapes are irregular. the 3-D structure is available. Figure 4 illustrates a result 3-D structure of particles in the sample, and Figure 5 illustrates a result of 3-D ACKNOWLEDGMENT reconstruction of particles with 3 main phases classified. Thank Mr. Dale P. Bentz and Mr. Paul E. Stutzman for sharing 2-D SEM and X-ray images and prior research work on cement EXPERIMENTAL RESULTS hydration. This work is supported by Program for Based on the reconstructed 3-D structure of Changjiang Scholars and Innovative Research initial status, we simulated the procedure of Team in University, National Defense Basic cement hydration. In this simulation, we use the Research Program (A1420080182) and the cellular automata as basic model to simulate the National High Technology Research and De- procedure of diffusion and decomposition. We velopment Program (“863”Program) of China set a series of check time point, at each point (2007AA01Z149). we calculate the surface fraction of each model. By measuring the surface fraction, we get the degree of simulated hydration procedure. REFERENCE We compared the difference of hydrating speed between spherical particles model and Bentz, D. P. (1997). Three-Dimensional Computer non-spherical particle model that we’ve recon- Simulation of Portland Cement Hydration and Mi- structed, as is illustrated in Figure 6. crostructure Development. Journal of the American According to the activity of different Ceramic Society, 80(1), 3–21. phases in the Portland cement, we simulate the Bentz, D. P., & Garboczi, E. J. (1993, March). Digi- diffusion and decomposing of both kind of tal- Image-Based Computer Modelling of Cement- model. The result shows that the non-spherical Based Materials. In Proceedings of Digital Image model works more accurately than spherical Processing: Techniques and Applications in Civil Engineering, Engineering Foundation Conference.

Bentz, D. P., & Stutzman, P. E. (1994). SEM analysis Li, W. (2000). The Study on 3-D reconstruction of and computer modeling of hydration of portland Cement Hydration Based on image processing. Un- cement particles in petrography of cementitious published master dissertation, Shandong University, materials. ASTM STP, 1215, 60–73. Jinan, China. Dongliang, Z. (2003). The Study on Microstructure Wittmann, F. H., Roelfstra, P. E., & Sadouki, H. Model of Cement Hydration Based on Reconstruction (1984). Simulation and Analysis of Composite of Three-dimensional Image. Unpublished master Structures. Materials Science and Engineering, 68, dissertation, JiNan University, Jinan, China. 239–248. doi:10.1016/0025-5416(85)90413-6

Dongliang Zhang, received the MS degree from Jinan University and the PhD degree from Tongji University in 2003, 2009 respectively. He is a postdoctoral researcher of Tongji University and the main interests are parallel computing and simulation. His recent research is focused on large scale traffic flow parallel simulation. Contact him at Department of Computer Science and Technology.

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 40 International Journal of Grid and High Performance Computing, 2(4), 40-50, October-December 2010

Network Architectures and Data Management for Massively Multiplayer Online Games

Minhua Ma, University of Derby, UK Andreas Oikonomou, University of Derby, UK

ABSTRACT

Current-generation Massively Multiplayer Online Games (MMOG), such as World of Warcraft, Eve Online, and Second Life are mainly built on distributed client-server architectures with server allocation based on sharding, static geographical partitioning, dynamic micro-cell scheme, or optimal server for placing a virtual region according to the geographical dispersion of players. This paper reviews various approaches on data replication and region partitioning. Management of areas of interest (field of vision) is discussed, which reduces processing load dramatically by updating players only with those events that occur within their area of interest. This can be managed either through static geographical partitioning on the basis of the assumption that players in one region do not see/interact with players in other regions, or behavioural modelling based on players’ behaviours. The authors investigate data storage and synchronisation methods for MMOG databases, mainly on relational databases. Several attempts of peer to peer (P2P) architectures and protocols for MMOGs are reviewed, and critical issues such as cheat prevention on P2P MMOGs are highlighted.

Keywords: Data Management, Distributed Systems, Massively Multiplayer Online Games, Multimedia, Networks, Peer-to-Peer, Region Partitioning

INTRODUCTION actions to the game server. Player characters can acquire and improve skills, collect items in their MMOGs, also known as MMORPGs, is short for inventory, trade items with other players etc. Massively Multiplayer Online (Role-Playing) The server serializes the actions, updates the Games, is a genre of online games which has game world accordingly, and communicates any been quickly and steadily growing since the changes to all affected players. The new game start of the 21st century. Typically, MMOGs state is then rendered by the client-side software. are based on a client-server architecture, where The condition of the game world is constantly events are computed at the server and sent to evolving, and in some cases in an unpredictable, clients. Each client controls a single character in unscripted fashion, as players interacts in the the world, known as the player, communicates game. Some client-server architectures (Quax et al., 2008) have an intermediary layer of proxy servers which can reduce the number of DOI: 10.4018/jghpc.2010100104

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Grid and High Performance Computing, 2(4), 40-50, October-December 2010 41 connections for the servers and cache non-state replication due to the issues of bandwidth and data such as mesh or inventory information to network latency. For example, in Russell et al.’s improve response time. (2008) solution a server send both the initial Peer to peer (P2P) architectures have re- state of a given object and deterministic code cently emerged as a potential alternative design to simulate the object over time to all inter- for MMOGs. A few attempts of P2P MMOG ested clients, so that clients can independently have been reported in the research commu- update the area of interest consistently with nity (Krause, 2008). Compared to peer to peer updates computed by the server. Computation models, distributed client-server architectures replication results in efficient use of network perform well when servers are faster (which bandwidth and the power of multi-core proces- is usually the case) or better distributed than sors of game platforms. clients but one of the drawbacks of client-server Traditionally if the game state is dy- based MMOGs is that peer to peer messages namically stored during game play, it could must be relayed. be restored upon restarting a crashed or failed In this paper, we discuss key issues involved server. This is usually a relatively simple task in networked computer game development, where a complete snapshot of the world taken i.e., MMOG network architectures such as immediately before shutting down the server can distributed client-server models and various P2P be re-issued. Difficulties however arise when protocols proposed in the literature, database the termination is unpredictable, such as in the design, data replication, region partitioning, case of a server crash. Ideally, there should be areas of interest, and cheating, with the hope no difference between the stored states and the that it would shed some light on future network actual game states in memory at the time of design of distributed virtual environments and failure however this may not be realistic due MMOGs. to scalability and efficiency reasons. In this context the major requirements for replication in MMOGs are (Zhang et al., 2008): DATA REPLICATION

Engagement is meant to be for a long period of • Consistency. This is relatively easy in time in MMOGs with users spending several classic client-server architectures whereas months or even years playing and improving P2P architecture distributes the game state their character. As these games are designed among peers and has no central server, to run continuously for years it is expected making it very hard to administrate the that the hardware infrastructure will at some game state and keep it consistent. Currently, point inevitably fail, be forced to shut down MMOGs cannot solely use P2P architecture or otherwise go offline. In such situations it is and there is no working business model yet. crucial that the game can be restored to the last • Efficiency. MMOGs are real-time systems correct state it was in. Data replication therefore and players expect fast responses to their becomes of paramount importance. actions as high latency could be detrimental Replication is the process of sharing data to the user’s experience leading to player to ensure consistency between redundant re- dissatisfaction which will eventually result sources, in order to improve reliability, fault in the player unsubscribing from the game. tolerance, and accessibility. It could be data The persistence layer inevitably adds over- replication if the same data is stored on multiple head to the server therefore the objective in storage devices or computation replication if the this case is to minimize the additional cost same computing task is executed many times. as experienced by a client when executing Current MMOGs are in favour of computation an action, e.g. moving, fighting or picking up items in the word.

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 42 International Journal of Grid and High Performance Computing, 2(4), 40-50, October-December 2010

• Scalability. As MMOGs typically support too garge, and apply various algorithms to man- thousands of simultaneous players. Such age load balancing. capacity must be supported with consis- On the other hand, some previous studies tency and efficiency. try to tackle this problem based on locality of players. They investigate how server selection All commercial MMOGs provide some can be optimized for a single client, when given form of replication for world recovery and a set of available servers (Chambers et al., 2003), maintenance purposes, but information about how to optimise server selection for a group of the technologies used is scarce. players (clans) who wish to play cooperatively on a server (Claypool, 2008), and how to find an optimal server for placing a virtual region VIRTUAL REGIONS AND based on the geographical dispersion of players DISTRIBUTED SERVERS in a MMOG which depends heavily on the time of day (Beskow et al., 2008) and to migrate A common way of distributing load is splitting game states to the server. Since the amount of a huge game world into smaller virtual regions players from a part of the world depends on the which are hosted on geographically distributed time of day, there is a shift in the location of the servers. Server allocation is usually based on a majority of players. Based on this, Beskow et al. fixed (virtual) spatial subdivision scheme. For (2008)’s heuristic core selection approach is to instance, Second Life (SL Wiki), one of the most dynamically find the centre of most players in well-known 3D virtual environments, is running a virtual region and migrate game objects to a on more than 5000 servers, each simulating a server whose physical location is closer to the 256x256 meter region (call a grid). As the player majority of the players in order to reduce latency. moves through the world it is handed off from Dynamic partitioning of game worlds on one server to another. It handles storing object geographically distributed servers will become state, land parcel state, and terrain height-map even more important as game services are moved state (because land owners can change the ter- from an internal cloud of the provider’s own rain). Each server performs visibility calcula- servers to the Cloud, i.e., rented server space tions on objects and land, and transmits the from companies like Amazon. When the time data to the client. Image data is transmitted in comes, not only real-time load balance and a prioritised queue. dynamic shifts in the location of the majority A widely identified problem with this static of players but also other factors like latency partitioning scheme is that it does not take into sensitiveness (e.g., game genres) will be taken account the dynamic nature of MMOGs. As into consideration to partition game worlds and players are not evenly distributed over the virtual find the optimal server to host a virtual region. world, some servers are almost idle while others This is due to the difference in latency require- are overloaded. Even if the static partitioning ments for different game genres (Claypool, is based on population density trends, it is still 2005), e.g., 500 ms Round Trip Time (RTT) susceptible to imbalances due to unforeseen for real-time strategy games. events. For example, if a large number of players decide to convene in a single location in the Sharding game world, it may cause unacceptable delay or system failure on the server which hosts the Apart from world partitioning, another ap- virtual region. proach which has been used in World of War- De Vleeschauwer et al. (2005) divide a craft (Blizzard Entertainment, 2004), the most virtual world into dynamic micro-cells which popular MMOG to date, to support millions of can be reassigned to other servers if the load on simultaneous players is sharding. The game the server they are currently hosed on becomes has many sever realms (shards) to support the

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Grid and High Performance Computing, 2(4), 40-50, October-December 2010 43 total 12 million subscribers and peak loads of clients. Some objects may not belong to any 800,000 concurrent players in some regions, client subset. These objects are tracked by the e.g., in China (The9, 2007). A shard (or a realm) server only. If object A sends a message to object is a complete and independent copy of the game B, the server sends this message transparently world. The maximum number of concurrent across the network to any client who replicates players in a shard is bounded, typically several B but not A. If a client replicates both A and B thousand. Players in different shards cannot then the message transmission between them interact with each other. This means that at any can be handled locally. given time, only a small subset of players who Two approaches have been used to model are in the same realm is able to interact with players’ visibility in MMOGs: static geographi- each other, e.g., if a player wants to play with cal partitioning and behavioural modelling his (real world) friends in the World of Warcraft, (Ahmed & Shirmolammadi, 2008). The static he might not be able to do so if their characters geographical partitioning uses virtual regions (avatars) are in different realms. A shard rapidly for visibility management on the basis of the reaches the server capacity during peak time, assumption that players in one region do not resulting in long queues of players waiting to see/interact with players in other regions. One join the shard. When it happens, a new shard is drawback of this approach is that a migrating started on a new server cluster. By adding more player will not experience others’ interactions shards (on new servers) the game provider can at the boundaries between regions. To make accommodate more players. up for delay introduced by migration when a However, people who wish to join a par- player travels across a boundary of two regions, ticular realm might not able to do so due to the game worlds are typically designed such that shards design. This might be one of the motiva- the boundaries between virtual regions are tions of people setting up a private server for either natural boundaries such as mountains or LAN parties with their friends. uninteresting in nature, e.g., cities are separated Since a realm is an instance of the whole by forests or wilderness with few NPCs so that game world, natively, this architecture supports players do not tend to linger on. Second Life instancing, through which a group of players has adopted this approach. All places in Second (typically 3-20 players) can complete a quest Life are separated, and players travel to other without interference from other players. places through teleporting. The behavioural modelling manages visibility based on individual players/characters’ AREA OF INTEREST behaviours. For example, in a role playing (FIELD OF VISION) game, different players, e.g. a warrior who fights with a sword, a hunter who attacks with ranged In MMO world, it is usually sufficient to update weapon, and a mage who throws spells, have players only with those events that occur within different behaviours in terms of how fast they their area of interest (or field of vision). Interest move, how far they can see, and the size of the management of computation replication (Rus- area they can interact with. A player who uses sell et al., 2008) allows the server to dynamically ranged weapons or spells has a larger area of select a subset of game objects to be replicated influence than a player who can only fight in to each client. This subset represents the area close combat. Although behavioural modelling of interest of a client (player). An object in a is the ultimate goal for managing player interest, given subset executes both on the server and geographic region partitioning is more common the associated client. Subsets may overlap, i.e. due to the ease of mapping processing resources the object executes on the server and multiple (or servers) (Lu et al., 2006).

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 44 International Journal of Grid and High Performance Computing, 2(4), 40-50, October-December 2010

DATABASE DESIGN On the other hand, having a specialized database has some disadvantages too as it would Choosing a data storage approach is very not only limit its support for other games, but important for MMOGs. Many MMOGs adopt even for other versions of the same game. This a relational database. Online worlds are them- could be a significant issue as MMOGs are selves composed of different types of objects expected to run for many years and have to making the relational model an appropriate undergo frequent changes both to keep them choice (Wadley & Sobell, 2007). Primary keys attractive for players and to address design is- and primary key indices allow for efficient sues and player exploits. This would lead in a object updates and there are other benefits as situation where every time the structure of the well such as fast querying for statistics and re- world is altered, the design of database tables ports. There are other alternatives to relational might need to be adjusted. This is expensive and databases such as storing the data to a at text may lead to problems, especially because not file, however, consistency requires constant all changes and adjustments may be compatible updates on very specific parts, which cannot with the previous database design. This can be accommodated efficiently by those types of lead to long, expensive and complex redesigns. storage. An alternative proposed by Zhang et Storage at the game and object level do not have al. (2008) is to only append changes to a log; such problems according to Zhang et al. (2008). this allows for fast writes but makes recovery more complicated as an extra data processing Two-Layered Database Structure step needs to be introduced. There are several database design ap- Zhang et al. (2008) propose splitting a game da- proaches when it comes to persistency of MMO tabase into two parts. A generic, game-agnostic world states. One option is for the granularity object layer and a game-specific, attribute layer. of persistence to be very coarse, i.e., the entire Objects are stored in a single table in serialized game state is simply stored in a BLOB attribute form with a key attribute for fast search. Each of a simple table (Zhang et al., 2008). Another attribute of an object class can be updated dy- option for finer granularity is to define the namically but they propose that some attributes game world as a set of objects. In this case the would be kept on separate tables to allow for database would have one relation table where fast updates. Non-frequently updated attributes rows represent objects. Each row would have an would require the entire object to be serialized object identifier as indexed primary key attribute and the corresponding rows in the object table and a second attribute storing the object state to be updated. in binary form (e.g., serialized from the cor- In the case of the game needing to be rede- responding main memory object). Game state signed, a new object class would be created or changing events would then have to serialize an existing class would be changed. The game and write only the object(s) affected by those designers in this case would have to indicate events (Riddoch & Turner, 2002). For even which of the attributes would be frequently finer granularity storage can be at the attribute updated so they could be automatically put level. In this case, the database would contain in separate, frequently accessed tables by the a relation for each object class with each attri- persistence layer. When a new object would be bute of the class mapped to an attribute of the instantiated it would be inserted into the Object corresponding table. For example, the database table and the initial values of its frequently would contain a possible “Player” table with updated attributes would be inserted into the attributes such as player id, name, level, etc. A corresponding tables if present. Usually object change would then only needs to be written to changes only affect the frequently changing at- those attribute values that have been changed. tributes making only the corresponding attribute tables needing updating.

The object layer ensures completeness of PEER-TO-PEER information about every object in the world BASED MMOGS no matter how outdated it is. The attribute layer ensures currency of information about Concurrent players of MMO’s are currently the specific attributes it has been assigned to. reported in thousands. Naturally such amounts Following a shutdown or failure then first layer of concurrent users present challenges to the retrieves all objects from the Object table, and scalability of current architectures which are then reconstructs the latest persistent state by predominantly based on client-server models. reading the values from the frequently updated In such current architectures the only way to attributes tables of the attribute layer. cope with the ever growing player population This database structure retains in this is to increase the amount of dedicated servers. case both the flexibility of object level design This solution however is subject to bottleneck and the efficiency of attribute level design. If issues especially during peak loads because of the structure of an object were to change, this server capacity. Cost is also a consideration. change would automatically be reflected in the Peer-to-peer (P2P) architectures have object layer and the database will still be able emerged as a potential alternative design choice to support the game. Most updates are made on for building scalable MMO games. P2P archi- frequently updated, separately stored attributes tectures have proved successful in aggregating at attribute layer level. This supports efficient and sharing users’ resources for file sharing updates for those specific attributes but even if purposes in achieving high scalability in cost- attributes are not tagged as frequently updated, effective manners. Typically in P2P systems their changes can still be reflected but at a higher participating users/nodes organize themselves latency cost. in an overlay network. The overall system load In addition to the above Tveit et al. (2003) is then distributed among all nodes where each presents a platform that logs activities of node is equal to the next and acts as both client simulated players in MMOGs for data mining and servers. purposes. The platform utilises object-oriented This aspect of the overlay topology struc- application mapping to a relational database. ture that defines the way peers connect to other, Typically the mapping software will automati- usually, “neighbouring” peers with whom they cally generate a relational schema for a given can exchange messages is a key feature of P2P object-oriented model in these cases. In addition networks. The overlay topology configuration systems such as J2EE provide a persistence can be arbitrary or, alternatively, can be inspired engine that writes changes to database objects by application specific semantics and it is usu- automatically at specific time points. Applica- ally dynamic depending on current load as well tion developers only use the object models. A as other predetermined parameters. In a typical combination of such mapping software and Networked Virtual Environment, each user sees persistence layer could also be used for persis- only a pre-defined portion of the virtual world tence of game worlds but current solutions do known as the area of interest (AOI) where they not allow for approximate solutions which are can perform actions such as move around, col- particularly important for MMORPGs given lect objects or communicate with other users. their usually large update volume. A disadvan- The AOI can be based on geographic proximity tage of this approach is that changing the object where a user is only interested in the activities model, and thus, the database schema, would be happening within their own AOI. Because of potentially complex and costly. The two layered this, dynamic reorganisation of the overlay solution presented above by Zhang et al. (2008) network in sync with the users’ positions is provides both game state aware approximation essential. This is typically achieved by each and dynamic adjustments. user only connect to neighbouring users within their vicinity.

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 46 International Journal of Grid and High Performance Computing, 2(4), 40-50, October-December 2010

P2P MMOG PROTOCOLS on numerical IDs. Since it is not feasible for a node to know every other node of the network Several P2P protocols exist currently for a node based routing table is used to store MMOGs at least as research projects. Krause log(N) entries. When a message is sent to the (2008) categorises current P2P protocols as: network it is recursively routed to the routing ALM based protocols, super node based pro- table entry with the closest numerical ID. In tocols and mutual notification based protocols. this way Pastry guarantees that this will take a maximum of O(log(N)) steps. ALM Based Protocols When a node wants to join a multicast group it sends a JOIN message to the multicast Application Layer Multicast (ALM) protocols group’s ID. Every node routing this message communicate game events using standard ALM becomes a forwarder for the multicast group techniques. This means that in most cases, generating in this way a forwarding tree with the game world is divided into a number of a depth of O(log(N)). smaller subspaces. Each of these subspaces is Failing nodes are detected by their chil- represented by a dedicated multicast group. An dren nodes in the tree structure who rejoin the event taking place inside one subspace triggers overlaying network creating in this way a self a message that is subsequently sent via the repairing multicast tree. If the failed node is the subspace’s multicast group to all players that root of a multicast tree this can be detected by have subscribed to receive updates of events in the new root when it receives the JOIN mes- that particular subspace. Typically each player sages from the former children of the failed only needs to know about events within their node. Since any new potential root is always the visual range also known as the Area of Interest, second closest numerically node to the group or AOI. If a player’s AOI intersects the border ID a multicast root can use it to also backup to another subspace they also have to subscribe additional information about the subspace in to that other subspace’s multicast group for a advance. This allows the new multicast root to seamless, coherent gameplay experience. To take over the orphan multicast group seamlessly. avoid excessive generation of subscription and unsubscription messages when a player moves Supernode Based Protocols close to the border between subspaces a range parameter is introduced that can be adjusted to Similarly to ALM based protocols, supernode be slightly larger than the defined AOI. Players based protocols divide the game space into then only unsubscribe from a given subspace if subspaces. The subspaces can be either of that subspace is both outside their AOI as well fixed size or variable size, dynamically based as outside their unsubscription range. on player density. Again similarly to ALM In SimMUD which is a typical example of based protocols each subspace is assigned to an ALM based protocol the game world is di- a responsible supernode that will receive all vided into fixed regions with unique IDs. The ID game event messages for that subspace and dis- of the subspace is also unique and serves as the seminate them to all subspace member nodes. group ID of the associated multicast group. Any Nodes have to register at the supernodes of additional state related information required the subspaces of their interest or otherwise to for the region such as the location of pick up the nodes within the player’s AOI. Supernode items is stored at the root of the multicast tree. overload is prevented by limiting the number SimMUD relies on Scribe for the dissemi- of nodes per subspace. The subspace will auto- nation of multicast messages which is built on matically divide if too many nodes subscribe to top of the Pastry P2P routing protocol. Pastry its supernode. Fixed subspaces however require supports uniquely identified nodes that can send different mechanisms to manage supernode messages to arbitrary keys. These messages load. An example of this class of protocols is are then delivered to the closest node based PubSubMMOG (Krause, 2008).

Mutual Notification game. Game message event dissemination in Based Protocols P2P based protocols for MMOGs are done by the players. Their nodes are usually bandwidth Mutual notification based protocols in contrast limited therefore for any protocol to be success- to ALM and supernode based protocols, do not ful in maintaining an acceptable level of user divide the game world at all. Instead, player experience bandwidth demands must be kept nodes send game event messages directly to as low as possible and certainly under certain all other nodes inside the player’s AOI. This limits. Krause (2008) measured the difference minimizes message propagation delay. However between the creation time of a movement event a prerequisite for this is that all player nodes and the time of reception of that event by other have to be aware of all the other nodes inside players, as well as the bandwidth consump- the player’s AOI. Discovery of new players tion for the above approaches in simulations recently moved into a player’s AOI relies on and found that their values are dependent on information received by neighbouring nodes. the group size for different world and world An example of a mutual notification based cluster sizes. protocol is Vast. In Vast each player computes In Krause’s (2008) simulations Vast per- a Voronoi Diagram (Fortune, 1986) of all their formed best regarding message propagation known neighbours. Nodes whose Voronoi re- delay. This can be generalised for all mutual gion intersects the outer border of the player’s notification protocols as they share this one- AOI are then classified as boundary neighbours hop event communication principle. Mutual while nodes adjacent to the player’s own Voronoi notification also ensures that message delay is region are classified as enclosing neighbours. unaffected by player density (either global or Every time a player moves all neighbours local) as long as a node remains within their receive a notification about the movement at bandwidth capacity. As shown in their simula- which point they updates their Voronoi diagram. tions the bandwidth requirements for this type By calculating differences between the old and of protocol are mostly moderate. They increase the new diagram nodes can deduce whether one only when extreme local and global player of their neighbours have to be informed about density is experienced. Nevertheless, even in the newly discovered node. This however is these cases the bandwidth demand was shown generally the case only for boundary neighbours. to be smaller than those of the other protocols. In this situation traffic reduction is achieved by The same ALM based protocols showed means of checking being done only by boundary moderate delays and average bandwidth require- neighbours and with their enclosing neighbours. ments. They are generally easy to implement Joining nodes can enter the overlay simply and stable but do not cope well with high global by sending a JOIN message to the closest player. density, which subsequently leads to high band- Failed players can be detected from absence of width requirements and longer delays. movement messages. PubSubMMOG performed poorly pre- The two most important performance dominantly due to its poor timeslot design. An criteria for network protocols for MMOGs are important advantage over the ALM based Sim- event message propagation delay (the time an MUD was that the delay did not increase much event message takes to reach players and the with high player density. High local densities bandwidth requirements. Small event message only increased latencies logarithmically but delays are tolerable but high latencies have this advantage was found by the paper to be a significant negative impact on the players’ at the cost of higher bandwidth consumption. experience (Krause, 2008). High latencies can Krause (2008) concludes that mutual no- be very frustrating for players that can lead tification based protocols have been shown to to disappointment and loss of interest in the be the best performers for low-delay P2P based

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 48 International Journal of Grid and High Performance Computing, 2(4), 40-50, October-December 2010 multiplayer gaming however the problem of Approaches Against stability and the associated additional bandwidth Cheating in P2P MMOGs requirements need to be evaluated. Kabus et al. (2005) discuss the following approaches in addressing cheating in P2P CHEATING IN MMOGs: DISTRIBUTED GAMES

Kabus et al. (2005) explore cheating in MMOGs. Distributed State Dissemination. An approach In MMOGs cheaters usually try to gain an unfair that handles the processing of action re- advantage over other players by shortcutting quests in a similar manner to traditional achievements (which would otherwise take long client/server systems with the exception times to achieve or by duplicating items (e.g., that dissemination of state updates to the weapons, treasure items). This can significantly clients is done over a P2P delivery system disrupt in-game economics and affect player in order to save bandwidth. The global satisfaction for non-cheating players. Non- game state is still computed on the server. cheating players soon discover that keeping up Mutual Checking scheme. This scheme allows with cheaters is impossible and decide either to the global game state to be maintained in cheat themselves (which accelerates the game’s a distributed fashion on the clients but collapse) or to stop playing the game altogether. the global state is replicated on multiple In client/server systems, the server is responsible clients which are tasked with detecting for the overall state of the game which makes cheaters via regular comparisons of their it difficult to cheat via malicious game state own local versions. alterations. However, design flaws and code Log Auditing. This approach tries to detect bugs may still allow successful cheating. This cheating by analyzing signed log files of however is easier to control in a client server state transitions caused by updates. This environment as opposed in a P2P environment task can be done during off peak periods of where centralised control either doesn’t exist low activity but does not prevent cheating or is limited. before it happens. It does however allow Another kind of cheating is that of players the global game state to be computed in acquiring information not intended for them. P2P fashion amongst clients. A typical example would be the acquisition of Trusted Computing. This approach relies on the enemy positions. Although the enemy should player node being protected against any not be seen in a P2P environment the node is user manipulation (e.g., installing mali- aware of their position making it very difficult cious software for cheating purposes). to prevent access to this information through Kabus et al. (2005) conclude that this malicious code. is the ideal environment for distributed A potentially worse issue than cheaters is online games. griefing (Ma et al., 2009). Griefers are players determined to disrupt other players’ experience CONCLUSION as much as possible. This can be done without breaking any game rules, for instance, prohib- This paper provides an overview picture of the iting a player from acquiring targets by being current state-of-the-art of MMOGs’ network obstructive to their field of view. architectures and data replication. We have Protection against cheaters and griefers discussed various distributed client-server are mission-critical tasks, especially for P2P architectures with server allocation based on MMOGs. sharding, geographical partitioning, dynamic

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Grid and High Performance Computing, 2(4), 40-50, October-December 2010 49 partitioning schemes based on micro-cells Cronin, E., Kurc, A. R., Filstrup, B., & Jamin, S. or geographical dispersion of players which (2004). An efficient synchronization mechanism heavily depends on the time of day. We also for mirrored game architectures. Multimedia Tools and Applications, 23(1), 7–30. doi:10.1023/ reviewed several approaches on data replica- B:MTAP.0000026839.31028.9f tion and region partitioning. Management of areas of interest is discussed, which has been De Vleeschauwer, B., Van Den Bossche, B., Verdickt, T., Turck, F., Dhoedt, B., & Demeester, P. (2005). used in all MMOGs to reduce processing load Dynamic microcell assignment for Massively Mul- by culling events that occur within players’ tiplayer Online Gaming. In Proceedings of 4th ACM field of vision. We also investigated relational SIGCOMM workshop on Network and system support database design for MMOGs, focusing on perfor games (pp. 1-7). New York: ACM. sistent storage of game states. Recent attempts Fortune, S. (1986). A sweepline algorithm for voronoi of P2P architectures and protocols for MMOGs diagrams. In Proceedings of the second annual sym- are reviewed as well and critical issues such as posium on Computational geometry (pp. 313-322). cheat prevention on P2P games are highlighted. New York: ACM. Kabus, P., Terpstra, W. W., Cilia, M., & Buchmann, A. P. (2005). Addressing cheating in distributed REFERENCES MMOGs. In Proceedings of the 4th ACM SIGCOMM Workshop on Network and System Support for Games Ahmed, D. T., & Shirmolammadi, S. (2008). A dy- (pp. 1-6). New York: ACM Press. namic area of interest management and collaboration Krause, S. (2008). A survey of P2P protocols for Mas- model for P2P MMOGs. In Proceedings of the 2008 sively Multiplayer Online Games. In Proceedings 12th IEEE/ACM International Symposium on Dis- of the 7th ACM SIGCOMM Workshop on Network tributed Simulation and Real-Time Applications (pp. and System Support for Games (pp. 53-58). New 27-34). Washington, DC: IEEE Computer Society. York: ACM. Beskow, P. B., Vik, K., Halvorsen, P., & Griwodz, C. Lu, F., Parkin, S., & Morgan, G. (2006). Load (2008). Latency reduction by dynamic core selection balancing for massively multiplayer online games. and partial migration of game state. In Proceedings In Proceedings of 5th ACM SIGCOMM workshop of the 7th ACM SIGCOMM Workshop on Network on Network and system support for games (Vol. 1). and System Support for Games (pp. 79-84). New New York: ACM Press. York: ACM. Ma, M., Oikonomou, A., & Zheng, H. (2009). Second Blizzard Entertainment. (2004). World of Warcraft. Life as a learning and teaching environment for digital Retrieved March 4, 2010, from http://www.bliz- games education. In M. Lombard et al. (Eds.), Pro- zard.com ceedings of the 12th Annual International Workshop Chambers, C., Feng, F., & Saha, D. (2003, Novem- on Presence (PRESENCE 2009), Los Angeles, CA. ber). A geographic redirection service for online Mauve, M., Vogel, J., Hilt, V., & Effelsberg, W. games. In Proceedings of the 11th ACM interna- (2004). Local-lag and timewarp: Providing con- tional conference on Multimedia, Berkeley, CA sistency for replicated continuous applications. (pp. 227-230). IEEE Transactions on Multimedia, 6(1), 47–57. Claypool, M. (2005). The effect of latency on user doi:10.1109/TMM.2003.819751 performance in real-time strategy games. Elsevier Quax, P., Dierckx, J., Cornelissen, B., Vansichem, G., Computer Networks, 49(1), 52–70. doi:10.1016/j. & Lamotte, W. (2008). Dynamic server allocation in comnet.2005.04.008 a real-life deployable communications architecture Claypool, M. (2008, January). Network characteris- for networked games. In Proceedings of the 7th tics for server selection in online games. In Proceed- ACM SIGCOMM Workshop on Network and System ings of the fifteenth Annual Multimedia Computing Support for Games (pp. 66-71). New York: ACM. and Networking (MMCN’08), San Jose, CA. Riddoch, A., & Turner, J. (2002, July). Technologies for building open-source massively multiplayer games. In Proceedings of the UKUUG Linux De- velopers Conference, Bristol, UK.

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 50 International Journal of Grid and High Performance Computing, 2(4), 40-50, October-December 2010

Russell, G., Donaldson, A., & Sheppard, P. (2008). Tveit, A., Rein, O., Iversen, J. V., & Matskin, M. Tackling online game development problems with (2003). Scalable agent-based simulation of players a novel network scripting language. In Proceedings in massively multiplayer online games. In Proceed- of the 7th ACM SIGCOMM Workshop on Network ings of the 8th Scandinavian Conference on Artificial and System Support for Games (pp. 85-90). New Intelligence, Frontiers in Artificial Intelligence and York: ACM. Applications. IOS. Second Life Wiki. (n.d.). Second Life server archi- Wadley, G., & Sobell, J. (2007). Using a simple tecture. Retrieved March 10, 2010, from http://wiki. MMORPG to teach multi-user, client-server database secondlife.com/wiki/Server_architecture development. In Proceedings of the MS Academic Days Conf. on Game Development. The9. (2007). World of Warcraft: The Burning Crusade surpasses 800,000 peak concurrent user Zhang, K., Kemme, B., & Denault, A. (2008). Per- milestone in mainland China on Oct 4, 2007. Re- sistence in Massively Multiplayer Online Games. In trieved March 10, 2010, from http://www.the9.com/ Proceedings of the 7th ACM SIGCOMM Workshop en/about/about_2.htm on Network and System Support for Games (pp. 53-58). New York: ACM.

Minhua Ma is a Reader in Visualisation & Virtual Reality and Programme Leader for MSc Computer Games Production at the University of Derby. She completed her Doctorate in Computer Science from the University of Ulster in 2005 and MSc in Computing Science from the University of Newcastle upon Tyne in 2001. Her research areas include Virtual Reality, 3D visualisation, serious games, and Natural Language Processing. Her principal lines of work have been published in 40 peer-reviewed scientific journals as well as conference proceedings. She has received grants from the NICHSA for her work on Virtual Reality in rehabilitation and a number of other grants for her research in visualisation and games. She has been supervising Ph.D. students in video games, digital watermarking and e-learning. Dr. Ma is an editor of the Journal of Computer Science and Information Technology, guest editor of a number of journal special issues, and has served on the Editorial Board of SJI and numerous conference committees.

Andreas Oikonomou is a lecturer and computer games subject coordinator at the University of Derby. Previous to his current role Dr. Oikonomou managed Derby Games Studio, the University of Derby’s commercial games studio and has also worked for the University’s Business Devel- opment Unit as Quality Assurance manager. Before joining Derby he was a game development lecturer and researcher at Coventry University and taught games development and interactive multimedia at Coventry City College. He holds a PhD in Educational Multimedia, a Master's degree in Information Technology for Management and a BSc (Hons) degree in Engineering. His current interests include distributed games, game design, game based learning and assess- ment, real-time rendering, game engines, interactive multimedia, biomedical engineering and business management.

Managing Inconsistencies in Data Grid Environments: A Practical Approach

Ejaz Ahmed, King Fahd University of Petroleum and Minerals, Saudi Arabia and University of Bedfordshire, UK Nik Bessis, University of Bedfordshire, UK Peter Norrington, University of Bedfordshire, UK Yong Yue, University of Bedfordshire, UK

ABSTRACT

Much work has been done in the area of data access and integration using various data mapping, matching, and loading techniques. One of the main concerns when integrating data from heterogeneous data sources is data redundancy. The concern is mainly due to the different business contexts and purposes from which the data systems were originally built. A common process for accessing data from integrated databases involves the use of each data source’s own catalogue or metadata schema. In this article, the authors take the view that there is a greater chance of data inconsistencies, such as data redundancies when integrating them within a grid environment as compared to traditional distributed paradigms. The importance of improving the data search and matching process is briefly discussed, and a partial service oriented generic strategy is adopted to consolidate distinct catalogue schemas of federated databases to access information seamlessly. To this end, a proposed matching strategy between structure objects and data values across federated databases in a grid environment is presented.

Keywords: Data Integration, Grid, Metadata, Pattern Matching, Plug-In Relations, Staging DBMS

INTRODUCTION data (multi-vendor relational, object-relational DBMSs) within a grid environment since data The role of traditional data integration (Bessis sources have been originally produced for a et al., 2007; Austin et al., 2006) and loading purpose other than their integration (Bessis techniques and methods requires more attention et al., 2009). This is our main motivation for when functioning within the grid environment. undertaking this research. One reason is the higher chance for conflict be- The grid is an emerging infrastructure that tween metadata and structures when integrating supports the discovery, access and use of distributed computational resources (Alpdmir et al., 2003), including data integration in the de-facto DOI: 10.4018/jghpc.2010100105

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 52 International Journal of Grid and High Performance Computing, 2(4), 51-64, October-December 2010

OGSA-DAI (Open Grid Service Architecture data matching and mapping framework includ- – Data Access Integration) specification frame- ing a linguistic matching approach; thirdly, to work. Significant effort has gone into defining discuss in full our methodological approach for requirements, protocols and implementing the a grid based matching process using a number OGSA-DAI specification framework as the of developed metadata algorithms. We finally means for users to develop relevant data grids conclude by presenting some pilot experimental to conveniently control the sharing, accessing results and further work. and management of large amounts of distributed data in Grid environments (Antonioletti et al., 2005; Atkinson et al., 2003). Ideally, OGSA- LITERATURE REVIEW DAI as a data integration specification aims The section presents an overview of and a to allow users to specify ‘what’ information comparison between matching approaches. is needed without having to provide detailed Rahm and Bernstein (2001) developed tax- instructions on ‘how’ or ‘from where’ to obtain onomy of schema matching approaches which the information (Reinoso Castillo et al., 2004). for some remains the significant contribution in However, mapping multiple physical replicas to this field. The taxonomy consists of two major one single logical file increases data redundancy branches (Figure 1): (a) individual matcher (Jacob et al., 2005). Yin et al. (2009) also ex- approaches and (b) combining matchers. Re- plain that running join queries over a data grid garding combining matchers, hybrid matchers environment requires appropriate strategies integrate matching criteria prior to mapping, as decomposing and disseminating the query whereas composite matchers integrate the to as many as possible sources, as processing results of individual matchers post-mapping. the user’s query in parallel will also bring in The description of individual matchers is from overhead in repetitive computing, redundant (Rahm & Bernstein, ibid). data transmission, and result merging. In this article, our main contribution is the discussion and proposal of a schema match- • Instance vs. schema: matching approaches ing exercise for identifying two objects that can consider instance data (i.e., data con- semantically relate them, while we refer to tents) or only schema-level information. mapping as the transformations between the • Element vs. structure matching: match objects concerned. That is to say when data can be performed for individual schema is accessed it will attempt to virtualize and/ elements, such as attributes, or for com- or transfer them within the target source. It is binations of elements, such as complex anticipated that if there is more than one ac- schema structures. cessing object in the same or dispersed data • Language vs. constraint: a matcher can use source(s), like DBMSs, then it is necessary to a linguistic based approach (e.g., based on define matching patterns to access the correct names and textual descriptions of schema objects concerned with their properties. In our elements) or a constraint-based approach present discussion, a schema is treated as a set of (e.g., based on keys and relationships). elements connected by some structure. A target • Matching cardinality: the overall match database schema requires access to certain data result may relate one or more elements of from many objects (or elements of a schema) one schema to one or more elements of of various distinct schemas of databases with the other, yielding four cases: 1:1, 1:n, n:1, the help of common sources’ metadata. n:m. In addition, each mapping element With this in mind, the main intention of may interrelated one or more elements our article is multi-fold: firstly, to offer a brief of the two schemas. Furthermore, there review of data matching exercises; secondly, to may be different match cardinalities at the present our proposed strategy by discussing our instance level.

Figure 1. Individual and combining schema matching approaches (from Rahm & Bernstein, 2001)

• Auxiliary information: most matchers rely Other considerations may further enrich not only on the input schemas S1 and S2 understanding and development of the data but also on auxiliary information, such matching approaches. Calvanese et al. (1999) as dictionaries, global schemas, previous employs three types of interschema correspon- matching decisions, and user input. dences for data integration and reconciliation in data warehousing. We note that Rahm and Sellami et al. (2010, p. 18) consider Rahm Bernstein (2001) fulfil the role of the “matching and Bernstein’s (2001) and other approaches to correspondences” type: be limited to small- and medium-scale matching problems, but to be insufficient for large-scale • Conversion Correspondences are used problems over schemas or ontologies with to specify that data in one source can be hundreds or thousands of components. They converted into data of a different source present a classification of approaches to large- or of the data warehouse, and how this scale problems under pair-wise and holistic conversion is performed. They are used approaches, employing a range of strategies, to anticipate several types of data conflicts such as fragmentation, clustering and statisti- that may occur in loading data. cal; it should be noted that their terminology • Matching Correspondences are used to for approaches and strategies (inter alia) is not specify how data in different sources can consistent. Pair-wise approaches (ibid, p. 18) match. attempt to construct an integrated schema for • Reconciliation Correspondences are used two sources, using any of a variety of strate- to assert how we can reconcile data in differ- gies to divide large and complex sources into ent sources into data of the data warehouse. manageable sub-problems. Holistic approaches (ibid, p. 21) attempt to match multiple schemas Equally, Haas et al. (2002, p. 586) raise at the same time. some interesting questions which will have a

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 54 International Journal of Grid and High Performance Computing, 2(4), 51-64, October-December 2010 direct impact on the effectiveness and efficiency elements. A schema often contains constraints to of querying databases of significant size and define data types and range values, uniqueness, distribution, namely: optionally relationship type, cardinalities, etc. If two input schemas contain such information, • Is transactional consistency important for it can be used by a matcher to determine the the integration problem? similarity of schema elements (Larson et al., • Is there a sophisticated cost model associ- 1989). For example, similarity can be used ated with the federated operations? on the equivalence of data types and domains including key characteristics (e.g., unique, The first of these is important where, for primary, foreign), relationship cardinality (e.g., example, data may change between queries, 1:1 relationships) (Rahm & Bernstein, 2001). even though these are close in time, leading However, some matching is based on the objects’ to different queries returning time-dependent name assigned by the user, like a storage table results, which may cascade effects. The second name or some other attribute name. is a natural consequence of sharing distributed It is assumed that the generic implementa- resources, physical and logical, all of which have tion of a matching mechanism represents the associated costs, notwithstanding any altruistic schemas to be matched in a uniform internal data-sharing philosophy. representation. This uniform representation According to the authors’ knowledge, a reduces the complexity of the heterogeneity topic which has not been much discussed in of various schemas or databases. To reduce the literature is the positioning of the matching heterogeneity, various tools are available that process within the overall query-response cycle. are tightly integrated with the framework of In this article, for example, we approach match- uniform representation of schemas. Some other ing as occurring prior to a query being accepted, tools need import/export programs to translate and that matching resulting in an integrated data between their native schema representation source, rather than, say, a query being parsed (such as XML, SQL, or UML) and the inter- and presented to different data stores in ways nal representation (Rahm & Bernstein, 2001). relevant to those particular stores. Similarly, many algorithms depend mostly on To this end, a wider review of the litera- the kind of information they exploit, but not on ture is beyond the scope of this article, but it generic representations. is clear that this is a domain rich in challenges As defined by Cali et al. (2002), G repre- for definition, description, classification and sents global or target schema and S represents a integration of approaches to provide useful and source schema. A data transformation system Г is a triplet Г = (G, S, M ), where M is a map- scalable solutions. G,S G,S ping between source (S) and target schema (G). We propose an iterative integration-by-example FRAMEWORK OF DATA paradigm by introducing another middle-tier MAPPING AND MATCHING temporary schema called the staging schema (Ahmed et al., 2008) or a staging DBMS as a In this section, we illustrate formalization of database in the grid environment. To achieve data matching and data mapping, which is this, we consider following example. based on three heterogeneous data models (relational, object-oriented and object-relational). Example-1: Consider an example of source and We follow basic nomenclature and notations target schema by choosing the triplet Г1 as described in the data integration context 1 1 1 = (G , S , M G,S). The source and target and more specifically as proposed by Cali et are constituted by the relation symbols, al. (2002). We assume that different constants as shown in Figure 2. between data sources denote different objects or

Figure 2 depicts a schema matching by attribute SPrice is computed as mentioned in identifying how elements are semantically re- equation (1). lated. Underlined attributes represent primary Two schemas S and G represent a similar key constraints and primary key names appear- domain application but in a different business ing in other relations represent foreign keys. scenario or local requirement of the system as We define a mapping to be a set of mapping given in equation (1). A mapping from source elements, each of which indicates that certain Client.Client# to target Customer.CID can be elements of source schemas S1 are mapped to written as a mapping expression or value cor- certain elements in target G1. Each mapping respondence. That is, element could have a mapping expression Customer.CID = Client.Client# related elements. In this mapping, some value Similarly, correspondences are injective indicating that Orders.OType = PurchaseOrder.Invoice- the mappings are 1:1. Some other value cor- Type respondences are surjective as the product of Also, in equation (1), a function formula two values (attributes) from the sources or there will be could be a function applied to one or more Sold.SPrice = OrdProd.SoldPrice – (Ord- attributes of source schema. For example, if Prod.SoldPrice * OrdProd.Discount) f: S → G then f: SoldPrice – (SoldPrice * and so on; other correspondences are shown Discount) → Sold.SPrice (1) in Figure 2. Where instance values of attribute Discount This work provides a basis of a simple contains percentage values that are given to a mapping that exists between two schemas product at the time of sale in source schema. whereby correspondences are created with Whereas such kind of discount is not explicitly attributes from schemas matching objects or indicated in a target schema, instead target relations. In mapping, issues of data anomalies

Figure 2. Scenario of source and target schemas’ elements mapping

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 56 International Journal of Grid and High Performance Computing, 2(4), 51-64, October-December 2010 and redundancies are most likely to occur. Such as an attribute, has a certain specific meaning in issues will be explored in the following section. one object. Based on its corresponding match, it may be interpreted differently in another schema or data source segment. For example, LINGUISTIC MATCHING the size of data values of matched attributes can APPROACHES be different. Similarly, conditional scenarios of ‘Male’ and ‘Female’, in one schema data values Linguistic matching or language-based uses are ‘M’ and ‘F’ whereas in other corresponding names and text in the form of words or sen- schema data values could be ‘0’ and ‘1’. The term tences to find semantically similar schema redundancy may mean that a matching attribute elements. Similarity of names can be defined, may occur in more than one data source segment recognized and measured in various ways; some during the search. Database normalization is key examples are found in Rahm and Bernstein used to consolidate such kinds of discrepancy (2001) and Bell and Sethi (2001): or information anomaly. • Equality of canonical name representations using special prefix/ suffix symbols. For GRID BASED FRAMEWORK example, Client# → Client Number or OF MATCHING ClientNo → Client Number, ProdID → Product Identifier. Now we move towards a kind of generic or • Equality of synonyms. For example, car uniform methodology that can help to reduce ≅ vehicle or car ≅ automobile and make ≅ the conflict of heterogeneous representations brand or model. of schemas/ databases. Consider a set of het- • Similarity of names based on common erogeneous databases (mainly DBMSs) that are substrings, edit distance, soundex (an en- integrated through a staging DBMS. At both coding of names based on how they sound levels of this DBMS software diversity – the tool rather than how they are spelled) etc. For and database levels – there exists the problem example, CR_amount ≅ Credit, ShipTo ≅ of communication between the software. The Ship2, OrderType ≅ ShipmentType, repre- DBMSs usually do not understand or are unable sentedBy ≅ representative. to communicate with each other (Rezenda et al., • User-provided name matches. For example, 1999). To resolve the heterogeneity problem, reportTo ≅ supervisorId, reportTo ≅ man- federated databases will communicate only via ager, error ≅ bug. a staging DBMS. All such databases are con- • Equality of hyponyms. For example, book nected with a staging DBMS using connectivity is-a publication and article is-a publica- drivers like ODBC and ODBC-JDBC. The stag- tion implies book ≅ publication, article ≅ ing DBMS will provide a service to make data publication, and report ≅ article. sharing seamless. Such sharable data services include data loading, data transformations, data Consequently, a linguistic analysis with matching and temporary data storage services comments can be produced and provide com- which are further explained in the following mentary about each schema element, for ex- examples. ample, for attributes in source schema S1 which contains Cust#, which may mean customer Example-2: A user can access a public schema of one or more databases. A web interface number or identifier, and S2 which contains CustAddress, which may mean customer ad- is provided with a default connection with dress or customer’s contact details. the staging DBMS. A user can write a An issue of data inconsistency in terms of query on the chosen database. A tempo- anomalies will occur if some information, such rary storage buffer of the staging DBMS

will be used to keep data fetched from and δ represents an instance or domain value any of the integrated databases. that needs to be searched in structure elements s when ψ = d. It represents a set of structure We consider a function f that is directed elements such as relation, attribute, when ψ = s. from a staging DBMS to any federated database. Let g be a function on ρDB which returns Refer to Figure 3; if x is a string initiated at possible similar matched value(s) that is ≅ ⊆ staging database (X) that is searched from each str gi (f(str), ηj) ^ str gi (f(str), ηj) (3) database (Y) then f can be mapped as where ηj represents a set of structure elements i from schema S of database DB through which fi: X → Y, i = 1, 2, 3, …., n The range Y may have more than one image possible match of string str is found or return as of x ∈ X, i.e., more than one domain match an output and ηj = φ means no match is found. exists in range Y. The possible matched values will be mapped in a staging database X defined Example-3: Consider searching for a telephone by function g as number “8606015” need to be searched i gi: Y → X, i = 1, 2, 3, …., n in a number of federated databases in a Also, x ≅ φ means no match found. To clarify grid environment. Then function f can the granularity issue of matching as discussed be written as above using functions f and g, we are defining a function ρ , to perform a search for an ele- DB f(str) = ρDB(“8606015”, d, {telephone ment string str from schema S of database DB number, telno, tel#, fax#, mobile#}) with a certain matching criteria Then returned values of this function as

f(str) = ρDB(str, ψ, δ) (2) defined in equation (2) would be where ≅ str g1(f(str),{S1.Client.telno}) such that

e.g., t[S1.Client.telno] = “8606015”,  str ≅ g (f(str),{S .Customer.fax#, d for instance or domain value 1 2 Ψ =  S .Employee. HomeTelNo}).where S , S , s for structure ellement 2 1 2  …., Sn represents schemas of some database DB. It is possible for several attributes to have the same domain. For example, suppose that

Figure 3. Searching patterns of string element name from federated DBMSs

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 58 International Journal of Grid and High Performance Computing, 2(4), 51-64, October-December 2010 we have a customer entity having the three Although it is considered beyond the scope of attributes of customer-name, customer-street this work, we assume that DBMSs having no and customer-city, and similarly, an employee metadata could use XML to export data and having the employee-name attribute. It is pos- schema structures which could be used as meta- sible that the attributes customer-name and data for integration and interoperable purposes. employee-name will have the same domain; Information about object structure can the set of all person names, which at the physi- be element names, data types and constraints, cal level is the set of all character strings. The etc. For example, in a relational DBMS table domains of balance and balance-name, on the names have their attributes with data types and other hand, certainly ought to be distinct. It is constraints. The methods of accessing such perhaps less clear whether customer-name and metadata information are not generic in access- branch-name have the same domain. ing heterogeneous DBMSs. Implementation of It is possible that the same credit card mapping requires a generic solution of accessing number may appear in more than one federated metadata information especially when DBMSs database. It can even appear in more than one exist in a grid environment or DBMSs act as relation of a schema(s). For example, in one federated databases. To handle such conflicts relation it appears so as to keep master infor- of mapping implementation, introducing a new mation. The same credit card number appears approach such that each federated DBMS con- in some other relation with defaulter status or tains a public schema segment whereby read fraud status. Such occurrences can be found with privileges of other schema objects are granted. the help of the above matching methodology Creating a schema for the metadata catalogue using relation (4). in a staging DBMS that is a replica of all feder- For a search of single match in grid da- ated metadata catalogues is shown in Figure 4. tabases, a number of grid databases (d) (say Such a metadata catalogue in staging DBMS is n-items) are involved. Each of them contains termed the SCAT (Staging Catalogue). a number of schema objects (s) say m-items, On the global scale, it is expected that the each schema object may contain number of amount of data flowing or data search into or relations or tables (t) say p-items, and one or around integrated grid DBMSs could be of the more attributes (a) say q-items of relation(s) will order of a terabyte. Any data information can be scanned. Thus, the total number of object be searched in the form of pattern matching items involved in the search match would be that would be possible in two phases. In quick search, a pattern will be searched initially n m p q through the SCAT of the staging database. This ()A→ ( n x m x p x q − items) ∑∑∑∑ dsta will generate a profile of possible searched d=1 s=1 t=1 a=1 data-elements as an output to perform further granular searches. The Pattern Matching Controller is the This indicates the granularity and complex- front-end service for pattern matching op- ity of searching a number of objects. erations across the data held at each federated DBMS. It accesses all objects of each federated IMPLEMENTATION OF DBMS which was searched from the SCAT, as STAGING METADATA shown in Figure 4. In the staging database, the Pattern Match- Our assumptions are that all DBMSs involved ing Log service includes object details where during the data integration process do contain patterns exist and in which form as shown Figure metadata and the user has the privileges to access 4. A search pattern can be a data value; it can and use metadata to extract general information be the name of some relation or an attribute. about the basic structures of database objects. Similarly, the log service maintains the list

Figure 4. Pattern match data management architecture

of patterns that are included in the search but name or linguistic matching, and the descrip- these are not found in federated DBMSs. These tion matching. services can be additional building blocks of It would be possible to indicate canonical OGSA-DAI. name matching in TableDetails.ShortDesc. The implementation procedure is involved Also, more than one similarity can be used in programming using JSP (Java Server Pages) attribute ShortDesc or in DetailDesc. These using service based programming with multi descriptive attributes of plug-in relations pro- vendor databases. vide matching information when the mapping process will be performed as shown in relations Metadata Catalogue Algorithm (4) or (5). The staging DBMS SCAT will also contain a most recent replica of public schema The SCAT in the form of metadata, contains segments of each federated DBMS. We believe the structure of relations (4) which are called that such a service will significantly improve plug-in relations. These plug-in relations pro- data integration efficiency since initially in- vide similar information as standard metadata formation searching will be performed prior to dictionaries of any DBMS but some auxiliary the scanning of the element from all federated information is included to improve the ef- DBMSs. fectiveness of mappings. Note that users can customize plug-in relations. We will use the Semantics of Metadata Catalogue concept of granularity of match, i.e., element level and structure level (Rahm & Bernstein, In order to define the semantics of a plug-in 2001), with the following plug-in relations. relation, we start by introducing the following DataTable (TableName, SchemaName, two relations in a source DBMS. Any feder- Description, DateCreated) ated schema or SCAT establishes a mapping TableDetails (TableName, SchemaName, with the relations of (5), to search the correct Serial#, Attribute, DataType, Size, Constraints, schema objects such as tables or relations γ for ShortDesc, DetailDesc) (4) required data access. These relations will act as a pool and a tDef(tn, sn, des, dc), tDet(tn, sn, sno, at, library encompassing enterprise dictionary dt, sz, cs, sd, dd) (5)where tDef.t1[tn, sn] ≠ and/or schema taxonomies. Underlined attri- tDef.t2[tn, sn], tDet.t1[tn,sno] ≠ tDet.t2[tn,sno], butes represents unique key constraints. It is tDet[tn] ⊆ tDef[tn] and description of each noted that attributes Descriptions, ShortDesc attribute of (5) is given in (4). and DetailDesc play an important role for the For every attribute a in each relation γ of schema C, we associate three functions: for t ∈

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 60 International Journal of Grid and High Performance Computing, 2(4), 51-64, October-December 2010 tDet, t[at] = a. On attribute a, defining functions leading physician information system (known ∂ to find a table name such that t[tn] = ∂(γ, C); as medcare) which provides clinical records, ξ to find a schema name such that t[sn] = ξ(sn, billing and work flow features. The second S); ρ to find an attribute name such that t[at] one (H2) is a clinical care system dedicated = ρ(γ,a); δ to find a data type such that t[dt] = to patient diagnostic illness. The third system δ(γ,a); μ to find a size such that t[sz] = μ(γ,a); η to (H3) is a PatAziz patient treatment database. find constraints such that t[cs] = concat[η(γ,ai)], To validate plug-in relations metadata for some attributes ai i.e., containing constraints search algorithm, it is desired to populate i.e., η(γ,ai) ≠ φ. database integration systems of hospitals with The attributes sd, dd will be used to store OGSA-DAI. The integration of this system multiple names or similar meanings which are is named ArabGrid. Information exchanged text details entered by the user with schema sn. between the systems includes data search, Also, sno is a sequence number starting with examination reports, diagnosis and medical 1, incremented by 1 for the same tn, denoted histories (Non-disclosure agreements do not by i. We can write for t ∈ tDet, for relation γ allow revealing details about these systems). and attribute a The hospitals are integrated with a staging t = {, , , DBMS and OGSA-DAI framework to provide , , , , , } medical doctor being a user forms an integrated Once possible schema objects or relations view of his medical records scattered across are found, instance level matching will then these hospitals. A patient’s personal information help to further boost the confidence in matching and diagnosis details such as medical history, results. At this level, linguistic and constraint surgical history and allergies etc. are recorded based characterization of instances is useful. For in different contexts in these hospitals. Before example, using linguistic techniques, it might a performing medical diagnosis search of some be possible to look at Client, ClientName and patient, the user will search the patient name Ename instances to conclude that ClientName is in ArabGrid. This search is performed initially a better match candidate for Client than Ename. from staging plug-in relations of the staging DBMS to browse the structure of database objects such as tables or column names, as shown EXPERIMENTS AND in Figure 5. If the string name is redundant in DISCUSSION more than one database, the user decides to choose an appropriate to search actual informa- The experimental design has been chosen and tion. Once the user finds an appropriate database carefully tested with an understanding of data where the search item is located, a query can matching discussed in our framework and an be performed on a chosen table name to access algorithm. Experimental development stages personal data of that particular patient from a included task identifications based upon design database. This can be done with patient name, and analysis review, a successful strategy that date of birth, etc. The same patient with differ- involved long hours of observations and users ent identifiers can be searched from more than interviews. This helps us to understand how to one database (or hospitals). This experimental utilize the framework in the form of experiment. validation helps a user to explore the patient’s It has been validated as an approach to match- medical history from federated databases. A ing for legacy system integration using three history of matching search is recorded for real example applications, received as in-kind future data search and data can be loaded or contributions from collaborators H1, H2, H3. extracted into the user profile. It is noted that One of these products or hospitals (H1) is a volume of data in an integrated database is very

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Grid and High Performance Computing, 2(4), 51-64, October-December 2010 61 large, in the order of a terabyte. The plug-in not affect the routine database transactional relations of ArabGrid help initially in searching operations. the data quickly. An ArabGrid experimental infrastructure of the staging DBMS is based on Oracle9i/ 10g FUTURE MAPPINGS Enterprise Edition database running on MS AND MATCHING WITH Windows 2003 Advance Server on a dedicated GRID SERVICES Dell PowerEdge 4400 server. The server has It is expected that this service of SCAT can be two 1 GHz Xeon processors, 4 GB of main implemented by extending the metadata held in memory, and a PERC3DI RAID controller the SCAT. Since the SCAT and data catalogue producing about 240 GB of RAID 5 storage will no longer need to be harmonized, this will over eight 36 GB U3 SCSI disks. The hetero- improve data integrity as described in (Austin geneous DBMSs used at grid nodes are MySQL, et al., 2006). As OGSA-DAI have a relatively Oracle, MS SQL Server etc. Database instal- simple operational model, it should be rela- lation and configuration includes interaction tively straightforward to map it onto whatever with OGSA-DQP version 3.0 axis 1.4 for emerges as the preferred way of building grid generating a realistic grid environment. applications (Antonioletti et al., 2005). The A web-based prototype system is par- SCAT service framework presented in this ticularly designed to depict how various com- article complies with the OGSA-DAI frame- ponents of the framework are implemented. work. This is achieved with the development The OGSA-DAI approach offers the way the of a new service called GSCATS (Grid Staging integration is managed, generated and present- Catalogue Service) using the common GDQS ing according to user preference and expertise. (Grid Distributed Query Service) instance for A consolidated integration of ArabGrid and multiple database connections. GSCATS im- OGSA-DAI systems is presented in the form proves the GDS (Grid Data Services) interface of a federated grid browser. Based on the web of (Alpdemir et al., 2003). When a GSCATS is interface middleware design, the user will have set up, it interacts with the appropriate registries more guiding and navigational facilities. to obtain the metadata information from the As discussed, the string match on name staging DBMS, then passes on required reg- may have several possible searches in federated istry information of matching string to GDQS databases. A data inconsistency in the form of instances. GDQS instances interact with GQESs redundancy occurs when name is defined as (Grid Query Evaluation Services) over multiple first name, full name or just a name as shown in execution nodes in the grid. Figure 5. It is also noted that name is also defined The low-level matching mechanism pro- for generic medicine name. Our proposed work vides more fine-grained control over precise for the matching algorithm (plug-in relations) and target data sharing as compliant with extracts such information from the federated OGSA-DAI. Performance issues using services database for decision purposes. This work also can be further explored on the basis of the total contains metadata information of various data number of accessed objects such as databases, elements, such as tables, attributes and data schema etc. instances. The user has a choice to perform quick and loose granular search because of large data volumes. The user can clarify search CONCLUSION patterns by using queries from databases. Data in plug-in relations is updated regularly using Schema mapping and matching is a basic an automatic loading service. This loading does problem in many database application domains,

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 62 International Journal of Grid and High Performance Computing, 2(4), 51-64, October-December 2010

Figure 5. A sample of pattern match search from grid environment

such as heterogeneous database integration, grid ACKNOWLEDGMENT database, e-commerce, data warehousing, and semantic query processing. The key to progress The first author would like to thank King Fahd in the coming years is to create an extensible University of Petroleum & Minerals for their and open infrastructure that can incorporate excellent research facilities. these advances as they become available (Forster & Kessleman, 2004); such an open infrastructure is introduced in this research. In REFERENCES the grid environment, the data is usually located in distributed heterogeneous data sources. Such Ahmed, E., Bessis, N., Yue, Y., & Stephens, D. data are managed, recognized and organized (2008). Data loading and mapping using staging DBMS in the grid. In Proceedings of the 21st IEEE locally by data content providers, multi-vendors Annual Canadian Conference on Electrical and of the sources, which have no ability to match Computer Engineering (CCECE), Ontario, Canada or map data fully across federated DBMSs. (pp. 1887-1893). We have presented a practical approach to Alpdemir, M. N., Mukherjee, A., Foster, I., Paton, overcome database heterogeneity via the use of N. W., Watson, P., Fernandes, A. A. A., et al. (2003). a purposefully uniform strategy. This practical Service-based distributed query processing on the approach includes pilot experimental verifica- grid. In Proceedings of the 1st Intl. Conference on tion and validation. For a sophisticated grid Service-Oriented Computing (ICSOC) (pp. 467-482). environment, a large number of data nodes Antonioletti, M., Atkinson, M., Baxter, R., Borley, (such as hospitals) can be added in an experi- A., Hong, N. P., & Collins, B. (2005). The design mental federation of systems (ArabGrid). Our and implementation of grid database services in strategy is based on the granularity feature of OGSA-DAI. Concurrency and Computation, 7, 2–4. data matching and mapping that helps to find Austin, J., Davis, R., Fletcher, M., Jackson, T., Jessop, candidate information which has similarities in M., Liang, N., & Pasley, A. (2005). DAME: Search- structure definition or at instance level. In this ing large data sets with in a grid-enabled engineering way, users can access redundant or multiple application. Proceedings of the IEEE, 93, 496–509. doi:10.1109/JPROC.2004.842746 pattern occurrences of a match. These redundant patterns can be further clarified by using Austin, J., Turner, A., & Alwis, S. (2006). Grid queries on federated databases. enabling data de-duplication. In Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing (pp. 2-8).

Bell, G. S., & Sethi, A. (2001). Matching records in a Haas, L. M., Lin, E. T., & Roth, M. A. (2002). Data national medical patient index. CACM, 44(9), 83–88. integration through database federation. IBM Systems Journal, 41(4). doi:10.1147/sj.414.0578 Bessis, N. (Ed.). (2009). Model architecture for a user tailored data push service in data grids. In Grid Jacob, B., Brown, M., Fukui, K., & Trivedi, N. (2005). technology for maximizing collaborative decision Introduction to grid computing (p. 148). IBM. management and support: Advancing effective virtual organizations (pp. 235-255). Hershey, PA: IGI Larson, J. A., Navathe, S. B., & El-Masri, R. (1989). Global. ISBN: 978-1-60566-364-7 A theory of attribute equivalence in databases with application to schema integration. IEEE Transac- Bessis, N., French, T., Burakova-Lorgnier, M., & tions on Software Engineering, 16(4), 449–463. Huang, W. (2007). Using grid technology for data doi:10.1109/32.16605 sharing to support intelligence in decision making . In Xu, M. (Ed.), Managing strategic intelligence: Rahm, E., & Bernstein, P. A. (2001). A survey of ap- Techniques and technologies (pp. 179–201). Hershey, proaches to automatic schema matching. The VLDB PA: Idea Group Publishing Inc. Journal, 10, 334–350. doi:10.1007/s007780100057 Cali, A., Calvanese, D., Giacomo, G., & Lenzerini, Reinoso Castillo, J. A., Silvescu, A., Caragea, D., M. (1999). Semistructured data schemas with expres- Pathak, J., & Honavar, V. G. (2004). Information sive constraints. In Proceedings of CAiSE (LNCS, extraction and integration from heterogeneous, pp. 262-279). Berlin: Springer. distributed, autonomous information sources—A federated ontology-driven query-centric approach. Cali, A., Calvanese, D., Giacomo, G., & Lenzerini, M. Retrieved January 5, 2007, from http:// www. (2002). Data integration under integrity constraints. cs.iastate.edu/ ~honavar/Papers/ indusfinal.pdf In Proceedings of CAiSE (LNCS 2348, pp. 262-279). Berlin: Springer. Rezenda, F. F., Georgian, U. H., & Rutschlin, J. (1999). A practical approach to access heterogeneous Calvanese, D., De Giacomo, G., Lenzerini, M., Nardi, and distributed databases. In [Berlin: Springer.]. D., & Rosati, R. (1999). A principled approach to data Proceedings of CAiSE, 99, 317–332. integration and reconciliation in data warehousing. In S. Gatziu, M. Jeusfeld, M. Staudt, & Y. Vassiliou Sellami, S., Benharkat, A.-N., & Amghar, Y. (2010). (Eds.), Proceedings of the International Workshop Towards a more scalable schema matching: A novel on Design and Management of DataWarehouses approach. International Journal of Distributed Sys- (DMDW’99), Heidelberg, Germany. tems and Technologies, 1(1), 17–39. Foster, I., Kesselman, C., Nick, J., & Tueke, S. (2002). Yin, D., Chen, B., Huang, Z., Lin, X., & Fang, Y. Grid services for distributed system integration. IEEE (2007, August 16-18). Utility based query dissemina- Computer, 35(6), 397–398. tion in spatial data grid. In Proceedings of the Sixth International Conference on Grid and Cooperative Foster, I., & Kessleman, C. (2004). The Grid: Blue- Computing (GCC 2007), Urumchi, Xinjiang, China print for a new computing infrastructure (pp. 283, (pp. 574-581). 391–396). San Francisco, CA: Morgan Kaufmann.

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 64 International Journal of Grid and High Performance Computing, 2(4), 51-64, October-December 2010

Nik Bessis is currently a Principal Lecturer (Associate Professor) in the Department of Computer Science and Technology at University of Bedfordshire (UK). He obtained a BA from the TEI of Athens and completed his MA and PhD at De Montfort University (Leicester, UK). His research interest is the analysis, research, and delivery of user-led developments with regard to trust, data integration, annotation, and data push methods and services in distributed environments. These have a particular focus on the study and use of next generation and grid technologies methods for the benefit of various virtual organisational settings. He is involved in and leading a number of funded research and commercial projects in these areas. Dr. Bessis has published numerous papers and articles in international conferences and journals, and he is the editor of three books and the Editor-in-Chief of the International Journal of Distributed Systems and Technologies (IJDST). In addition, Dr. Bessis is a regular reviewer of several journals and conferences and has served as a keynote speaker, associate editor, conference chair, scientific program committee member, and session chair in numerous international conferences. More information is available from: http://www.beds.ac.uk/departments/computing/staff/nik-bessis

Peter Norrington received his PhD from the University of Bedfordshire in 2009 with a thesis in cognitive authentication techniques, where he currently works as e-PDP Development Officer. He has worked in the education, hospitality and journalism sectors. His research interests center around cooperative and collaborative systems and usability. More information is available from: http://www.beds.ac.uk/bridgescetl/about/people/team

Yong Yue is currently Director for Research in Applicable Computing (IRAC) at the University of Bedfordshire and a Professor of Computing Technology. Professor Yong Yue obtained a BSc in Mechanical Engineering from the Northeastern University, China and a PhD in CAD/CAM from Heriot-Watt University, Edinburgh. He is also a Chartered Engineer. Professor Yue has led and participated in a number of research and professional projects and collaborative links in 10 countries over the world. He has numerous publications and supervised four PhD students to completion as Director of Studies. His research interest is in the area of computer graphics and virtual reality, CAD/CAM and operations research. Professor Yong is a regular reviewer of several journals and conferences and has served as a guest editor, an associate editor, a scientific program committee member, and a session chair in numerous international conferences. More information is available from: http://www.beds.ac.uk/departments/computing/staff/yong-yue

Evaluating Heuristics for Scheduling Dependent Jobs in Grid Computing Environments

Geoffrey Falzon, Brunel University, UK Maozhen Li, Brunel University, UK

ABSTRACT

Job scheduling plays a critical role in the utilisation of grid resources by mapping a number of jobs to grid resources. However, the heterogeneity of grid resources adds some challenges to the work of job scheduling, especially when jobs have dependencies which can be represented as Direct Acyclic Graphs (DAGs). It is widely recognised that scheduling m jobs to n resources with an objective to achieve a minimum makespan has shown to be NP-complete, requiring the development of heuristics. Although a number of heuristics are available for job scheduling optimisation, selecting the best heuristic to use in a given grid environment remains a difficult problem due to the fact that the performance of each original heuristic is usually evaluated under different assumptions. This paper evaluates 12 representative heuristics for dependent job scheduling under one set of common assumptions. The results are presented and analysed, which provides an even basis in comparison of the performance of those heuristics. To facilitate performance evaluation, a DAG simulator is implemented which provides a set of tools for DAG job configuration, execution, and monitoring. The components of the DAG simulator are also presented in this paper.

Keywords: Dependent Job Scheduling, Direct Acyclic Graphs, Grid Computing, Heuristics, Job Scheduling Optimisation

INTRODUCTION ments. Jobs can be generally classified into two classes - independent jobs and dependent The past few years have witnessed a rapid jobs. The problem of scheduling m jobs to n development of grid computing systems and resources with an objective to minimise the total applications (Li & Baker, 2005). A grid is execution time (makespan) has been shown to a heterogeneous computing environment in be NP-complete requiring the development of that resources may have various computing heuristics (Ibarra & Kim, 1977). Braun et al capacities. Job scheduling which is a process (Braun et al., 2001) evaluated eleven heuristics of mapping jobs to resources plays a crucial for mapping independent tasks to heterogeneous role in resource utilisation in grid environ- computing environments which helps select the best heuristic to use in a given environment. Mapping dependent jobs to heterogeneous DOI: 10.4018/jghpc.2010100106

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 66 International Journal of Grid and High Performance Computing, 2(4), 65-80, October-December 2010 computing environments such as grids poses • Task execution and data transfer are more challenges in that the dependencies of jobs non-pre-emptive. have to be taken into account in the scheduling • Task execution is computed according to process. the order defined by a scheduler. Jobs with dependent tasks can be repre- • An ECT matrix is provided. This can be sented by Directed Acyclic Graphs (DAGs) generated by dividing the workload of a in which each node represents an executable job by the MIPS rating of the processor task and each directed edge represents data where the job is executed. transfers between two tasks. We assume that • For list scheduling heuristics (Sih & Lee, DAGs always have a single entry node (i.e., a 1993; Topcuoglu, Hariri, & Wu, 2002), node with no parents) and a single exit node partial makespan (the time required to (i.e., a node with no children). To compute execute the first n jobs in the workflow) a schedule in grid environments, scheduling is calculated by ignoring data transfer to algorithms require the following information: jobs that are not yet scheduled.

• Information on computing resources avail- The remainder of the paper is organised able in a grid environment. as follows. It briefly describes the twelve heu- • An Expected Completion Time (ECT) ma- ristics used in the evaluation, and presents the trix in which the expected time to execute components of the DAG simulator for DAG a task on each machine is provided. jobs configuration, execution and monitoring. • The network bandwidth of the communica- Then it evaluates the 12 heuristics and analyses tion link connecting any two computers. the performance results. Finally it concludes the paper. This paper evaluates 12 heuristics for scheduling dependent jobs in grid environments. To facilitate performance evaluation, a DAG HEURISTICS FOR SCHEDULING simulator is implemented which provides a set DEPENDENT JOBS of tools for DAG job configuration, execution The 12 heuristics that were evaluated are Min- and monitoring. The following assumptions are min (Maheswaran, Ali, Siegel, Hensgen, & made when scheduling a job with a number of Freund, 1999), Max-min (Maheswaran et al., dependent tasks: 1999), Sufferage (Maheswaran et al., 1999), XSufferage (Casanova, Legrand, Zagorodnov, • A computer can execute only one task at & Berman, 2000), Baseline (Maheswaran & a time. Siegel, 1998), Levelised Min-Time (LMT) • Task execution can only start after all data (Iverson, Ozguner, & Follen, 1995), Min- required by the task is available. FINISH (Alhusaini, Prasanna, & Raghavendra, • Data transfer can only begin when a task 1999), Max-FINISH (Alhusaini et al., 1999), is completed. Heterogeneous Earliest Finish Time (HEFT) • Data transfer is scheduled according to the (Topcuoglu et al., 2002), Critical-Path-on-a- order of the tasks requiring the data. This is Processor (CPOP) (Topcuoglu et al., 2002), referred to as in-order scheduling (Wang, Performance Effective Task Scheduling (PETS) Siegel, Roychowdhury, & Maciejewski, (Ilavarasan & Thambidurai, 2007) and Genetic 1997). Algorithms (GAs) (Braun et al., 1999; Hou, • A computer can transfer only one set of data Ansari, & Ren, 1994; Shroff, Watson, Flann, & at a time. Data transfer and task execution Freund, 1996; Singh & Youssef, 1996; Spooner, can be performed in parallel. Cao, Jarvis, He, & Nudd, 2005). For the purpose

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. International Journal of Grid and High Performance Computing, 2(4), 65-80, October-December 2010 67 of performance comparison, a Random algo- XSufferage rithm was also implemented and evaluated. This section briefly describes the implementation XSufferage is an extension of the Sufferage of these algorithms for scheduling jobs with algorithm on a cluster level. The XSufferage dependent tasks. value of a task is the difference between the task earliest completion time and its second earliest Min-Min completion time in a different site. Similar to Sufferage, XSufferage chooses the schedulable Min-min begins with set U of unscheduled task with maximum XSufferage for schedul- tasks and set S of schedulable tasks (i.e., un- ing and assigns the task to the resource that scheduled tasks whose predecessor tasks were minimises task completion time. The rationale executed). The set M of minimum completion behind XSufferage is that if a file is required times for each task in S on all machines is then by some task and is already present at a remote calculated. The task with the overall minimum cluster, the task would “suffer” if not assigned completion time from M is selected and assigned to a host in that cluster. to the machine that provides the earliest task completion time. The newly scheduled task is Baseline (BL) removed from U and S is updated by computing the list of schedulable tasks in U. The process The BL algorithm first groups tasks according is repeated until all tasks are scheduled (i.e., to levels such that each level contains a set of U is empty). independent tasks. This is performed as follows:

Max-Min • Root tasks are at level 0 • Each task in level n has parent tasks be- Similar to Min-min, Max-min also begins with longing to level 0 to n-1 and at least 1 task set U of unscheduled tasks and set S of sched- in level n-1. ulable tasks. The set M of minimum completion times for each task in S on all machines is then calculated. Max-min selects the task with the Tasks at the same level are then ordered in maximum completion time from M and assigns descending order of number of data items each it to the machine that minimises task completion task produces (ties are broken arbitrarily). The time. The newly scheduled task is removed from schedule is defined using the above task priori- U and S is computed. The process is repeated tisation and machine assignment is performed until all tasks are scheduled (i.e., U is empty). such that task completion time is minimised.

Sufferage Levelised Min-Time (LMT)

The Sufferage value of a task is the difference The LMT algorithm first groups tasks according between the task earliest completion time and to level as in BL algorithm. Average task execu- its second earliest completion time. Sufferage tion time across all machines is then calculated, algorithm operates in a similar way to Min- from which the level-average execution time Min algorithm, except for the fact that in each is determined (the average execution time of iteration the task with maximum Sufferage is all tasks in a given level). The task average scheduled on the machine that results in mini- execution time for tasks having their closest mum task completion time. The rationale behind child task not in the next level is then deducted Sufferage is to give priority to tasks that would by the level-average execution time of the “suffer” most in terms of task completion time if levels in between. Tasks are then scheduled the task is not assigned to a particular machine. in the order defined by level (lowest first) and updated average execution time (highest first).

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 68 International Journal of Grid and High Performance Computing, 2(4), 65-80, October-December 2010

Machine assignment is performed to minimise that this does not break task precedence rules). task completion time. The rationale behind LMT The HEFT algorithm implemented does not is to give priority to tasks that would affect utilise the insertion-based policy. execution of child-tasks if delayed. Critical-Path-on-a- Min-FINISH Processor (CPOP)

The Min-FINISH algorithm is a level-by-level The CPOP algorithm prioritises tasks by first version of the Min-min algorithm. Tasks are calculating the upward rank (ranku) and the first grouped by level and then tasks in the same downward rank (rankd) for all tasks using level are scheduled using the same technique average ECTs on all machines and average as in Min-min algorithm. communication time (calculated using the mean of bandwidth): Max-FINISH

The Max-FINISH algorithm is a level-by-level version of the Max-min algorithm. Tasks are first grouped by level and then tasks in the same level are scheduled using the same technique as in Max-Min algorithm.

max rank n = rank n +w + c Heterogeneous Earliest d() i n∈ pred() n ( d() j j ji ) j i Finish Time (HEFT)

The HEFT algorithm ranks tasks based on the Rank of task ni is calculated as follows: upward rank (ranku) of a task which is recursively defined as: rank ()ni = ranku () ni +rankd ()ni

The critical path is computed by first selecting the root task and then traversing the DAG by selecting the task with the highest rank (ties are broken by selecting the first immedi- ate successor) until the exit task is reached. The critical-path processor is then identified ranku is equivalent to calculating the latest task by evaluating the machine that minimises the finish time using expected completion time execution time of the critical path. During the (ECT) averaged on all grid nodes and aver- processor selection phase, the CPOP algorithm age bandwidth for data transfer time. ranku is maintains a list of schedulable tasks and the calculated recursively starting from exit tasks task with the highest rank is scheduled. Criti- and traversing the graph upwards. Tasks are cal tasks are assigned to the critical path, while then ordered in descending order of upward non-critical tasks are assigned to the machine rank. This order provides a topological sorted that minimises task completion time. order of tasks (Topcuoglu et al., 2002). Tasks are then assigned to the machine that achieves Performance Effective Task the minimum task completion time. The HEFT Scheduling (PETS) algorithm has an insertion-based policy that allows tasks to be scheduled in an earliest idle time The PETS algorithm starts by determining task slot between 2 already scheduled tasks (provided level. Task rank is then calculated as follows:

Average Computation Cost ACC v = Crossover consists of exchanging genetic ()i material between 2 chromosomes to generate m w i,j where w = exec timmeof t on m the next generation chromosome. The most ∑ i, j i j j= 1 m common technique is the single-point cross-

n over, where a crossover point is chosen and the Data Transfer Cost DTC v = c data beyond the crossover point is exchanged ()i ∑ i,j j= 1 between the 2 parents. Crossover is applied ac- th whereci, j = j data transsfer from ti cording to the crossover probability (crossover rate) and is usually between 0.6 and 1.0. If elitism is not used, a high value of crossover rate Rank of Predecessor Task RPT()vi = can result in loosing good candidate solutions. max rank v where v ∈ preed v {}()p p ()i Mutation is analogous to biological mutation and is used to maintain genetic diversity rank()v = i from one generation of a population of chromo- round ACC v +DTC v +RPT v { ()i ()i ()i } somes to the next. Mutation can be performed by either mutating the scheduling part or the matching part of the chromosome. Tasks are then scheduled by levels (lowest Termination is the criteria used to determine first) and ranks (highest first). Machines as- whether the solution evaluated is sufficient. signed in such a way that the completion time Typical criteria used are: of tasks is minimised. • A solution is found that satisfies the mini- Genetic Algorithm (GA) mum criteria GAs are adaptive heuristic search algorithms • Fixed number of generations is reached which can be used to find approximate solu- • The GA search has converged to a solution. tions to optimisation and search problems for • No improvement in best fitness over a large search spaces. The implementation of a specified number of generations. GA usually involves problem representation • Guided operators (Han & Kendall, 2003). (genetic representation and fitness function), GA Operators (selection, crossover and muta- The classical GA implemented in this work tion) and termination function. terminates when a fixed number of generations Genetic representation translates a candi- is reached or no improvement in best fitness over date solution into a fixed size array of bits. In a specified number of generations can be made. biological terms, this is also known as a chromosome. A fitness function provides a value of Random goodness or fitness for a given chromosome. The Random algorithm is implemented in the The fitness function used for scheduling is following way: makespan (or total execution time). Selection refers to the technique used to • Maintaining two lists: T (schedule – ini- select the chromosomes that will participate S tialized to ø) and T (job list - initialized to the next generation. In most cases, the L probability of a chromosome participating in with all jobs). • Randomly selecting a job from T and the next generation is dependent on the fitness L removing it from T . value. Roulette Wheel Selection combined L with Elitism was used for the Classical GA • Identifying the range in which the job implementation. can be inserted to maintain precedence constraints by locating the last predecessor and the first successor.

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 70 International Journal of Grid and High Performance Computing, 2(4), 65-80, October-December 2010

• Choosing the location at random within the tasks and G is the set of data transfer). Edges

range defined and updating TS accordingly. are created such that all source nodes are always

• The process is repeated until TL is ø. to the left of destination nodes. This guarantees that the graph is acyclic and avoids computa- DAG SIMULATOR tionally expensive cyclic testing.

In order to evaluate the 12 heuristics listed previ- CrDAGs ously, a DAG simulator has been implemented using Java programming language. The DAG CrDAGS is a command line utility that uses the simulator has a number of components which same objects as testDAG for creating DAGs at is shown in Figure 1. random. Parameters required are: testDAG • Task Name Label • Number of nodes To support the development of DAG related • Number of edges algorithms and to visually evaluate workflows, • Maximum/minimum node weight a Java based GUI was developed as shown • Maximum/minimum edge weight in Figure 2. DAG files are defined in XML 1 which is compatible with GVF/Royère . Us- CrDAGSched ing the GUI, it is possible to create random DAGs and to modify DAGs as required. To CrDAGSched was designed to transform a support changes, functionality for performing Random Generated DAG into a schedule defi- topological sorting, cyclic testing and critical nition (to be used by DAGSched). Parameters path analysis is available. required to define a schedule are: Random DAGs are created based on the technique as described by Maheswaran and • Input (DAG xml) and output XML file Siegel (1998). The algorithm creates a |G| x |S| names. dependency matrix (where S is the set of sub- • Number of grid nodes

Figure 1. The components of the DAG simulator

Figure 2. testDAG GUI

• Maximum/minimum MIPS rating n´ n matrix is generated (n is the number of • Number of Sites grid nodes) providing data rate between any

• Bandwidth between grid nodes if within two grid nodes (where aij=aji). The concept the same site (LAN) implemented is similar to the crossbar switch • Bandwidth between grid nodes if in dif- described by Wang et al. (1997). Optionally, if ferent sites (WAN). a CCR value is provided, then the data transfers • Communication to Computation Ratio are automatically adjusted to fit the requested (CCR). CCR value.

Grid nodes are created and are assigned a DAGSched site and MIPS rating at random. An ECT (es- DAGSched takes a GridSched xml file as input timate time to complete task t on machine m ) i j and produces a Grid Events xml file as output. matrix is generated by dividing the weight of Figure 3 shows typical results displayed by vertices (workload) by the MIPS rating of grid DAGSched. nodes. A consistent ECT matrix is therefore The grid events file has a copy of the grid generated, but allows DAGSched to be pro- schedule to be optimised and one GridSchedE- vided with any type of data that may be required. vents entry for each algorithm used by DAG- Using the bandwidth information provided, a

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 72 International Journal of Grid and High Performance Computing, 2(4), 65-80, October-December 2010

Figure 3. DAGSched output

Sched to optimise the Grid schedule. Grid- using Java 6.0 Update 3 on a PC with an Intel SchedEvents contains details about task Core 2 Quad Q6600 2.4 GHz and 4GB RAM executions and data transfer events. DAGSched running Windows Vista. is also able to read an optimised schedule The tools described previously were used definition XML file which provides the list of to create 100 Schedules as follows: tasks in a topological sorted order and their respective grid node assignment. DAGSched • The number of tasks varied between 50 would then determine the respective schedule and 500 (steps of 50) events and write the schedule events to the • The number of data transfer and the number output XML file. Grid schedule events can be of grid processors were varied according to analysed using showGSched. task count to achieve adequate parallelism. • Communication to Computation Ratio (CCR) was set to 1. EXPERIMENTAL RESULTS • 600 additional schedules were created by The twelve heuristics were implemented in Java updating the data transfer size in the original programming language and were evaluated schedules to achieve CCR values of 0.01, 0.1, 0.5, 2, 10 and 100.

The parameters for the Classical GA makespan and standard deviation over the range (called GAClassic) algorithm implemented of CCR tested (random algorithm was ex- were determined from experimentation and cluded to improve readability) are show in are shown in Table 1. Various normalisation Figure 6 and Figure 7 respectively. functions are found in literature (Ilavarasan It is noted that Min-min, BL, LMT and & Thambidurai, 2007; Topcuoglu et al., 2002; PETS produce consistent results when CCR is Zhang, Koelbel, & Kennedy, 2007). To be able varied from 0.01 to 100. On the other hand, to compare results across different schedules, Max-min, Sufferage, XSufferage, Min-FINISH, results were normalised by dividing with the Max-FINISH, HEFT, CPOP and GA do not best result attained. Algorithm performance perform well for scheduling DAG jobs with was evaluated based on: CCR greater than 1. Figure 8 and Figure 9 show the Good Count • Normalised makespan - obtained by divid- and Best Count obtained from each heuristic in ing makespan by the smallest makespan the range of CCR varied from 0.01 to 100. A poor achieved for the given schedule performance can be observed for Max-Min, Suf- • Good count - the count of schedule optimi- ferage, XSufferage, Max-FINISH, HEFT and sation results that have a makespan within CPOP. Min-min, BL, LMT and PETS achieve 10% of the best schedule (i.e., normalised good results, whereas both Min-FINISH and makespan less than 1.10) GA obtain inconsistent results. • Best count - the count of schedule optimisa- It is noted that the performance of PETS tion results that achieved the best makespan heuristic in terms of Best Count improves with for the given schedule an increase in CCR values. The performance • Algorithm execution time of both BL and LMT does not change with an increase in CCR values. Figure 4 shows average normalised makes- Since the results obtained from the GA pan and 95% range for all schedules (CCR were not repeatable, GA was executed 4 times ranging from 0.01 to 100); while Figure 5 shows per schedule. For this reason, the probability of average normalised makespan and 95% range achieving the best result is higher (invariant for for schedules with CCR of 0.1. Good Count) and superior results were observed There is a substantial increase in makespan in terms of Best Count (Figure 8). variation for schedules with CCR higher than Figure 10 shows the average execution 0.1 and this explains the difference in the 95% times of the heuristics which are summarised ranges. The variations in average normalised below:

Table 1. GA parameters

Generation Size 256 Elite Rate 0.032 Crossover Rate 0.80 Mutation Rate (Schedule) 0.20 Mutation Rate (Match) 0.20 Selection Criteria Roulette Wheel Maximum Iteration 5000 Converging Criteria 50

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 74 International Journal of Grid and High Performance Computing, 2(4), 65-80, October-December 2010

Figure 4. Average normalised makespan (95% range) for all schedules

Figure 5. The average of normalised makespan (95% range) for CCR=0.1

• Workflow level scheduling heuristics (0.3s), PETS (0.3s) have the best execution (heuristics that take into account the whole performance. workflow when performing scheduling) • Task Level scheduling heuristics (perform BL (0.3s), LMT (0.3s), HEFT (0.3s), CPOP scheduling decisions based only on the information about a task or a set of inde-

Figure 6. Average normalised makespan variations with different CCRs

Figure 7. Standard deviation variations with different CCRs

pendent tasks) Min-min (5.7s), Max-min • GA (11.8s) execution time is the highest (2.2s), Sufferage (4.5s), XSufferage (4.1s), due to its complexity in computation. Min-FINISH (3.8s) and Max-FINISH (3.9s) cover the middle range of execu- An important point to highlight is the dif- tion time. ference in average execution time for Min-min

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 76 International Journal of Grid and High Performance Computing, 2(4), 65-80, October-December 2010

Figure 8. Good count variations with different CCRs

Figure 9. Best count variations with different CCRs

and Max-min (5.7s and 2.2s respectively). The tasks to be evaluated per iteration) and their two algorithms have a complexity of O(vgm) implementation is also similar. The difference (Yu, Buyya, & Ramamohanarao, 2008) (where in execution time is due to a different value of g v is the number of tasks, m is the number of ma- that is amplified by the computational complex- chines and g is an average number of schedulable ity of evaluating partial workflow makespan.

Figure 10. The complexities of the heuristics

This is demonstrated by the example in Figure Both pairs of Sufferage (4.5s) / XSufferage 11, which illustrates the scheduling differences (4.1s) and Min-FINISH (3.8s) / Max-Finish between Min-min and Max-min heuristics, in (3.9s) have comparable average execution which g for Min-min and Max-min is 3.2 and times. The inefficiency of scheduling of Suf- 2.1 respectively. Max-min schedules tasks ferage and XSufferage are also attributed to the with maximum task finish time and results in same factors as those for Max-min. In Casanova a schedule that tries to move along branches, et al. (2000), it is claimed that the Sufferage thus resulting in a lower average number of heuristic could perform worst in case of data schedulable tasks (g). intensive applications in a multi-cluster envi- The early scheduling of tasks also explains ronment. In the test cases analysed, there is a the inefficiency of Max-min heuristic in relation slight improvement in Good Results and Best to Min-min. This is confirmed by the improved Results for schedules with CCR of 10 and 100. results obtained by Max-FINISH (levelled ver- The performance of CPOP is comparable sion of Max-min) when compared to Max-min, to Max-min heuristic. Computing the critical since task levelling does not allow the effect of path using CPOP rank definition and assigning scheduling tasks by branching as in Max-min. it to the critical-path processor is not sufficient, The results obtained by Min-min and Max-min since the lack of data transfer on the critical path are in-line with those published in Braun et al. changes the workflow and also the critical path. (2001) and Casanova et al. (2000) for dependent The rank of a task is in relation to the rank of and independent task scheduling. Max-min its child tasks and therefore a task with a high could have an improved scheduling if a long rank has a child-task with a high rank. This execution task is processed in parallel to a results in a branching effect similar to what number of short tasks and Max-Min is ex- was observed for Max-min heuristic and is pected to perform better than Min-Min in the considered as another factor that contributes to cases where there are many more short tasks a low quality of schedules produced by CPOP. than long tasks.

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 78 International Journal of Grid and High Performance Computing, 2(4), 65-80, October-December 2010

Figure 11. A comparison of Min-min and Max-min

HEFT does not perform well. This could is slightly superior to BL and LMT in terms of be due to the fact that HEFT calculates ranks Good Count and Best Count, while BL and LMT using the mean value of task execution time have a lower Average Normalised Makespan and communication time and as explained by and Normalised Makespan Standard Deviation. Zhao and Sakellariou (2003) this might not be Min-min produces good results but is com- the most efficient choice. putationally expensive. The levelled version GAClassic shows a poor quality of sched- of Min-min (Min-FINISH) is computationally ules. This could be caused by the fact that the more efficient and performs fairly better than GA parameters used in Table 1 were determined Min-min. It is also noted that algorithms BL, based on the experimental results of workflows LMT, HEFT, CPOP and PETS have good with 50 jobs and the CCR was set to 0.1. In execution performance since they minimise order to evaluate larger workflows and increase the number of evaluations of job execution on the values of CCR, GA parameters should be different machines by producing a topologically reviewed especially the generation size. The sorted list and evaluating each job in turn only experimentation required to determine GA once. This is due to the fact that the main com- parameters is a known limitation of GAs (Han putational complexity in algorithm execution & Kendall, 2003). Another factor affecting the is makespan calculation. quality of GAClassic schedules is the perfor- The following are some observations drawn mance of the Random scheduling algorithm. from the evaluation process: In the evaluation of the Random algorithm performed, Random algorithm produces results • Level-by-level heuristics tend to sched- that are on average in the order of 3.5 times ule non-critical tasks early delaying the (CCR=0.01) to 160 times (CCR=100) larger execution of critical tasks which leads to than those generated by scheduling heuristics increased makespans. and therefore makes it more difficult for the GA • Since rank definitions in PETS provide a algorithm to find an optimal solution. topological sorted list, a non-levelled version of PETS could be used to prioritise tasks by latest finish time. CONCLUSION • List scheduling heuristics operate by mi- On the whole, among the 12 heuristics, the nimising Task-Finish Time. In some cases, best ones are BL, LMT and PETS providing this results in scheduling tasks to minimise good results with low execution times. PETS task finish time at the expense of increasing partial makespan.

• The evaluation of partial makespan is Iverson, M., Ozguner, F., & Follen, G. (1995). penalised by the fact that data transfer to Parallelizing existing applications in distributed tasks that are not yet scheduled is ignored. heterogeneous environments. Paper presented at the Proceedings of Heterogeneous Computing Task scheduling decision should take into Workshop. consideration all the data transfers. • Implementing GA algorithms for grid job Li, M., & Baker, M. (2005). The grid core technologies. London: Wiley. doi:10.1002/0470094192 scheduling poses some challenges in determining the proper parameters to be used. Maheswaran, M. (1999). Dynamic mapping of a class of independent tasks onto heterogeneous computing systems. Journal of Parallel and Distributed Com- puting, 59(2), 107–131. doi:10.1006/jpdc.1999.1581 REFERENCES Maheswaran, M., & Siegel, H. J. (1998). A dynamic matching and scheduling algorithm for heterogeneous computing systems. Paper presented at the 7th Alhusaini, A. H., Prasanna, V. K., & Raghavendra, Heterogeneous Computing Workshop. C. S. (1999). A unified resource scheduling framework for heterogeneous computing environments. Shroff, P., Watson, D. W., Flann, N. S., & Freund, R. Paper presented at the Proceedings of the Eighth (1996). Genetic simulated annealing for scheduling Heterogeneous Computing Workshop. data-dependent tasks in heterogeneous environments. Paper presented at the Proceedings of Heterogeneous Braun, T. D., et al. (1999). A comparison study of Computing. static mapping heuristics for a class of meta-tasks on heterogeneous computing systems. Paper presented Sih, G., & Lee, E. (1993). A compile-time scheduling at the Proceedings of the Eighth Heterogeneous heuristic for interconnection-constrained heteroge- Computing Workshop (HCW’99). neous processor architecture. IEEE Transactions on Parallel and Distributed Systems, 4(2), 175–187. Braun, T. D. (2001). A comparison of eleven static doi:10.1109/71.207593 heuristics for mapping a class of independent tasks onto heterogeneous distributed computing system. Singh, H., & Youssef, A. (1996). Mapping and Journal of Parallel and Distributed Computing, scheduling heterogeneous task graphs using genetic 61(6), 810–837. doi:10.1006/jpdc.2000.1714 algorithms. Paper presented at the Proceedings of Heterogeneous Computing Workshop. Casanova, H., Legrand, A., Zagorodnov, D., & Ber- man, F. (2000). Heuristics for scheduling parameter Spooner, D. P. (2005). Performance-aware workflow sweep applications in grid environments. Paper pre- management for grid computing. The Computer Jour- sented at the Proceedings of the 9th Heterogeneous nal, 48(3), 347–357. doi:10.1093/comjnl/bxh090 Computing Workshop (HCW 2000). Topcuoglu, H., Hariri, S., & Wu, M. Y. (2002). Han, L., & Kendall, G. (2003). Guided operators for Performance-effective and low-complexity task a hyper-heuristic genetic algorithm. Paper presented scheduling for heterogeneous computing. IEEE at the AI 2003: Advances in Artificial Intelligence. Transactions on Parallel and Distributed Systems, 13(3), 260–274. doi:10.1109/71.993206 Hou, E. S. H., Ansari, N., & Ren, H. (1994). A genetic algorithm for multiprocessor scheduling. IEEE Wang, L., Siegel, H. J., Roychowdhury, V., & Ma- Transactions on Parallel and Distributed Systems, ciejewski, A. (1997). Task matching and scheduling 5(2), 113–120. doi:10.1109/71.265940 in heterogeneous computing environments using a genetic-algorithm-based approach. Journal of Ibarra, O. H., & Kim, C. E. (1977). Heuristic algo- Parallel and Distributed Computing, 47(1), 8–22. rithms for scheduling independent tasks on nonidenti- doi:10.1006/jpdc.1997.1392 cal processors. Journal of the ACM, 24(2), 280–289. doi:10.1145/322003.322011 Yu, J., Buyya, R., & Ramamohanarao, K. (2008). Workflow schdeduling algorithms for grid computing Ilavarasan, E., & Thambidurai, P. (2007). Low . In Xhafa, F., & Abraham, A. (Eds.), Metaheuristics complexity performance effective task scheduling for scheduling in distributed computing environments algorithm for heterogeneous computing environ- (studies in computational intelligence) (pp. 173–214). ments. Journal of Computer Science, 3(2), 94–103. Berlin: Springer. doi:10.1007/978-3-540-69277-5_7 doi:10.3844/jcssp.2007.94.103

Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 80 International Journal of Grid and High Performance Computing, 2(4), 65-80, October-December 2010

Zhang, Y., Koelbel, C., & Kennedy, K. (2007). Rela- ENDNOTE tive performance of scheduling algorithms in grid environments. Paper presented at the Seventh IEEE 1 http://gvf.sourceforge.net/ International Symposium on Cluster Computing and the Grid. Zhao, H., & Sakellariou, R. (2003). An experimental investigation into the rank function of the heterogeneous earliest finish time scheduling algorithm Paper presented at the Proceedings of Euro-Par 2003.

Ing. Geoffrey Falzon is an IT Manager at STMicroelectronics (Malta) Ltd. He received a PhD from Brunel University, UK in 2009. He received an MSc in Data Communications from Brunel University in 2000 and a B.Eng. (Hons) in Electrical Engineering from the University of Malta in 1993. His research interests are in the area of Grid Workflow Management, Job Scheduling, Code Optimisation, Distributed Computing and Network Systems. He is a member of IEEE, IET and BCS.

Maozhen Li is a Senior Lecturer in the School of Engineering and Design at Brunel University, UK. He received the PhD from Institute of Software, Chinese Academy of Sciences in 1997. His research interests are in the areas of grid computing, intelligent systems, P2P computing, semantic web, information retrieval, content based image retrieval. He has over 70 scientific publications in these areas. He authored “The Grid: Core Technologies”, a well-recognised textbook on grid computing which was published by Wiley in 2005. He has served over 30 international conferences. He is currently on editorial boards of three journals - the International Journal of Grid and High Performance Computing, the International Journal of Distributed Systems and Technologies, and International Journal on Advances in Internet Technology. He is a member of IEEE, IET and BCS.

An official publication of the Information Resources Management Association

Mission The primary objective of IJGHPC is to provide an international forum for the dissemination and development of theory and practice in grid and high performance computing. IJGHPC publishes refereed and original research papers and welcomes contributions on current trends, new issues, tools, societal impact, and directions for future research in the areas of grid and high performance computing. The journal is targeted at both academic researchers and practicing IT professionals.

Subscription Information IJGHPC is published quarterly: January-March; April-June; July-September; October-December by IGI Global. Full subscription information may be found at www.igi-global.com/IJGHPC. The journal is available in print and electronic formats.

Institutions may also purchase a site license providing access to the full IGI Global journal collection fea- turing more than 100 topical journals in information/computer science and technology applied to business & public administration, engineering, education, medical & healthcare, and social science. For information visit www.infosci-journals.com or contact IGI at [email protected].

Copies of Articles Copies of individual articles are available for purchase at InfoSci-On-Demand (www.infosci-on-demand. com), an aggregated database consisting of more than 31,000 research articles written by prominent ex- perts and scholars in the field of information/computer science and technology applied to business & public administration, engineering, education, medical & healthcare, and social science.

TOC Alerts Please go to www.igi-global.com/IJGHPC and click on the "TOC Alert" tab on the left menu if you want to be emailed the table of contents of each issue of this journal as it is released.

Copyright The International Journal of Grid and High Performance Computing (ISSN 1938-0259; eISSN 1938- 0267). Copyright © 2010 IGI Global. All rights, including translation into other languages reserved by the publisher. No part of this journal may be reproduced or used in any form or by any means without written permission from the publisher, except for noncommercial, educational use including classroom teaching purposes. Product or company names used in this journal are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. The views expressed in this journal are those of the authors but not necessarily of IGI Global.

Correspondence and questions: Editorial: Emmanuel Udoh, Editor-in-Chief Subscriber Info: IGI Global IJGHPC Customer Service Tel: 260 450 3845 701 E Chocolate Avenue Email: [email protected] Hershey PA 17033-1240, USA Tel: 717/533-8845 x100 E-mail: [email protected]

IJGHPC is currently listed or indexed in: Cabell's; GetCited; INSPEC; Media Finder; Standard Periodical's Directory; Ulrich's Periodical Directory