PATTERNS 2014

The Sixth International Conferences on Pervasive Patterns and Applications

ISBN: 978-1-61208-343-8

May 25 - 29, 2014

Venice, Italy

PATTERNS 2014 Editors

Alexander Mirnig, University of Salzburg, Austria

Jaap Kabbedijk, Utrecht University, Netherlands

1 / 107 PATTERNS 2014 Foreword

The Sixth International Conferences on Pervasive Patterns and Applications (PATTERNS 2014), held between May 25-29, 2014 in Venice, Italy, targeted the application of advanced patterns, at-large. In addition to support for patterns and pattern processing, special categories of patterns covering ubiquity, software, security, communications, discovery and decision were considered. As a special target, the domain-oriented patterns cover a variety of areas, from investing, dietary, forecast, to forensic and emotions. It is believed that patterns play an important role on cognition, automation, and service computation and orchestration areas. We take here the opportunity to warmly thank all the members of the PATTERNS 2014 Technical Program Committee, as well as all of the reviewers. The creation of such a high quality conference program would not have been possible without their involvement. We also kindly thank all the authors who dedicated much of their time and efforts to contribute to PATTERNS 2014. We truly believe that, thanks to all these efforts, the final conference program consisted of top quality contributions. Also, this event could not have been a reality without the support of many individuals, organizations, and sponsors. We are grateful to the members of the PATTERNS 2014 organizing committee for their help in handling the logistics and for their work to make this professional meeting a success. We hope that PATTERNS 2014 was a successful international forum for the exchange of ideas and results between academia and industry and for the promotion of progress in the field of pervasive patterns and applications. We are convinced that the participants found the event useful and communications very open. We hope that Venice, Italy, provided a pleasant environment during the conference and everyone saved some time to enjoy the charm of the city.

PATTERNS 2014 Chairs:

Herwig Manaert, University of Antwerp, Belgium Fritz Laux, Reutlingen University, Germany Michal Zemlicka, Charles University - Prague, Czech Republic Alfred Zimmermann, Reutlingen University, Germany Richard Laing, Robert Gordon University, UK Ricardo Sanz, UPM ASlab, Spain Mauricio Hess-Flores, University of California, Davis, USA Teemu Kanstren, VTT, Finland Markus Goldstein, DFKI (German Research Center for Artificial Intelligence GmbH), Germany Zhenzhen Ye, IBM, Essex Junction, USA René Reiners, Fraunhofer FIT - Sankt Augustin, Germany Nguyen Vu Hoàng, Vertigo, France Alexander Mirnig, University of Salzburg, Austria Juan C Pelaez, Defense Information Systems Agency, USA Bastian Roth, University of Bayreuth, Germany Steve Strauch, IAAS at University of Stuttgart, Germany Jaap Kabbedijk, Utrecht University, Netherlands

2 / 107 PATTERNS 2014

Committee

PATTERNS Advisory Chairs

Herwig Manaert, University of Antwerp, Belgium Fritz Laux, Reutlingen University, Germany Michal Zemlicka, Charles University - Prague, Czech Republic Alfred Zimmermann, Reutlingen University, Germany Richard Laing, Robert Gordon University, UK Ricardo Sanz, UPM ASlab, Spain Mauricio Hess-Flores, University of California, Davis, USA

PATTERNS Research/Industry Chairs

Teemu Kanstren, VTT, Finland Markus Goldstein, DFKI (German Research Center for Artificial Intelligence GmbH), Germany Zhenzhen Ye, IBM, Essex Junction, USA René Reiners, Fraunhofer FIT - Sankt Augustin, Germany Nguyen Vu Hoàng, Vertigo, France Alexander Mirnig, University of Salzburg, Austria Juan C Pelaez, Defense Information Systems Agency, USA

PATTERNS Publicity Chairs

Bastian Roth, University of Bayreuth, Germany Steve Strauch, IAAS at University of Stuttgart, Germany Jaap Kabbedijk, Utrecht University, Netherlands

PATTERNS 2014 Technical Program Committee

Ina Suryani Ab Rahim, Pensyarah University, Malaysia Junia Anacleto, Federal University of Sao Carlos, Brazil Andreas S. Andreou, Cyprus University of Technology - Limassol, Cyprus Annalisa Appice, Università degli Studi di Bari, Italy Martin Atzmueller, University of Kassel, Germany Senén Barro, University of Santiago de Compostela, Spain Rémi Bastide, University Champollion / IHCS - IRIT, France Bernhard Bauer, University of Augsburg, Germany Noureddine Belkhatir , University of Grenoble, France Hatem Ben Sta, Université de Tunis - El Manar, Tunisia Silvia Biasotti, Consiglio Nazionale delle Ricerche, Italy Félix Biscarri, University of Seville, Spain Cristian Bonanomi, Universita' degli Studi di Milano, Italy Michaela Bunke, University of Bremen, Germany João Pascoal Faria, University of Porto, Portugal Michelangelo Ceci, University of Bari, Italy

3 / 107 M. Emre Celebi, Louisiana State University in Shreveport, USA Jian Chang, Bournemouth University, UK William Cheng-Chung Chu(朱正忠), Tunghai University, Taiwan Loglisci Corrado, University of Bari, Italy Bernard Coulette, Université de Toulouse 2, France Karl Cox, University of Brighton, UK Jean-Charles Créput, Université de Technologie de Belfort-Montbéliard, France Mohamed Dahchour, National Institute of Posts and Telecommunications - Rabat, Morocco Jacqueline Daykin, Royal Holloway University of London, UK Angelica de Antonio, Universidad Politecnica de Madrid, Spain Sara de Freitas, Coventry University, UK Vincenzo Deufemia, Università di Salerno - Fisciano, Italy Kamil Dimililer, Near East University, Cyprus Alberto Egon, Federal University of Rio Grande do Sul (UFRGS), Brazil Giovanni Maria Farinella, University of Catania, Italy Eduardo B. Fernandez, Florida Atlantic University - Boca Raton, USA Simon Fong, University of Macau, Macau SAR Francesco Fontanella, Università di Cassino e del Lazio Meridionale, Italy Dariusz Frejlichowski, West Pomeranian University of Technology, Poland Christos Gatzidis, Bournemouth University, UK Joseph Giampapa, Carnegie Mellon University, USA Markus Goldstein, German Research Center for Artificial Intelligence (DFKI), Germany Gustavo González, Mediapro Research - Barcelona, Spain Pascual Gonzalez Lopez, University of Castilla-La Mancha, Spain Carmine Gravino, University of Salerno, Italy Christos Grecos, University of the West of Scotland, UK Yann-Gaël Guéhéneuc, École Polytechnique - Montreal, Canada Pierre Hadaya, UQAM, Canada Brahim Hamid, IRIT-Toulouse, France Sven Hartmann, TU-Clausthal, Germany Kenneth Hawick, Massey University, New Zealand Mauricio Hess-Flores, University of California, USA Christina Hochleitner, CURE, Austria Władysław Homenda, Warsaw University of Technology, Poland Samuelson W. Hong, Zhejiang University of Finance & Economics, China Chih-Cheng Hung, Southern Polytechnic State University-Marietta, USA Shareeful Islam, University of East London, UK Slinger Jansen (Roijackers), Utrecht University, The Netherlands Jinyuan Jia, Tongji University, China Maria João Ferreira, Universidade Portucalense - Porto, Portugal Jaap Kabbedijk, Utrecht University, Netherlands Hermann Kaindl, TU-Wien, Austria Abraham Kandel, University South Florida - Tampa, USA Teemu Kanstren, VTT, Finland Alexander Knapp, Universität Augsburg, Germany Richard Laing, The Scott Sutherland School of Architecture and Built Environment/ Robert Gordon University - Aberdeen, UK Robert Laramee, Swansea University, UK

4 / 107 Fritz Laux, Reutlingen University, Germany Hervé Leblanc, IRIT-Toulouse, France Gyu Myoung Lee, Institut Telecom/Telecom SudParis, France Haim Levkowitz, University of Massachusetts Lowell, USA Fotis Liarokapis, Coventry University, UK Pericles Loucopoulos, Harokopio University of Athens, Greece / Loughborough University, UK Herwig Manaert, University of Antwerp, Belgium Elio Masciari, ICAR-CNR, Italy Constandinos Mavromoustakis, University of Nicosia, Cyprus Fuensanta Medina-Dominguez, Carlos III University of Madrid, Spain Alexander G. Mirnig, University of Salzburg, Austria Ivan Mistrík, Independent Consultant. Heidelberg, Germany Paula Morais, Universiadade Portucalense - Porto, Portugal Fernando Moreira, Universidade Portucalense, Portugal Haralambos Mouratidis, University of East London, UK Antonino Nocera, University Mediterranea of Reggio Calabria, Italy Jean-Marc Ogier, Université de la Rochelle, France Krzysztof Okarma, West Pomeranian University of Technology, Poland Hichem Omrani, CEPS/INSTEAD, Luxembourg Jerry Overton, Computer Sciences Corporation, USA Ana Paiva, University of Porto, Portugal Eric Paquet, National Research Council / University of Ottawa, Canada João Pascoal Faria, University of Porto, Portugal Galina Pasko, Uformia, Norway Rodrigo Paredes, Universidad de Talca, Chile Giuseppe Patane', CNR-IMATI, Italy Photis Patonis, Aristotle University of Thessaloniki, Greece Juan C Pelaez, Defense Information Systems Agency, USA Christian Percebois, IRIT/Université de Toulouse, France Gabriel Pereira Lopes, Universidade Nova de Lisboa, Portugal Luciano Pereira Soares, Insper, Brazil Nadia Pisanti, University of Pisa, Italy José R. Pires Manso, University of Beira Interior, Portugal Agostino Poggi, Università degli Studi di Parma, Italy Giovanni Puglisi, University of Catania, Italy Francisco A. Pujol, Universidad de Alicante, Spain Mar Pujol, Universidad de Alicante, Spain Claudia Raibulet, University of Milano-Bicocca, Italy Alessandro Rizzi, Università degli Studi di Milano, Italy José Raúl Romero, University of Córdoba, Spain Agostinho Rosa, Technical University of Lisbon, Portugal Gustavo Rossi, UNLP - La Plata, Argentina Ozgur Koray Sahingoz, Turkish Air Force Academy, Turkey Maria-Isabel Sanchez-Segura, Carlos III University of Madrid, Spain Kurt Sandkuhl, Jönköping University, Sweden Isabel Seruca, Universidade Portucalense - Porto, Portugal Caifeng Shan, Philips Research, The Netherlands Karolj Skala, Ruder Boškovic Institute Zagreb, Croatia

5 / 107 Carina Soledad, Universidad de La Laguna, Spain Marco Spruit, Utrecht University, The Netherlands Michael Stal, Siemens, Germany Martin Stanton, Manchester Metropolitan University, UK Janis Stirna, Stockholm University, Sweden Mu-Chun Su, National Central University, Taiwan Sam Supakkul, Sabre Inc., USA Stella Sylaiou, Hellenic Open University, Greece Ryszard Tadeusiewicz, AGH University of Science and Technology, Poland Dan Tamir, Texas State University - San Marcos, USA Shanyu Tang, China University of Geosciences - Wuhan City, P. R. China Horia-Nicolai Teodorescu, "Gheorghe Asachi" Technical University of Iasi / Romanian Academy, Romania Daniel Thalmann, Nanyang Technological University, Singapore Alain Toinon Léger, Orange - France Telecom R& / University St Etienne / ENS Mines - Betton, France Mati Tombak, University of Tartu / Tallinn Technical University, Estonia Alessandro Torrisi, Università di Catania, Italy Theodoros Tzouramanis, University of the Aegean, Greece Domenico Ursino, University Mediterranea of Reggio Calabria, Italy Michael Vassilakopoulos, University of Thessaly, Greece Krzysztof Walczak, Poznan University of Economics, Poland Stefan Wendler, Ilmenau University of Technology, Germany Laurent Wendling, University Descartes (Paris 5), France Wojciech Wiza, Poznan University of Economics, Poland Mudasser F. Wyne, National University- San Diego, USA Dongrong Xu, Columbia University & New York State Psychiatric Institute, USA Reuven Yagel, The Jerusalem College of Engineering, Israel Zhenzhen Ye, IBM, Essex Junction, USA Lihua You, Bournemouth University, UK Hongchuan Yu, Bournemouth University, UK Alfred Zimmermann, Reutlingen University, Germany Michal Žemlička, Charles University, Czech Republic

6 / 107 Copyright Information

For your reference, this is the text governing the copyright release for material published by IARIA.

The copyright release is a transfer of publication rights, which allows IARIA and its partners to drive the dissemination of the published material. This allows IARIA to give articles increased visibility via distribution, inclusion in libraries, and arrangements for submission to indexes.

I, the undersigned, declare that the article is original, and that I represent the authors of this article in the copyright release matters. If this work has been done as work-for-hire, I have obtained all necessary clearances to execute a copyright release. I hereby irrevocably transfer exclusive copyright for this material to IARIA. I give IARIA permission or reproduce the work in any media format such as, but not limited to, print, digital, or electronic. I give IARIA permission to distribute the materials without restriction to any institutions or individuals. I give IARIA permission to submit the work for inclusion in article repositories as IARIA sees fit.

I, the undersigned, declare that to the best of my knowledge, the article is does not contain libelous or otherwise unlawful contents or invading the right of privacy or infringing on a proprietary right.

Following the copyright release, any circulated version of the article must bear the copyright notice and any header and footer information that IARIA applies to the published article.

IARIA grants royalty-free permission to the authors to disseminate the work, under the above provisions, for any academic, commercial, or industrial use. IARIA grants royalty-free permission to any individuals or institutions to make the article available electronically, online, or in print.

IARIA acknowledges that rights to any algorithm, process, procedure, apparatus, or articles of manufacture remain with the authors and their employers.

I, the undersigned, understand that IARIA will not be liable, in contract, tort (including, without limitation, negligence), pre-contract or other representations (other than fraudulent misrepresentations) or otherwise in connection with the publication of my work.

Exception to the above is made for work-for-hire performed while employed by the government. In that case, copyright to the material remains with the said government. The rightful owners (authors and government entity) grant unlimited and unrestricted permission to IARIA, IARIA's contractors, and IARIA's partners to further distribute the work.

7 / 107 Table of Contents

SQL Pattern Design and Development 1 Huda Al-Shuaily and Karen Renaud

Building a General Pattern Framework via Set Theory: Towards a universal pattern approach 8 Alexander G. Mirnig and Manfred Tscheligi

From Pattern Languages to Solution Implementations 12 Michael Falkenthal, Johanna Barzen, Uwe Breitenbucher, Christoph Fehling, and Frank Leymann

Reconfiguration Patterns for Goal-Oriented Monitoring Adaptation 22 Antoine Toueir, Julien Broisin, and Michelle Sibilla

Dynamic Incremental Fuzzy C-Means Clustering 28 Bryant Aaron, Dan Tamir, Naphtali Rishe, and Abraham Kandel

SPEM: A Software Pattern Evaluation Method 38 Jaap Kabbedijk, Rene van Donselaar, and Slinger Jansen

Generating EE 6 Application Layers and Components in JBoss Environment 44 Gabor Antal, Adam Zoltan Vegh, and Vilmos Bilicki

Refinement Patterns for an Incremental Construction of Class Diagrams 50 Boulbaba Ben Ammar and Mohamed Tahar Bhiri

Automating Cloud Application Management Using Management Idioms 60 Uwe Breitenbucher, Tobias Binz, Oliver Kopp, and Frank Leymann

A Method for Situational and Guided Information System Design 70 Dalibor Krleza and Kresimir Fertalj

An Analytic Evaluation of the SaCS Pattern Language – Including Explanations of Major Design Choices 79 Andre Alexandersen Hauge and Ketil Stolen

Privacy by Design Permission System for Mobile Applications 89 Karina Sokolova, Marc Lemercier, and Jean-Baptiste Boisseau

Effectively Updating Co-location Patterns in Evolving Spatial 96 Jin Soung Yoo and Hima Vasudevan

Powered by TCPDF (www.tcpdf.org) 8 1/ 107/ 1 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

SQL Pattern Design and Development

Huda Al-Shuaily Karen Renaud Department of Information Technology School of Computing Science Higher College of Technology University of Glasgow Muscat, Oman Glasgow, UK [email protected] [email protected]

Abstract—This paper presents the processes involved in the Patterns are a widely used mechanism for supporting design and development of a set of Structure Query Language knowledge transfer. We set out to investigate whether (SQL) patterns. The intention is to support novices during patterns could meet the need for optimally-structured SQL acquisition. This process is grounded in the SQL learning instructional material in this context. Schlager and Ogden [7] model developed from learning theory literature and empirical found that incorporating a cognitive model in the form of investigations into SQL acquisition. One of the crucial cross- cutting factors identified during the development of this model expert user and product-independent knowledge into novice was the quality of the instructional material provided to instruction enhances learning. They concluded that such a learners to support the acquisition process. Since patterns have cognitive model framework could potentially help to support been successfully deployed in other areas to support knowledge knowledge acquisition. transfer, we set out to develop SQL patterns to meet this need for effective instructional material. We detail the process by Patterns traditionally structure knowledge in such a way which we identified the required components of SQL patterns. that they can transfer best practice from experts to novices. Our patterns were also informed based on observations of how We cannot assume that SQL patterns can be structured in the novices and experts solved SQL queries. We conclude by exactly the same way as other more well-established presenting our proposed SQL pattern content and structure. patterns, however, so we need to carefully align them with Keywords-Pattern; SQL; Expert the SQL acquisition process. I. INTRODUCTION We briefly present our previously developed model of SQL learning (Fig. 1), which is the logical place to start SQL is one of those languages that seem particularly when identifying SQL patterns and positioning them within challenging to master. We previously reported on an the learning process. This model is grounded in Bloom’s investigation which advanced explanations for this apparent difficulty [1] and developed a model of SQL learning shown taxonomy [4] and validated by studies of how novices learn in Fig.1. Our investigation uncovered four cross-cutting to write SQL queries. The SQL learning model emerged issues that impacted the learning process. One major factor from the analysis of the educational literature, and was was the quality and suitability of the instructional material to augmented by the analysis of data gathered during qualitative support the learning process. In providing instructional and quantitative studies of SQL acquisition. material, two specific aspects are important: This model incorporates the notion of the development of  The structuring of knowledge within the mental models. We demonstrate how mental models are instructional material. Merrill [2] explains that the constructed during SQL acquisition. Learners start with the organization and representation of knowledge development of individual schemata, moving on towards impacts learning. Mayer [3] makes the same meaningful structuring of schemata into hierarchies and argument, positing that the way in which a body of constructed mental models. Existence of these models knowledge is structured determines how readily it suggests that the learner will be able to solve a variety of will be grasped by learners. problems of similar nature, i.e., they have abstracted. An  When, during the learning process, the afore- abstraction exists and they should be able to apply core mentioned instructional material should be concepts in many contexts: learning has resulted in a introduced. Learning happens in a predictable and heuristic. mediated way, with subsequent knowledge and skills The SQL acquisition process is also modeled in the building on prior knowledge and understanding [4], diagram, showing that learners need first to have a basic [5]. Hence, the sequence in which we present knowledge of SQL concepts, and an understanding of how to knowledge is important. It should support learning use them. They then have to practice applying these concepts rather than encouraging trial-and-error attempts to to a variety of problems: analyzing, synthesizing and produce correct SQL queries without understanding evaluating. They ought to emerge from this stage with an the underlying principles [6]. appreciation of the core principles; with an ability to make judgments about strategies to be deployed. Learners who

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 1

9 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

have progressed to this upper level can be considered to have psychology research [4], [5], [11] and instructional design mastered SQL. research. The next step narrowed to cover Computer Science (CS) educational research [12] and focused on fostering of problem solving skills. Having confirmed the possibility of using patterns to support SQL acquisition, we proceeded to identify and define the patterns using text mining followed by observation of novices and experts.

III. PATTERN MINING According to Bruner [15] new instructional methods need first to apply what educators know about how students learn, remember, and use related skills. A text mining procedure was therefore used to extract this information from texts, to discover patterns from existing knowledge repositories, solutions, or designs. This process captures practice that is both good and significant [14]. This identified common knowledge arguably represents the core concepts and practices of SQL query writing. There

Figure 1. A Model of SQL Learning is a strong precedent for this approach [13].

SQL patterns’ identification process is explored in Section II. The mining process was carried out manually based on The processes that involve SQL pattern identification using the dimensions defined in the SQL Learning Model (Fig 1), text mining are explored in Section III. The SQL pattern and is depicted in Fig. 2. identification phase using novices’ observation is presented in Section IV while expert observation is considered in Section V. Section VI discusses and Section VII concludes.

II. SQL PATTERN IDENTIFICATION To identify our SQL patterns, we commenced by studying other pattern identification methods and procedures. Patterns are not an optimistic or ephemeral collection of ideas. They encapsulate specific tried-and-tested best practice techniques specific to a particular field [8]. Patterns do not state obvious solutions to trivial problems nor do they cover every possible eventuality, but they do capture important “big ideas” [9]. A pattern should explain how a problem should be solved and why the presented solution is appropriate in a particular context. Alexander [8] points out that patterns may be discovered by identifying a problem and later finding a solution or by seeing a positive set of examples and abstracting a common solution. Coad and Mayfield [10] suggest that patterns are based on a designer’s experience of the area. Figure 2. Mining Process

The SQL pattern development process needed to focus The following steps were followed during the mining on both the behavioral and the cognitive aspects of SQL process: acquisition. Understanding learner ability to perform different cognitive tasks such as query formulation, 1. Identifying Data: translation, and writing is essential to be able to design this a. Identify expected SQL knowledge from new SQL instructional material. texts and categorize the knowledge into the SQL learning model categories. The SQL patterns we derived emerged from an iterative b. Identify the declarative or “remembering” SQL research process, which involved a review of educational concepts. Here, we mined data such as SQL research, uncovering relevant human factors related to SQL facts or concepts. For example joining, usability as well as psychology-related research. The process aggregation, sub-query. aimed to accommodate the nature of SQL acquisition. We explored this process by conducting a general overview of the literature about educational theory and cognitive

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 2

10 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

2. Identifying Information:  Problem Solving: a. Identify the procedural or “comprehension” o Were the required concepts identified SQL concepts, and how they are used in a correctly? particular context. o Were the gathered concepts correctly matched b. Identify the “practice” skills by considering to the given problem context? how concepts should be applied in solving o Did they search for examples on the Web? problems. For example, show a context o Did they try different possible solutions? If so, scenario and explain how the relevant syntax why was a particular solution selected? and rules are applied. Also illustrate the o How did they react to errors? scenario with appropriate examples which

show, step-by-step, how such a concept should be applied. 3. Identifying Knowledge: a. Investigate the “creating” activity. For example, find evidence of generic principles being applied in particular contexts. 4. Identify the SQL misconceptions that could potentially be corrected by the patterns.

The mining process was informed by knowledge management research that distinguishes between data, information and knowledge [16]. The process of knowledge extraction and categorization led to an initial set of patterns. The mining process provided a starting point, delivering a static understanding of how Figure 3. Novice Observation SQL core knowledge is presented in textbooks and commonly-used texts. How such SQL concepts are applied The observation data, based on observation of 63 in reality, by query writers, could not be gauged without a students (see Table I), revealed that many students lacked field investigation. The next step was to observe and analyze problem solving skills. Students often started to write SQL novice SQL problem-solving behavior. queries without taking the time to consider different approaches. There was no attempt to choose an optimal approach from a number of candidate approaches. They IV. OBSERVING NOVICES behaved tactically and did not take time to analyze the Researchers in the field of pattern identification agree on problem description and to consider what they should do the value of direct observation in arriving at patterns. “In before attempting to write the query. This tendency confirms order to discover patterns which are alive we must always previous research findings [18]. Students spent the bulk of start with observation” [8] (p.254). Furthermore, [17] points their time solving syntax and semantic errors and assessing out that “Patterns are not created or invented; they are the correctness of the generated results. identified via an invariant principle”. Strategy identification, by means of learner observation, TABLE I: STUDENTS’ DEMOGRAPHICS INFORMATION helps determine what gaps “best practice” SQL patterns Time Participants Participants should fill. Cognitive science suggests giving learners a nd problem and observing everything they do and say while 2009/10 Students registered for 2 17 attempting a solution. Unstructured observations were thus year course conducted over a period of two semesters. 2010/11 Students registered for 2nd 21 The focus of the observation was on the following year course particular aspects: 2010/11 Students registered for third 15 year course  Remembering: o When they remembered the required concepts, were they correct?  Searching (Not Remembering): Furthermore, novices lacked the ability to sub-divide the o How were the required, but forgotten, concepts problems into sub-problems or to identify the specific obtained? For example, did they refer to knowledge required to solve individual sub-problems [19]. If textbooks or teaching materials, or did they they do divide and conquer, they then have to synthesize search the Web to find similar problems and identified sub-solutions to design a complete solution to the related solutions. problem. This, too, seems to be a skill that novices lack (Fig. 4).

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 3

11 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

support students during problem solving, where they apply Less searching behavior than anticipated was observed the basic SQL concepts and principles. and when it did take place it was often unproductive. The initial set of SQL patterns were refined based on the Students searched for similar problems on the Web or spent observation process. some time looking at the lecture notes, trying to understand different concepts. This was often unproductive since they V. EXPERT OBSERVATION wasted time searching for irrelevant concepts. Professionals have acquired knowledge and skills through study and practice over the years and are termed “expert”. Patterns are means of codifying experts’ knowledge and expertise to facilitate knowledge transfer. Pattern content must be informed by experts’ actual practices. This section presents a description of problem solving strategies deployed by two individual expert SQL query writers. The experts were master students at Glasgow University who had experience using SQL. The tasks they solved are shown in Fig. 5. The aim of this observation was to determine how experts solved these problems, as opposed to novices.

Q1: Give the titles of books that have more than one author. Q2: Display the names of borrowers who have Analogy Sources of never returned a book late analogy A1 Formulating the problem “number of loan …” Schemata D1 Dividing the problem into sub problem or sub Advanced Figure 5. Expert Observation task goals knowledge A2: Analysis : Joining two tables “joining the copy Basic The cognitive activities performed during problem with titles” knowledge solving of two tasks were recorded by employing a “talk- R1Reasons why D1 “ to get single table” Basic aloud” protocol [20]. knowledge A3 writing: Write SQL syntax Basic knowledge D2 use subquery Advanced knowledge A4Evaluation: Execute the A3 without applying D2 Basic and checking the results knowledge

D5to apply Self join for the table Advanced knowledge A12 Writing: Modifying the query Basic knowledge A13Evaluation: Modifying the query with no clear Schemata decision Figure 6. Expert Observation process A14Writing Applying D5 Schemata A15: Analysis of the problem Schemata The observation process recorded all cognitive activities Applying aggregation Advanced (see Fig. 7), such as schemata retrieval (Remembering) and knowledge Searching (Not Remembered). The collected information A16: Writing: Iteration of changing the query Advanced was categorized as conceptual (basic building blocks from knowledge Fig. 1), schemata (knowledge of how concepts are used), or Figure 4. Problem Solving Stages, and Novice Strategies rule (abstract heuristic knowledge) as recommended by [7], [15], [21]. Observation of novices was invaluable in understanding

how to design supporting instructional material. However, Experts, after reading the problem description, made an we needed also to understand which particular strategies were deployed by SQL experts since this was the behavior initial decision about the type of technique that had to be applied. They then looked at the provided data model and we wanted to guide the novices towards. What emerged from verbally listed the possible approaches to solving the this analysis was the fact that an intervention was required to problem that they could deploy. After mulling it over, they

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 4

12 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

settled on one particular approach and provided reasons for different relevant knowledge is applied to arrive at a sub- discarding the other options. Both experts used a divide-and- solution. When experts solved the first part they applied conquer approach and sub-divided the problem: they did not basic knowledge. Then, as the problem requirements attempt to write the whole query at once. They wrote and required more understanding they applied advanced tested the commands related to the sub-queries and then knowledge which was sometimes obtained by searching. synthesized the sub-solutions to arrive at the final complete They then applied problem solving strategies such as solution. incremental development, division into sub-queries, consideration of a number of different ways of solving the The most interesting part of this observation was the fact problem, and choice of the optimal strategy. that the experts applied an implicit pattern matching approach to their assessment of the problem. They clearly tried to match a number of different learned heuristics to the problem before settling on the best approach. One can only assume that they had internalized a number of abstract heuristics which they tried to match to the given problem before settling on a “best-fit” approach.

1-Reading and understanding the problem The participant broke the 2-Search for more information from the overall problem into a Internet “Googling” number of sub-problems. He 3-Problem Solving: first started by joining Book- a. Analysis Copy and Book-Title tables.  Consulting ER model At this stage a few actions  Identify the available table (A1-A3) and a decision (D1) holding the required data. were performed and other  Rereading the problem. decisions were pending. The b. Synthesis participant was happy with  Deciding which concepts to his performance at this apply. stage. He then applied  Searching for SQL syntax or another decision, i.e., to relevant examples. use sub-queries c. Writing The participant then exercised the decision to  Start writing the first SQL query apply the self-join technique. in the tool. However, he failed to apply d. Checking: it correctly. The participant  Evaluate the result of the first then deployed aggregation attempt. and was satisfied that he had  Manipulate the query with some solved the problem. justification (fixing the errors). This is done iteratively until they are satisfied. 4-Repeat sub-steps in number 3 until satisfied.

Figure 7. The cognitive activities experts deployed

The analysis acted to inform research into the type of approach that ought to be nurtured in novices. The next section discusses how the reported results contributed to the SQL pattern identification.

VI. DISCUSSION The upper half of Fig. 8 shows how experts solve a task using an analogical approach. The model is based on the ideas of [23], which depicts how scientists think and solve physical problems. The bottom half shows how this model Figure 8. Expert Problem Solving [23](top) and SQL Acquisition on reflects the SQL acquisition process. This model presents the Expert Model (bottom) different sources of knowledge and strategies experts deploy.

Observation of expert activities showed that they divided the problem into sub-problems. Then, for each sub problem,

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 5

13 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

TABLE II. PROPOSED SQL PATTERN CONTENTS Provide students with data models to Schema help them understand the context of Formation the problem The impact of applying the pattern in Schema such a problem context. Formation Support for matching a problem to a Schema Figure 9. Typical Expert Actions (left) and Novice Actions (right) solution in a simple format such as a Formation checklist Observing experts and novices solving led us to visualize A section which includes the basic Schemata both the expert and novice actions in solving SQL queries knowledge required to solve the (Fig. 9). problem. Considering how experts approach the task and relating it Step-by-step SQL visual examples of Schemata to novices’ actions indicates the nature of the gap between the pattern being applied expert and novice. There was no evidence that novices This should be augmented with a step Encourage struggled with basic knowledge of SQL syntax. They also by step plan to train students to deploy Engagement knew how the SQL constructs ought to be used. However, effective problem solving strategies, with Analysis novices clearly lacked the knowledge and skills required to as suggested by (Mayer, 2008). and Synthesis solve novel problems. This is related to the “PROBLEM Phases during SOLVING” stage as seen in Fig. 9. This, then, is where more Problem effective instruction material needs to assist the process. Solving This analysis allowed us to determine what type of knowledge and skills are required to solve the SQL VII. CONCLUSION AND FUTURE WORK problems. We were also able to determine how the The successful implementation of instructional materials information should be presented to learners, i.e., what the (SQL patterns) will depend on the pattern writer optimal sequence of information. The results suggest that: understanding all the different factors that influence SQL - Experts start solving the problem by re-formulating the learnability, such as: learner characteristics, SQL language problem statement and determining its context using the specifications, human cognition and instructional material. data model. The pattern writer must align with established wisdom about - Expert knowledge is structured, connected and abstract. human cognition. Our study has provided the guidance to They have: inform SQL pattern content, which should ultimately serve 1- Basic knowledge about SQL syntax and semantics as the link between the task requirement and the generic “SQL Syntax and Semantics”; pattern. 2- Advanced knowledge about the meaning of SQL REFERENCES concepts “SQL Query comprehension” and about how to apply SQL concepts in the given context; 3- Heuristics: [1] Al-Shuaily, H., SQL pattern design, development &  Knowledge about the wisdom of SQL evaluation of its efficacy. PhD Thesis. University of applicability in a certain context “problem- Glasgow. 2013 context-solution”. This is a high level of [2] Merrill, M.D. Knowledge objects and mental models. in knowledge that novice lack as was discussed. Advanced Learning Technologies, 2000. IWALT 2000.  Knowledge about the consequences of Proceedings. International Workshop on. 2000. pp. 244- applying SQL concepts “impact-of-solution”. 246. This is a skill of evaluating SQL concepts, [3] Mayer, R.E., Learning and instruction 2003: Prentice which is a high level of knowledge that novices Hall. lack. [4] Bloom, Benjamin Samuel, and David R. Krathwohl. Taxonomy of educational objectives: The classification Observation made it clear that instructional materials, of educational goals. Handbook I: Cognitive domain. such as notes, did not nudge students towards productive activities or support effective problem solving. To help 1956. novices to achieve a measure of SQL expertise we propose [5] Gorman, M.E., Types of Knowledge and Their Roles in that the SQL patterns should include components shown in Technology Transfer, in The Journal of Technology Table II. Transfer Springer Netherlands. 2002, pp. 219-231. [6] Al-Shuaily., H. Analyzing the Influence of SQL Teaching and Learning Methods and Approaches. in

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 6

14 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

10th International Workshop on the Teaching, Learning [22] Reisner, P., Human Factors Studies of Database Query and Assessment of Databases. UK, London. 2012. Languages: A Survey and Assessment. ACM Comput. [7] Schlager, M.S. and W.C. Ogden, A cognitive model of Surv. 13(1) 1981, pp. 13-31. database querying: a tool for novice instruction. SIGCHI [23] Nersessian, N.J., How do scientists think? Capturing the Bull. 17(4) 1986, pp. 107-113. dynamics of conceptual change in science. Cognitive [8] Alexander, C., The timeless way of building. 1979, models of science 15 1992, pp. 3-44. Oxford, UK: Oxford University Press. [9] Winn, T. and P. Calder, Is This a Pattern? IEEE Softw. 19(1) 2002, pp. 59-66. [10] Coad, P. and M. Mayfield, Object model patterns: workshop report. SIGPLAN OOPS Mess., 5(4) 1994, pp. 102-104. [11] Anderson, et al., A Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom's Taxonomy of Educational Objectives (Complete edition) 2001. New York: Longman. [12] Bower, M., A taxonomy of task types in computing. SIGCSE Bull. 40(3) 2008, pp. 281-285. [13] van Welie, M., K. Mullet, and P. McInerney. Patterns in practice: a workshop for UI designers. in CHI'02 extended abstracts on Human factors in computing systems. 2002. ACM. [14] Fincher, S. and I. Utting. Pedagogical patterns: their place in the genre. in ACM SIGCSE Bulletin. 2002. ACM. [15] Bruner, J.S., Toward a theory of instruction. Vol. 59. 1966: Belknap Press. [16] Tian, J., Y. Nakamori, and A.P. Wierzbicki, Knowledge management and knowledge creation in academia: a study based on surveys in a Japanese research university. Journal of Knowledge Managemen. 13(2) t, 2009, pp. 76-92. [17] Fincher, S. Patterns for HCI and Cognitive Dimensions: two halves of the same story. in Kuljis, J., Baldwin, L., Scoble, R., Proceedings of the Fourteenth Annual Workshop of the Psychology of Programming Interest Group. 2002. [18] Ramalingam, V., D. LaBelle, and S. Wiedenbeck, Self- efficacy and mental models in learning to program. SIGCSE Bull 36(3) 2004, pp. 171-175. [19] Lahtinen, E., K. Ala-Mutka, and H.M. Järvinen. A study of the difficulties of novice . in ACM SIGCSE Bulletin. 2005. ACM. [20] Dunbar, K., How scientists think: On-line creativity and conceptual change in science. Creative thought: An investigation of conceptual structures and processes. 1997, pp. 461-493. [21] Ogden, W.C., Implications of a cognitive model of database query: comparison of a natural language, formal language and direct manipulation interface. ACM SIGCHI Bulletin. 18(2) 1986, pp. 51-54.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 7

15 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Building a General Pattern Framework via Set Theory: Towards a Universal Pattern Approach

Alexander G. Mirnig, Manfred Tscheligi Christian Doppler Laboratory for “Contextual Interfaces” HCI & Usability Unit, ICT&S Center, University of Salzburg Salzburg, Austria Email: {firstname.lastname}@sbg.ac.at

Abstract— Patterns have been successfully employed for serve to make implicit knowledge explicit. These benefits are capturing knowledge about proven solutions to reoccurring of particular importance to researches, who do not already problems in several domains. Despite that, there is still little have this knowledge themselves, i.e., it is a way to draw literature available regarding pattern generation or common from a vast pool of knowledge. If working with patterns has pattern quality standards across the various domains. We extensive domain experience as a prerequisite, then those present an attempt for a universal (i.e., domain independent) that would need that knowledge the most would benefit the pattern framework. Via basic set theory, it is possible to least from it. The final goal of this research is to arrive at a describe pattern sets that are composed of several subsets structured but still easy to understand framework that regarding pattern types, quantities, sequence, and other captures the essence of patterns and makes them factors. We can thus describe patterns as sets of interrelated understandable as well as usable for practitioners and elements instead of isolated entities, thus corresponding with the scientific reality of complex problems with multiple researchers in any domain. The first step is to provide a basic relevant factors. The framework can be used to describe set theoretic analysis that allows to describe patterns and existing pattern languages and serve as a basis for new ones, pattern languages in a general manner. This later on serves as regardless of the domain they are or were created for. a domain independent basis for reflections on how patterns can or should be created and structured. Keywords-patterns; pattern basics; pattern framework; set We argue for a general strand of research on patterns as a theory means to capture knowledge about research practices. With such a theoretical basis available, practitioners from any I. INTRODUCTION AND MOTIVATION domain could have a pool of knowledge to draw from, which would help them create patterns suitable for their needs. This Patterns have been used as a tool for capturing should not mean that a variety in pattern languages and knowledge about proven solutions to reoccurring problems approaches is not desirable. It makes sense to assume that in many domains. Most prominent among these domains are different domain requirements need different pattern architecture and software design [1][2][6]. Patterns allow approaches. However, the basics of patterns should ideally documenting knowledge about methods and practices in a be similar for everyone and easily accessible, like with structured and systematic manner. Another major benefit of general mathematics. A statistician needs and employs patterns is that they can serve to “make implicit knowledge different mathematical means than a fruit vendor. But both explicit” [10], i.e., they can be used to explicitly capture draw from the same pool of general mathematics as their what is normally only acquired via experience after having basis. In our research, we take a step back, look at patterns worked in a certain field or domain for an extended period of from a general point of view and describe them via basic set time. The information contained in such patterns can then be theory [5]. A general analysis of patterns allows us to treat provided to others (researchers or other interested parties) in them as separate phenomena, independent of the domains a relatively quick and efficient manner. Despite this, there is they are created and used in. Set theory is one of the most little general (i.e., domain independent) literature available basic, but at the same time very powerful, mathematical tools on patterns and pattern creation. available. By using set theory, we can ensure consistency of Having access to a structured collection of implicit and our framework, while still keeping things basic and relatively explicit knowledge about research practices is useful when easy to understand. An additional benefit of our approach is conducting research in any domain. There is no How to that it permits the creation of pattern sets across different Generate Patterns in 10 Easy Steps or similar basic pattern languages that address a similar purpose. This can literature. This is not an entirely new idea [8], and there has facilitate the consolidation of already existing knowledge already been a big push in that direction by, e.g., the work of within the various domains. In this paper, we begin with an Meszaros and Doble [11] and Winn and Calder [13], which overview of existing general literature on patterns in Section we want to expand and build upon. II, followed by an outline of the initial proposed set theoretic Two of the main benefits of patterns are that they pattern framework in Section III. In Section IV, we present facilitate re-application of proven solutions and that they our planned next steps for further iteration and finalization of

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 8

16 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

the framework and conclude with a few paragraphs on the III. THE GENERAL PATTERN FRAMEWORK perceived advantages and possible future challenges of our approach. A. Patterns and Pattern Sets Before starting to build the framework, we first need to II. RELATED WORK take a look at patterns, pattern languages, and the concepts Patterns have been employed in a multitude of behind them. Alexander [2] characterized patterns in the application domains [1][6][12] and a good number of following way: “Each pattern describes a problem which extensive pattern collections [3][4][7] have been created in occurs over and over again in our environment, and then the past. Literature on the pattern generation process itself, describes the core of the solution to that problem, in such a sometimes also referred as pattern mining [4], is still scarce way that you can use this solution a million times over, [9]. Existing literature on pattern generation is mostly without ever doing it the same way twice.” So on a basic focused on specific domains [3][6][8][12]. The work of level, patterns can be understood as a structured assortment Gamma et al. [4] can be considered important elementary of statements. A pattern language is a complete hierarchy of literature, but it is still centered on software design. Although patterns, ordered by their scope [12]. We will translate these covering a wide spectrum of software design problems, it is concepts into a basic set theoretic structure by employing arguably of limited applicability outside of the software regular sets, ordered sets, and subsets, via the following engineering domain. The same can be said about other example based on a Contextual User Experience (CUX) specialized pattern generation guidances [8], which would pattern structure by Krischkowsky et al. [8] (see Tab.1). require adaptation to be employed in other domains (e.g., Please note that this analysis would work for any pattern biology or linguistics). language that is or can be structured in a similar way, such Meszaros and Doble [11], developed a pattern language as, e.g., the template laid out by Gamma et al. for pattern writing, which serves to capture techniques and [6], but we wanted to give a more current and not software- approaches that have been observed to be particularly centered example to prove our point. effective at addressing certain reoccurring problems. Their patterns for patterns were divided into the following five TABLE I. CUX PATTERN STRUCTURE [8] sections: Context-Setting Patterns, Pattern Structuring Instructions on Each Pattern Section Patterns, Pattern Naming and Referencing Patterns, Patterns Section # for making Patterns Understandable, Pattern Language Name Instruction on Each Section Structuring Patterns. Another interesting approach being The name of the pattern should shortly describe the quite similar in its aims to the one presented in this paper, is 1 Name suggestions for design by the pattern (2-3 words the Pattern Language for Pattern Language Structure by would be best). Winn and Calder [13]. They identified a common trait among pattern languages (i.e., they are symmetry breaking) List the UX factor(s) addressed within your chosen key finding (potential UX factors listed in this section UX and built a rough, nonformal general framework for pattern 2 can be e.g., workload, trust, fun/enjoyment, stress...). Factor languages in multiple domains. These ideas are similar in Please underpin your chosen UX factor(s) with a concept to what we pursue in our research. The difference is definition. that we want to provide a purely formal framework without or minimal statements regarding its content (such as types or As short as possible - the best would be to describe Key your key finding (either from an empirical study or traits). We want to focus on the basics behind patterns and 3 structure these, so that they can be applied as widely as Finding findings that are reported in literature) in one sentence. possible, although we intend to incorporate the work of Meszaros and Doble, and Winn and Calder at a later stage . (see Section IV). . Another interesting aspect of patterns is that one single . Key- Describe main topics addressed by the pattern in pattern is usually not enough to deal with a certain issue. 8 Alexander et al. [2] already expressed this by stating the words order to enable structured search. possibility of making buildings by “stringing together Origin of the pattern (e.g., literature, other pattern, 9 Sources patterns“. The pattern itself, however, does not include the studies or results) information of which other pattern might be relevant in a particular case. This information is only available once the pattern is part of an actual pattern language. Borchers [3] We now want to generate an actual pattern language set, introduced the notion of high level patterns, which reference let us call it CUX Language (and refer to it as CL for lower level patterns to describe solutions to large scale brevity’s sake), based on the structure outlined in Tab. 1. We design issues. This hierarchy is expressed via references in can do so by introducing nine subsets (i.e., sets of the set CL) the patterns themselves, which is a good way of CL1 to CL9, each subset corresponding to one of the nine understanding and describing patterns as interconnected categories (from Name to Sources, respectively) described entities. A suitable framework for patterns and pattern above. To actually generate a pattern for CL, we need to languages should ideally be able to capture these relations assign statements to each of the nine subsets. We do that by between patterns. assigning a yet undefined set of statements S to CL, making

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 9

17 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

sure that none of the subsets remains empty. Note that D3: (3) ‘statement’ in this regard not only refers to full sentences, but also to single words or sequences of words which are not But how do we now specify which of these descriptors full sentences. We can generate n-number of CL-patterns P1 (D1-D3) is the appropriate one for a given scenario? Patterns to Pn this way. are created for a certain purpose, and in most cases that Of course, simply arranging patterns into sets and subsets purpose is how to deal with a certain reoccurring problem. does not in itself guarantee that any of these patterns are We can specify which descriptor fits a certain purpose better actually useful or reasonable. What this analysis can tell us is than another. To properly express this, we introduce the (a) the pattern language (CL) the patterns are generated in, notion of targets T that contain the general purpose of a (b) how many statement categories a successful pattern certain activity (e.g., car user experience). This is different generated in that language must contain, and (c) which from the problem-field of a pattern, since a given high level statements can be found in which category, i.e., the patterns pattern could very well reference a lower level problem that themselves. So, this elementary analysis has already yielded addresses a different problem, while both serve the same a powerful starting framework, via which we can express in general purpose. We can now map descriptors to targets, a domain-independent manner how patterns and pattern depending on what is needed. In this example, a target that languages stand in relation to each other, regardless of does not require both CL-patterns would be assigned either domain they were generated in. D1 or D3, whereas one that does would be assigned D2. So, in addition to being able to specifying the relations between B. Descriptors patterns in a single pattern language, we are not confined to CL does not yet fully qualify as a pattern language in the that single pattern language. This means, that we can also actual sense of the word and there are two things that an describe hierarchical pattern sets from different domains and individual CL-pattern does not tell the reader at this stage. pattern languages in the same framework. By adding one These are (a) which other patterns might be useful or even additional layer (targets and descriptors) to what was already necessary for a given purpose, and (b) exactly at which point available before, we have arrived at a highly modular and during a given task or activity and in which order will they flexible pattern framework. Fig. 1 provides an overview of be needed. Without knowing these, one can only guess what the interrelation of pattern languages, patterns, descriptors, else they might need upon being presented with only a single and targets. pattern or depend on prior experience. It would be undesirable and arguably defeat the purpose of patterns, if assign extensive meta-knowledge were necessary to be able to use Target Descriptor them successfully. This is why we enrich the basic set theoretic framework with specialized descriptor sets, which serve to understand patterns in context with each other. We specify shall again illustrate this via a simple example: Assume that Pattern language we have three patterns, P1 to P3, which would help us in create from conducting a user study in the car. P1 and P2 are CL-patterns Patterns to reduce user distraction, whereas P3 is a pattern about processing the data gained from the study. P3 was created in a different pattern language, let us call that one DL. We can Figure 1. The Pattern Framework – Overview now specify which of these patterns we want or need and in which order by introducing an ordered set D. Let us further Currently, one problem of this proposed structure is that assume that we want to express that we need only one CL- any sets of statements can be made into a set and called a pattern as well as the DL-pattern and that the DL-pattern will pattern language. This is hardly acceptable, of course, and be needed after the CL-pattern. We can express all of this via needs to be rectified. We are currently still working on the following example descriptor set D1. Please note, that identifying requirements, that any set of statements should angle brackets (‘<’ and ‘>’) denote an ordered set, as fulfill in order to be called a pattern language and what the opposed to an ordinary set, which would be denoted by curly respective descriptors should look like. Descriptors are brackets (‘{’ and ‘}’). ordered sets themselves, so each of their elements have a clearly defined spot in a concrete sequence. This rigid 1 1 D1: (1) structure also means that it can be difficult to express that particular patterns could be relevant at any point in the Instead, if we need both CL-patterns, we can express this via sequence or at several fixed points. But it cannot be assumed the following modification to D1: that all patterns would only ever be relevant at one very specific point. Even if that were the case, it could similarly 2 1 D2: (2) not be assumed that these specific points were always known. A refinement of the descriptor sets will be in order, Considering the fact that D1 in (1) does not tell us which of to permit more numerous and less cumbersome expression the two CL-patterns is needed, we could also specify a possibilities with regard to sequences. pattern directly, if not any of them would do:

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 10

18 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

IV. NEXT STEPS can be used to describe virtually any pattern set, regardless The framework outlined in this paper is still a work in of which domain(s) its patterns came from or when the progress. To arrive at a sufficiently detailed and refined pattern was created. Not only is it possible to capture the framework, we are currently working on the following: hierarchical order of existing pattern languages via descriptors, but also reference patterns from other languages A. Refine the Framework: Pattern Languages, that might fit a certain purpose. This means that the Descriptors, Targets framework is not tied to a single pattern language or even a We intend to pursue a refinement of pattern language single domain and permits references to patterns from requirements similar to Winn and Calder’s approach [13] multiple pattern languages. We therefore consider it a and introduce symmetry breaking (or a similarly suitable suitable basis for domain independent pattern research. property) as a necessary property of descriptors. We will ACKNOWLEDGMENT then research the requirements a descriptor and its subsets need to fulfill in order to acquire that property. A more We gratefully acknowledge the financial support by the detailed analysis of pattern languages as hierarchical Austrian Federal Ministry of Economy, Family and Youth structures and how this translates into concrete descriptor and the National Foundation for Research, Technology and sets is being worked on. Descriptors permit specifying Development (Christian Doppler Laboratory for „Contextual patterns directly, via specifying the language they are part of, Interfaces"). or with regard to other additional factors. The type of a pattern could be regarded as one such relevant factor. We REFERENCES will, therefore, incorporate the notion of pattern types into [1] C. Alexander, The Timeless Way of Building, New York: the framework, in particular the pattern types put forward by Oxford University Press, 1979. Meszaros and Doble [11]. We consider these as particularly [2] C. Alexander, S. Ishikawa, M. Silverstein, M. Jacobson, I. Fiksdahl-King, and S. Angel, A Pattern Language: Towns, important in this regard, due to their general nature, but also Buildings, Construction, Oxford: University Press, 1979. analyze domain-specific pattern types (e.g., the three types of [3] J. Borchers, A pattern approach to interaction design, New software design patterns put forward by Gamma et al. [6]) York: John Wiley & Sons, 2001. will have to be taken into consideration. In addition, we will [4] A. Dearden and J. Finlay, “Pattern Languages in HCI: A provide a more concrete structure for targets, with more Critical Review,” HCI, Volume 21, January 2006, pp. 49-102. detailed information on what a target is and the information [5] K. Devlin, The Joy of Sets: fundamentals of contemporary set it should contain. theory, 2nd ed., Springer, 1993. [6] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design B. Apply the Framework patterns: elements of reusable object-oriented software, Once the framework has been completed, we will Boston: Addison-Wesley Professional, 1995. demonstrate the suitability of the framework by actually [7] S. Günther and T. Cleenewerck, “Design principles for internal domain-specific languages: a pattern catalog generating and describing sample pattern sets from two very illustrated by Ruby,” Proc. 17th Conf. on Pattern Languages different domains (e.g., User Experience (UX) research and of Programs (PLOP '10). ACM, New York, NY, USA, , Neuroscience). Article 3 , pp. 1-35, DOI=10.1145/2493288.2493291, retrieved: April, 2014. V. CONCLUSION [8] A. Krischkowsky, D. Wurhofer, N. Perterer, and M. Tscheligi, “Developing Patterns Step-by-Step: A Pattern The great advantage of the approach described in this Generation Guidance for HCI Researchers,” Proc. paper is that patterns are separate from descriptors, which are PATTERNS 2013, The Fifth International Conferences on themselves separate from the targets. This means that Pervasive Patterns and Applications, ThinkMind Digital patterns can be generated as usual, descriptors generated and Library, Valencia, Spain, May 2013, pp. 66–72. assigned on an as-needed basis. For the pattern user, this [9] D. Martin, T. Rodden, M. Rouncefield, I.Sommerville, and S. means that they do not have to scour vast databases of Viller, “Finding Patterns in the Fieldwork,” Proc. Seventh European Conf. on Computer-Supported Cooperative Work, patterns for those they might need. All they need is to have a Bonn, Germany, September 2001, pp. 39-58. look at the descriptor(s) that is/are assigned to the target they [10] D. May and P. Taylor, “Knowledge management with have in mind. Thus, existing pattern databases can be patterns,” Commun. ACM 46, 7, July 2003, pp. 94-99, expanded with descriptors, which help make them more DOI=10.1145/792704.792705, retrieved: April, 2014. usable and reduce the amount of domain experience and [11] G. Meszaros and J. Doble, “A pattern language for pattern previous knowledge required in order to employ patterns writing,” Pattern languages of program design 3, Robert C. successfully. Martin, Dirk Riehle, and Frank Buschmann (Eds.). Addison- Wesley Longman Publishing Co., Inc., Boston, MA, USA, But even more importantly, descriptors can specify November 1997, pp. 529-574. patterns with regard to certain properties, such as pattern [12] J. Tidwell, “Designing Interfaces : Patterns for Effective language, context, etc. Descriptors functions similarly to Interaction Design,” O’Reilly Media, Inc., 2005. references are contained in the patterns themselves (as [13] T. Winn and P. Calder, “A pattern language for pattern suggested by Borchers [3]), but enable additional or language structure,” Proc. 2002 Conf. on Pattern Languages alternative references to other patterns at any time, since they of Programs - Volume 13 (CRPIT '02), James Noble (Ed.), are not actual parts of a pattern. This means that descriptors Vol. 13. Australian Computer Society, Inc., Darlinghurst, Australia, Australia, June 2003, pp. 45-58.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 11

19 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

From Pattern Languages to Solution Implementations

Michael Falkenthal, Johanna Barzen, Uwe Breitenbücher, Christoph Fehling, Frank Leymann Institute of Architecture of Application Systems University of Stuttgart Stuttgart, Germany {falkenthal, barzen, breitenbuecher, fehling, leymann}@iaas.uni-stuttgart.de

Abstract—Patterns are a well-known and often used concept in and Wolf [12] to gain insight to proven solutions of his the domain of computer science. They document proven problems; but, these are still generic solutions and he has to solutions to recurring problems in a specific context and in a find proper implementations for the systems to integrate. generic way. So patterns are applicable in a multiplicity of This can lead to huge efforts since besides paradigms of specific use cases. However, since the concept of patterns aims used programming languages he has also to consider many at generalization and abstraction of solution knowledge, it is difficult to apply solutions provided by patterns to specific use constraints given by the running systems and technologies. cases, as the required knowledge about refinement and the A teacher who uses the learning patterns of Iba and manual effort that has to be spent is immense. Therefore, we Miyake [18] has to adapt them to match his prevailing introduce the concept of Solution Implementations, which are school system with all the teaching methods. To give a final directly associated to patterns to efficiently support example, a costume designer could use the patterns of elaboration of concrete pattern implementations. We show how Schumm et al. [17] to find clothing conventions for a Solution Implementations can be aggregated to solve problems cowboy in a western film but he still has to come up with a that require the application of multiple patterns at once. We specific solution for his current film. validate the presented approach in the domain of cloud While patterns in general describe proven generic application architecture and cloud application management and show the feasibility of our approach with a prototype. solutions at a conceptual level, the examples above show that it is still time consuming to work out concrete solutions Keywords-pattern; pattern languages; pattern-based solution; of those generic solutions. pattern application; patterns To overcome this problem, we suggest that patterns should be linked to the (i) original concrete solutions from I. INTRODUCTION which they have been deduced (if available) and (ii) to Pattern and pattern languages are a well-established individual new concrete implementations of the abstractly concept in different application areas in computer science described solution. This enables users that want to apply a and information technology (IT). Originally introduced to certain pattern to take already existing implementations for the domain of architecture [2], the concept of patterns their use cases, which eases applying patterns and reduces recently got more and more popular in different domains the required manual effort significantly. such as education [18], design engineering [16], cloud The remainder of this paper is structured as follows: we application architecture [23] or costumes [17]. Patterns are clarify the difference between the common concept of used to document proven solutions to recurring problems in pattern solutions and concrete solutions in Section II. In a specific context. However, since the concept of patterns Section III, we discuss related work and the lack of directly aims at generalization and abstraction, it is often difficult to usable concrete solutions in state of the art pattern research. apply the captured abstracted knowledge to a concrete We show how to keep patterns linked to concrete solution problem. This can require immense manual effort and knowledge and how to select them to establish concrete domain-specific knowledge to refine the abstract, solution building blocks, which can be aggregated in conceptual, and high-level solution description of a pattern Section IV. In Section V, we give an example of how to to an individual use case. These following examples show apply the introduced concepts in the domains of cloud that this problem occurs in several domains due to the application architecture and cloud application management abstraction of solution knowledge into patterns. For and verify the feasibility of the presented approach by example, if a PHP: Hypertext Preprocessor (PHP) developer means of an implemented prototype in Section VI. We uses the Gang of Four patterns of Gamma et al. [19], he is conclude this paper with an outline of future work in faced with the problem that he has to translate the general Section VII. solution concepts of the patterns to his concrete context, i.e., II. MOTIVATION he has to implement solutions based on a given programming paradigm predefined by PHP. An enterprise Patterns document proven solution knowledge mainly in architect who has to integrate complex legacy systems may natural text to support human readers of a pattern. Patterns use the enterprise application architecture patterns of are often organized into pattern languages, i.e., they may be Fowler [20] or the enterprise integration patterns of Hohpe connected to each other. Pattern languages provide a common template for documenting all contained patterns.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 12

20 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

This template typically defines different items to be abstracted knowledge, therefore manual effort and specific documented such as “Problem”, “Context”, “Solution”, and knowledge is needed to apply them to concrete problems. “Known Uses”. The problem and context section describe Another problem occurs if multiple patterns have to be the problem to be solved in an abstract manner where the combined to create a concrete solution. Pattern languages solution describes the general characteristics of the solution tackle the problem of aggregating patterns to solve overall – all only conceptually, in an abstract way. Thus, the general problems. As shown by Zdun [9], this can be supported by solution is refined for individual problem manifestations defining relationships between patterns within a pattern and use cases resulting in different concrete solutions every language, which assure that connected patterns match time the pattern is applied. The known uses section is the together semantically, i.e., that they are composable only place where concrete solutions from which the pattern regarding their solutions. This means that patterns can be has been abstracted are described. But these are commonly used as composable building blocks to create overall not extended as the pattern is applied nor do they guide solutions. Once patterns are composed to create overall pattern readers during the creation of their own solutions. solutions the problem arises that concrete solutions have to Therefore, due to the abstract nature of patterns and be feasible in the context of concrete problem situations. generalized issues, most pattern languages only contain Referring to the former mentioned example of a PHP some concrete solutions a pattern was derived from in the developer, the overall concrete solution, consisting of the known uses section. This leads to the problem that the user concrete solutions of the composed patterns, has to be of the pattern has to design and implement a specific elaborated that it complies with the constraints defined by solution based on his individual and concrete use case, i.e., a the programming language PHP. So, the complexity of solution has to be implemented based on the user’s creating concrete solutions from composed patterns circumstances considering the given pattern. However, increases with the number of aggregated solutions, since many patterns are applied several times to similar use cases. integration efforts add to the efforts of elaborating each Thus, the effort has to be spent every time for tasks that individual solution. Thus, to summarize the discussion were already executed multiple times. For example, the above, we need a means to improve the required refinement Model-View-Controller (MVC) Design Pattern is an often from a pattern’s abstract solution description to directly used pattern in the domain of software design. This pattern applicable concrete solutions and their composition. was, therefore, implemented for many applications in many programming languages from scratch, as patterns typically III. RELATED WORK provide no directly usable concrete solutions for use cases in Patterns are human readable artifacts, which combine a concrete context. Patterns are not linked with a growing problem knowledge with generic solution knowledge. The list of solutions that can be used as basis to apply them to template documenting a pattern contains solution sections individual use cases rapidly: each time a pattern should be presenting solution knowledge as ordinary text [2][19][13]. applied, it has to be refined manually to the current use case. This kind of solution representation contains the general The provided sections such as “Known Uses” and principle and core of a solution in an abstract way. Common “Examples”, which are part of the pattern structure in most solution sections of patterns do not reflect concrete solution pattern languages, therefore, support the reader in creating instances of the pattern. They just act like manuals to support new solutions only partially [10][12][13]: they provide only a reader at implementing a solution proper for his issues. Iterative pattern formulation approaches as shown by partial solution refinements or solution templates as written Reiners et al. [6] and Falkenthal et al. [5] can enable that text but not directly applicable implementations that can be concrete solution knowledge is used to formulate patterns. used without additional effort. The major reasons for this Patterns are not just final artifacts but are formulated based problem are, that neither the concrete solution is on initial ideas in an iterative process to finally reach the documented in a way that enables reusing it efficiently nor it status of a pattern. Nevertheless, in these approaches is obvious how to aggregate existing solutions if multiple concrete solution knowledge only supports the formulation patterns are applied together. Thus, the reader of a pattern is process of patterns but is not stored explicitly to get reused faced with the problem of creation and design to elaborate a when a pattern is applied. proper solution based on a given pattern each time when it Porter et al. [15] have shown that selecting patterns from has to be applied – which results in time-consuming efforts a pattern language is a question of temporal ordering of the that decrease the efficiency of using patterns. selected patterns. They show that combinations and As of today, patterns are typically created by small aggregations of patterns rely on the order in which the groups of experts. By abstracting the problems and solutions patterns have to be applied. This leads to so called pattern into patterns relying on their expertise, these experts sequences which are partially ordered sets of patterns determine the content of the patterns. This traditional way of reflecting the temporal order of pattern application. This pattern identification created the two issues already seen: approach focuses on combinability of patterns, but not on the first, the patterns are not verifiable because the concrete combinability of concrete solutions. solutions they have been abstracted from are not traceable Many pattern collections and pattern languages are stored (“pattern provenance”) and second the patterns document in digital pattern repositories such as presented by

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 13

21 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Fehling [4], van Heesch [7] and Reiners [3]. Although these A. Solution Implementations repositories support readers in navigating through the We argue that concrete solutions are lost during the patterns they do not link concrete solutions to the patterns. pattern writing process since patterns capture general core Therefore, readers have to manually recreate concrete solution principles in a technology and implementation solutions each time when they want to apply a pattern. agnostic way. In addition, applications of patterns to form Zdun [9] shows that pattern languages can be represented as graphs with weighted edges. Patterns are the nodes of the new concrete solutions are not documented in a way that graph and edges are relationships between the patterns. The enables reusing the knowledge of refinement. As a result, weights of the edges represent the semantics of the the details of the concrete solutions are abstracted away and relationships as well as the effects of a pattern on the must be worked out again when a pattern has to be applied resulting context of a pattern. These effects are called goals to similar use cases. Thus, the benefits of patterns in the and reflect the influence of a pattern on the quality attributes form of abstractions lead to effort when using them due to of software architectures. While this approach helps to select the missing information of concrete realizations. We suggest proper pattern sequences from a pattern language it does not keeping concrete solutions linked to patterns in order to ease enable to find concrete solutions and connect them together. pattern application and enable implementing new concrete Demirköprü [8] shows that Hoare logic can be applied to solutions for similar use cases based on existing, already patterns and pattern languages such that patterns are getting refined, knowledge. These linked solutions can be, for enriched by preconditions and postconditions. By example, (i) the concrete solutions which were considered considering this conditions, pattern sequences can be initially to abstract the knowledge into a pattern, (ii) later connected into aggregates respectively compositions of applications of the pattern to build new concrete solutions, patterns where preconditions of the first pattern of the or (iii) concrete solutions that were explicitly developed to sequence are the preconditions of the aggregate and ease applying the pattern. postconditions of the last pattern in the sequence are Concrete solutions, which we call Solution accordingly the postconditions of the aggregate. This Implementations (SI), are building blocks of concrete approach also only tackles aggregation of patterns without solution knowledge. Therefore, Solution Implementations considering concrete solutions. Fehling et al. [31][33] show that their structure of cloud describe concrete solution knowledge that can be reused computing patterns can be extended to annotate patterns with directly. For example, in the domain of software additional implementation artifacts. Those artifacts can development, Solution Implementations provide code, represent instantiations of a pattern on a concrete cloud which can be used directly in the development of an own platform. Considering those annotations, developers can be application. For example, a PHP developer faced with the guided through configurations of runtime environments. problem to implement the Gang of Four Pattern MVC [19] Although patterns can be annotated with concrete in an application can reuse a Solution Implementation of the implementation artifacts, this approach is only described in MVC pattern written in PHP code. Especially, patterns may the domain of cloud computing and does not introduce a provide multiple different Solution Implementations – each means to ease pattern usage and refinement in general. optimized for a special context and requirements. So, there could be a specific MVC Solution Implementation for PHP4 IV. SOLUTION IMPLEMENTATIONS: BUILDING BLOCKS and another for PHP5, each one considering the FOR APPLYING AND AGGREGATING CONCRETE SOLUTIONS programming concepts of the specific PHP version. Another FROM PATTERNS Solution Implementation could provide a concrete solution In the section above, we summarized the state of the art and of the MVC pattern implemented in Java. So, in this case identify that (i) concrete solutions are not connected to also a Java developer could reuse a concrete MVC solution patterns and that (ii) there are no approaches dealing with to save implementation efforts. the aggregation of concrete solutions if multiple patterns By connecting Solution Implementations to patterns, have to be applied together. Even though there are users do not have to redesign and recreate each solution approaches to derive patterns from concrete solution every time a pattern is applied. The introduced Solution knowledge iteratively [5][6], concrete solutions are not Implementations provide a powerful means to capture stored altogether with the actual patterns nor are they linked existing fine-grained knowledge linked to the abstract to them. Concrete solutions, thus, cannot be retrieved from knowledge provided by patterns. So, users can look at the patterns without the need to work them out manually over connected Solution Implementations once a pattern is and over again for the same kind of use cases. Therefore, we selected and reuse them directly. To distinguish between propose an approach that (i) defines concrete, implemented pattern’s abstract solutions and Solution Implementations, solution knowledge as reusable building blocks, (ii) that we point out that the solution section of patterns describes links these concrete solutions to patterns, and (iii) enables the core solution principles in text format and the Solution the composition of concrete solutions. Implementations represent the real solution objects – which may be in different formats (often depending on the problem domain), e.g., executable code in software development or real clothes in the domain of costumes. Thus, while patterns

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 14

22 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

are documented commonly in natural text, their Solution concrete implementations and written in ordinary text to Implementations depend mainly on the domain of the support readers. In contrast to this abstract description, we pattern language and can occur in various forms. Since grasp Solution Implementations as fine-grained artifacts, many specific Solution Implementations can be linked to a which provide concrete implementation information for pattern, we need a means to select proper Solution particular use cases of a pattern. Solution Implementations Implementations of the pattern to be applied. are linked to patterns where Selection Criteria are added to the relation between the pattern and the Solution B. Selection of Solution Implementations from Patterns Implementation to guide pattern users during the selection Once a user selects a pattern, he is faced with the of Solution Implementations. problem to decide which Solution Implementation solves his problem in his context properly. To enable selecting C. Aggregation of Solution Implementations proper Solution Implementations of a pattern we introduce The concepts of Solution Implementations and Selection Selection Criteria (sc), which determine when to use a Criteria enable to reuse concrete solutions, which are linked certain Solution Implementation. The concept of keeping to patterns. But most often problems have to be solved by Solution Implementations linked to the corresponding combining multiple patterns. Therefore, we also need a pattern and supporting the selection of a proper Solution means to combine Solution Implementations of patterns to Implementation is shown in Figure 1. Selection Criteria are solve an overall problem altogether. For this purpose, added to relations between Solution Implementations and Solution Implementations connected to patterns can have patterns. Selection Criteria may be human readable or additional interrelations with other Solution software interpretable descriptions of when to select a Implementations of other patterns affecting their Solution Implementation. They provide a means to guide composability. For example, Solution Implementations in the selection using additional meta-information not present the domain of software development are possibly in the Solution Implementation itself. implemented in different programming languages. To exemplify the concept, we give an example of Therefore, there may exist various Solution Solution Implementations from the domain of architecture. Implementations for one pattern in different programming In this domain addressed by [1][2], a languages, remembering the above example of the PHP and Solution Implementation would be, e.g., a real entrance of a Java Solution Implementations of the MVC pattern. To be building or a specific room layout of a real floor, which are combined, both Solution Implementations often have to be described in detail and linked to the corresponding implemented in the same programming language. pattern [1][2]. To find the most appropriate Solution This leads to the research question “How to compose Implementation for a particular use case, Selection Criteria Solution Implementations selected from multiple patterns such as the cost of the architectural Solution Implementation into a composed Solution Implementation?” or the choice of used material can be considered. For Patterns are often stored and organized in digital pattern example, two Solution Implementations for the pattern repositories. These repositories, such as presented by mentioned above that deals with room layouts might differ Fehling [4], van Heesch [7] and Reiners [3], support users in in the historical style they are built or by the functional searching for relevant patterns and navigating through the purpose like living, industrial or office, etc. Thus, based on whole collection of patterns, respectively a pattern language such criteria, the refinement of a pattern’s abstract solution formed by the relations between patterns. To support can be configured by specifying desired requirements and navigation through pattern languages, these relations can be constraints. formulated at the level of patterns indicating that some To summarize the concept of Solution Implementations patterns can be “combined” into working composite it has to be pointed out that solutions in the domain of solutions, some patterns are “alternatives”, some patterns patterns are abstract descriptions that are agnostic to can only be “applied in the context of” other patterns etc. Zdun [9] has shown that pattern languages can be formalized to enable automated navigation through pattern languages based upon semantic and quality goal constraints reflecting a pattern’s effect once it is applied. This also enables combining multiple patterns based on the defined semantics. The approach supports the reader of a pattern language to select proper pattern sequences for solving complex problems that require the application of multiple patterns at once. But, once there are Solution Implementations linked to patterns this leads to the requirement to not only compose patterns but also their Figure 1. Solution Implementations (SI) connected to a pattern (P) are selectable under consideration of defined Selection Criteria (sc). concrete Solution Implementations into overall solutions.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 15

23 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Figure 2. Aggregating Solution Implementations (SI) along the sequence of selected patterns (P).

We extend the approach of Zdun to solve the problem of due to the context of the Solution Implementations. In selecting appropriate patterns to also select and aggregate contrast to the context section of a pattern, which is used appropriate Solution Implementations along the selected together with the problem section to describe the sequence of patterns. circumstances when a pattern can be applied, the Solution To assure that Solution Implementations are building blocks Implementations’ context is more specific in terms of the composable with each other, we introduce the concept of an concrete solution. For example, if an Aggregation Operator Aggregation Operator, as depicted in Figure 2. The shall connect two Solution Implementations consisting of Aggregation Operator is the connector between several concrete PHP code, the operator itself could also be Solution Implementations. Solution Implementations can concrete PHP code wrapping functionality from both just be aggregated if a proper Aggregation Operator Solution Implementations. If the Solution Implementations implements the necessary adaptations to get two Solution to aggregate are Java class files, e.g., an Aggregation Implementations to work together. Adaptions may be Operator could resolve their dependencies on other class necessary to assure that Solution Implementations match files or libraries and load all dependencies. Afterwards it together based on their preconditions and postconditions. could configure the components to properly work together Preconditions and postconditions are functional and and execute them in a Java runtime. Thus, an Aggregation technical dependencies, which have to be fulfilled for Operator composes and adapts multiple Solution Solution Implementations. In Figure 2., the three patterns Implementations considering their contexts. Another P!, P!!and P!!! show a sequence of patterns, which can be example on how the Aggregation Operators can be used in selected through the approach of Zdun considering very different domains is an example of the domain of semantics (s) of the relations, goals (g) of the patterns and costumes in films. When dressing the characters of a further weights. Solution Implementations are linked with western movie usually the sheriff costume pattern and the the patterns and can be selected according to the Selection outlaw costume pattern need to be applied. But there are Criteria introduced in the section above. Furthermore, there numerous Solution Implementations of these patterns in are two Solution Implementations associated with pattern P! terms of concrete sheriff and outlaw costumes, e.g., for different historical time periods. To make sure the costumes but only Solution Implementation SI!"! can be aggregated of the sheriff and outlaw match together, an Aggregation with Solution Implementation SI!""! of the succeeding pattern P!! due to the Aggregation Operator between those Operator, for example, can ensure that certain Solution two Solution Implementations. There is no Aggregation Implementations originate from the same time period or the Operator implemented for SI , so that it cannot be same country and can be used together in one movie. !"! Further the Aggregation Operator adapts Solution aggregated with SI!""! , but, nevertheless, it is a working ! Implementations to suit to the settings of a scene in a film, concrete solution of P . So, in the scenario depicted in i.e., by adapting the color of the costumes. Thus, the Figure 2 an Aggregation Operator has to be available to costumes’ Solution Implementations are aggregated to solve aggregate SI and SI . !"! !""! a problem in combination. Those examples show that In general, Aggregation Operators have to be available Solution Implementations of patterns from different to compose Solution Implementations for complex problems domains have to be aggregated using specific Aggregation requiring the application of multiple patterns. Solution Operators. Since different pattern languages deal with Implementations aggregated with such an operator are different contexts, they can contain different Aggregation concrete implementations of the aggregation of the selected Operators to compose Solution Implementations. patterns. Aggregated Solution Implementations are, therefore, concrete building blocks solving problems V. VALIDATION addressed by a pattern language. To validate the proposed concept of Solution Aggregation Operators depend on the connected Implementations, this section explains the application of Solution Implementations, i.e., they are context-dependent

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 16

24 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Solution Implementations in the domains of cloud As depicted in Figure 3, the stateless component and application architecture and cloud management. stateful component pattern both provide Solution Implementations, which implement these patterns for Java A. Deriving Solution Implementations in the Domain of web applications packaged in the web archive (WAR) Cloud Application Architecture format that are hosted on Amazon Elastic Beanstalk [21] To explain the concept of Solution Implementations in which is part of Amazon Web Services (AWS) [30]. The the domain of cloud computing patterns, the example elastic load balancer has three Solution Implementations depicted in Figure 3 shows the three patterns stateless implementing the described management functionality for component, stateful component, and elastic load balancer stateful components and stateless components for WAR- from the pattern language and pattern catalogue of Fehling based applications on Amazon Elastic Beanstalk and et al. [10][31]. The stateless component and stateful Microsoft Azure [22]. The Selection Criteria “WAR is component patterns describe how an application component deployed on Microsoft Azure” respectively “WAR is can handle state information. They both differentiate deployed on Elastic Beanstalk” support the user to choose between session state – the state with the user interaction the proper Solution Implementation. For example, if SI2 is within the application and application state – the data selected the user knows that this results in a concrete load handled by the application, for example, customer addresses balancer in the form of a deployed WAR file on Elastic etc. While the stateful component pattern describes how this Beanstalk. Since a load balancer scales components, it needs state can be handled by the component itself and possibly be concrete instances of either stateless component or stateful replicated among multiple component instances, the component to work with. Thus, the user can select a proper stateless component pattern describes how state information Solution Implementation for the components based on his is kept externally of the component implementation to be concrete requirements considering the Selection Criteria of provided with each user request or to be handled in other the relations between the patterns stateless component and data storage offerings. The elastic load balancer pattern stateful component and their Solution Implementations. To describes how application components can be scaled out: assure that Solution Implementations are composable, i.e., their performance is increased or decreased through addition that they properly work together, they refine and enrich the or removal of component instances, respectively. Decisions pattern relationships to formulate preconditions respectively on how many component instances are required are made by postconditions on the Solution Implementation layer. The monitoring the amount of synchronous requests to the preconditions and postconditions of the elastic load balancer managed application components. The elastic load balancer Solution Implementations, therefore, capture which related pattern is related to both of the other depicted patterns as it pattern – stateless component or stateful component – they conceptually describes how to scale out stateful components expect to be implemented by managed components. and stateless components: while stateless components can Furthermore, they capture the supported deployment be added and removed rather easily, internal state may have package – WAR in this example – and runtime environment to be extracted from stateful components upon removal or for which they have been developed: SI3.1 of stateless synchronized with new instances upon addition. component has the postcondition “WAR on Elastic

Figure 3. Solution Implementations in the domain of cloud application architecture linked to patterns and aggregated by Aggregation Operators.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 17

25 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Beanstalk” while SI1.2 of elastic load balancer is enriched (AMI) containing the implementation of stateless with the precondition “WAR on Elastic Beanstalk” and SI1.1 component as well as a runtime to execute the component in with “WAR on Azure”. The previously introduced the form of an AWS Elastic Compute Cloud (EC2) [32] Aggregation Operator interprets these dependencies and, for instance and, (iii) an autoscaling group example, composes SI3.1 and SI1.2. During this task, the (MyAutoscalingGroup) to define scaling parameters used by configuration parameters of the solutions are adjusted by the the elastic load balancer and the wiring of the elastic load operator, i.e., the elastic load balancer is configured with the balancer and the launch configuration. address of the stateless component to be managed. As some MyLB defines an AWS elastic load balancer for scaling of this information may only become known after the Hypertext Transfer Protocol (HTTP) requests on port 80. deployment of a component, the configuration may also be Further, MyCfg defines the AMI ami-statelessComponent in handled during the deployment. the property ImageId, which is used for provisioning new Following, this example is concretely demonstrated by instances by an elastic load balancer. The autoscaling group an AWS Cloud Formation template [28] generated by the MyAutoscalingGroup wires the elastic load balancer and the Aggregation Operator in Listing 1. An AWS Cloud stateless component instances by means of referencing the Formation template is a configuration file, readable and properties LoadBalancerNames and processable by the AWS Cloud to automatically provision LaunchConfigurationName to MyLB and MyCfg, and configure cloud resources. For the sake of simplicity the respectively. Since all the mentioned properties are in depicted template in Listing 1 shows only the relevant parts charge of enabling an elastic load balancer instance to of the template, which are adapted by the Aggregation automatically scale and load balance instances of Operator. To run the example scenario on AWS, three parts components contained in an AMI, an Aggregation Operator are needed within the AWS Cloud Formation template to can dynamically adapt those properties based on the reflect the aggregation of SI3.1 and SI1.2: (i) an elastic load selected Solution Implementations to be aggregated. So, balancer (MyLB), which is able to scale components, (ii) a presuming that ami-statelessComponent contains an launch configuration (MyCfg), which provides implementation of SI3.1, an Aggregation Operator can configuration parameters about an Amazon Machine Image aggregate SI3.1 and SI1.2 by adapting the mentioned properties and, therefore, provides an executable configuration template for AWS Cloud Formation. The "MyLB" : { same principles can be applied to aggregate SI1.3 and SI2.1 "Type" : "AWS::ElasticLoadBalancing::LoadBalancer", because of their matching preconditions and postconditions. "Properties" : { By adapting the ImageId of the LaunchConfiguration to an "Listeners" : [ { AMI, which runs an AWS EC2 instance with a deployed "LoadBalancerPort" : "80", stateful component, the Aggregation Operator can aggregate "InstancePort" : "80", SI and SI . "Protocol" : "HTTP" 1.3 2.1 } ], Further, SI1.1 has precondition “WAR on Azure” and is, } therefore, incompatible with SI2.1 and SI3.1, i.e., SI1.1 cannot }, be combined with these Solution Implementations due to "MyCfg" : { their preconditions and postconditions. The selection of a "Type" : "AWS::AutoScaling::LaunchConfiguration", Solution Implementation, therefore, may restrict the number "Properties" : { of matching Solution Implementations of the succeeding "ImageId" : { "ami-statelessComponent" }, pattern since postconditions of the first Solution "InstanceType" : { "m1.large" }, Implementation have to match with preconditions of the } second. This way, the space of concrete solutions is reduced }, based on the resulting constraints of a selected Solution "MyAutoscalingGroup" : { Implementation. To elaborate a solution to a overall "Type" : "AWS::AutoScaling::AutoScalingGroup", problem described by a sequence of patterns exactly one "Properties" : { Solution Implementation has to be selected for each pattern … in the sequence considering its selection criteria to match "LaunchConfigurationName" : { "Ref" : "MyCfg"}, non-functional requirements, as well as postconditions of "LoadBalancerNames" : [ { "Ref" : "MyLB" } ] the former Solution Implementation. … } B. Deriving Solution Implementations in the Domain of } Cloud Application Management In this section, we show how the presented approach can Listing 1. Extract from AWS Cloud Formation template produced by an be applied in the domain of cloud application management. Aggregation Operator to aggregate configuration snippets to aggregate Therefore, we describe how applying management patterns elastic load balancer and stateless component. introduced in [10][29] to cloud applications can be supported

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 18

26 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Figure 4. Management Planlets are Solution Implementations in the domain of cloud management linked to patterns and aggregated by an Aggregation Operator.

by reusing and aggregating predefined Solution system, or creating a database backup. Each planlet exposes Implementations in the form of executable management its functionality through a formal specification of its effects workflows. on components, i.e., its postconditions, and defines optional In the domain of cloud application management, preconditions that must be fulfilled to execute the planlet. applying the concept of patterns is quite difficult as the Therefore, each specific precondition of a planlet must be refinement of a pattern’s abstract solution to an executable fulfilled by postconditions of other planlets. Thus, planlets management workflow for a certain use case is a complex can be combined to implement a more sophisticated challenge: (i) mapping abstract conceptual solutions to management task, such as scaling an application. If two or concrete technologies, (ii) handling the technical complexity more planlets are combined, the result is a composite of integrating different heterogeneous management APIs of management planlet (CMP), which can be recursively different providers and technologies, (iii) ensuring non- combined with other planlets again: the CMP inherits all functional cloud properties, (iv) and the mainly remote postconditions of the orchestrated planlets and exposes all execution of management tasks lead to immense technical their preconditions, which are not fulfilled already by the complexity and effort when refining a pattern in this domain. other employed planlets. Thus, management planlets provide The presented approach of Solution Implementations enables a recursive aggregation model to implement management to provide completely refined solutions in the form of workflows. Based on these characteristics, management executable management workflows that already consider all planlets are ideally suited to implement management patterns these aspects. Thus, if they are linked with the corresponding in the form of concrete Solution Implementations. We create pattern, they can be selected and executed directly without Solution Implementations, which implement a pattern’s further adaptations. This reduces the (i) required refinement for a certain use case by orchestrating several management knowledge and (ii) manual effort to apply a management planlets to an overall composite management management pattern significantly. To apply the concept of planlet that implements the required functionality in a Solution Implementations to this domain, two issues must be modular fashion as depicted in Figure 4. considered: (i) selection and (ii) aggregation of Solution As stated above, selection and aggregation of Solution Implementations in the form of management workflows. Implementations must be considered, the latter if multiple To tackle these issues, we employ the concept of patterns are applied together. For example, Figure 4 shows management planlets, which was introduced in our former two management patterns: (i) forklift migration [29] – research on cloud application management automation [24]. application functionality is migrated with allowing some Management planlets are generic management building downtime and (ii) elasticity management process [10] – blocks in the form of workflows that implement management application functionality is scaled based on experienced tasks such as installing a web server, updating an operating workload. Both patterns are linked to two Solution

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 19

27 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Implementations each in the form of composite management wiki-technology we implemented a Solution Implementation planlets. The forklift migration pattern provides two Solution repository for the domain of costume patterns [14]. Here, for Implementations: one migrates a Java-based example, the concrete costumes of a sheriff occurring in a (packaged as WAR file) to Microsoft Azure [22], another to film can be understood as the Solution Implementation of a Amazon Elastic Beanstalk [21]. Thus, if the user selects this sheriff costume pattern. By connecting the pattern to a pattern and chooses the Selection Criteria defining that a Solution Implementation as a concrete solution of the WAR application shall be migrated to Elastic Beanstalk, SI1.2 abstracted solution of the pattern the application of the is selected. Whether this Solution Implementation is pattern in a certain context is facilitated. applicable at all depends on the context: if the application to The combination of several concrete Solution be migrated is a WAR application, then the Solution Implementations has been prototyped for the domain of Implementation is appropriate. Equally to this pattern, the cloud management patterns. A workflow generator has been elasticity management process pattern shown in Figure 4 built that is used to combine different management planlets provides two Solution Implementations: one provides to an overall workflow implementing a solution to a problem executable workflow logic for scaling a WAR application on that requires the use of multiple patterns. The input for this Elastic Beanstalk (SI2.1). Thus, if these two patterns are generator is a partial order of (composite) management applied together, the selection of SI1.2 restricts the possible planlets, i.e., Solution Implementations that have to be Solution Implementations of the second pattern, as only SI2.1 orchestrated into an executable workflow. This partial order is applicable (its preconditions match the postconditions of is determined by the relations of combined patterns: if one SI1.2). As a result, the selection of appropriate Solution pattern is applied after another pattern, also their Solution Implementations can be reduced to the problem of (i) Implementations, i.e., management planlets, have to be matching Selection Criteria to postconditions of Solution executed in this order. The workflow generator creates Implementations and (ii) matching preconditions and BPEL-workflows while management planlets are also postconditions of different Solution Implementations to be implemented using BPEL. As BPEL is a standardized combined. workflow language, the resulting management plans are After Solution Implementations of different patterns have portable across different engines and cloud environments been selected, the second issue of aggregation has to be supporting BPEL as workflow language, which is in line tackled to combine multiple Solution Implementations in the with TOSCA [25][26]. form of workflows into an overall management workflow that incorporates all functionalities. Therefore, we implement VII. CONCLUSION AND FUTURE WORK a single Aggregation Operator for this pattern language as In this paper, we introduced the concept of Solution described in the following: to combine multiple Solution Implementations as concrete instances of a pattern’s solution. Implementations, the operator integrates the corresponding We showed how patterns and pattern languages can be workflows as subworkflows [27]. The control flow, which enriched by Solution Implementations and how this approach defines the order of the Solution Implementations, i.e., the can be integrated into a pattern repository. To derive subworkflows, is determined based on the patterns’ solution concrete solutions for problems that require the application path depicted in Figure 2. So in general, if a pattern is of several patterns we proposed a mechanism to compose applied before another pattern, also their corresponding these solutions from concrete solutions of the required Solution Implementations are applied in this order. patterns by means of operators. We concretized the general concept of Solution Implementations in the domain of cloud VI. SOLUTION IMPLEMENTATIONS PROTOTYPE management by introducing management planlets as To prove the approach’s technical feasibility, we examples for Solution Implementations. We verified the implemented a prototype. That consists of two integrated approach by means of a prototype of an integrated pattern components: (i) a pattern repository and (ii) a workflow repository and workflow generator. generator. The pattern repository aims to capture patterns Currently, we extend the implemented repository for and their cross-references in a domain-independent way to solution knowledge in the domain of costume design to support working with patterns. Based on semantic wiki- capture Solution Implementations more efficiently. This technology [11] it enables capturing, management and search repository integrates patterns and linked Solution of patterns. To adapt to different pattern domains, the pattern Implementations in this domain and we are going to enlarge format is freely configurable. The pattern repository already the amount of costume Solution Implementations. We are contains various patterns from different domains like cloud also going to tackle the limitation of the presented approach computing patterns, data patterns and costume patterns to to not only work on solution implementation sequences but demonstrate the genericity of our approach. The cross- also on aggregations of concrete solution instances not references between the patterns enable an easy navigation ordered temporally due to pattern sequences. Since Solution through the pattern languages. Links like “apply after” or Implementations are composed by Aggregation Operators “combined with” supports to connect the patterns to result in we are going to enhance our pattern repositories to also store a pattern language. The pattern repository does not only and manage the Aggregation Operators. Finally, we will contain the patterns and their cross-references but can be investigate Aggregation Operators in domains, besides the connected to a second repository containing the solution above mentioned to formulate a general theory of Solution implementations of these patterns. Also, based on semantic Implementations and Aggregation Operators.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 20

28 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

REFERENCES 17th European Conference on Pattern Languages of Programs (EuroPLoP), July 2012, pp. C4-1–C4-30. [1] C. Alexander, “The timeless way of building,” Oxford University Press, 1979. [18] T. Iba, T. Miyake, “Learning patterns: a pattern language for creative learners II,” Proceedings of the 1st Asian Conference [2] C. Alexander, S. Ishikawa, M. Silverstein, M. Jacobson, I. on Pattern Languages of Programs (AsianPLoP 2010), March Fiksdahl-King, and S. Angel, “A pattern language: towns, 2010, pp. I-41 – I-58. buildings, constructions,” Oxford University Press, 1977. [19] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, “Design [3] R. Reiners, Library, http://bridge-pattern- patterns: elements of reusable object-oriented software,” library.fit.fraunhofer.de/pattern-library/, last accessed on Addison-Wesley, 1995. 2014.01.30. [20] M. Fowler, “Patterns of enterprise application architecture,” [4] C. Fehling, F. Leymann, R. Mietzner, and W. Schupeck, “A Addison-Wesley, 2003. collection of patterns for cloud types, cloud service models, and cloud-based application architectures,” [21] Amazon, Elastic Beanstalk, http://www.cloudcomputingpatterns.org, last accessed on http://www.amazon.com/elasticbeanstalk, last accessed on 2014.01.30, University of Stuttgart, Report 2011/05, Mai 2014.01.30. 2011. [22] Microsoft, Microsoft Azure, http://www.windowsazure.com, [5] M. Falkenthal, D. Jugel, A. Zimmermann, R. Reiners, W. last accessed on 2014.01.30. Reimann, and M. Pretz, “Maturity assessments of service- [23] C. Fehling, F. Leymann, R. Retter, D. Schumm, and W. oriented enterprise architectures with iterative pattern Schupeck, “An language of cloud-based refinement,” Lecture Notes in Informatics - Informatik 2012, applications”, Proceedings of the 18th Conference on Pattern September 2012, pp. 1095–1101. Languages of Programs (PLoP), October 2011, pp. A-20–A- [6] R. Reiners, “A pattern evolution process – from ideas to 30. patterns,” Lecture Notes in Informatics – Informatiktage [24] U. Breitenbücher, T. Binz, O. Kopp, and F. Leymann, 2012, March 2012, pp. 115–118. “Pattern-based runtime management of composite cloud applications”, Proceedings of the 3rd International [7] U. van Heesch, Open Pattern Repository, http://www.patternrepository.com, last accessed on Conference on Cloud Computing and Service Science 2014.01.30. (CLOSER), May 2013, pp. 475–482. [25] OASIS, Topology and Orchestration Specification for Cloud [8] M. Demirköprü, “A new cloud data pattern language to support the migration of the data layer to the cloud,” in Applications Version 1.0, http://docs.oasis- German “Eine neue Cloud-Data-Pattern-Sprache zur open.org/tosca/TOSCA/v1.0/os/TOSCA-v1.0-os.html, last Unterstützung der Migration der Datenschicht in die Cloud,” accessed on 2014.01.30. University of Stuttgart, diploma thesis no. 3474, 2013. [26] T. Binz, U. Breitenbücher, O. Kopp, and F. Leymann, “TOSCA: portable automated deployment and management [9] U. Zdun, “Systematic pattern selection using pattern language grammars and design space analysis,” Software: Practice and of cloud applications,” in Advanced Webservices, A. Experience, vol. 37, 2007, pp. 983–1016. Bouguettaya, Q. Z. Sheng, F. Daniel, Eds., Springer, 2014, pp. 527–549. [10] C. Fehling, F. Leymann, R. Retter, W. Schupeck, and P. Arbitter, “Cloud computing patterns,” Springer, 2014. [27] O. Kopp, H. Eberle, and F. Leymann, “The subprocess spectrum”, Proceedings of the 3rd Business Process and [11] N. Fürst, “Semantic wiki for capturing design patterns,” in Services Computing Conference (BPSC), September 2010, German “Semantisches Wiki zur Erfassung von Design- pp. 267–279. Patterns,” University of Stuttgart, diploma thesis no. 3527, 2013. [28] Amazon, AWS Cloud Formation, http://aws.amazon.com/cloudformation/, last accessed on [12] G. Hohpe and B. Wolf, “Enterprise integration patterns: 2014.01.30. designing, building, and deploying,” Addison-Wesley, 2004. [29] C. Fehling, F. Leymann, S. T. Ruehl, M. Rudek, and S. [13] F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, and Verclas “Service migration patterns – decision support and M. Stal, “Pattern-oriented software architecture volume 1: a best practices for the migration of existing service-based system of patterns,” Wiley, 1996. applications to cloud environments,” Proceedings of the IEEE [14] D. Kaupp, “Application of semantic wikis for solution International Conference on Service Oriented Computing and documentation and pattern identification,” in German Applications (SOCA), December 2013, in press. “Verwendung von semantischen Wikis zur [30] Amazon, Amazon Web Services, http://aws.amazon.com, last Lösungsdokumentation und Musteridentifikation,” University accessed on 2014.01.30. of Stuttgart, diploma thesis no. 3406, 2013. [31] C. Fehling, F. Leymann, J. Rütschlin, D. Schumm, “Pattern- [15] R. Porter, J. O. Coplien, and T. Winn, “Sequences as a basis based development and management of cloud applications,” for pattern language composition,” in Science of Computer Future Internet, vol. 4, 2012, pp. 110–141. Programming, Special issue on new software composition concepts, vol. 56, April 2005, pp. 231–249. [32] Amazon, AWS EC2, http://aws.amazon.com/de/ec2/, last accessed on 2014.04.10. [16] F. Salustri, “Using pattern languages in design engineering,” Proceedings of the International Conference on Engineering [33] C. Fehling, F. Leymann, R. Retter, D. Schumm, W. Design, August 2005, pp. 248–362. Schupeck, “An architectural pattern language of cloud-based applications,” Proceesings of the 18th Conference on Pattern [17] D. Schumm, J. Barzen, F. Leymann, and L. Ellrich, “A Languages of Programs (PLoP), Oct. 2011, pp. A-20 – A-21 pattern language for costumes in films,” Proceedings of the

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 21

29 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Reconfiguration Patterns for Goal-Oriented Monitoring Adaptation

Antoine Toueir, Julien Broisin, Michelle Sibilla IRIT, Universite´ Paul Sabatier Toulouse, France Email: {toueir,broisin,sibilla}@irit.fr

Abstract—This paper argues that autonomic systems need to monitoring system has to adapt its behavior according to these make their distributed monitoring adaptive in order to improve new requirements and constraints. their “comprehensive” resulting quality; that means both the Quality of Service (QoS), and the Quality of Information (QoI). In To summarize, the monitoring of autonomic systems has to a previous work, we proposed a methodology to design monitoring be capable of configuring the underlying mechanisms carrying adaptation based on high level objectives related to the manage- the monitoring functions (e.g., measuring, gathering, calculat- ment of quality requirements. One of the advantages of adopting a ing, evaluating, filtering, etc.) starting from QoS specification, methodological approach is that monitoring reconfiguration will as well as reconfiguring those mechanisms based on quality be conducted through a consistent adaptation logic. However, requirements. eliciting the appropriate quality goals remains an area to be investigated. In this paper, we tackle this issue by proposing Most of the time, reconfiguration is held through ad-hoc some monitoring adaptation patterns falling into reconfiguration logic (proposing solutions for particular scenarios dealing with dimensions. Those patterns aim at facilitating the adaptation specific issues). But, this approach is not suitable for reuse design of monitoring behavior of the whole set of distributed in other scenarios, and it also does not satisfy high level monitoring modules part of autonomic systems. The utility of those patterns is illustrated through a case-study dealing with objectives extended over the whole scale of the autonomic monitoring adaptation based on high level quality objectives. system. To overcome these issues, we adopted a Requirements Engineering methodology to design monitoring adaptation; it Keywords–Quality requirements; adaptive monitoring; auto- starts from high level goals, and ends up with the configuration nomic systems; goal-oriented adaptation. of monitoring mechanisms [2]. Right now, the key question is: how to identify goals I.INTRODUCTION representing the ”starting point” for deriving monitoring (re)configuration? In other words, reconfiguration questions Autonomic systems that are implemented by virtue of their such as: why to delay launching some monitoring mecha- four characteristics self-configuration, self-optimization, self- nisms? Why to substitute remote agents? How to aggregate healing and self-protection, are serving the ultimate objective alarms? What determines the monitoring of this set of met- of making them self-managed to achieve high level objectives rics and not another one? Why to exchange metrics among [1]. These objectives are strongly related to the quality level distributed management entities? This paper deals with these provided by autonomic systems. When large and complex sys- questions by proposing monitoring adaptation patterns that as- tems are targeted, the self-management characteristic (self-*) is sist human administrators in designing meaningful adaptations a key issue to deal with. Basically, self-management is thought and thus increase the overall quality of the autonomic systems. as the auto-adaptation capability that brings the system to reach The work presented here relies on both a 3-layered adaptive an absolute or preferable state. Concretely, the four self-* monitoring framework [3][4][5] and our goal-oriented adapta- characteristics are realized by implementing the Monitoring, tion methodology [2]. We pursue this research by focusing on Analyzing, Planning, Executing - Knowledge (MAPE-K) loop adaptation patterns dedicated to the identification of high level modules. This implementation is either embedded within a goals, together with their refinement. The paper is organized resource, or distributed over several resources. as follows: the next section gives an overview of the studied In MAPE-K loop, the monitoring module plays a crucial monitoring framework; the monitoring adaptation patterns are role, since wrong decisions might be taken by the analyzing discussed in Section III; and then applied to a case-study in & planning modules. Therefore, autonomic systems need to Section IV; before concluding, Section V enumerates other ensure the quality of information (e.g., correctness, freshness, monitoring adaptation approaches and points out their weak- timeliness, accuracy, etc.) exposed by the distributed monitor- nesses. ing modules. Moreover, within autonomic systems, monitoring is usually QoS-oriented. Thus, the services implemented by the II.THE STUDIED ADAPTIVE MONITORING FRAMEWORK functional system must respect the required QoS level that is determined through the Service Level Agreements (SLAs) that Our approach is based on a 3-layered framework [3][4][5] have been agreed with clients. Since the management system illustrated in Figure 1, and defines three fundamental capa- (managing the functional system) could provide the possibility bilities required to control monitoring: being configurable, to renegotiate or modify the QoS specification afterward, the adaptable and governable.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 22

30 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

TABLE I: Patterns Refining Achieve Goals (P ⇒ ♦Q) Pattern Subgoal 1 Subgoal 2 Subgoal 3 Milestone P ⇒ ♦R R ⇒ ♦Q Case P ∧ P 1 ⇒ ♦Q1 P ∧ P 2 ⇒ ♦Q2 (P 1 ∨ P 2) Q1 ∨ Q2 ⇒ Q Guard P ∧ ¬R ⇒ ♦R P ∧ R ⇒ ♦Q P ⇒ P WQ

high level quality objectives the monitoring framework carries on. By iterating a refinement process, we finally identify what it is called leaf goals or requirements (see Figure 1). Once the leaf goals are determined, both policies (to be inserted into the governability layer) and agents (invoking operations of the adaptability layer) will be recognized. Thus, monitoring system adaptation is automatically handled. However, human administrators have to manually identify the leaf goals accord- ing to the high level objectives they want to reach. To facilitate Figure 1: Adaptation Methodology & Monitoring Framework this task, we conducted an investigation about the monitoring aspects that could be subject to adaptation. As a result, we have identified various leaf goals belonging to four dimensions (i.e., Spatial, Metric, Temporal, Exchange) [2]. Hereafter, in the next The configurability layer relies on the Distributed Man- section, we propose monitoring adaptation patterns falling into agement Task Force (DMTF) Common Information Model those dimensions. (CIM) standard. In addition to the managed resources, this low level layer aims at representing the metrics [6] and the gathering mechanisms [3] that are required to monitor both III.MONITORING ADAPTATION PATTERNS the QoS provided by the functional system and the QoI of the monitoring system itself; this layer deals with both With regard to the refinement process, besides the basic mechanisms. The adaptability layer provides an interface AND/OR-decompositions, we rely on some predetermined cor- encapsulating operations to be applied on the lower layer rect and complete refinement patterns proved mathematically models. Thus, the behavior of the underlying monitoring [9]. Those patterns refine Achieve goals of the form P ⇒ ♦Q mechanisms will be reconfigured during runtime by invoking (see Table I), and are written in Linear Temporal Logic (LTL) these operations. Finally, the governability layer is the top classical operators where ♦,  and W mean some time in level layer representing the ”intelligence” of the monitor- the future, always in the future, and always in the future ing adaptation. To express the quality requirements, it uses unless, respectively. Starting from a given goal (P ), milestone Event/Condition/Action (ECA) policies to describe when and pattern identifies one (various) intermediate goal(s) (R, [...]) how adaptation should take place, that is, in which contexts that must be reached orderly before reaching the ultimate one those operations of the adaptability layer should be invoked. (Q). Rather, case pattern identifies the set of different and complete cases (P 1,P 2) for reaching final goals (Q1,Q2) We exploit the Requirements Engineering (RE) to propose that OR-decompose the ultimate goal (Q). Finally, the guard monitoring adaptation methodology, and to build configura- pattern requires the recognition of a specific condition (R) bility and adaptability models [2]. RE iterates activities of before achieving the ultimate goal (Q). ”eliciting, evaluating, documenting, consolidating and chang- ing the objectives, functionalities, assumptions qualities and In order to clarify the exploitation contexts, pattern goals constraints that the system-to-be should meet based on the and requirements, as well as some application situations, our opportunities and capabilities provided by new technologies” pattern structure encompasses: context, pattern refinement, and [7]. Keep All Objectives Satisfied (KAOS) is adopted as RE examples. Notice that we are focusing on adaptation actions goal-oriented method, due to its formal assertion layer that taken at the autonomic manager side only. Thus, investigating proves correctness and completeness of goals [8]. In KAOS, adaptations at the agent side is out of scope. In addition, the the system-to-be is divided into various models. Goals Model patterns are refined using KAOS graphical language [7]. determines the objectives to be realized through that system (e.g., minimizing monitoring cost). Agents Model comprises the agents (e.g., human automated component) responsible A. Exchange Dimension Pattern for realizing the refined elicited goals. Notice that the term Context. Relying on IBM blueprint reference architecture Agents in networks and systems management represents enti- [10], autonomic systems could distribute self-management ties responding to management requests coming from other (MAPE) loops over multiple collaborating autonomic man- management entities called Managers; therefore, the term agers. Each of them is responsible for managing a particular Agent in RE has a different meaning. Operations Model deals scope of managed resources. Patterns belonging to this dimen- with the internal operations to be carried by agents (e.g., sion are useful to overcome metrics gathering/delivering prob- updating polling period). Object Model identifies the system- lems. Those problems could manifest either on metrics values, to-be objects (e.g., entities, agent, relationships). reliability of communication between information sources & Therefore, based on KAOS, our methodology identifies the destinations, or even on their trustworthiness.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 23

31 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Figure 3: Metric Dimension Pattern Figure 2: Exchange Dimension Pattern

able to make ”wise” decisions, the monitoring system needs to Pattern Refinement. Communications inside autonomic instrument specific metrics that could be activated/deactivated system could be classified according to the entities in- according to the management needs during runtime. Patterns volved in information exchange (i.e., managers, agents, belonging to this dimension are useful to control the trade- shared databases). Therefore, we identify three communication off between constructing more knowledge and monitoring the classes: Manager-2-Agent, Manager-2-Manager, and Manager- information that is necessary for management. 2-Shared Database (see Figure 1). Besides identifying commu- Pattern Refinement. Metric instrumentation must be nication classes, we need to deal with pull & push communi- thought at the whole management system level. In other words, cation modes. In pull, the entity needing information solicits a given autonomic manager could activate/deactivate instru- the one possessing it, which responds with the queried infor- mentation of particular metrics, but when deactivating metrics mation; where in push, the entity possessing the information on that manager, it doesn’t mean necessarily that those metrics reports it to other entities. By taking into consideration push are ”abandoned”, because they could be transferred to other and pull modes, along with previous communications classes, collaborating autonomic manager on which they are activated. we use case pattern for the first two refinement levels to cover These two cases are OR-decomposing the first refinement level all possible cases (see Figure 2). (see Figure 3). Based on the triplet h Information Source, Communi- Regarding metrics manipulation inside an autonomic man- cation Protocol, Information Destination i, the Manager-2- ager, the second refinement level uses case pattern to cover Agent pull mode will be OR-decomposed into Substitute metric classes. Our research exploits both CIM Metric Model Agent and Substitute Protocol leaf goals. Rather, Substi- classifying metrics into Base, Discrete & Aggregation, as well tute Protocol and Substitute Destination OR-decompose both as our mathematical extension [6] classifying base metrics into Manager-2-(Manager/Shared DB) push mode. Besides, Acti- Resource, Measurable & Mathematical. Each of these classes vate/Deactivate Polling & Exporting leaf goals are elicited to is OR-decomposed using Add Aspects and Remove Aspects leaf launch and stop polling & exporting. goals. On the other hand, the transfer of metrics among auto- Notice that in both Manager-2-(Manager/Shared DB) pull nomic managers could be refined through milestone pattern, mode communications, the manager responding to requests is when metrics are activated on the collaborating manager (Add considered as agent (because it is the information source); Aspects in Figure 3, as Subgoal 1 in Table I) first, and then therefore, this case becomes identical to Manager-2-Agent removed from the delegating one (Remove Aspects, as Subgoal pull mode. Moreover, adaptation actions related to Manager- 2). 2-Agent push mode are not treated because they need to be held at the agent side. It is worth noting that previously mentioned aspects are representing ”metric definitions”, rather than ”metric values”. Examples. This pattern is suitable for the following cases: The former encompasses attributes related to the nature of (1) Increasing accuracy or precision of pulled/pushed metrics metric (e.g., data type, unit), where the latter attributes describe values, by replacing information source. (2) Querying more the instrumented values and their relevant contexts. For further available agents, or blocking fake agents trying to integrate the information, the reader is referred to the DMTF Base Metric distributed management system. (3) Securing the communica- Profile [11]. tion between information sources and destinations. (4) Modi- fying information destination when changing the topology of Examples. This pattern can be applied in the following collaborating autonomic managers. cases: (1) Performing troubleshooting, or applying root cause analysis algorithms, because they require the instrumentation B. Metric Dimension Pattern of additional metrics. (2) Modifying the hierarchical topol- ogy of the management system by instrumenting aggregated Context. The main idea behind building autonomic sys- metrics to be exported to other managers or shared DBs. (3) tems is to delegate decisions, that human administrators are ”Engineering” the distribution of monitored metrics among used to make, to autonomic systems themselves. Thus, to be autonomic managers.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 24

32 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

D. Temporal Dimension Pattern Context. Temporal aspects are decisive factors in adapt- ing monitoring behavior. Notice that previous patterns are explained without time considerations, but in fact, they imply some temporal aspects. Patterns belonging to this dimension are useful either to overcome both temporal violations and problems, or to tune the analysis over the instru- mented metrics. Pattern Refinement. Regarding information exchange, once again, we use case pattern to represent the same cases identified in Exchange dimension. Obviously, dealing with information exchange temporal aspects means that the ex- change is done iteratively and not once. Thus, Manager-2- Agent case is OR-decomposed into periodic poll, and both Figure 4: Spatial Dimension Pattern Manager-2-(Manager/Shared DB) cases are OR-decomposed into periodic export (see Figure 5). Note that Manager-2-Agent push mode and Manager-2-(Manager/Shared DB) pull mode are not mentioned for the reasons explained in Section III-A. C. Spatial Dimension Pattern We distinguish two levels of temporal granularity: the fine- grained level deals with an individual polling (exporting), Context. As mentioned earlier, in an autonomic system, whereas the coarse-grained level addresses a collective polling each manager is responsible for managing a set of managed (exporting). Based on this distinction, we identify six leaf resources. In many cases, the number of users consuming the goals OR-decomposing periodic poll (export), namely: Update autonomic system services may oscillate rapidly, or even be- Polling (Exporting) Period to update the frequency of a given come quite important in term of size. Thus, managed resources polling (exporting), Align Polling (Exporting) to launch a set are subject to be joined/withdrawn during the runtime. Patterns of synchronized parallel pollings (exported metrics) at the belonging to this dimension are useful to react in regard with same time, and Misalign Polling (Exporting) to launch pollings important changes concerning the scope of managed resources. (exported metrics) according to a given/adjustable offset. Pattern Refinement. Management of autonomic systems Regarding metrics calculation, we identify the case of is orchestrated by the collaboration of multiple autonomic modifying the temporal interval covered by the metric value. managers, each of which can act on its own perimeter, as However, the validity of a metric value that is not instantaneous well as the perimeters of its collaborating peers. Thus, the first (e.g., throughput) is equal to the temporal interval through refinement level uses case pattern to cover these two cases (see which that value was measured. Therefore, case pattern is used Figure 4). twice to cover all possible metric classes previously mentioned. We refine only the measurable, mathematical & aggregation In fact, acting on its own perimeter is OR-decomposed metrics, because time has a sense in their calculation, but not using Expand and Shrink Monitoring Perimeter leaf goals. the other metrics. Thus, at the fourth refinement level, we OR- Rather, acting on others perimeters is refined using case decompose measurable & mathematical metrics using Update pattern into deploying a new manager, or soliciting an existing Time Scope Interval. Rather, Update Time Series Interval OR- one. First, the case of deploying a new manager is refined using decomposes aggregation metrics. milestone pattern into launching manager (Launch Delegated Manager in Figure 4, as Subgoal 1 in Table I), and then, Examples. This pattern is suitable for the following cases: delegating perimeter (Delegation, as Subgoal 2). In turn, the (1) Controlling (e.g., relaxing, stressing) the monitoring load delegation goal is also refined though milestone pattern into on autonomic managers, network paths among autonomic joining delegated perimeter on the delegated manager (Expand managers and shared DBs, as well as remote agents. (2) Tuning Perimeter, as Subgoal 1), and then, deleting this perimeter from temporal parameters of metrics analysis. the delegating manager (Shrink Perimeter, as Subgoal 2). Notice that all previous patterns are subject to be updated and enriched, in order to integrate new monitoring adaptation In the second case, where acting is held on an existing actions. For instance, we can address temporal aspects of manager, the refinement is done twice, first using milestone alarms filtering by delaying delivery of redundant alarms [12], pattern, into delegating the whole perimeter to the delegated as well as alarms correlation by modifying the stream interval manager (Delegation, as Subgoal 1), and then shutting down time during which Complex Event Processing engines (e.g., the delegating one (Shutdown Delegating Manager, as Subgoal Esper & Drools) perform correlations. OR-decomposition into 2). Update Waiting Time & Update Window Time could be used for these two cases respectively. Examples. This pattern is suitable for the following cases: (1) Load balancing of monitoring among autonomic managers. IV. CASE-STUDY (2) Supporting scalability of the autonomic systems. (3) Min- imizing the overall monitoring charge in terms of dedicated Context. Our scenario takes place in a cloud data center monitoring entities. hosting a large number of virtual machines (VMs), and pro-

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 25

33 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

OR Decomposition Temporal Dimension all received metrics. Rather, we apply the temporal pattern to update the exporting period from 3-4 to 30-40 seconds (Up- date Exporting Period). This adaptation necessitates applying in Metrics in Exchange another one belonging to exchange pattern to stop the pollings Case Pattern Case Pattern that are launched during the first time-slot (Deactivate Polling). Mgr-2-Agt OR Base OR Mgr-2-Mgr OR Discrete OR Manager-2-Shared DB Mgr-2-Sh DB The second objective is refined using spatial pattern in Aggregation order to shutdown recently deployed managers, during the Manager-2-Manager Manager-2-Agent Base Discrete Aggregation first time-slot. Thus, an underloaded manager delegates its OR Decomposition whole perimeter to another one, and shutdowns itself (Ex- Case Pattern pand Perimeter, Shrink Perimeter & Shutdown Delegating Resource OR Periodic Export Periodic Poll Measurable OR Manager). During the second time-slot, autonomic managers Resource Mathematical OR Decomposition already deliver to clients around one-third of the metrics

Mathematical Measurable Align Exporting Align Polling pushed by agents, thus no adaption actions are to be taken in regard with minimizing monitoring resources. Misalign Exporting Misalign Polling OR Decomposition Autonomic managers would be able to adapt their mon- Update Time Update Time Update Exporting Update Polling itoring, if they recognize adaptation stimuli. Therefore, we Scope Interval Series Interval Period Period exploit guard pattern to apply adaptation actions (Adaptation in Figure 6, as Subgoal 2 in Table I) as response to specific Figure 5: Temporal Dimension Pattern stimulus (Guard, as Subgoal 1), while maintaining the current monitoring behavior unless adaptation takes place (Unless, as Subgoal 3). viding its clients with a continuous monitoring of the enforced V. RELATED WORK SLAs metrics. Each VM integrates an agent providing prede- termined metrics reflecting VM healthiness. In most large scale In this section, we try to align our approach of adapting systems, distributed agents periodically push their metrics; in monitoring using goal-oriented dimensional patterns with other our case, agents push those metrics every 10 seconds to specific existing trends focusing on monitoring of QoS in autonomic pre-configured autonomic managers. To facilitate the case- systems [13][14][15][16][17][18]. study, we assume that our studied SLA template encapsulates the same metrics pushed by agents. Besides, this SLA template In order to manage QoS in autonomic systems, the latter distinguishes two time-slots: metrics are to be refreshed at the applies adaptation actions. In many cases, for instance [13], client side with a freshness falling into the range of 3-6 seconds this adaptation doesn’t concern the monitoring system itself, during the first time-slot, and a range of 30-40 seconds for but precisely, is applied on the managed system services the second one. The SLAs metrics values are instrumented and infrastructure (i.e., reconfiguring resource allocation). Cer- and delivered automatically through polling and exporting, tainly, this adaptation will result in increased quality, but this respectively. Once a new SLA is enforced, the autonomic way, the knowledge of the management system won’t exceed a managers use pull mode to collect VMs metrics with the lowest ”maximum ceiling” and management will be limited in terms freshness value (3 seconds). of treating new situations. Monitoring more metrics or managed resources is ad- Objectives. Human administrators identify two high level dressed in [14][16][18] either to deal with the managed scope goals to be satisfied during the monitoring system runtime: changes, or to operate a ”minimal” monitoring that is able Respect Metrics Freshness makes sure that SLAs are monitored to be extended in case of SLA violations, or even to adapt appropriately, and Minimize Monitoring Cost aims at limiting monitoring to meet SLA modifications. Indeed, it is important the resources dedicated to monitoring as much as possible. to scale up/down monitored metrics and resources. But it isn’t Patterns. We can exploit several patterns to deal with clear whether this capability could be applied in other scenarios the first objective. During the first time-slot, we use the for other objectives, if so, how that could be feasible. temporal pattern to relax polling & exporting by updating Runtime deployment of monitoring resources (i.e., man- Update Polling & Exporting Period their periods ( in Figure agers, probes) is discussed in [14][15][17] either to integrate 6) with respect to the highest freshness range (6 seconds). If monitoring into the SLA management life-cycle of large scale delivering freshness violates the highest freshness, that would systems, or to replace failed managers, or even to monitor be a result of overloading manager [2], thus we apply the some metrics concerning particular paths or segments. But spatial pattern as a second alternative, and consequently, a here also, besides the undeniable gains of deploying moni- new autonomic manager will be deployed to assist the over- toring resources during runtime, we don’t see how the system Launch Delegated Manager, Expand Perimeter loaded one ( administrators can orchestrate the monitoring adaptation (i.e., & Shrink Perimeter ). As a third alternative, and in case that planning & executing) of the distributed monitoring among the overloaded autonomic manager monitors non-SLAs metrics several collaborating managers. (e.g., physical servers healthiness), the metric pattern could be applied to transfer them to other manager, in order to relax the Inspired from the autonomic computing reference archi- first one (Add & Remove Aspects). Since the second time-slot tecture proposed in [10], patterns regarding the distribution freshness (30-40 seconds) is greater than agents push period of the MAPE loop modules were proposed in [19][20]. Those (10 seconds), there is no need to poll metrics, nor to export patterns are useful in terms of design reuse as well as clarifying

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 26

34 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Respect Metric Freshness Minimize Monitoring Cost

Acting on Acting on Acting on Acting on Exchange & Acting on Spatial Dimension Temporal Dimension Spatial Dimension Metric Dimension Temporal Dimensions

Guard Guard Guard Guard Guard Guard Guard Guard Guard Pattern Pattern Pattern Pattern Pattern Guard Adaptation Unless Adaptation Unless Adaptation Unless Adaptation Unless Adaptation Unless

Milestone Pattern Milestone Pattern Mileston Pattern Milestone Pattern Milestone Pattern

Update Update Launch Update Shutdown Expand Shrink Add Remove Deactivate Polling Exporting Delegated Exporting Expand Shrink Delegating Perimeter Perimeter Aspects Aspects Polling Period Period Manager Period Perimeter Perimeter Manager

Figure 6: Respect Metrics Freshness & Minimize Monitoring Cost Refinement

the application contexts and benefits, but they target mainly [6] A. Toueir, J. Broisin, and M. Sibilla, “Toward configurable perfor- the deployment of the monitoring modules rather than the mance monitoring: Introduction to mathematical support for metric monitoring behavior itself. In addition, they don’t treat the representation and instrumentation of the cim metric model,” in 2011 5th International DMTF Academic Alliance Workshop on Systems and monitoring adaptation in regard with quality requirements. Virtualization Management (SVM), 2011, pp. 1–6. [7] A. Van Lamsweerde, Requirements Engineering: From System Goals to UML Models to Software Specifications. Wiley, 2009. VI.CONCLUSION &PERSPECTIVES [8] A. Van Lamsweerde, “Requirements engineering in the year 00: A research perspective,” in Proceedings of the 22Nd International Con- We proposed a goal-oriented approach for designing self- ference on Software Engineering, ser. ICSE ’00, 2000, pp. 5–19. managed monitoring in autonomic systems. This approach [9] R. Darimont and A. Van Lamsweerde, “Formal refinement patterns assists human administrators to adapt the monitoring system for goal-driven requirements elaboration,” in ACM SIGSOFT Software behavior regarding quality requirements. It means that mon- Engineering Notes, vol. 21, no. 6. ACM, 1996, pp. 179–190. itoring is configured starting from quality specification (e.g., [10] IBM Corp., “An architectural blueprint for autonomic computing,” IBM White Paper, June 2005. SLA), and reconfigured based on adaptation patterns, that are [11] A. Merkin, “Base metrics profile,” December 2009, document Number: exploited to achieve high level quality objectives. We designed DSP1053. four monitoring adaptation patterns according to dimensions [12] A. Clemm, Network Management Fundamentals. Cisco Press, 2006, that represent various aspects on which adaptation actions can ch. 5, pp. 138–141. apply to. Thus, each dimension represents a ”starting point” [13] G. Katsaros, G. Kousiouris, S. V. Gogouvitis, D. Kyriazis, A. Meny- reflection to elicit monitoring goals that are refined till reaching chtas, and T. Varvarigou, “A self-adaptive hierarchical monitoring leaf goals. mechanism for clouds,” Journal of Systems and Software, vol. 85, no. 5, 2012, pp. 1029–1041. About perspectives, we are currently investigating how [14] D. Roxburgh, D. Spaven, and C. Gallen, “Monitoring as an sla-oriented monitoring adaptations could influence the stability at the consumable service for saas assurance: A prototype,” in 2011 IFIP/IEEE International Symposium on Integrated Network Management (IM), autonomic system whole scale, in case of applying many over- 2011, pp. 925–939. lapped adaptation leaf goals over several autonomic managers. [15] P. Thongtra and F. Aagesen, “An adaptable capability monitoring In addition, the agent side adaptations need to be investigated, system,” in 2010 Sixth International Conference on Networking and and orchestrated with those applied at the autonomic manager Services (ICNS), 2010, pp. 73–80. side. [16] M. Munawar, T. Reidemeister, M. Jiang, A. George, and P. Ward, “Adaptive monitoring with dynamic differential tracing-based diagno- sis,” in Managing Large-Scale Service Deployment, ser. Lecture Notes in Computer Science, F. Turck, W. Kellerer, and G. Kormentzas, Eds. REFERENCES Springer Berlin Heidelberg, 2008, vol. 5273, pp. 162–175. [1] J. O. Kephart and D. M. Chess, “The vision of autonomic computing,” [17] J. Nobre, L. Granville, A. Clemm, and A. Prieto, “Decentralized Computer, vol. 36, no. 1, Jan. 2003, pp. 41–50. detection of sla violations using p2p technology,” in Proceedings of the 8th International Conference on Network and Service Management, [2] A. Toueir, J. Broisin, and M. Sibilla, “A goal-oriented approach for 2012, pp. 100–107. adaptive sla monitoring: a cloud provider case study,” in LATINCLOUD [18] P. Grefen, K. Aberer, H. Ludwig, and Y. Hoffner, “Crossflow: Cross- 2013, Maceio,´ Brazil, December 2013. organizational workflow management in dynamic virtual enterprises,” [3] A. Moui, T. Desprats, E. Lavinal, and M. Sibilla, “A cim-based International Journal of Computer Systems Science & Engineering, framework to manage monitoring adaptability,” in Network and ser- vol. 15, 2000, pp. 277–290. vice management (cnsm), 2012 8th international conference and 2012 [19] D. Weyns, B. Schmerl, V. Grassi, S. Malek, R. Mirandola, C. Prehofer, workshop on systems virtualiztion management (svm), 2012, pp. 261– J. Wuttke, J. Andersson, H. Giese, and K. Goschka,¨ “On patterns for 265. decentralized control in self-adaptive systems,” in Software Engineering [4] A. Moui, T. Desprats, E. Lavinal, and M. Sibilla, “Information models for Self-Adaptive Systems II, R. Lemos, H. Giese, H. Muller,¨ and for managing monitoring adaptation enforcement,” in International M. Shaw, Eds. Springer Berlin Heidelberg, 2013, vol. 7475, pp. 76– Conference on Adaptive and Self-adaptive Systems and Applications 107. (ADAPTIVE), Nice, 22/07/2012-27/07/2012, 2012, pp. 44–50. [20] A. J. Ramirez and B. H. C. Cheng, “Design patterns for developing [5] A. Moui, T. Desprats, E. Lavinal, and M. Sibilla, “Managing polling dynamically adaptive systems,” in Proceedings of the 2010 ICSE adaptability in a cim/wbem infrastructure,” in 2010 4th International Workshop on Software Engineering for Adaptive and Self-Managing DMTF Academic Alliance Workshop on Systems and Virtualization Systems, ser. SEAMS ’10, 2010, pp. 49–58. Management (SVM), 2010, pp. 1–6.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 27

35 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Dynamic Incremental Fuzzy C-Means Clustering

Bryant Aaron, Dan E. Tamir Naphtali D. Rishe and Abraham Kandel Department of Computer Science, School of Computer Science Texas State University, Florida International University San Marcos, Texas, USA Miami, Florida, USA {ba1127, dt19}@txstate.edu [email protected], [email protected]

Abstract-Researchers have observed that multistage clustering of “visiting” every data element for a limited number of can accelerate convergence and improve clustering quality. iterations in a specific order as opposed to the traditional Two-stage and two-phase fuzzy C-means (FCM) algorithms approach which considers every piece of data in each have been reported. In this paper, we demonstrate that the iteration. FCM clustering algorithm can be improved by the use of static A related approach is the multi-resolution or multistage and dynamic single-pass incremental FCM procedures. clustering. Researchers observed that a multistage-based training procedure can accelerate the convergence and Keywords-Clustering; Fuzzy C-Means Clustering; improve the quality of the training as well as the quality of Incremental Clustering; Dynamic Clustering; Data-mining. the classification/decision phases of many of the clustering I. INTRODUCTION algorithms [5][12][13]. For example, our previous research reports show that the pyramid K-means clustering The FCM algorithm provides a soft (fuzzy) assignment algorithm, the pyramid FCM, and multi-resolution KNN of patterns to clusters [1]. The assignment is represented by yield two-to-four times convergence speedup [13]. Both the a partition matrix. The algorithm starts with either seeding multistage clustering and the incremental clustering apply the FCM with an initial partition matrix or through initial an approach of sampling the data. In the multistage cluster centers and attempts to improve the partition matrix clustering, data is sampled with replacement, whereas in according to a given quality criterion. incremental clustering, due to the cost of replacement, the Traditionally, in each of the FCM iterations, the data is sampled without replacement. In both cases the algorithm is applied to the entire data set represented as a validity of the sampling has to be addressed. vector residing in the processor memory and representing a This paper describes a new approach for incremental multi-dimensional set of measurements. Recently, however, FCM clustering. In difference from the pyramid FCM the FCM algorithm, as well as numerous other clustering approach, the sampling is done without replacement. and data-mining algorithms, such as K-means, ISODATA, Furthermore, the sampling size is fixed. On the other hand, Kohonen neural networks (KNN), expectation two measures are applied to the data in order to overcome maximization, and simulated annealing [1]-[6], have been the fact that each data block is processed only one time. exposed to a relatively new challenge referred to as “big- First, the algorithm starts with a relatively large number of data.” Often, the enormous amount of data available online clusters and scales the number down in the last stage. This is cannot fit processors’ physical memory. In fact, often the referred to as a two-phase procedure. Hence, in the data does not even fit secondary memory. Given that intermediate stages (first phase) each block might affect input/output operations are generally the most taxing different cluster centers. A second and innovative version of computer operations, working on the entire data set in every the algorithm enables a dynamic number of clusters. Again, FCM iteration requires numerous consecutive reads of the algorithm starts with a relatively large number of massive amounts of data. A scenario that might challenge clusters; however, each transaction on a block might change the traditional approach to FCM clustering occurs when (increase or decrease) the number of clusters. In both cases portions of the data are generated or become available the algorithms work on “chunks” of data referred to as dynamically and it is not practical to wait for the entire data blocks. This paper presents several experiments with multi- set to be available. pass and static/dynamic single-pass versions of the FCM This brings the need for incremental clustering into the and empirically evaluates the validity of the static and forefront. Incremental clustering is also referred to as a dynamic incremental clustering approach. single-pass clustering, whereas the traditional clustering is The main contributions of this paper are: 1) a new referred to as multi-pass clustering [7]-[11]. The idea is to approach, described in Section III.F, for incremental cluster a manageable portion of the data (a data block) and clustering, where the number of clusters is relatively high maintain results for the next manageable block until and is followed by clustering the resultant centers, is exhausting the data. Under this approach, each block is presented — this approach, increases the validity of processed by the algorithm a limited number of times, incremental clustering, and 2) a second approach, described potentially only once. Ideally, a block, along with the in Section III.G, where initially the number of clusters is preliminary number of initial centers selected should be as relatively high and the number of clusters dynamically large as possible, occupying as much of the available changes throughout execution, provides better incremental internal processor memory. One might question the validity

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 28

36 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

clustering quality/validity, and can be used to resolve the the clustering algorithm out of local minima [4][25][27]. issue of identifying the right number of cluster centers. These schemes, however, require long convergence time, A literature review performed shows numerous papers especially for large clustering problems. FCM and fuzzy on incremental clustering. To the best of our knowledge ISODATA generalize the crisp K-means and ISODATA. there are no reports on research that applies the operations The FCM clustering algorithm is of special interest since it listed in this paper to the FCM algorithm. is more likely to converge to a global optimum than many The rest of the paper is organized in the following way. other clustering algorithms, including K-means. This is due Section II reviews related research. Section III provides to the fact that the cluster assignment is “soft” [28][29]. On details of several single-pass and multi-pass variants of the other hand, the FCM attempt to “skip” local optima may FCM clustering and lists metrics used to assess the quality bear the price of numerous soft iterations and can cause an of clustering. Section IV describes a set of experiments increase in computation time. FCM is used in many conducted to assess the performance and validity of the applications of pattern recognition, clustering, classification, incremental clustering algorithms described in Section III, compression, and quantization including signal and image and Section V concludes with findings and proposes further processing applications such as speech coding, speech research. recognition, edge detection, image segmentation, and color- map generation [28]-[35]. Thus, improving the convergence II. REVIEW OF RELATED RESEARCH time of the FCM is of special importance. Clustering is a widely-used data classification method Multistage processing is a well-known procedure used applied in numerous research fields, including image for reducing the computational time of several applications, segmentation, vector quantization (VQ), data mining, and specifically, image processing procedures. This method uses data compression [14]-[20]. K-means is one of the most a sequence of reduced resolution versions of the data to commonly used clustering algorithms; a variant of the K- execute an image processing task. Results of execution at a means algorithm — the Linde, Buzo, and Gray (LBG) VQ low resolution stage are used to initialize the next, higher algorithm with unknown probability distribution of the resolution stage. For example, Coleman proposes an sources — is utilized in many applications [1][16]. The LBG algorithm for image segmentation using K-means clustering algorithm has been intensively researched. Some of these [14]. Hsiao has applied Coleman’s technique for texture

research results relevant to K-means and FCM are reviewed segmentation [36]. He used a -sample of the image to

next. identify . Huang and Zhu have applied the Coleman Lloyd proposes an iterative optimization method for algorithm to DCT based segmentation and color separation quantizer design, it assumes that the distribution of the data respectively [37][38]. Like Hsiao, they used -of the is unknown and attempts to identify the optimal quantizer [21]. This approach is equivalent to 1-means (that is, K- image-pixels to set up the parameters of the final clustering means with ). While Lloyd’s method yields optimal algorithm, where the final clustering is performed on the minimum mean square error (MMSE) for the design of one entire image. They found that the final cluster-centers dimensional quantizer, its extension to multi-dimensional obtained in the training-stage are very close to the final data quantizer (i.e., VQ) with unknown distributions is not cluster centers obtained from clustering the entire image. guaranteed to yield optimal results [21]. Consequently, K- This lends itself to a two-stage K-means procedure that uses means with is not guaranteed to reach a global one low resolution sample to initiate the parameters of the optimum. actual clustering. Pyramid processing is a generalization of The LBG method for VQ with unknown underlying the two-stage approach where the resolution of samples is distribution generalizes Lloyd’s iterative method and sets a growing exponentially and each execution stage doubles the VQ design procedure that is based on K-means [16]. The number of samples. LBG VQ procedure is currently the most commonly Additional applications of multistage architectures are used/researched VQ approach. Garey has shown that the reported in the literature [27][31][39]. Rosenfeld surveys the LBG VQ converges in a finite number of iterations, yet it is area and proposes methods for producing the multistage NP-complete [22]. Thus, finding the global minimum snapshots of an image [40]. Kasif shows that multistage solution or proving that a given solution is optimal is an linking is a special case of ISODATA [41] and Tilton uses intractable problem. Another problem with K-means is that multistage for clustering remote sensing data [42]. Tamir the number of clusters ( ) is fixed and has to be set in introduces a pyramid multistage method to non-supervised advance of executing the algorithm. ISODATA is a training in the context of K-means and neural networks. He generalization of K-means which allows splitting, merging, has shown that the pyramid approach significantly and eliminating clusters dynamically [23][24]. This might accelerates the convergence of these procedures [12][13]. lead to better clustering (better local optimum) and Several papers deal with accelerating the convergence of eliminate the need to set in advance. ISODATA, however, FCM [39][45][46]. Altman has implemented a two-stage is computationally expensive and is not guaranteed to FCM algorithm [47]. The first stage operates on a random converge [2]. sample of the data and the second stage uses the cluster Several clustering algorithms and combinatorial centers obtained in the first stage to cluster the entire set. optimization techniques, such as genetic algorithms and Cheng improves the method proposed by Altman and simulated annealing, have been devised in order to enforce has investigated a two-phase approach [31]. The first phase

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 29

37 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

implements a linear multistage algorithm which operates on III. FUZZY C-MEANS CLUSTERING VARIANTS small random slices of the data. Each slice contains of In this section, we present several variants of FCM the data. The algorithm finds the cluster centers of the first clustering. slice (say ), then uses these centers as initial centers for clustering a sample that contains the first slice and an A. The Classical Fuzzy C-Means Clustering Algorithm additional slice ( ) obtained through random sampling. The FCM algorithm is a generalization of the crisp K- After running the multistage phase for stages, the final means clustering. Actually, the generalization is quite centers for the combination of slices { which intuitive. In the K-means algorithm, set membership is contain of the entire data are obtained. Next, in the crisp. Hence, each pattern belongs to exactly one cluster. In second phase, these centers are used to cluster the entire the FCM, set membership is fuzzy and each pattern belongs data. to each cluster with some degree of membership. The Other approaches for improving the convergence rate of following formalizes this notation.

clustering include data reduction techniques and data Let , where , be a set of , sampling using hypothesis testing [48][49]. -dimensional vectors representing the data to be clustered A related research effort deals with clustering of very into clusters with cluster centers large data sets that are too big to fit into the available . Under the FCM, each element memory. One approach to this problem is to use incremental belongs to every cluster with some degree of algorithms [10][11][27]. Several of these algorithms load a membership . Hence, the matrix [ ], referred to as slice of the data, where the size of a slice is constrained by the available memory, and cluster this slice [7][9]. Results the partition matrix, represents the fuzzy cluster assignment of clustering current slices (e.g., centers, partition matrices, of each vector to each cluster . The goal of FCM is to dispersion, etc.) are used in the process of clustering identify a partition matrix , such that optimizes a given upcoming slices. Hore has proposed a slice based single- objective function. A commonly used FCM objective pass FCM algorithm for large data sets [39]. The proposed function is defined to be: method lumps data that has been clustered in previous slices

into a set of weighted points and uses the weighted points ∑ ∑ ‖ ‖ (1) along with fresh slices to commence with the clustering of the entire set in one path [39]. Another approach for where is the weighting exponent. In this research, is clustering large data sets is to sample, rather than slice, the set to the most commonly used and proposed value of 2 data [49]. [28]. Instead, in this report, the incremental approach we use The most common measures for FCM clustering quality has two phases. In the first phase, a very large set of clusters are: 1) the value of the objective function, 2) the partition is used. Practically, we are trying to fit as many elements in coefficients, 3) the classification entropy, 4) measures of a block and as many clusters per block as possible in the deviation of the partition matrix from a matrix obtained with memory. In the second phase, after processing all the uniformly distributed data, and 5) measures of induced blocks, a process of clustering the centers obtained from the fuzziness [24][28][44]. It should be noted that some of the last block is applied. quality criteria are derived from distortion measures. Hence, It is interesting to note that K-means, FCM, Neural in this case, the goal is to minimize distortion, and high Networks (e.g., KNN), and many other iterative quality means low distortion. In other words, the quality can optimization algorithms have two main modes of operation, be considered as the inverse of distortion. Measures 1 the batch mode and the parallel-update mode. For example, through 5 assume that the end result of the clustering is soft. in the batch mode execution of FCM, each iteration Nevertheless, in many cases, it is desirable to obtain “hard considers every pattern individually and the centers are clustering” assignment to be used for VQ, image updated with respect to every pattern considered. The segmentation, or other classification applications. In these parallel-update mode, which is less computationally cases, two additional quality criteria can be considered: 6) expensive and is the predominantly used mode in most the rate distortion function, and 7) the dispersion matrix current applications, assigns all the patterns to the relevant [2][44]. Of all these measures, 1, 6, and 7 are most clusters and then updates the centers. In this context, the commonly used. Specifically, for metric 1, the functional slice approach which is used for large data sets can be can be interpreted as a generalized distortion measure, considered as a hybrid of batch and parallel update. which is the weighted sum of the squared distances from all This brings the issue of parallel processing of clustering the points in the cluster domain to their assigned cluster algorithms. Several ways to partition and distribute the center. The weights are the fuzzy membership values clustering task have been considered [19][39][42][50]-[52]. [28][29]. Hence, this metric is proportional to the inverse of One possible way is to assign a set of samples or a slice of the quality of FCM. Lower distortion denotes higher quality. data to each processor and eventually merge the cluster Metrics 6 and 7 are further elaborated in the next section. centers obtained from each processor into one set of centers. In general, the rate distortion function is used when the We plan to address this problem as a future research subject. FCM is utilized for quantization. In this case, after convergence, the matrix [ ] is defuzzified; e.g., by

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 30

38 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

using a nearest neighbor assignment. The compression rate clusters , with cluster centers of FCM is fixed by the selection of . Hence, the rate . In iteration the algorithm uses the distortion quality-measure boils down to the MMSE, given cluster centers induced by the by: partition matrix to re-partition the data set and obtain a new partition matrix . Cluster centers at iteration are ∑ ∑ ‖ ‖ (2) computed according to:

Again, lower distortion denotes higher quality. When the (∑( ) ) (∑( ) ) (4) clustering is used for classification, a quality criteria that measure the density of cluster as well as the relative

distance between clusters can be used to estimate the The matrix [ ] is calculated according to:

recognition accuracy. In this case, a dispersion measure can be used. To elaborate, let be the set of ‖ ‖ (5) clusters obtained through “hard clustering,” and let [ ] ∑ ( ) ‖ ‖ be the set of the corresponding cluster The process of center induction, data partition, and centers, then , the within dispersion matrix of the cluster , is defined to be the covariance matrix of the set of matrix update continues until a given termination condition, which relates to an optimization criteria or limit on the elements that belong to . The within dispersion matrix of , ) is a given function of the entire set of the within number of iterations, is met. The following is a commonly used criterion [16]: dispersion matrices of the individual clusters. For example, the elements of can be the averages of the compatible | | (6) elements of for . The between dispersion

matrix of , , is the covariance matrix of . The quality of the clustering can be expressed as a function of the within Fig. 1 is a pseudo code of the algorithm. dispersion matrix and the between dispersion matrix . A commonly used dispersion function is [44]: 1. Parameters: 2. Set , choose an (3) a. , initial partition matrix where is the trace of the matrix . ( - a set of B. The Fuzzy C-Means Algorithm vectors 3. In iteration let The FCM consists of two main phases: setting/updating b. - the number of the membership of vectors in clusters and setting/updating vectors

cluster centers. Some variants of FCM start with a set of c. - the number of be the induced centers which induces a partition matrix [28][29]. In this partitions clustering centers case, seeding the algorithm relates to the initial selection of d. - a weighting computed by equation centers. Other variants initialize a partition matrix which exponent ( 4. induces initial centers [24]. Hence, seeding these FCM e. [ ] - the variants amounts to initializing the partition matrix. The two a. Set [ ] approaches are virtually equivalent that choosing one over partition matrix at according to equation the other is just a matter of convenience related to the iteration 5.

format of data and the form of the application. We are using f. b. Compute the second approach, where the seeding relates to selecting according to equation the initial partition matrix. Hence, in the seeding step, the - the set of clustering 1. membership matrix is initialized. In the next iterations, the centers at iteration c. Set . cluster centers are calculated and the partition matrix is g. - the maximum 4. Stop if ; or updated. Finally, the value of the objective function for the number of iterations if , and equation current classification is calculated. The algorithm terminates h. - the objective- 6 holds for a small when a limit on the number of iterations is reached or a “short circuit condition” is met. A commonly used function’s value at such as . termination condition halts the algorithm when the iteration Otherwise, go to (2). derivative of the distortion function is small. Because the c- Figure 1. Algorithm for baseline FCM means algorithm is sensitive to the seeding method, a variety of procedures have been proposed for selecting seed The idea behind the multistage methods reported in the points [26][52]. The following paragraphs include a formal next section is that an estimate of the partition matrix and definition of the algorithm and presents a pseudo-code. the location of the cluster centers can be obtained by Given a set of vectors , where clustering a sample of the data. There is a trade-off that

and an initial partition matrix , the FCM is an relates to the sample size. A small sample is expected to iterative algorithm for partitioning a set of vectors into produce a fast yet less reliable estimation of the cluster

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 31

39 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

centers. This leads to a multistage approach, which involves block is going through a set of epochs of FCM where the several stages of sampling (with replacement) of the data final centers of block are used as the initial centers for and estimating the membership matrix for the next stage. block . The size of the first sample should be as small as possible. The fact that each block is “touched” just one time might On the other hand, it should be statistically significant [44]. raise a question about the validity of the results. The results Each of the stages includes more objects from the data and might be valid if the data elements of blocks share similar sets the initial partition matrix of stage according to the features. For example, the data elements are drawn from the final partition matrix of stage . same probability distribution function or the same fuzzy membership function. Alternatively, validity might be C. The LBG Termination Criterion attained if the features of data elements vary “slowly” The main difference between the LBG variant of the between blocks. We use two methods to improve the FCM algorithm and the classical FCM algorithm is the validity of results. First, we use a two-phase incremental termination condition. The LBG algorithm stops when an algorithm. In the first phase, a very large set of clusters is approximation for the derivative of the MSE given by used. Practically, we are trying to fit as many elements in a D(m1)  D(m) block and as many clusters per block as possible in the is smaller than a threshold, D(m1) memory. In the next phase, after processing all the blocks, a (m1) (m) process of clustering the centers obtained from the last block D  D (  ) . is applied. The second measure for increasing validity is D(m1) using a relatively large number of clusters and at the same D. The Sequential Fuzzy C-Means Algorithm time allowing the number of clusters to change dynamically. This is described in the next section. FCM can be executed in one of two basic modes; batch mode and online mode. The batch mode updates the G. The Dynamic Incremental Fuzzy C-Means Algorithm partition matrix after considering the entire set of data. The The dynamic incremental FCM algorithm presented in online mode is also referred to as the sequential mode. In an this paper is similar to the incremental FCM algorithm. The epoch l of a sequential mode, for each pattern, the partition difference is that the number of clusters is allowed to matrix and cluster centers are found. The cluster change. centers are used to find the distortion value and Several operations can change the number of clusters. weighted distortion value. The partition matrix are used First, following the ISODATA algorithm principles, clusters in the next epoch of the sequential mode. with too few elements might be eliminated, clusters that are too close to each other might be merged, and clusters with E. The Block Sequential Fuzzy C-Means Algorithm large dispersion might be split [13][25][27]. The criteria for The block sequential mode is a compromise between the merge and split might be related to the within and between stringent computational requirements of the sequential FCM dispersion of the clusters [1]-[3][43]. Other methods for and the need to operate on data online. In this case, the changing the number of clusters might include clustering occurs on accumulated blocks of data. Each block incrementing/decrementing the number of clusters (without is going through epochs of FCM where the final centers of split/merge) based on a criterion such as a threshold on the block are used as the initial centers for block . In distortion. We have implemented the threshold approach. many cases . In this sense, the algorithm resembles Each block is going through a set of epochs of FCM other multistage clustering such as the pyramid FCM. The where the final centers of block are used as the initial block sequential algorithm might be utilized in an iterative centers for block . Following the application of FCM fashion, where each of the iterations performs epochs of on a block, a decision concerning the effective number of FCM on a single block of data elements at a time. clusters is made and the number might be incremented or Note that all the clustering algorithms described so far decremented based on a predetermined quality criteria assume that (at some point) the entire data set is available. threshold. We place an upper bound and a lower bound on Moreover, generally, due to the iterative fashion of the number of clusters, where the lower bound ensures that execution, these algorithms access the same elements more we still have enough clusters to maintain validity and enable than once (in different iterations). When the data is very the two-phase approach described above. Again, we are large and cannot fit the memory of the processor, a different trying to fit as many elements in a block and a large number approach, referred to as single-pass, has to be adapted. of clusters per block in the memory and apply a two-phase Under the single-pass (incremental) approach, each approach where the centers from the last iteration are block/data-element is accessed only one time and then clustered and provide the final set of clusters. removed from internal memory to provide space for new elements. This is described next. IV. EXPERIMENTS AND RESULTS F. The Incremental Fuzzy C-Means Algorithm A. Experimental Setup The incremental FCM algorithm presented in this paper In order to compare and contrast the performance and is similar to the block sequential algorithm with the validity of the FCM variants presented, we have exception that each block is accessed only one time. Each implemented these algorithms numerous times on different

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 32

40 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

data sets, using different parameters. Three sets of data of reconstruct the original three-dimensional data set by data are used for the experiments performed. The first set replacing each cluster-number by the centroid associated (1) consists of the red, green, and blue (RGB) components with the cluster. In the case of 8-bit per color component of relatively small ( pixels) color images. These and , the original 24-bit per pixel image is images e.g., Lena and Baboon are used by many other represented by a 4-bit per pixel image, along with a small researchers and the results of algorithms that use these color map. Hence, about six-fold compression is achieved. images are published in numerous papers and books [2][6]. In this set of experiments, a block processed by the single- The images of the first set are subject to color quantization. pass algorithm consists of an image row. The sequential The second set (2) is composed of the RGB components of algorithm is applied to every pixel of a scaled down version relatively large aerial photography images ( pixels of the images, while the rest of the multi-pass algorithms or 81 million pixels) color this set is used for more operate on the entire set of the pixels of the original images. aggressive color quantization. The third set (3) includes 20 The experimental results are scaled to represent the million elements of six-dimensional synthetic data points distortion for the entire image. The static incremental with known centers and known distribution. Hence, data algorithm starts with 192 clusters per block. Following the sets (2) and (3) are relatively large. It should be noted that processing of the last block, the 192 centers are clustered the pixel images are the largest images that fit the into 16 centers and the distortion for the entire image with memory of our current hardware/software configuration. these centers is measured. The dynamic incremental This is important since we use the entire image for running algorithm starts with 192 clusters per block and allows non-incremental clustering in order to assess the results of fluctuations in this number. Following the processing of the the incremental clustering and this is the maximal size that last block, the centers are clustered into 16 centers and the can be used for non-incremental clustering. Three types of distortion for the entire image with these centers is output data/results are collected: 1) records of convergences measured. (distortion per iteration), 2) execution time, and 3) 2) Synthetic Data clustering quality (inverse of distortion at the final iteration). A set of random cluster centers with a total of The experiments are divided into two classes; multi-pass six-dimensional vectors is generated. The and single-pass. In the multi-pass experiments, we vectors within a cluster are distributed according to a 2-D compared the performance of classical FCM, LBG based normal distribution with standard deviation of 0.05 around FCM, sequential FCM, and block sequential FCM. Given the center. For the single-pass experiments, the data is the constraints of these algorithms, they have been applied divided into 500 blocks of 40,000 elements per block. Other to a manageable data set (i.e., a data set with a medium parameters are identical to the ones used for the color map number of elements). In the single-pass experiments, we quantization experiments. tested the incremental and the dynamic incremental approaches with the same data used for the multi-pass B. Experimental Results algorithms and with a “huge” set of synthetic data that is not Fig. 2 shows the distortion per iteration of the multi-pass suitable for multi-pass processing. Nevertheless, to verify algorithms executed on the image Lena, which is a the results with the large data set, we ran the LBG variant of RGB image. Fig. 2a shows the distortion results of the the FCM algorithm on that data using a “powerful” LBG variant the final distortion is 23.6 db. The block multicore computer. The computer worked on the data for sequential algorithm distortion per iteration is presented in several hours. For the dynamic incremental algorithm, we Fig. 2b. The distortion results converge to the value of 23.8 used an approach where the number of clusters is db. This is equivalent to the results of running the other incremented/decremented by 3 based on a threshold on single pass algorithms for several iterations. Visually, all the distortion and the number of clusters during the current versions of clustering executed in this research produced epoch. about the same reconstructed image. 1) Color Quantization Fig. 3a and Fig. 3b show the results of running the The problem of color quantization can be stated in the single-pass incremental version on the image Lena with following way: given an image with N different colors, single row per block. Both the static and dynamic choose colors such that the resulting K-color image incremental algorithms converge to a distortion value of is the least distorted version of the original image [1]. Color about 25 db. This is slightly higher than the multi pass quantization can be implemented by applying the FCM algorithms. But, it is expected as the single pass algorithm clustering procedure to the image-pixels where each pixel “visits” every block only one time. The dynamic represents a vector in some color representation system. For incremental algorithm has slightly lower distortion than the example, the clustering can be performed on the three- incremental algorithm. We have repeated these experiments dimensional vectors formed by the RGB color components numerous times, with different seed values using nine of each pixel in the image [1]. After clustering, each three- different images and obtained similar results. dimensional vector (pixel) is represented by the cluster- Fig. 4a and Fig. 4b show the results of running the number to which the vector belongs and the cluster centers single-pass incremental version on the image are stored in a color-map. The K-value image, along with “Neighborhood” with single row per block. Both the static the color-map, is a compressed representation of the N- This is a much larger image, and it is subject to more colors original image. The compressed image can be used to aggressive quantization which ends up in a monochromatic

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 33

41 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

image. In this case, the distortion obtained through the LBG first phase uses a large number of centers and the second variant on the entire image is 20.4 db. The incremental phase clusters the centers obtained in the first phase into a algorithm and the dynamic incremental algorithms produce desired size of clusters. Moreover, the dynamic clustering a result of 23.2 db. and 24 db. Respectively. The approach allows the number of centers in the first phase to degradation is justified by the fact that the image vary and, in the case of very large data sets, outperforms the is the largest image that fits our software/hardware static incremental approach. configuration. Hence, a larger image cannot undergo a multi pass procedure. VI. CONCLUSION AND FUTURE WORK Fig. 5 shows the original and reconstructed versions of This paper has reviewed static and dynamic single-pass “Neighborhood.” Figs. 6 and 7 show the same experiments and multi-pass variants of the FCM. A novel two-phase with the image “Park.” The results are similar with a static single-pass algorithm as well as a dynamic two-phase distortion of 17.5 db., 19.5 db., and 20 db., for the LBG, single-pass algorithm have been presented and are showing incremental and dynamic incremental variants respectively. high utility. Future research will concentrate on additional Fig. 8a and Fig. 8b show the results of incremental methods for dynamic change in the number of clusters in clustering and dynamic incremental clustering with a large both steps of dynamic incremental FCM. In addition, we amount of synthetic data points (20,000,000 points) in a six- plan to initiate research on equivalent approaches in the dimensional space). Fig. 8a shows the results of running the KNN. We also plan to investigate parallel incremental static incremental FCM on the synthetic data. Due to the algorithms. Finally, we have recently received access to an fact that the data is drawn from a fixed distribution, the aerial photography data set where the resolution of each results of distortion per block are quite stable and the image is pixels (1 tera pixels) and we plan to execution ends up at a distortion value of 0.38. The explore the algorithms with this data set. Additionally, we execution of the dynamic version of incremental FCM is have access to a vector data set related to the aerial depicted in Fig. 8b. The distortion per block is better than photography. This set contains 170 million records of results obtained for the static case and stabilizes at 0.11. varying size with an average size of 40 entries per record. Moreover, this set contains a mixture of categorical and V. RESULT EVALUATION numerical data. We plan to use this set as well for further The results of the experiments reported and additional exploration of our algorithms. experiments performed show the utility of using a two- phase single-pass incremental FCM algorithm, where the

a

Figure 2. Distortion per iteration for the multi-pass algorithms with the Image Lena

a b

Figure 3. Incremental and Dynamic Incremental Clustering of the Image Lena

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 34

42 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

b a Figure 4. Incremental and Dynamic Incremental Clustering of the Image “Neighborhood”

Figure 5. Original and Reconstructed Versions of the Image “Neighborhood”

Figure 6. Incremental and Dynamic Incremental Clustering of the Image “Park”

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 35

43 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Figure 7. Original and Reconstructed Versions of the Image “Park”

a b

Figure 8. Incremental and Dynamic Incremental Clustering of Synthetic Data

ACKNOWLEDGMENT [7] P. S. Bradley, U. M. Fayyad, and C. A. Reina, ”Scaling Clustering Algorithms to Large Databases,” in Proceedings of This material is based in part upon work supported by the the Fourth International Conference on Knowledge Discovery National Science Foundation under Grant Nos. I/UCRC IIP- and Data Mining (KDD98), R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro (eds), AAAI Press, Menlo Park, CA, 1998, 1338922, AIR IIP-1237818, SBIR IIP-1330943, III-Large IIS- pp. 9-15. 1213026, MRI CNS-0821345, MRI CNS-1126619, CREST [8] M. Charikar, C. Chekuri, T. Feder, and R. Motwani, HRD-0833093, I/UCRC IIP-0829576, MRI CNS-0959985, “Incremental clustering and dynamic information retrieval,” and FRP IIP-1230661. in STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, ACM, New York, NY, REFERENCES USA, May 1997, pp. 626-635. [1] D. E. Tamir and A. Kandel, “The Pyramid Fuzzy C-means [9] F. Farnstrom, J. Lewis, and C. Elkan, “Scalability for clustering Algorithm,” International Journal of Computational Intelligence algorithms revisited,” in ACM SIGKDD Explorations in Control, vol. 2 iss: 2, Dec. 2010, pp. 270-302. Newsletter, ACM, New York, NY, USA, vol. 2 iss: 1, June 2000, pp. 51-57. [2] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. 5th Berkeley Symposium on [10] C. Gupta and R. Grossman, “GenIc: A Single Pass Generalized Mathematical Statistics and Probablity, vol. 1, University of Incremental Algorithm for Clustering,” in the Fourth (SIAM) California Press, 1967, pp. 281-297. International Conference on Data Mining (SDM04), 2004, pp. 147-153. [3] J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles. London: Addison-Wesley, 1974. [11] J. Lin, M. Vlachos, E. Keogh, “Iterative Incremental Clustering of Time Series,” Advances in Database Technology - EDBT [4] P. Berkhin, ”Survey of Clustering Data Mining Techniques,” 2004, vol. 2992, March 2004, pp. 106-122. Technical Report, Accrue Software, San Jose, CA, 2002. [12] D. E. Tamir, C. Park, and W. Yoo, “The validity of pyramid K- [5] A. A. El-Gamal, “Using Simulated Annealing to Design Good means clustering,” in Proc. SPIE 6700 Mathematics of Codes,” IEEE Transactions on Information Theory, vol. 33 iss: Data/Image Pattern Recognition, Compression, Coding, and 1, Jan. 1987, pp. 116-123. Encryption X, with Applications, 67000D, Sept. 2007, pp. 658- [6] T. Kohonen, (1991) Self-organizing maps: optimization 675, doi:10.1117/12.735436. approaches. In T. Kohonen, K. Mäkisara, O. Simula, and J. [13] D. E. Tamir, Cluster Validity of Multi Resolution Competitive Kangas, (eds), Artificial Neural Networks. Proceedings of Neural Networks. International Conference on Artificial ICANN'91, International Conference on Artificial Neural Intelligence, 2007, pp. 303-312. Networks, vol. II, 1991, pp. 981-990, North-Holland, Amsterdam.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 36

44 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

[14] G. B. Coleman and H. C. Andrews, “Image segmentation by patterns,” Pattern Recognition, vol. 40 iss: 11, Nov. 2007, pp. clustering,” in Proceedings of the IEEE, vol. 67 iss: 5, May 3005-3011. 1979, pp. 773-785. [35] R. R. Yager, S. Ovchinnikov, R. M. Tong, and H. T. Nguyen, [15] W. H. Equitz, “A new vector quantization clustering algorithm,” Fuzzy Sets and Applications: Selected Papers by L. A. Zadeh. in Acoustics, Speech and Signal Processing, IEEE Transactions, New York: Wiley, 1987. vol. 37 iss: 10, Oct. 1989, pp. 1568-1575. [36] J. Y. Hsiao and A. Sawchuk, “Unsupervised textured image [16] Y. Linde, A. Buzo, and R. Gray, “An Algorithm for Vector segmentation using feature smoothing probabilistic relaxation Quantizer Design,” in IEEE Transactions on Communications, techniques,” Computer Vision, Graphics, and Image Processing, vol. 28 iss: 1, Jan. 1980, pp. 84-95. vol. 48 iss: 1, Oct. 1989, pp. 1-21. [17] H. V. Lung and J.-M. Kim, “A generalized spatial fuzzy C- [37] J. F. Huang, “Image Segmentation Using Discrete Cosine means algorithm for medical image segmentation,” in Transform and K-Means Clustering Algorithm,” in Computer Proceedings of the 18th International Conference on Fuzzy Science. 1991, Florida Institute of Technology: Melbourne, Systems, Jeju Island, Korea, Aug. 2009, pp. 409-414. ISSN: Florida. 1098-7584, doi:10.1109/FUZZY.2009.5276878. [38] Y. Zhu, “Image Segmentation for Color Separation,” in [18] S. Das and A. Konar, “Automatic image pixel clustering with an Computer Science. 1991, Florida Institute of Technology: improved differential evolution,” Applied Soft Computing, vol. Melbourne, Florida. 9 iss: 1, Jan. 2009, pp. 226-236, doi:10.1016/j.asoc.2007.12.008. [39] P. Hore, L. W. Hall, and D. B. Goldgof, “Speedup of Fuzzy [19] X. Xu, J. Jager, and H.-P. Kriegel, “A Fast Parallel Clustering Clustering Through Stream Processing on Graphics Processing Algorithm for Large Spatial Databases,” Data Mining and Units,” in IEEE International Conference on Fuzzy Systems, Knowledge Discovery, vol. 3 iss: 3, Sept. 1999, pp. 263-290. London 2007. [20] P. Heckbert, “Color image quantization for frame buffer [40] A. Rosenfeld, “Pyramids: Multi-resolution Image Analysis,” 3rd display,” ACM Transactions on Computer Graphics, vol. 16 iss: Scandinavian Conference on Image Analysis, June 1983, pp. 23- 3, July 1982, pp. 297-307. 28. [21] S. Lloyd, “Least squares quantization in PCM,” IEEE [41] S. Kasif and A. Rosenfeld, “Pyramid linking is a special case of Transactions on Information Theory, vol. 28 iss: 2, Mar. 1982, ISODATA,” IEEE Transactions on Systems, Man and pp. 129-137. Cybernetics, vol. SMC-13 iss: 1, Jan.-Feb. 1983, pp. 84-85. [22] M. Garey, D. Johnson, and H. Witsenhausen, “The complexity [42] J. C. Tilton, “Multi-resolution Spatially Constrained Clustering of the generalized Lloyd-Max Problem (Corresp.),” IEEE of Remotely Sensed Data on the Massively Parallel Processor,” Transactions on Information Theory, vol. 28 iss: 2, Mar. 1982, IGARSS Symposium, 1984, pp. 661-666. pp. 255-256. [43] E. Lughofer, “Dynamic Evolving Cluster Models Using On-line [23] G. H. Ball and J. D. Hall, “ISODATA, A Novel Method of Data Split-and-Merge Operations,” in Proceedings of the 2011 10th Analysis and Pattern Classification,” Technical Report, SRI International Conference on Machine Learning and Applications International, Stanford 1965. and Workshops (ICMLA), vol. 2, Dec. 2011, pp. 20-26. [24] J. C. Bezdek, “A Convergence Theorem for the Fuzzy [44] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data. ISODATA Clustering Algorithms,” IEEE Transactions on Englewood Cliffs: Prentice Hall, 1988. Pattern Analysis and Machine Intelligence, vol. PAMI-2 iss: 1, [45] R. L. Cannon, J. V. Dave, and J. C. Bezdek, “Efficient Jan. 1980, pp. 1-8. Implementation of the Fuzzy C-Means Clustering Algorithms,” [25] D. Cheng, R. Kannan, S. Vempala, and G. Wang, “A divide- IEEE Transactions on Pattern Analysis and Machine and-merge methodology for clustering,” ACM Transactions on Intelligence, vol. PAMI-8 iss: 2, March 1986, pp. 248-255. Database Systems (TODS), vol. 31 iss: 4, Dec. 2006, pp. 1499- [46] J. F. Kolen and T. Hutcheson, “Reducing the time complexity of 1525. the fuzzy C-means algorithm,” IEEE Transactions on Fuzzy [26] C. Alippi and R. Cucchiara, “Cluster partitioning in image Systems, vol. 10 iss: 2, Apr. 2002, pp. 263-267. analysis classification: a genetic algorithm approach,” in [47] D. Altman, “Efficient fuzzy clustering of multi-spectral Proceedings of the Computer Systems and Software images,” in IEEE 1999 International, IGARSS ’99 Proceedings, Engineering, CompEuro ’92, May 1992, pp. 139-144, doi: Geoscience and Remote Sensing Symposium, vol. 3, Jun.-Jul. 10.1109/CMPEUR.1992.218520. 1999, pp. 1594-1596. [27] S. Young, I. Arel, T. P. Karnowski, D. Rose, "A Fast and Stable [48] S. Eschrich, J. Ke, L. O. Hall, and D. B. Goldgof, “Fast accurate Incremental Clustering Algorithm," 2010 Seventh International fuzzy clustering through data reduction,” IEEE Transactions on Conference on Information Technology: New Generations Fuzzy Systems, vol. 11 iss: 2, Apr. 2003, pp. 262-270. (ITNG), April 2010, pp.204-209. [49] N. R. Pal and J. C. Bezdek, “Complexity reduction for "large [28] J. C. Bezdek, Pattern Recognition with Fuzzy Objective image" processing,” IEEE Transactions on Systems, Man, and Function Algorithm, New York, USA: Plenum Press, 1981. Cybernetics, Part B: Cybernetics, vol. 32 iss: 5, Oct. 2002, pp. [29] A. Kandel, Fuzzy Mathematical Techniques with Applications, 598-611. New York: Addison Wesley, 1987. [50] S. Rahimi, M. Zargham, A. Thakre, and D. Chhillar, “A parallel [30] A. M. Bensaid, L. O. Hall, J. C. Bezdek, and L. P. Clarke, Fuzzy C-Mean algorithm for image segmentation,” IEEE “Partially supervised clustering for image segmentation,” Annual Meeting of the Fuzzy Information (Processing NAFIPS Pattern Recognition, vol. 29 iss: 5, May 1996, pp. 859-871. ’04), vol. 1, June 2004, pp. 234-237. [31] T. W. Cheng, D. B. Goldgof, and L. O. Hall, “Fast fuzzy [51] A. Petrosino and M. Verde, “P-AFLC: a parallel scalable fuzzy clustering,” Fuzzy Sets and Systems, vol. 93 iss: 1, Jan. 1998, clustering algorithm,” in Proceedings of the 17th International pp. 49-56. Conference on Pattern Recognition (ICPR 2004), vol. 1, Aug. [32] F. Höppner, F. Klawonn, R. Kruse, and T. Runkler, Fuzzy 2004, pp. 809-812, ISSN: 1051-4651, doi: Cluster Analysis: Methods for Classification, Data Analysis and 10.1109/ICPR.2004.1334340. Image Recognition, New York: Wiley, 1999. [52] D. T. Anderson, R. H. Luke, and J. M. Keller, “Speedup of [33] S. K. Pal and D. K. D. Majumder, Fuzzy Mathematical Fuzzy Clustering Through Stream Processing on Graphics Approach to Pattern Recognition, New York: Wiley, 1986. Processing Units,” IEEE Transactions on Fuzzy Systems, vol. 16 iss: 4, Aug. 2008, pp. 1101-1106. [34] L. Ma and R. C. Stauton, “A modified fuzzy C-means image segmentation algorithm for use with uneven illumination

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 37

45 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

SPEM: A Software Pattern Evaluation Method

J. Kabbedijk, R. van Donselaar, S. Jansen Department of Information and Computing Sciences Utrecht University, The Netherlands {J.Kabbedijk, R.VanDonselaar, Slinger.Jansen}@uu.nl

Abstract—Software architecture makes extensive use of many pattern attributes can not be directly measured in a quantitative software patterns. The decision on which pattern to select is way. complex and architects struggle to make well-advised choices. Decisions are often solely made on the experience of one architect, Pattern evaluation adds retrospect and the knowledge of lacking quantitative results to support the decision outcome. many experts to existing pattern documentation. This study There is a need for a more structured evaluation of patterns, also relates to software architecture as it solves a problem supporting adequate decision making. This paper proposes a found in the software pattern selection process. Software Software Pattern Evaluation Method (SPEM), that enables the pattern evaluation helps when performing pattern oriented quantification of different pattern attributes by using structured focus groups. The method is formed using a design science software architecture in cases where alternative patterns to approach in which an initial method was created using expert solve the same problem and only a single pattern can be interviews, which was later refined using several evaluation selected. This is an important factor to take into account, sessions. SPEM helps software producing companies in struc- because it means that rather than selecting individual patterns, turing their decision making and selecting the most appropriate an architect will want to select an architectural style, and patterns. Also, SPEM helps in enriching pattern documentation thus select a large set of patterns that fit this style. This area by providing a way to add quantitative information to pattern of software architecture has developed, which resulted in a descriptions. large amount of documented patterns and allows for comparing Keywords—architectural patterns. quality attributes. software architectural styles [5]. architecture. decision making. pattern evaluation. Although it seems that comparing individual patterns is less relevant for software architecture, an architectural style is selected at the early stages of software design and cannot easily I.INTRODUCTION be changed after the development has started. This creates a Modern software architecture heavily relies on the use of problem because while the software is being developed, the many different software patterns, often used complementary to requirements for the project or the environment will change. each other in order to solve complex architectural problems. Therefore it is necessary to extend the architecture or at Software architecture provides guidelines and tools for high times alter the existing architecture. At this point it becomes level system design in which architects select best fitting pat- relevant to compare individual patterns in order to select the terns to be used within the software product [1]. Many different pattern that fits the project requirements. This is an ongoing patterns and tactics exist, leading to a complicated trade-off process that happens throughout software development and analysis between different solutions and causing the evaluation relies on the experience of software architects and developers. and selection of the appropriate software patterns to be a Current documentation of software patterns is lacking a way to complex task [2]. This complexity means architects need to compare them with each other. But, if multiple patterns tackle have in-depth understanding of the project characteristics and the same problem, how does an architect decide which one requirements combined with extensive experience in software to use? This is tacit knowledge of experienced architects and development. developers, leading to the following problem statement:“There is no formal way to express the quality of one pattern over The information needed for appropriate pattern selection another”. is seldom available to all architects in a centralized or stan- dardized way. Architectural decisions are frequently made This paper presents the Software Pattern Evaluation based on experience and personal assessment of one person, Method (SPEM). Using SPEM, software producing companies instead of using the knowledge of many [3]. Allowing software are supported in pattern selection decision making and are able architects to use all information efficiently saves time when to quantitatively compare different patterns. SPEM enables selecting fitting software patterns and leads to better and more them to get an overview of specific pattern characteristics in a adequate decision making. For this to be possible, a method timely manner. Also, SPEM can be used to generate a publicly has to be created enabling the evaluation and documentation available pattern related body of knowledge, helping research of crucial attributes of a software pattern [4]. This structured and practitioners in architectural research and decision making. evaluation will allow architects and decision makers to com- This paper first gives an overview of research related to pattern pare different solutions and select the best matching pattern. comparison in Section II. The design science approach used Patterns, however, are a high-level solution that can be used in the research is described in Section III, after which SPEM different scenarios, making it impossible to use one specific is presented in Section IV. The pattern evolution, including implementation of the pattern to evaluate the entire pattern. the initial method creation (Section V-A) and method evalu- Because specific implementations are unusable, the relevant ation (Section V-B), showing the changes during the method

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 38

46 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

creation process can be found in Section V. To conclude, the the purpose of answering this research question, multiple sub- applications of SPEM are discussed (Section VI), followed by questions are constructed: a conclusion (Section VII). SQ1: Which attributes are relevant in pattern II.RELATED WORK evaluation?

Software Patterns — As software development was ma- Rationale: Patterns can possess many attributes that give turing in the 1980s, the need arose to share common solutions important information on usefulness and quality. For example, to recurring problems. This process started out by developers how the pattern effects performance or maintainability can both communicating to their colleagues how they solved a recurring be attributes of a pattern. development issue. The communication was informal and there were no clear rules for documentation. In later years, software Attributes are used in software architecture to evaluate patterns have become an essential part of software development the quality of certain aspects of the architecture. We apply as a way to capture and communicate knowledge. Software the same principles for evaluation of software patterns. The patterns are solutions to a recurring problem in a particular first step is to create a list of attributes by looking at related context [6], [7]. When properly documented these solutions literature. This list is then reduced by performing expert are a valuable asset for communication with and among interviews. This tells us which of the listed attributes are practitioners [8]. Usage of software patterns allows for time important to software architects when evaluating a pattern. and cost reduction in software development projects, making A validation of the reduced list of attributes is performed by them an important tool for software design and development. interviewing a second expert. The result of these interviews Although software patterns started out as a way to communi- is a validated list of attributes relevant in the software pattern cate solutions among developers, they have become a crucial evaluation process. part of software architecture [9] as well. A pattern selected by a developer, however, does not take into account the entire SQ2: How can attributes relevant in pattern architecture and how it combines with existing patterns. This evaluation be quantified in a manner that allows for problem is solved by selecting patterns at the architectural comparison? level. Architecture Evaluation — Evaluation is commonly used Rationale: Typical documentation on software patterns in software architecture in order to increase quality and is qualitative in nature. Although this might be suited for decrease cost [10]. Many evaluation methods for software documentation on patterns it does not allow for comparison. architecture have been developed and compared in recent For this reason, the different attributes relevant for pattern years [11]. The evaluation should be performed as early as evaluation need to be quantified. A structured method of possible in order to prevent large scale changes in later stages quantification that is used for evaluations would allow for of development. Software architecture evaluation is linked to patterns to be compared on attribute level. the development requirements and desired quality attributes. Therefore, it is not a general evaluation of software archi- tecture, nor an evaluation of a specific implementation. The evaluation should be an indication of whether the proposed architecture is a good fit for the project. Pattern comparison Conduct expert Initial method and evaluation has been done before [12] in a quantitative interviews manner, but has focussed on the implementation of different patterns and lacks the evaluation of the idea the pattern Is basis for describes. Evaluate method Improved method

III.RESEARCH APPROACH

Improvements This section presents the research questions answered in Is finalized in this paper and the design science approach used to construct

SPEM. The main research question (MRQ) answered in this No improvements paper is: Build SPEM method SPEM method MRQ: How can software patterns be evaluated in a manner that is objective and allows for com- parison?

The aim of this study is to aid software architects in the decision making process of selecting software patterns. This Fig. 1. Design Science Research Method can only be useful when the evaluation method yields objective results. Since a software pattern can not be objectively mea- To answer this question we first look at comparable sured in any way, the opinions of multiple software architects methods of quantification within the domain of software are used in the form of scores. A quantitative study also engineering. From these methods the specific characteristics allows for easy comparison between alternative patterns. For are deduced. An example of these characteristics can be the

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 39

47 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

ability to assign a negative value to an attribute. A method for environment, and in requirements and functional specifi- quantification is constructed based on the list of characteristics. cations. The method is evaluated by using it in a focus group session • Portability — Degree to which the software product can after which is can be incrementally improved. be transferred from one environment to another. A design science approach is used, which is depicted in Figure 1. An initial method is created based on an earlier exploratory study [13], extended by expert interviews. The method is evaluated in multiple cycles in which the method Create participant Participant profiles was put to practice in a real-world setting. Three subsequent profiles sessions are organized in which both professional software architects and software architecture students used the method. Assign scores Information system master students can be used as test subjects Assign personal Personal score list instead of professional software developers [14]. All sessions scores were recorded and an evaluation form is filled in by all par- ticipants after each session. A revised method was constructed No consensus Is input for after each session, based on the feedback, which is used as Is included in input for the next session. After three sessions no significant changes were needed any more, leading to the creation of the Consensus final method (i.e. SPEM). Assign group score Score table

IV. SPEM - SOFTWARE PATTERN EVALUATION METHOD [Next attribute] Is included in SPEM has been constructed to evaluate software patterns in a manner which allows for comparison. There are two distinct roles: [All attributes discussed] Write evaluation Evaluation summary Evaluator — Leads the evaluation process by introducing con- summary cepts and directing discussions. Is responsible for timekeeping, collecting all deliverables and noting scores. Participant — A software architect or developer who uses his knowledge to assign scores to attributes, enters discussion, Fig. 2. SPEM: Software Pattern Evaluation Method shares arguments and tries to reach consensus. The evaluation data is gathered during a focus group Pattern attributes are characteristics of the pattern itself, session. These sessions vary in duration from one to two hours. used to measure its learnability or ease of implementation. The Four to twelve participants can partake in the evaluation, ex- goal of the evaluation is to assign a score to each attribute by cluding the evaluator. The basis of the evaluation are attributes, all participants. The score is a relative measure based on the categorized in both quality attributes and pattern attributes. experience of the participant, ranging from −3 to +3. The Quality attributes are used to measure the impact the pattern score is a generalization of the software pattern, not based on has on software quality and are based on ISO/IEC 25010 [15]. a specific implementation. Experience using the pattern in a The following quality attributes (excluding sub-attributes) are variety of situations is expressed by the score. Therefore the used in SPEM: difference in experience among all participants is a key factor in the evaluation, which is compensated in a group score. • Performance efficiency — Degree to which the software A group score is assigned to each attribute (excluding sub- product provides appropriate performance, relative to the attributes) and expresses a score after a round of discussion. amount of resources used, under stated conditions. The discussion of each attribute allows the participants to share • Compatibility — The ability of multiple software com- their knowledge with each other. The goal of the discussion ponents to exchange information or to perform their is to reach consensus, meaning that after knowledge has required functions while sharing the same environment. been shared between participants with different amounts of • Usability — Degree to which the software product can experience, one score is assigned on which all participants be understood, learned, used and attractive to the user, agree. The result is quantitative data in the form of scores when used under specified conditions. based on personal experience and the knowledge of a group, • Reliability — Degree to which the software product visualized in an evaluation summary (see Figure 3). can maintain a specified level of performance when used SPEM consists of four activities and four deliverables, under specified conditions. as shown in Figure 2. The first activity focuses on creating • Security — The protection of system items from acci- participant profiles [16]. These profiles are forms containing dental or malicious access, use, modification, destruction, fields for the participant’s name, job description and years of or disclosure. experience. Additionally, there are input fields for the pattern • Maintainability — Degree to which the software product name and experience with the pattern. Provided with the can be modified. Modifications may include corrections, participant profile is a personal score list containing a list improvements or adaptation of the software to changes in of quality attributes, sub-attributes and pattern attributes. For

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 40

48 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Fig. 3. SPEM evaluation summary () each item on this list there is a possibility to give a personal expert evaluation sessions. score. The evaluator introduces the method to the participants by explaining each deliverable and the focus group session protocol. In the protocol, all activities and by who they are A. Initial Method Construction performed are listed and described. Thereafter, the evaluator Expert interviews formed the basis of the initial version asks the participants to fill out the participant profile. of the SPEM method. Two software architects from different In the second process, personal scores are assigned to an companies cooperated to share their views on software pattern attribute. During the evaluation the scores are recorded in the evaluation. Understanding which attributes are relevant in pat- personal score list. After the evaluation the personal scores are tern evaluation and how they could be quantified was the goal entered in the score table. The score table contains rows with of the interview. During the interview a list of quality attributes all attributes used in the evaluation and columns containing all derived from ISO/IEC 9126 [17] and ISO/IEC 25010 [15] was personal scores, average scores, standard deviations and group discussed, the latter being preferred by the interviewees. Al- scores. The evaluator introduces an attribute by giving a short though both interviews had different results on the importance description. The participants are then asked to assign a score of each individual attribute of the standard, none could be to the attribute and all corresponding sub-attributes. excluded. Ease of learning and ease of implementation are both attributes describing characteristics of software patterns. In the next phase, a group score is assigned to an attribute Both these attributes should be included in software pattern and noted in the score table. The group score is a score which evaluation as they play an important part in software pattern is produced by gaining consensus. This means all participants selection. partake in a discussion. The focus of the discussion is to exchange arguments on the score of an attribute. If consensus Scenarios are often used in software architecture evalu- is reached among all participants, the resulting group score ation, but do not fit pattern evaluation. The fact that patterns is assigned and noted on the score table. If consensus is not are evaluated without a specific implementation in mind makes reached, the group score is not assigned and no score will be the use of scenarios irrelevant. A software architect should noted in the score table. The evaluator initiates a discussion interpret the results of pattern evaluation by relating it to their on the current attribute by asking a single participant’s score own project. When attributes are quantified using a score, it and motivation for the score. Other participants are free to should be possible to assign a negative value. Patterns can respond and exchange views, directed by the evaluator. If the affect software quality in a negative way or have negative discussion ends or if no time is left, the evaluator asks the characteristics, which a score should be able to express. The participants if they have reached consensus. When consensus range of the scores should be between a five and ten point is reached, the group score is recorded in the score table. scale. At larger ranges it would be difficult for an architect to assign an accurate score. When all attributes have been evaluated, an evaluation sum- mary is created. The evaluation summary is a combination of When multiple architects perform a pattern evaluation, they all participant profiles and a filled out score table. Additionally are likely to have varying degrees of experience. Experience a new form is added containing the name of the evaluator, date is key in understanding software patterns and their effect and threats to validity. This gives the evaluator the opportunity on software quality. It is important to assign a score to an to note any occurrences that are not expressed in the main attribute which takes into account the varying degrees of deliverables. This process is performed by the evaluator at the experience software architects have. This should be done using end of the focus group session and concludes the evaluation. discussion and consensus. In a discussion, those who have more experience can share their knowledge with those who have less experience. Together working towards consensus V. METHOD EVOLUTION can improve the level of knowledge of the participants and This section discusses how the initial method evolved and consequently improve the score. Software pattern evaluation shows the explicit changes made to the method based on the should be performed with at least one architect who has

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 41

49 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

experience using the pattern which is being evaluated. This of leaving it empty which might imply a neutral score. The restriction makes sure the evaluation yields a valuable result. introduction of the pattern was unclear, leading to discussion and debate. How scores should be assigned and what they Based on these interview results a method was constructed represent raised questions during the session. The evaluator incorporating the following: should focus more on explaining the meaning of scores and • All attributes and sub-attributes from ISO/IEC the difference between quality attributes and pattern attributes. 25010 [15]. Based on the second evaluation session, the following • Two additional attributes; ease of implementation and improvements were incorporated in the method: ease of learning. • Scoring ranging from −5 to +5. • Sub-attributes removed — Because discussion on sub- • Discussion after each attribute. attributes took too long, they were removed from the • The goal of trying to reach consensus on each attribute. method. • Added an option to give an attribute no score — An B. Method Evaluation explicit way was added for participants to indicate they do not want to give a score to a certain attribute. Using a design science approach the initial method was • More focus on pattern introduction — The pattern evaluated and improved several iterations. A total of three needs to be thoroughly explained to prevent discussions. focus group sessions were hosted to evaluate the method. • More focus on explaining what the scores represent In these sessions the method was carried out by evaluating — Scores represent the impact the evaluated pattern has a software pattern. Participants were asked to fill out an on software quality or characteristics of the pattern itself. evaluation form at the end of the focus group session. The This distinction needs to be clear in order to properly feedback gathered in the evaluation forms and experiences assign scores. from hosting the sessions were the basis for each new iteration • Stronger role of the evaluator — The evaluator needs to of the SPEM method. direct the discussions. Apart from initiating discussions, During the first focus group session, four software archi- they should also be halted. Time keeping is the responsi- tects participated, each having over nine years of experience bility of the evaluator. in software development. During this session the observer pattern was evaluated using the initial version of SPEM. The The third focus group session was performed with different first attribute, functional suitability and corresponding sub- students from the same group as the second focus group attributes raised many questions. It was not possible to assign a session. Although sub-attributes did not receive a score, they score, as the attribute demanded a specific context. Not having were referred to in discussions to better understand an attribute. a description for sub-attributes was confusing and diverted Therefore it is important to include the sub-attributes in the discussions to the definitions of certain sub-attributes. Based method. Discussions for each sub-attribute would increase the on the evaluation session, the following improvements were time to complete an evaluation substantially. A personal score incorporated in the method: should be assigned to a sub-attribute while discussions and consequently group scores, should be left out. • Removal of attribute ‘Functional suitability’ — This attribute, including its sub-attributes turned out to be Based on the third evaluation session, the following im- irrelevant based on the focus group session. Functional provements were incorporated in the method: suitability can only be assessed by looking at specific • Sub-attributes added — Can help the understanding of implementations. attributes and provide more detail to the data. • Including a description for all sub-attributes — A de- • Sub-attribute discussions removed — Gives the sub- scription of each attribute, including all sub-attributes was attributes a less prominent role in the method and focusses needed. This way different interpretations of attributes can more on attributes. be precluded. • Descriptions for each attribute / sub-attribute added The second focus group session was performed with twelve to participant profile — Allows the participants to read participating master students. The students have an information descriptions of attributes independent of the evaluator. systems background and were all enrolled in the Software Architecture course, which prepared the students for the focus After the three sessions, the final method (i.e. SPEM) was group session. The Access Point pattern was evaluated and created. participants were free to discuss attributes without any inter- vention from the evaluator. This resulted in lengthy discussions VI.SPEMIMPLEMENTATION making the session take longer than anticipated. Discussions SPEM is created to evaluate software patterns in general, should be halted by the evaluator after a certain period of time without a specific implementation in mind. This enables the based on the time that is available. option for comparison of software patterns, because each Assigning scores to sub-attributes and discussing them was pattern has been evaluated as an abstract solution. It prevents time consuming. Sub-attributes needed a less prominent role unbalanced comparison between patterns based on different in the method. It was not always possible for participants to implementations. There is a trade-off between easy to compare assign a score to an attribute. Therefore it should be possible to generic evaluation and implementation specific evaluation. An have an explicit option stating that no score is assigned, instead implementation specific evaluation provides more accurate

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 42

50 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

data, but it can only be compared to evaluated patterns based from -3 to +3 are assigned to all attributes and sub-attributes. A on the same implementation. A generic evaluation might not be group score is assigned to all attributes after a discussion and as accurate, but ensures all evaluated patterns can be compared. reaching consensus. All scores are noted in the score table. SPEM can be used for implementation specific evaluation A future step is to perform SPEM sessions to gather data with few adjustments. It requires the evaluator to explain and produce results allowing for software pattern comparisons. that the scores should be assigned with an implementation Gathering the output of evaluation sessions and adding them to in mind. There needs to be an input field describing the the knowledge base can help to further validate the method and implementation on the score table. With these adjustments an produces valuable knowledge for both academic and industrial evaluation session would be identical to SPEM and allows for purposes. use of all processes and deliverables used in SPEM. ACKNOWLEDGMENT This study provides knowledge on software pattern evalu- ation by introducing a method to evaluate software patterns. The authors would like to thank Leo van Houwelingen from The data SPEM evaluations provide further expands the body Exact Software and Raimond Brookman from Info Support of knowledge on patterns. It adds retrospect to the existing for their valuable input. We also want to thank the architect software pattern documentation and provides insight on the team from Info Support for their cooperation. This research is impact patterns have on software quality. A collection of funded by the NWO ‘Product-as-a-Service’ project. SPEM evaluation results provides valuable knowledge on the understanding of software patterns and software quality. REFERENCES A knowledge base would enable the disclosure of SPEM [1] L. Bass, P. Clements, and R. Kazman, Software Architecture in Practice, evaluation results and would allow results to be combined and 2/E. Pearson Education India, 1998. compared. From an industrial perspective, a SPEM knowledge [2] A. Jansen, J. Van Der Ven, P. Avgeriou, and D. K. Hammer, “Tool sup- base would enable software architects to share their knowledge port for architectural decisions,” in The Working IEEE/IFIP Conference on software patterns. It would make knowledge available to aid on Software Architecture (WICSA’07). IEEE, 2007, pp. 4–4. [3] M. A. Babar and I. Gorton, “A tool for managing software architecture in software pattern selection, leading to better decision making knowledge,” in Proceedings of ICSE Workshop on Sharing and Reusing and overall software quality. It is through sharing knowledge Architectural Knowledge (SHARK). IEEE, 2007, pp. 11–17. that software pattern selection can reach a higher level of [4] J. Tyree and A. Akerman, “Architecture decisions: Demystifying archi- maturity, allowing for a structured way of comparing software tecture,” Software, IEEE, vol. 22, no. 2, pp. 19–27, 2005. patterns. [5] G. Booch, “On creating a handbook of software architecture,” in Pro- ceedings of the Conference on Object Oriented Programming Systems SPEM uses discussion and consensus to obtain quantitative Languages and Applications (OOPSLA), vol. 16, no. 20, 2005, pp. 8–8. evaluation data. This method of quantification was introduced [6] F. Buschmann, Pattern oriented software architecture: a system of to cope with different experience levels among participants. patterns. Ashish Raut, 1999. It has imposed a constraint on the method of data gathering [7] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns: used in SPEM. As discussions require interaction between Elements of Reusable Object-Oriented Software. Addison-Wesley participants, all participants need to be able to communicate Professional, 1995. with each other at the same time. Therefore SPEM is used [8] K. Beck, R. Crocker, G. Meszaros, J. Vlissides, J. O. Coplien, L. Do- minick, and F. Paulisch, “Industrial experience with design patterns,” in focus group sessions, limiting the number of participants. in Proceedings of the 18th international conference on Software engi- A trade-off exists between a more accurate score based on neering. IEEE Computer Society, 1996, pp. 103–114. consensus with a small number of participants and a less [9] F. Buschmann, K. Henney, and D. C. Schmidt, “Past, present, and future accurate, but more reliable score with a large number of trends in software patterns,” IEEE Software, vol. 24, no. 4, pp. 31–37, participants. 2007. [10] G. Abowd, L. Bass, P. Clements, R. Kazman, and L. Northrop, “Rec- ommended best industrial practice for software architecture evaluation,” VII.CONCLUSION DTIC Document, Tech. Rep., 1997. [11] M. A. Babar, L. Zhu, and R. Jeffery, “A framework for classifying and SPEM is an objective software pattern evaluation method comparing software architecture evaluation methods,” in Proc. of the which can be used to compare patterns. It is used to eval- Australian Software Engineering Conference. IEEE, 2004, pp. 309– uate relevant attributes of patterns based on the experience 318. of software architects. SPEM provides quantitative data on [12] M. Hills, P. Klint, T. Van Der Storm, and J. Vinju, “A case of visitor attributes in the form of scores. The data can be interpreted versus ,” in Objects, Models, Components, Patterns. and visualized to allow for software pattern comparison. This Springer, 2011, pp. 228–243. answers the main research question (MRQ). [13] J. Kabbedijk, M. Galster, and S. Jansen, “Focus group report: Evaluating the consequences of applying architectural patterns,” in Proc. of the The question “Which attributes are relevant in pattern European conference on Pattern Languages of Programs (EuroPLoP), 2012. evaluation?” (SQ1) is answered with a list of attributes, con- [14]M.H ost,¨ B. Regnell, and C. Wohlin, “Using students as subjectsa sisting of quality attributes and pattern attributes. The quality comparative study of students and professionals in lead-time impact attributes are based on ISO/IEC 25010 and modified for pattern assessment,” Emp. Softw. Engineering, vol. 5, no. 3, pp. 201–214, 2000. evaluation, resulting in the following set of attributes: a) Per- [15] ISO/IEC, ISO/IEC 25010. Systems and software engineering - Systems formance efficiency, b) Compatibility, c) Usability, d) Re- and software Quality Requirements and Evaluation (SQuaRE) - System liability, e) Security, f) Maintainability, and g) Portability. and software quality models, 2010. These attributes can be quantified in a manner that allows for [16] R. Donselaar and J. Kabbedijk, [Accessed: 2014-04-08]. [Online]. comparison (SQ2) by rating the different attributes by experts Available: http://www.staff.science.uu.nl/ kabbe101/PATTERNS2014 in a focus group setting. It requires that personal scores ranging [17] ISO/IEC, ISO/IEC 9126. Software engineering – Product quality, 2001.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 43

51 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Generating Java EE 6 Application Layers and Components in JBoss Environment

Gábor Antal, Ádám Zoltán Végh, Vilmos Bilicki Department of Software Engineering University of Szeged Szeged, Hungary {antalg, azvegh, bilickiv}@inf.u-szeged.hu

Abstract—Nowadays, prototype-based development models are representations the model-driven development tools can very important in software development projects, because of automatically produce lower level system models, and in the the tight deadlines and the need for fast development. end of the process they can generate implementations in Prototypes can be very efficient in the analysis, validation and specific programming language. Transitions between the clarification of the requirements during the development different model levels and the generation of the iterations. Model-driven development and code generation are implementation from the models are performed by frequently used techniques to support the fast development of predefined automatic transformations. The model-driven application prototypes. There are many recurring tasks in the development process supports the regeneration of the iterations of a prototype development process, which can be implementation after modifying the system models. performed much faster using high level models and the A very important tool of model-driven development is automatic generation of application component implementations. The goal of this paper is to review model- code generation, which is used in the implementation phase driven development and code generation methods and tools, [3]. The goal of code generation is to automatically create a which can be used to create Java Enterprise Edition 6 runnable implementation of the system in a specific applications. Besides, this paper introduces a new code programming language based on the initial high level system generation tool, which uses the JBoss Forge framework for models. Using this technique the production of prototypes generating application layers and components based on the can be accelerated significantly. The usage of different entities of the application. design patterns can cause a lot of recurring manual implementation tasks, which can be supported with code Keywords- prototype development; model-driven engineering; generation effectively. The generated source code is code generation. supported by expert knowledge, so the maintainability of the code can be higher compared to the manual implementation. I. INTRODUCTION However, the existing code generation solutions have some main disadvantages and deficiencies, which make them There are many software development projects at present unable to use effectively for generating Java EE 6 [4] where it is very hard to collect all the requirements at the applications. Our goal is to introduce a new code generation beginning of the project. The customers can be uncertain or toolkit, which eliminates these drawbacks and generates they cannot visualize the requirements in detail and application prototypes based on Java EE design patterns. precisely. Therefore, the classical waterfall development This paper provides insights on some methods, which can model does not work in these projects. The requirements be used for effective development of Java EE (Enterprise specification and the system plan need more phases to Edition) 6 application prototypes. Section II examines some validate and clarify, which need active communication with related work in the area of code generation and shows the the customer. Therefore, prototype-based development disadvantages of the existing generator toolkits. Section III models can be useful in these cases [1]. Using these models presents the JBoss Forge [5] framework, which can generate it is possible to show a functional application prototype to Java EE 6 application prototypes with its Forge Scaffold [6] the customer to validate and clarify the requirements. But extension. Section IV describes some practical very fast prototype implementation is needed for effective methodologies for code generation using some tools from the prototype-based development processes. For this purpose, JBoss Forge framework. Section V introduces a Forge-based model-driven development and code generation methods are code generation toolkit, which eliminates the defects of the used to speed up the implementation. However, it is Forge Scaffold and provides more application components to important to examine and use software design patterns to generate. Section VI presents a case study about the usage of ensure the maintainability of the application. our generator toolkit in a telemedicine project. Section VII The goal of Model-Driven Development (MDD) [2] is to concludes the paper and discusses some interesting problems simplify the system design process and increase development that are targeted to be further studied and improved in the productivity by reusing standard software models and design future. patterns. The developer defines the components of the system with their relations, the structures and relations of the data to be managed, and the use cases and behaviors of the system using high level models. Based on these high level

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 44

52 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

II. RELATED WORK Faces (JSF), RichFaces, Primefaces. Unfortunately, the During model-driven development processes the high Forge Scaffold plug-in has some defects. level system models are often created using Unified  If the entity layer contains inheritance between Modeling Language (UML) [7], mostly based on class-, entities, the user interfaces of the child class do not sequence-, activity- and state diagrams. The programming contain the fields of the super class. language of the implementation is mostly Java, C# or C++.  If a field has a @ElementCollection annotation, the The GenCode toolkit [8] generates Java classes from UML edit page does not contain any component for class diagrams, which include fields, getter and setter selecting/adding elements to the field. methods, default constructors, but it can generate only stubs  In case of associations between entities, the edit page for non-trivial methods. Other UML-based model-driven contains a dropdown menu for selecting/adding development tools, e.g., Modelio [9], ObjectIF [10], IBM elements with the return value of the toString() Rational Rose [11] and Rhapsody [12], have the same method of the related entity instances. It is not drawback. Some toolkits introduced in [13][14][15] papers possible to select a field as label. can generate non-trivial methods based on sequence- and  List pages do not provide ordering and filtering by activity diagrams. However, these tools need these diagrams multiple criterions. for each non-trivial method and they cannot use generic  The generated code is difficult to maintain. The diagram templates for generating pattern-based applications. business logic is generated into one Bean class per Some MDD tools define their own modeling languages, entity. There is no separate e.g., Acceleo [16] and ActifSource [17], but they do not have (DAO) layer. any metamodels for Java EE code generation. A part of the source code can be also a model. A great IV. CODE GENERATION METHODOLOGIES example for this statement is the entity layer implementation The Forge Scaffold plug-in uses the combination of two of an application, which defines the data structures managed main methodologies for generating source code: template- by the system. The entity implementations can be used to based and procedural code generation. The tools for these generate Create-Read-Update-Delete (CRUD) user interfaces code generation methods are provided by the JBoss Forge for the application. Entity-based code generation tools are, framework, so new code generator plug-ins can be developed e.g., JBoss Forge, seam-gen [18], MyEclipse for Spring [19], using these tools. and OpenXava [20]. In the case of these toolkits, the The concept of template-based code generation generated user interfaces are in some cases incomplete, methodology is to produce source code based on predefined inaccurate, or difficult to use, e.g., inheritance, association templates, which will be rendered depending on the between entities. generation context. These templates consist of two types of parts: static parts, which will be generated into every source III. THE JBOSS FORGE FRAMEWORK code instances; and dynamic parts, which will be replaced One of the most important entity-based code generation depending on the different inputs of the generation process. toolkits is the JBoss Forge framework, which is developed For supporting template-based code generation, JBoss by the JBoss Community. The main goal of this tool is to Community provides the Seam Render tool, which is also manage the lifecycle and configuration of Maven projects used by Forge Scaffold plug-in [22]. The syntax of Seam [21] and to support the automatic generation of some Render code templates is quite simple. The static code parts application components with some simple commands. Forge should be specified the same way as they should appear in has a command line interface, which includes most of the the generated code instances. The dynamic code parts should important Linux file manager commands. These commands be indicated by expressions with named objects given in the are extended to read and modify the structure of Java source form @{indicatorExpr}. The template-based code generation files. Besides, there are commands implemented for creating process consists of three phases: and configuring projects, and creating Java Persistence 1. Compiling the template: the specified template is Application Programming Interface (JPA) entities. The processed and the dynamic template parts are Forge commands can be run batched in a script to make the collected with their object names and positions. use of the framework easier. 2. Defining the rendering context: every object names There are many available and downloadable plug-ins for in indicator expressions are mapped to a Java object Forge, extending its functionality. It is also possible to (mostly a String) for replacing every occurrence of develop new plug-ins to support new code generation the proper object name with the value of the object. capabilities. A very useful plug-in included by the 3. Rendering the template with the given context. framework is Forge Scaffold, which can be used to generate This method can be efficiently used in cases when the basic Java EE 6 prototypes with CRUD web user interfaces source code to generate contains small dynamic parts, which based on the JPA entities of the application. The Scaffold can be rendered by simple replacements, e.g., DAO classes, plug-in can be extended with several scaffold providers, Beans for business logic. But, there can also be cases when which can be used to specify the UI component library for the source code to generate contains “highly dynamic” parts, generating layer, e.g., standard Java Server which can be rendered using very complex replacements (e.g., user interfaces) or when a given existing source code

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 45

53 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

file must be modified in a later phase of the generation on the Data Access Object design pattern. We created a process. In these cases the procedural code generation generic DAO superclass for providing simple EntityManager methodology can be useful. operations and the entity-dependent DAO child classes The procedural code generation aims the production of extend this superclass to provide queries and entity-specific code elements using libraries for parsing and representing the operations. Therefore, only the child classes should be elements of the specified language. For generating Java generated depending on the entity based on a DAO template. source code procedurally, Forge provides the Forge Java We used Hibernate as JPA provider in our generator toolkit. Parser library, which can parse Java source code to its own high level representation, create new code elements (classes, C. Business Logic Layer interfaces, fields, methods, etc.) or modify existing elements The business logic layer of an application is responsible in the source code [23]. For generating XHTML (eXtensible for establishing connection between the user interface and Hypertext Markup Language) elements of the web user the entities (using the DAO layer), and managing complex interface, the Metawidget framework can be useful, which business processes and transactions. There are two main contains the high level representation of standard XHTML components in general, which are used by CRUD user tags, and JSF components [24]. Using these tools, new interfaces: generator components can be developed to produce highly  Data Models: provides the listing of specific type of dynamic code parts and extend existing source code with entities with paging, ordering and filtering. They use new elements. the Session Façade and Value List Handler design patterns. V. FORGE-BASED CODE GENERATION TOOLKIT  Entity Action Beans: provides editing, viewing and For eliminating the defects of Forge Scaffold and deleting of a selected entity instance. They use the extending its functionality with the generation of other Session Façade design pattern. application components, we aimed the development of a new These business logic components are also easy to code generation toolkit based on the JBoss Forge framework. generalize using generic superclasses. Only the entity- We analyzed some of our software development projects to specific Data Models and Entity Action Beans should be find application layers and components, which contain a generated depending on the entity based on code templates. large amount of repetitive work that can be replaced with We used Contexts and (CDI) for code generation methods to accelerate the development context management and dependency injection, and process. We found nine general areas that can be effectively Enterprise JavaBeans (EJB) 3 for transaction management in supported with code generation. We examined these areas our generator toolkit. and their design patterns, collected the knowledge and D. CRUD User Interfaces experience we have in relation to the development of these components and implemented a code generation toolkit for The CRUD user interfaces provide simple listing, supporting the effective production of them. In the following creating, editing, viewing and deleting capabilities for the subsections we discuss these areas in detail. application user. Simple UI components (text fields, labels, dropdown menus, etc.) can be inserted using JSF 2 A. Entity and Validation Management components. Complex UI components (list items, fields with The entity generator of the Forge framework is a very labels, complex selector components, etc.) can be efficiently useful tool but it has some defects: generalized using JSF 2 composite components, which can  It is not possible to generate inheritance between simplify the generation of complex user interfaces. These entities. components use the Composite View design pattern. We also  It cannot generate enum types with specific values. eliminated the defects of Forge Scaffold in our user interface  It is not possible to generate custom constructors, generator. The generation of user interface components is hashCode, equals and toString methods based on performed by procedural code generation and the result is specific fields. inserted into predefined XHTML page templates. We implemented these capabilities to extend the An example list user interface generated by our toolkit is functionality of the entity generator. Besides, Forge cannot shown on Figure 1. The layout and content of list items are generate validation annotations to the entity fields, so we generated using the structure of the entity classes. extended our toolkit with the possibility to add annotations of Bean Validation API (Application Programming Interface) to the entities. The generated entities use the Composite Entity design pattern to implement associations between the entities. B. DAO Layer The DAO layer of the application is responsible for the operations, which can be made on the entities of the system: inserting, updating, deleting, querying. Using JPA-based EntityManagers it is easy to generalize the DAO layer based Figure 1. List user interface generated by our toolkit

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 46

54 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

E. Authentication, Authorization, Auditing The system needs authentication and authorization modules to provide appropriate data security. Users need to be authenticated before using the system and it is very important to check the permissions of the actual user before data access or modification. These modules can be the same in most of the applications. We use the PicketLink [25] security and identity management framework in our generated security module, which provides a general entity layer pattern and JPA-based solutions for identity and role management. In our security module, we also implemented PicketLink-based registration, password change and password reminder components, which can be automatically generated for the given application. The logging of the operations made by application users is also a very important data security aspect, which is called Figure 2. Chart user interface generated by our toolkit data auditing. Our code generation toolkit can produce an auditing module for the application, which is capable to log every data operations with the name of the performer user H. Demo Data Generators and the actual timestamp. The auditing module uses During the development the system needs a lot of demo Hibernate Envers [26] in a combination with our auditing data for testing. Demo data can also make presentation of the solutions. application to the customer easier. Therefore, demo data generator components are needed in the system, which can F. REST Interfaces be very time-consuming to implement in case of large Mobile client applications often connect with a server domain models. Our toolkit can generate demo data application for data querying or uploading. Therefore, both generators for the selected entities in the application. The the client and the server application need interfaces to constraints of entity fields are also taken into consideration communicate with each other. The most frequently used during the generation. It is also possible to configure the methodology of client-server communication is the number of entities to be generated before the production of Representational State Transfer (REST) architecture. For data generators. transferring data between the client and the server, the JavaScript Object Notation (JSON) and eXtensible Markup I. Dashboards Language (XML) formats are often used. Our toolkit can Many applications may need special, customizable user generate common source files for REST-based interfaces, which can show only the most important communication (both for client and server) and REST information for users, with simple and clear visualization service stubs for server application based on DAO methods. elements. Our code generator toolkit can produce dashboard Data objects transferred between server and clients are user interfaces with customizable layout (tiles, which can be implemented using the Data Transfer Object (DTO) design moved or hidden) and content (charts, gauges, switches, pattern. The generated service implementation includes DTO status indicators, etc.). A generated example dashboard is classes based on entities and converters between entities and shown on Figure 3. The dashboard is created using Gridster DTO classes. Our solution uses the RESTEasy library for jQuery-based library [29]. REST implementation and Jackson library for JSON implementation. G. Data Visualization on Charts In server applications there can be a lot of numerical data (e.g., measurement data, quantities), which should be visualized on charts to analyze their changes. Our toolkit is capable to generate chart visualization user interfaces for specified numerical data over the changes of a specified entity field (e.g., measurement date). A generated example chart user interface is shown on Figure 2. We use the Flot [27] JavaScript-based library for visualizing charts, FlotJF [28] library for representing Flot objects in Java, and RESTEasy for the REST services, which provides data for charts.

Figure 3. Dashboard user interface generated by our toolkit

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 47

55 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

VI. CASE STUDY: DEVELOPMENT OF A TELEMEDICINE VII. CONCLUSION AND FUTURE WORK APPLICATION This paper reviewed model-driven development and code We used our code generation toolkit successfully in a generation methods and tools for producing Java EE 6 telemedicine project where we had to develop a data applications, and introduced a code generation toolkit based collector application for supporting the long distance on the JBoss Forge framework, which can generate many monitoring of patients after heart surgery. The task of server application components based on Java EE design patterns to application was the collection, storage, and the visualization accelerate the prototype development process. We also used of the incoming data from different sensors, e.g., weight this toolkit in the development of a telemedicine application scales, ECG sensors, blood pressure monitors. and measured the efficiency of the code generation When planning the mentioned application we analysed supported development compared to the manual the components to be developed and the results showed that development. The measurement results show that the a significant part of the development can be supported with development process with code generation needed code generating methods. Relying on this, the prototype significantly less time than the manual development. development of the patient monitoring system with our code In the future, we would like to continue the development generating tool was started. The following components of the of our generator toolkit and extend its functionality. We application were generated: identified the following goals, which we would like to reach  DAO (Data Access Object) and business level layer. in the further development of the toolkit.  Listing, editing, visualizing user interfaces.  Now, the generator works based on Forge 1.4.3. We  Charts for the visualization of ECG measurements would like to upgrade our toolkit to Forge 2. and for the representation of the temporal changes of  During the test phase of the development, unit tests blood pressure and weight. provide the correctness of small, low level system  Data uploading REST service stubs on the server units. The development of correctly functioning side and REST client stubs in the mobile client system units can be more effective with application. implementing unit tests for given units. We would Above this, only the refinement of user interfaces and the like to extend our toolkit with the capability to implementation of REST interfaces were required. generate unit tests for selected classes. Furthermore, for integrating sensors, Android mobile  An entity-based code generator toolkit can be easier prototypes were developed, whose task was to receive and to use if it can process UML class diagrams, which process the data sent by a single channel ECG sensor, a contain the entities of the application. These UML blood pressure monitor, and a weight scale. diagrams should be extended with some additional We measured the time requirements at this application properties, such as JPA and Bean Validation with our productivity measurement plug-in for Eclipse IDE, annotations for entity classes and fields. Using these and compared them with the time needed by developing this diagrams the Java implementation of entities can be application manually [30][31]. The measurement results can automatically generated. We would like to develop be examined in Table I. tm means the manual implementation an extension for our code generator toolkit to process time (in person days), and tcg means the implementation time extended UML class diagrams of entities as the input with code generation support (in person days). The of application prototype generation. application layers needed about 80% less time in average to  The main goal of our toolkit is to generate web- implement with code generation support, compared to the based server application components. In the future manual development. we would like to identify some areas in Android mobile application development, which can be TABLE I. COMPARISON OF DEVELOPMENT TIME IN MANUAL- AND supported with code generation and implement a CODE GENERATION SUPPORTED DEVELOPMENT generator toolkit for Android applications. Total  Many entity fields can have special meanings that Layer tm tcg tcg/tm (%) time imply special criterions during the demo data spent (%) generation and special validation processes. For Domain model 0.1 0.02 20 5 example, an entity field with String type can be a DAO layer 3 1 33 10 first name of a person, an address or a country name. CRUD user 10 0.2 2 50 interfaces This semantic aspect should be taken into Authentication, consideration during the data generation and authorization, 20 0.2 1 15 validation of values in entity fields. Our goal is to auditing provide an opportunity to add semantic annotations Data 3 1 33 10 to entity fields, which are related to ontology visualization definitions of special data structures, and to generate System integration 3 1 33 10 special data generators and validators for the interfaces annotated fields.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 48

56 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

ACKNOWLEDGMENT Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension (ICPC 2011). Kingston (ON), pp. 201-204. The publication is supported by the European Union and [31] G. Kakuja-Toth, A. Z. Vegh, A. Beszedes, L. Schrettner, T. Gergely, co-funded by the European Social Fund. and T. Gyimothy, “Adjusting effort estimation using micro- Project title: “Telemedicine-focused research activities productivity profiles”, 12th Symposium on Programming Languages on the field of Mathematics, Informatics and Medical and Software Tools. SPLST'11. Tallinn, Estonia, 5-7 October 2011, sciences” pp. 207-218. Project number: TÁMOP-4.2.2.A-11/1/KONV-2012- 0073

REFERENCES [1] M. F. Smith, Software prototyping: adoption, practice, and management. McGraw-Hill, 1991. [2] B. Sami, Model-Driven Software Development. Springer, 2005. [3] J. Herrington, Code Generation in Action. Manning Publications Co., 2003. [4] http://docs.oracle.com/javaee/, [retrieved: 2014.04.11]. [5] http://forge.jboss.org/, [retrieved: 2014.04.11]. [6] http://forge.jboss.org/docs/important_plugins/ui-scaffolding.html, [retrieved: 2014.04.11]. [7] J. Rumbaugh, I. Jacobson, and G. Booch, The Unified Modeling Language Reference Guide, Second Edition, Addison-Wesley Professional, 2004. [8] A. G. Parada, E. Siegert, and L. B. de Brisolara, “GenCode: A tool for generation of Java code from UML class models”, 26th South Symposium on Microelectronics (SIM 2011), Novo Hamburgo, Brazil, pp. 173-176. [9] http://www.modeliosoft.com/, [retrieved: 2014.04.11]. [10] http://www.microtool.de/objectif/en/, [retrieved: 2014.04.11]. [11] http://www-03.ibm.com/software/products/us/en/ratirosefami/, [retrieved: 2014.04.11]. [12] http://www-03.ibm.com/software/products/us/en/ratirhapfami/, [retrieved: 2014.04.11]. [13] A. G. Parada, E. Siegert, and L. B. de Brisolara, “Generating Java code from UML Class and Sequence Diagrams”, Computing System Engineering (SBESC), 2011 Brazilian Symposium, Florianopolis, Brazil, pp. 99-101. [14] M. Usman and A. Nadeem, “Automatic Generation of Java Code from UML Diagrams using UJECTOR”, in International Journal of Software Engineering and Its Applications vol. 3, no. 2, 2009, pp. 21- 38. [15] E. B. Oma, B. Brahim, and G. Taoufiq, “Automatic code generation by model transformation from sequence diagram of system’s internal behavior”, in International Journal of Computer and Information Technology, vol. 1, issue 2, 2012, pp. 129-146. [16] http://www.eclipse.org/acceleo/, [retrieved: 2014.04.11]. [17] http://www.actifsource.com/, [retrieved: 2014.04.11]. [18] http://docs.jboss.org/seam/2.3.1.Final/reference/html/gettingstarted.ht ml, [retrieved: 2014.04.11]. [19] http://www.myeclipseide.com/me4s/, [retrieved: 2014.04.11]. [20] http://openxava.org/home, [retrieved: 2014.04.11]. [21] http://maven.apache.org/, [retrieved: 2014.04.11]. [22] https://github.com/seam/render, [retrieved: 2014.04.11]. [23] https://github.com/forge/java-parser, [retrieved: 2014.04.11]. [24] http://metawidget.org/, [retrieved: 2014.04.11]. [25] http://picketlink.org/, [retrieved: 2014.04.11]. [26] http://envers.jboss.org/, [retrieved: 2014.04.11]. [27] http://www.flotcharts.org/, [retrieved: 2014.04.11]. [28] https://github.com/dunse/FlotJF, [retrieved: 2014.04.11]. [29] http://gridster.net/, [retrieved: 2014.04.11]. [30] G. Kakuja-Toth, A. Z. Vegh, A. Beszedes, and T. Gyimothy, “Adding process metrics to enhance modification complexity prediction”,

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 49

57 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Refinement Patterns for an Incremental Construction of Class Diagrams

Boulbaba Ben Ammar * and Mohamed Tahar Bhiri + Faculty of Sciences of Sfax, Sfax University, Sfax, Tunisia *[email protected] [email protected]

Abstract —Specifying complex systems is a difficult task, which application areas as reactive systems [17][26][27] and the cannot be done in one step. In the framework of formal vertical refinement under B. The horizontal refinement [17] methods, refinement is a key feature to incrementally develop consists in introducing new details in an existing model. more and more detailed models, preserving correctness in each Each introduction of details to a model leads to a new model, step. Our objective is an incremental development, using the which must be coherent with the previous one. If it is the technique of refinement with proof for UML specifications. case, we said that the second model refines the first one. Indeed, UML suffers from two major weaknesses, namely, it is Horizontal refinement aims at the progressive acquisition, by not based on a simple and rigorous mathematical foundation successive refinement, of a coherent model from abstract and it does not support the concept of refinement with proof of specifications, leading to the abstract formal specification of correction. To achieve this, we advocate a development future software or system. framework combining the semi-formal features of UML/OCL and the formal one from B method. We chose the B formal The vertical refinement [17] consists in going from an language in order to benefit from existing work done on abstract model to a more concrete one, for example by coupling between UML and B. In addition, we propose and reducing the non-determinism. The concrete model is a formalize in B the refinement patterns that promote realization of the abstract model. For example, the B incremental development with proof of UML/OCL class Automatic Refinement Tool (BART) [1] tool associated with diagrams. We illustrate our purpose by the description of some B offers refinement rules that can be used in the final stages development steps of an access control system. of vertical refinement phase of a formal process development. B and Event-B do not distinguish between Keywords-UML; OCL; refinement pattern; class diagram horizontal and vertical refinement. In fact, both refinements use two types of refinement allowed by B, i.e., data I. INTRODUCTION refinement and algorithms or control refinement [18]. In the following, we briefly present in Section 2 the Refinement is a process to transform an abstract existing approach for construction of class diagrams. The specification into a concrete one [17]. It aims to develop proposed approach is presented in Section 3. In Section 4, we systems incrementally, which are correct by construction present our catalog of refinement patterns used in an [18]. Refinement is defined in a rigorous way in various incremental specification development process. In Section 5, formal languages, such as B [18], Event-B [17], we illustrate the use of the proposed approach using an Communicating Sequential Processes (CSP) [8], Z and access control case study. Section 6 concludes this paper and Object-Z [14]. The Unified Modeling Language (UML) [19] proposes perspectives. is an object-oriented modeling language widely used. It is a de-facto standard, allowing graphical visualization of models II. RELATED WORK facilitating communication inter-actors. But, it does not UML is a graphical modeling language reference, support the concept of refinement. It has a dependency offering an important range of diagrams. Class diagram, relationship stereotyped «refine» to connect a client (or which can express the static aspects of a system, are one of refined concrete element) to a provider (abstract element). This relationship is subject to several interpretations [15] and the most used diagrams. At the "heart" of the object modeling, it shows the classes and interfaces of a system and does not provide methodological assistance related to how to the various relationships between them. Approaches to refine existing UML models. In addition, UML does not construction and verification of class diagrams have been allow the verification of a refinement relationship between highlighted in several studies [4][7][10][13][20][21]. two models. In a formal language, such as B, Event-B, CSP, The decomposition unit of object-oriented systems is the Z and Object-Z, although the refinement relationship is well- concept of class. A coarse characterization of classes makes defined and well-supported (generation of proof obligations, interactive prover, model checker and animator), an a distinction between the analysis classes belonging to the space of problems (external world in modeling course) and experienced designer finds more or less important difficulties design classes and implementation belonging to the space of in identifying the different levels of abstraction (an “optimal” solutions. refinement strategy) for carrying out the process of Methodological works supporting the identification of refinement. Solutions based on the concept of pattern --like the useful and relevant classes were completed. A method the design patterns in Object-Oriented (OO) applications-- to known as “Underline the names in the document of the guide the designer during the refinement process begin to appear covering both the horizontal refinement for requirements” is proposed in [13]. The results of the application of this method are very sensitive to the used

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 50

58 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

style. This can lead designers to omit useful classes while Example and See also. In addition, the proposed refinement introducing classes, which are not justified. patterns are formalized into B specifications, using Class, Responsibility, Collaboration (CRC) cards [20] are systematic rules of translation of UML into B [11][16]. This paper cards on which the designers evoke the potential helps to identify precisely the conditions of applicability, the classes according to their responsibilities and in the way in evolution of a UML class diagram and correctness of the which they communicate. This technique promotes refinement relationship. Such B formalization can be reused interaction within teams but its contribution to the with advantage when instantiating these patterns by the identification of quality classes is uncertain. designer. Thus, in a joint development UML/B, the designer Meyer [7] provides the general heuristics for classes selects and applies a refinement pattern on his abstract discovery, based on the theory of Abstract Data Types specification. Then, he obtains a new specification, which (ADT). He defines different uses of inheritance justifying includes new properties related to the application of the their uses. refinement pattern. Verification of the correctness of the The Business Object Notation (BON) method [21] refinement relation between two specifications is entrusted to introduces useful advices to identify the classes. It proposes Atelier B tool [30]. two structural concepts (cluster and class), two strong In the following, we detail a new approach used for conceptual relations (customer and inheritance) and a simple development of UML class diagrams guided by refinement language of assertions expressing the semantic properties patterns. Such development process allows establishing a attached to the modeled elements. UML class diagrams, which models the key concepts of the Many analysts and designers use only the class diagrams. application and has properties considered to be coherent Others use development process allowing scheduling of covering the constraints of the application resulting from its several types of UML diagrams. Some development process specifications. The process advocated has four steps: with UML adopts an approach based on use case diagrams in Rewriting of requirements, Refinement strategy, Abstract order to draw class diagrams [28]. specification, and Refinement steps. The design classes represent the architectural abstractions, which facilitate the production of elegant and A. Rewriting of requirements extensible software structures. Design patterns, including Currently, the specifications are often of poor quality. those of Gang of Four (GoF) [10] promote the identification Abrial [27] criticize these specifications to be directed too of these classes. towards a solution and to present mechanisms of realization Semi-formal graphical notations (such as UML) are to the detriment of the explicitness of the properties of the generally intuitive, but do not allow rigorous reasoning. On system to be conceived. We recommend to rewrite the contrary, formal notations (such as B) provide mathematical specifications in order to put forward the properties of the proofs, but are not easy to understand. Several studies future system and to facilitate the development of a suitable coupling between semi-formal and formal notations exist. strategy of refinement. For that purpose, we use the Among these works, we studied profitably those linking recommendations of Abrial [17][27] for distinguishing the UML to B [16]. Works of coupling between UML and B go functional safety and liveness properties. to the combination of UML and B in a new language named B. Refinement strategy UML-B [9]. By using the technique of refinement, the approach Rewriting of requirements facilitates the development of described by Ben Ammar et al. [4] allows of a UML/OCL an adequate strategy of refinement. But, this does not class diagram showing all the formal properties of the future guarantee obtaining an “optimal” strategy of refinement. system. The obtained class diagram, containing the analysis Work making it possible to compare alternative strategies of classes, represents a coherent abstract model of the future refinement for a given scope of application, in case of the system. Such a model can be concretized (identification of reactive systems, starts to appear [17][27]. design and implementation classes) by applying the C. Abstract specification technique of refinement. This stage aims to establish an abstract UML/OCL model III. APPROACH TO DESIGN A SYSTEM IN A STEP -WISE described by a class diagram based on the refinement MANNER strategy previously defined. The UML/OCL class diagram product is translated into B in order to formally verify its Our approach consists on the proposal of a catalogue of coherence. refinement patterns (see Section IV) for incremental development of UML class diagrams. These patterns are D. Refinement steps built to solve recurrent problems in the development of the The refinement process involves several steps. Each static part of an OO application, such as: introduction of an refinement step takes as inputs three parameters: the class intermediate class [3], reification of an attribute, an diagram of level i, the proposed catalog of refinement enrichment of an association, decomposition of an aggregate patterns and the properties resulting from the specifications and the introduction of a new entity. These patterns are to be taken into account and produce as output the class characterized by a precise framework composed of six parts diagram of level i+1 (see Figure 1). showing the fundamental aspects of a refinement pattern. These parts are Intention, Motivation, Solution, Verification,

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 51

59 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

relationships. This subsequently facilitates the introduction of details via intermediate classes to refine these abstract relations. 3) Solution (1)

(2) Figure 1. Step of refinement

The consideration of the properties, which guide the Figure 2. Class_Helper specification process of refinement, can be realized by applying 4) See also refinement patterns. The formal verification of the Class_Helper [3] the pattern depends on the nature of the correctness of the refinement step is entrusted to the AtelierB relationship between two important classes P1 and P2: tool through the translation of two class diagrams in two B generalization, association, aggregation, composition and levels. The gluing invariant in B models making a link dependency. In addition, the intermediate class introduced between these two levels (abstract and refined) can be Helper can be connected to P1 and P2 using the same kind of established by reusing the B formalization of proposed relationship or two relations of different nature. refinement patterns. The refinement process terminates when Class_Helper the pattern can be applied in reverse order of all the explicit properties in the specification are taken into the concrete to the abstract. This process of abstraction - as account in accordance with the adopted refinement strategy. opposed to refinement - can be profitably used in an activity Thus, ultimate UML/OCL class diagram obtained models the of reverse engineering. key concepts of the system to achieve. In addition, it contains B. Pattern 2: Class_Attribute the essential properties deemed formally consistent. 1) Intention IV. REFINEMENT PATTERNS When a class has an attribute modeling a concept Unlike architecture patterns [12], analysis patterns [22] considered interesting and with well-defined operations, this and design patterns [10] a refinement pattern, has a dynamic pattern allows to reify this attribute in a new class called character. Applied to a model of level i, a refinement pattern Class_Attribute. An aggregation relationship is introduced produces a model of level i+1. Recently, refinement patterns between the enclosing class --aggregate-- and the class begin to appear for formalisms, such as Event-B [26], KAOS reifying the concerned attribute --component--. [2] and B [1]. In [3], we offer refinement patterns to solve 2) Motivation recurring problems in incremental development of the static For reasons of simplification, at a high level of part of an OO application using UML/OCL. A refinement abstraction, a concept can be modeled as an attribute. Then, pattern has two parts: Specification (1) and Refinement (2). according to details from the specifications, the same A specification describes the UML/OCL class diagram of concept can be retained as a class. This is justified by the level i. A refinement describes the UML/OCL class diagram identification of well-defined operations applicable on this of level i+1 produced by applying the corresponding concept by analyzing the introduced details. The type of the refinement pattern on the model of level i. The proposed attribute is rather discrete: integer, enumerated or refinement patterns, presented later, are described in the alphanumeric. same framework including six parts: Intention, Motivation, 3) Solution Solution, Verification, Example and See Also. In the following, we detail the patterns only by the (1) Intention, Motivation, Solution and See also.

A. Pattern 1: Class_Helper 1) Intention It allows introducing a class Class_Helper between two classes considered important with respect to the refinement (2) step considered. The direct relationship between the two major classes is refined by a path connecting these two classes through the intermediate class introduced. 2) Motivation Figure 3. Class_Attribute specification UML class diagram consists of four types of inter-class relationships: generalization (or inheritance), association, 4) See also aggregation and dependence. In an incremental OO The idea of reification of an attribute may be used with modeling, it is advantageous to start with abstract inter-class advantage in an activity of restructuring (or refactoring) of an existing OO models. Moreover, in [5], we proposed a

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 52

60 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

refactoring schema based on the reification of an attribute: 3) Solution introduction of the concept of delegation. (1)

C. Pattern 3: Class_Decomposition 1) Intention It allows detailing the responsibilities of an original class (2) 1 1 by introducing new classes. The original class and the resulting classes are connected by relations of generalization Figure 5. Class_NewEntity specification (inheritance). The number of the resulting classes is at least 4) See also equal to one. This pattern favors a top-down modeling. The On the form, the two patterns Class_Helper and generalization covers mainly the following two cases [7]: Class_NewEntity produce similar effects. But onthe content, • Heritage subtype: You are an external model they differ. In fact, they have two different gluing invariants. system in which a class of objects (external) can be In addition, the pattern Class_NewEntity is oriented towards decomposed into disjoint sub categories. We urge that the horizontal refinement (specification stage) encouraging the parent, A, be deferred so that it describes a set of the construction step-by-step of a business model of the objects not fully specified. The heir B can be effective application, while the pattern Class_Helper is oriented or delayed. towards the vertical refinement (design stage) promoting the • Restriction inheritance: Inheritance of restriction gradual construction of a conceptual model of the applies if the instances of B are among the instances of application. A, those that satisfy a constraint expressed in the invariant B and absent from the invariant A. A and B E. Pattern 5: Refinement_Operation should both be deferred or both effective. 1) Intention 2) Motivation This pattern provides a passage from an abstract In a top-down modeling approach, it is advantageous to specification of operation into a more concrete one. It is start with a minimum number of classes. Sometimes we inspired by formal development practices used in B method. think that factoring operations can cause problems with 2) Motivation implementation. B method allows several types of refinement: data 3) Solution refinement, control refinement and algorithmic refinement. (1) In control refinement, the following facts are observed: • the operation to be refined retains the same signature,

• its precondition can be strengthened,

• the nondeterministic behavior, described by (2) substitutions, can be reduced. Figure 4. Class_Decomposition specification In UML framework, we can describe the control refinement using Object Constraint Language (OCL) 4) See also notations for presenting both abstract and concrete The pattern Class_Decomposition introduces the idea of specification of operation to be refined. a decomposition of a class via inheritance relationship. Both 3) Solution UML relationships: aggregation and composition can be used for the decomposition of an aggregate entity modeled by UML class. (1)

D. Pattern 4: Class_NewEntity

1) Intention It allows the introduction of UML class, which models a separate entity. The introduced class is related to other (2) classes from abstract level through association relationships.

2) Motivation In an incremental OO modeling, it is advantageous to Figure 6. Refinement_operation specification start with a minimum number of entities called very abstract 4) See also entities. This further promotes the introduction of details The pattern Refinement_Operation introduced the idea of through less abstract or concrete entities, called (e.g., control refinement. In the same way, we can define a pattern equipment) to go from the abstract world to the concrete of data refinement. This allows the introduction of concrete world. variables (data). In this case, a gluing invariant, which links abstract and concrete variables, should be explained. Both control and data refinement are not mutually exclusive; they can be operated in the same refinement step. It is obvious

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 53

61 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

that the pattern Refinement_Operation can be applied 2) Motivation combining these two types of refinement. Sometimes, an association must own properties. These properties cannot be attached to the extremities of this F. Pattern 6: Class_Abstraction association. 1) Intention 3) Solution The pattern Class_Abstraction introduced software qualities, such as efficiency, reusability and scalability in a (1) software development guided by successive refinements. Thus, it allows factoring common properties - attributes, operations and relationships - of some classes within a founding class. (2) 2) Motivation

In the development process, each entity is modeled by a class. But often, classes that are, in fact, variations of the Figure 8. Class_Association specification same concept are encountered. Several classes of a class 4) See also diagram have common characteristics. It is said that these The specification part of Class_Helper pattern is identical independent classes can be derived from a common ancestor. to the pattern Class_Association. However, their refinement The idea is to improve the modeling, a better parts are different. representation, facilitate data storage and thus avoiding redundant features. For that, we can factor these common V. EXAMPLE features between the different classes into a new founding class. Our objective is to develop a system to control the access 3) Solution of person to the various buildings of a workplace, inspired by (1) [17]. In [17], this application is modeled in Event-B. In this (2) work, we provide a joint development in UML / OCL and B of this application by using the proposed refinement patterns. Proof tools and animation associated with B are used to perform automated verification of UML /OCL graphical models. Control is carried out from the authorizations assigned to the concerned persons. An authorization allows a person, under the control of the system, to enter in some buildings and not in others. The authorizations are permanent, i.e., they cannot be modified during the operation of the system. When a person is inside a building, his exit must also be controlled so that it is possible to know, at any Figure 7. Class_Abstraction specification moment, who are in a given building. A person can move from a building to another only if these two buildings are 4) See also interconnected. The communication between the buildings is The pattern of change introduced by this pattern can be done through one-way doors. Each door has an origin used with advantage in the process of refactoring to do to building and a destination building. A person may enter a improve the structure (or quality) of an existing OO building by crossing a door if it is unlocked. The doors software. A refinement process with evidence rather favors being physically locked, a door unlocked for only one obtaining a correct by construction software. The pattern authorized person requiring entering the building. A green Class_Abstraction advocates for the inclusion of other LED associated with each door is lit when the requested software qualities, such as efficiency, scalability from the access is authorized, prerequisite for unlocking the door. initial phases of a development process guided by successive Similarly, a red LED associated with each door is lit when refinements. Besides, the risk of overlooking the efficiency the requested access is denied to the door. Each person has a quality in a process of refinement with proof is mentioned in magnetic card. Card readers are installed at each door to read [24]. the information on a card. Near each reader, there is a turnstile that is normally blocked; no one can cross it without G. Pattern 7: Class_Association the control of the system. Each turnstile is equipped with a 1) Intention clock, which determines in part its behavior. This pattern can increase the power of an association by considering both as an association relationship and an A. Rewriting of requirements association class. This can be justified by the emergence of Rewriting of requirements of the case study aims to the specific details of the association relationship. Indeed, highlight the properties of this application. In order to such details may be attached to extremities of the classify these requirements, we used the following labels: association. An association class can only exist if the • EQU-Equipment to reference the description of association relationship exists. equipment used by the application.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 54

62 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

• FUN-Equipment/Actor to reference an attached • FUN-MODEL to reference the main function of the functionality of a device or an actor. application. • MODEL-FUN-number to reference an assured by the In Table I, we will list the different requirements of the application functionality. application of building access control. Each property is described by a relatively short text and a reference. TABLE I. REWRITING OF REQUIREMENTS OF AN ACCESS CONTROL SYSTEM . The system is responsible for controlling access of a number of people to several buildings. FUN-MODELE Each person is allowed to enter certain buildings (and not others). Buildings not recorded in this MODELE-FUN-1 authorization are implicitly prohibited. This is a permanent assignment. Any person in a building is allowed to be there. MODELE-FUN-2 The geometry of the building is used to define which buildings can communicate with each other MODELE-FUN-3 and in what direction. A building does not communicate with itself. MODELE-FUN-4 A person cannot move from a building where it is to a building where he wants to go if these two MODELE-FUN-5 buildings communicate with each other. Any person authorized to be in a building should be allowed to go to another building that MODELE-FUN-6 communicates with the first. The buildings are connected together by means of gates, which are one-way. We can therefore EQU-DOOR speak of origin and destination buildings for each door. A door cannot be taken if it is unlocked. A door can be unlocked for only one person at the same FUN-DOOR-1 time. Conversely, any person involved in the unlocking of a door cannot be in one another. When a door is unlocked for a certain person, it is in the building behind the door in question. In FUN-DOOR-2 addition, this person is allowed to go to the destination building of same door. When a door is unlocked for a certain person, it is in the building behind the door in question. In FUN-PERSON addition, this person is allowed to go to the destination building of same door. A green LED associated with each door. EQU-GREENLIGHT A green LED is lit when the requested access is allowed (pre requisite for unlocking the door). FUN-GREENLIGHT A red LED associated with each door EQU-REDLIGHT The red light of a door whose access has been denied. FUN-REDLIGHT The red and green lights of the same door cannot be turned on simultaneously. FUN-LIGHT Each person has a magnetic card that contains his permissions for different buildings. EQU-CARD Card readers are installed at each door to read the information on a card. EQU-CARDREADER

B. Refinement strategy Table II specifies the order of consideration of the properties and requirements of our case study: building access control. This defines our refinement strategy for incremental development of this application. The initial model is limited to the basic abstract properties of the application. Each refinement step includes a small number of properties from the abstract to the concrete. The refinement process ends when all properties from rewriting requirements were indeed taken into account. Equipment, such as door, card or LED that the application uses is introduced during the final stages of the adopted refinement process.

C. Abstract specification Figure 9. Initial class diagram We begin by developing a simple and very abstract class diagram that takes into account only the properties ( FUN- MODELE, MODELE-FUN-2) (see Figure 9). TABLE II. REFINEMENT STRATEGY Model Equipment and Function Initial // First FUN-MODELE, MODELE-FUN-2 // MODELE-FUN-1 Second MODELE-FUN-3, MODELE-FUN-4, MODELE-FUN-5, MODELE-FUN-6 Third // Fourth EQU-DOOR, FUN-DOOR-2 // FUN-DOOR-1, FUN-PERSON

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 55

63 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Fifth EQU-GREENLIGHT, FUN-GREENLIGHT, EQU-REDLIGHT, FUN-REDLIGHT Sixth // Seventh FUN-LIGHT // EQU-CARD, EQU-CARDREADER

association2; Door as Helper and Building as both P1 and D. Refinement steps P2. Finally, the property (FUN-DOOR-2) is that a door is a 1) First refinement component of a building. Thus, we introduce a composition In this step, we consider only the property (MODELE- relationship between Door and Building (see Figure 11). FUN-1). This property indicates that the authorizations provided by the association "authorization" are permanent. For that, we applied data refinement based on pattern of Refinement_Operation. This refinement consists on the addition of a frozen constraint to the association "authorization" presented in Figure 9. 2) Second refinement In this step, we inject into our system the properties (MODELE-FUN-3, MODELE-FUN-4, MODELE-FUN-5 and MODELE-FUN-6). These properties allow the introduction of the concept of communication between buildings. A person cannot move from one building to another only if the two buildings are interconnected. The association “communication” is introduced into the class diagram as a recursive association on the class Building (see Figure 10). Such refinement requires a rewriting of the OCL expressions of the operation “pass”. Figure 11. Third refinement Thus, we reused the refinement pattern Refinement_Operation. 4) Fourth refinement The fourth step of refinement consists on the definition of the functionality of the class Door introduced in the previous step (FUN-DOOR-1).

Figure 10. Second refinement

5) Third refinement The third refinement consists in adding new equipment (EQU-DOOR and FUN-DOOR-2). A door can make the connection between two buildings. This leads us to define the concept of door: each door has original building and a destination building. Indeed, the communication between the buildings is through a door. Thus, we must remove the association "communication", introduced in the previous refinement, and replace it with two associations between origin Building and Door and between Door and destination Figure 12. Fourth refinement Building. This change can be obtained by applying the refinement pattern Class_Helper with: communication as Such a transformation requires the revision of the association; origin as association1; destination as semantics of the operation "pass". Indeed, property (FUN-

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 56

64 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

PERSON) is that a person must appear before a door to move from one building to another. This requires the introduction of an association "acceptance" between Person and Door. Thus, the operation "pass" should not take an instance of the class Building as formal parameter but rather an instance of the class Door. For this, we apply the refinement pattern Refinement_Operation to generate the class diagram presented in Figure 12. 5) Fifth refinement In this step, we consider the properties (EQU- GREENLIGHT, FUN-GREENLIGHT, EQU-REDLIGHT and FUN-REDLIGHT). These properties define two new classes with their characteristics as components of the class Door. The application of refinement pattern Class_Decomposition with composition as a relationship between Door and RedLight and GreenLight as components generates the class diagram shown in Figure 13.

Figure 14. Sixth refinement

7) Seventh refinement The last properties (EQU-CARD and EQU- CARDREADER) will be taken into account in this final stage of refinement.

Figure 13. Fifth refinement

6) Sixth refinement In this step, we note the similarity between the two classes RedLight and GreenLight (FUN-LIGHT). Thus, we decided to factor the common properties between these two classes. The application of refinement pattern Class_Abstraction generates the class diagram shown in Figure 14. The pattern Class_Abstraction allows introducing a new class named Light, which groups common properties between GreenLight and RedLight.

Figure 15. Seventh refinement

Two intermediate classes can be introduced: • the class Card associated with each person,

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 57

65 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

• the class CardReader associated with each door of a machinery, allowing the proof of the correctness of the building. refined specification relative to the abstract one. We These classes are related as follows: Card is connected to proposed to use B for this purpose, using systematic Person, Card is connected to CardReader and CardReader is derivation rules from UML into B. Such a translation of connected to Door. UML into B uses profitably the B formalization of the The application of refinement pattern Class_Helper on proposed refinement [6]. Indeed, the properties described in the association “acceptance” generates the class diagram of the form of B invariant --including gluing invariant-- are Figure 15. retrieved and instantiated when translating UML into B. As an illustration, Figure 16 and 17 shows the B formalization E. Verification of the obtained system of pattern Class_Helper introduced in Section 4. Formal verification of such refinements can be exploited if the language is equipped with formal refinement

MACHINE p1, p2, association B_Class_Helper_a INVARIANT SETS p1 ⊆ P1 & p2 ⊆ P2 ∧ OBJECTS = {p11, p12, p13, association ∈ p1 ↔p2 p21, p22, p23, h1, h2, h3} INITIALISATION ABSTRACT_CONSTANTS p1 := {p11,p12, p13} || P1, P2 p2 := {p21, p22, p23} || PROPERTIES association := {p11↦p21, p11↦p22, P1 ⊆ OBJECTS & P2 ⊆ OBJECTS ∧ p11↦p23, p12↦p21, p12↦p22, P1 ∩ P2 = Ø ∧ P1 = {p11, p12, p13} ∧ p12↦p23, p13↦p21, p13↦p22, P2 = {p21, p22, p23} p13↦p23} VARIABLES END Figure 16. B formalization of pattern Class_Helper

REFINEMENT ran(association1) = dom(association2) ∧ B_Class_Helper_r /∗Gluing Invariant ∗/ REFINES dom(association) = dom(association1) ∧ B_Class_Helper_a ran(association) = ran(association2) ∧ ABSTRACT_CONSTANTS ran(association1) = dom(association2) ∧ HELPER association = (association1;association2) PROPERTIES INITIALISATION HELPER ⊆ OBJECTS ∧ HELPER ∩ P1 = Ø∧ p1:={p11,p12,p13} k p2:={p21,p22,p23} || HELPER ∩ P2 = Ø∧ HELPER = {h1, h2, h3} helper := {h1, h2, h3} || ABSTRACT_VARIABLES association1:={p117↦h1,p117↦h2,p117↦h3, p1, p2, helper, p127↦h1, p127↦h2, p127↦h3, association1, association2 p137↦h1, p137↦h2, p137↦h3} || INVARIANT association2:={h17↦p21,h17↦p22,h17↦p23, helper ⊆ HELPER ∧ h27↦p21, h27↦p22, h27↦p23, association1 ∈ p1↔helper ∧ h37↦p21, h37↦p22, h37↦p23} association2 ∈ helper↔p2 ∧ END

Figure 17. B formalization of pattern Class_Helper The abstract machine B_Class_Helper_a formalizes the abstract level of Class_Helper pattern using the systematic TABLE III. TABLE OF THE STATE OF B SPECIFICATIONS translation rules of UML to B [11], while B_Class_Helper_r nPO 1 nPRi 2 nPRa 3 uUn 4 %Pr machine formalizes the refined level of the same pattern. The Initial model 9 1 8 0 100 link between these two levels is described by the REFINES Second refinement 4 0 4 0 100 clause. Gluing invariant introduced in B_Class_Helper_r machine guarantees the correction of the refinement relation Third refinement 7 0 7 0 100 between the two levels of Class_Helper pattern. Formal Fourth refinement 12 3 9 0 100 verifications on the B models corresponding to UML/OCL Fifth refinement 12 2 10 0 100 class diagram are related to the coherence of the initial Seventh refinement 26 0 26 0 100 abstract model and the correction of each refinement step. 1 Number of Proof Obligations They call the generator of proof obligations (conjectures to 2 Number of Proof Obligations proved Interactively prove) and provers in B platform. The correction of B 3 Number of Proof Obligations proved Automatically models, respecting requirements, is forward to the ProB tool 4 Number of Proof Obligations Unproved [29], allowing animation and model checking.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 58

66 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Table III summarizes the proof obligations associated [9] C. Snook and M. Butler. 2006, “UML-B: Formal modeling and with our case study. The seven proposed refinement promote design aided by UML,” ACM Trans. Softw. Eng. Methodol. vol. 15, essentially the development of class diagrams correct by no. 1, January, 2006, pp. 92-122. construction. However, the designer could improve the [10] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, Design Patterns, Addison-Wesley, 1995. structure, without changing the semantic aspects of the class [11] E. Meyer and J. Souquières, “A Systematic Approach to diagram obtained by refinement using wisely the refactoring Transform OMT Diagrams to a B Specification,” Proceedings of the technique [23][25]. World Congress on Formal Methods in the Development of Computing Systems, Toulouse, France, September,1999, pp. 875-895. VI. CONCLUSION AND FUTURE WORK [12] F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, and M. The main idea of this work is to propose intuitive Stal, Pattern-Oriented Software Architecture: a system of patterns, refinements as patterns, providing a basis for tools John Wiley and Sons, 1996. [13] G. Booch, “Object-Oriented Development,” IEEE Trans. Software supporting the refinement-driven modeling process. In this Eng., vol. 12, no. 2, 1986, pp. 211–221. paper, we have presented our catalogue of refinement [14] G. Smith, The Object-Z Specification Language. Kluwer patterns. Formal verification of such refinements can be Academic Publishers, 2000. exploited if the language is equipped with formal refinement [15] H. Habrias and C. Stoquer, “A formal semantics for UML machinery, allowing the proof of the correctness of the refining,” XII Colloque National de la Recherche en IUT, CNRIUT'06, refined specification relative to the abstract one. We Brest, France, Jun, 2006. proposed to use B for this purpose, using systematic [16] H. Ledang, “Automatic Translation from UML Specifications to derivation rules from UML into B. The proposed refinement B,” Proceedings of the 16th IEEE international conference on patterns promote the identification of analysis classes that Automated software engineering, San Diego, USA, November, 2004, pp. 436-440. model the key concepts, resulting from requirements. [17] J. R. Abrial, Modeling in Event-B - System and Software Currently, we are exploring the following two tracks: Engineering, Cambridge University Press, 2010. proposal of refinement patterns oriented design by retrieving [18] J. R. Abrial, The B Book - Assigning Programs to Meanings, and adapting ideas from GoF patterns [10]; proposal of Cambridge University Press, 1996. refinement patterns oriented implementation, using the [19] J. Rumbaugh, I. Jacobson, and G. Booch, The Unified Modeling object-oriented modeling universal data structures (Eiffel) Language Reference Manual. 2 Boston, MA: Addison-Wesley, 2005. [7]. The next step of this work consists of automating [20] K. Beck and W. Cunningham, “A laboratory for teaching object- detecting and application of patterns in an appropriate oriented thinking,” ACM SIGPLAN Not., vol. 24, no. 10, 1989, pp. 1– 6. framework. In addition, we proposed a new approach, [21] K. Waldèn and J. M. Nerson, Seamless object-oriented software allowing finding a refinement strategy for the development architecture: analysis and design of reliable systems, Prentice-Hall, of UML class diagrams guided by the refinement patterns. Inc., 1995. An interesting idea is to preserve the history of pattern [22] M. Fowler, Analysis Patterns: Reusable Object Models, Addison- application in development case studies in order to have a Wesley Professional, 1996. traceability of the development process, allowing to back- [23] M. Fowler, Refactoring: Improving the Design of Existing track on previous decisions. Code.Boston, MA, USA: Addison-Wesley, 1999. [24] M. Guyomard, “Specification and refinement using B: two REFERENCES pedagogical examples,” ZB2002 4th International B Conference, Education Session Proceedings, Grenoble, France, January, 2002. [1] A. Requet, “BART: A Tool for Automatic Refinement,” ABZ, [25] R. Straeten, V. Jonckers, and T. Mens, “A formal approach to London, UK, September, 2008, pp. 345-345. model refactoring and model refinement,” Software and System [2] A. van Lamsweerde, Requirements Engineering - From System Modeling (2), 2007, pp. 139-162. Goals to UML Models to Software Specifications, Wiley, 2009. [26] T. S. Hoang, A. Furst, and J. R. Abrial, “Event-B Patterns and [3] B. Ben Ammar, M.T. Bhiri, and A. Benhamadou, “Refinement Their Tool Support,” Software Engineering and Formal Methods, Pattern: Introduction of intermediate class Class_Helper,” Conférence International Conference on, Hanoi, Vietnam, November, 2009, pp. en IngénieriE du Logiciel, CIEL, Rennes, France, Jun, 2012, pp. 1-6. 210-219. [4] B. Ben Ammar, M.T. Bhiri, and J. Souquières, “Event modeling for [27] W. Su, J. R. Abrial, R. Huang, and H. Zhu, “From Requirements construction of class diagrams,” RSTI - ISI, vol. 13, no. 3, 2008, pp. to Development: Methodology and Example,” The 13th International 131–155. Conference on Formal Engineering Methods, ICFEM, Durham, United [5] B. Ben Ammar, M.T. Bhiri, and J. Souquières, “Refactoring pattern Kingdom, October, 2011, pp. 437-455. of class diagrams based on the notion of delegation,” 7éme atelier sur [28] X. Castellani,“Cards stages of study of UML diagrams, Payment l'Evolution, Réutilisation et Traçabilité des Systèmes d’Information, orders of these studies,” Technique et Science Informatiques, vol. 21, ERTSI, couplé avec le XXVI éme congrès INFORSID, Fontainebleau, no. 8, 2002, pp. 1051–1072. France, May, 2008, pp. 1-12. [29] The ProB Animator and Model Checker, User Manual, [6] B. Ben Ammar, Contribution to the Systems Engineering: http://www.stups.uni-duesseldorf.de/ProB/index.php5/User_Manual, Refinement and Refactoring of UML specifications, Editions 2013. universitaires europeennes, 2012. [30] Clearsy System Engineering, Atelier B, User Manual, Version [7] B. Meyer, Object-oriented software construction, Prentice Hall, 4.0, http://www.tools.clearsy.com/resources/User_uk.pdf, 2010. 1997. [8] C. A. R. Hoare, “Communicating sequential processes,” Communications of the ACM, vol. 21, no. 8, 1978, p. 666-677.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 59

67 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Automating Cloud Application Management Using Management Idioms

Uwe Breitenbucher,¨ Tobias Binz, Oliver Kopp, Frank Leymann Institute of Architecture of Application Systems University of Stuttgart, Stuttgart, Germany {breitenbuecher, lastname}@iaas.uni-stuttgart.de

Abstract—Patterns are a well-established concept to document However, in some areas, these benefits face difficulties generic solutions for recurring problems in an abstract manner. that decrease the efficiency of using patterns immensely. In Especially in Information Technology (IT), many pattern lan- the domain of Cloud Application Management, the immediate, guages exist that ease creating application architectures, designs, fast, and correct execution of management tasks is of vital and management processes. Their generic nature provides a importance to achieve Cloud properties such as on-demand powerful means to describe knowledge in an abstract fashion self-service and elasticity [5][6]. Thus, management patterns, that can be reused and refined for concrete use cases. However, the required manual refinement currently prevents applying the e. g., to scale a Cloud application, cannot be applied manually concept of patterns efficiently in the domain of Cloud Application by human operators when a problem occurs because manual Management as automation is one of the most important require- executions are too slow and error prone [1][7]. Therefore, ments in Cloud Computing. This paper presents an approach to use patterns in Cloud Application Management, their that enables automating both (i) the refinement of management application must be automated [8]. A common way to automate patterns for individual use cases and (ii) the execution of the the execution of management patterns is creating executable refined solutions: we introduce Automated Management Idioms to processes, e. g., workflows [9] or scripts, that implement a refine patterns automatically and extend an existing management refined pattern solution for a certain application [1]. To achieve framework to generate executable management workflows based Cloud properties, this must be done in advance, i. e., before the on these refinements. We validate the presented approach by a problem occurs, for being ready to run them immediately when prototypical implementation to prove its technical feasibility and evaluate its extensibility, standards compliance, and complexity. needed. However, these processes are tightly coupled to a single application as the refinement of a management pattern depends Keywords—Application Management; Automation; Patterns; Id- mainly on the technical details of the application, its structure, ioms; Cloud Computing and the concrete desired solution [10]. For example, the pattern for scaling a Cloud application results in different processes depending on the Cloud provider hosting the application: due I.INTRODUCTION to the heterogeneous management APIs offered by different Cloud providers, different operations have to be executed to Patterns are a well-established concept to document reusable achieve the same conceptual effect [11]. Thus, individual pattern solution expertise for frequently recurring problems. In many refinement influences the resulting management processes areas, they provide the basis for decision making processes, fundamentally. As a result, multiple individual processes have to design evaluations, and architectural issues. In the domain of be implemented in advance to execute one management pattern Cloud Computing, patterns are of vital importance to build, on different applications. However, if multiple patterns have to manage, and optimize IT on various levels: Cloud Computing be implemented for hundreds of applications in advance, this Architecture and Management Patterns [1], Enterprise Inte- is not efficient as the required effort is immense. In addition, gration Patterns [2], and Green IT Patterns [3] are a few as Cloud application structures typically evolve over time, e. g., examples that provide helpful guides for realizing complex caused by scaling, a pattern may has to be implemented multiple Cloud applications, their management, and challenging non- times for a single application. Ironically, if the implemented functional requirements. The concept of patterns enables IT processes are not used during the application’s lifetime, the experts to document knowledge about proven solutions for spent effort was laborious, costly, but completely unnecessary. problems in a certain context in an abstract, structured, and reusable fashion that supports systems architects, developers, The result of the discussion above is that the gap between a administrators, and operators in solving concrete problems. pattern’s abstract solution and its refined executable implemen- The abstract nature of patterns enables generalizing the core tation for a certain use case currently prevents applying the con- of problem and solution to a level of abstraction that makes cept of patterns efficiently to the domain of Cloud Application them applicable to various concrete instances of the general Management—due to the mandatory requirement of automation problem—independently from individual manifestations. Thus, and its difficult realization. Thus, we need a means to automate many IT patterns are applicable to a big set of different settings, the refinement of abstract patterns to concrete executable technologies, system architectures, and designs. Applying solutions on demand. In this paper, we tackle these issues patterns to real problems requires, therefore, typically a manual by presenting an approach that enables applying management refinement of the described abstract high-level solution for patterns fully automatically to individual applications. We show adapting it to the concrete use case. To guide this refinement how the required refinement of a management pattern towards and ease pattern application, most pattern languages docu- a concrete use case can be automated by inserting an additional ment “implementations”, “known uses”, or “examples” for layer of Automated Management Idioms, which provide a the described pattern solution—as already done implicitly by fine grained refinement of a particular abstract management Alexander in his early publications on patterns [4]. pattern. These Automated Management Idioms are able to create

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 60

68 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

formal declarative descriptions of the management tasks to be technical details and proprietary idiosyncrasies. It is nearly performed automatically for individual applications that are impossible to reuse and adapt the implemented knowledge used afterwards to generate the corresponding executable man- efficiently for other applications and providers as the required agement processes using an existing management framework. effort to analyze and transfer the conceptual solution to other use This enables automating the application of abstract patterns to cases is much too high. This results in continually reinventing various concrete applications without human intervention, which the wheel for problems that were already solved multiple times— increases the efficiency of using patterns in Cloud Application but not documented in a way that enables reusing the conceptual Management. Thereby, the presented approach (i) helps IT knowledge. A solution for these issues are management patterns, experts to capture their management knowledge in an executable which provide a well-established means to document conceptual fashion and (ii) enables automating existing management solutions for frequently recurring problems in a structured, patterns—both independently from individual applications. To reusable, and tailorable way [13]. Therefore, a lot of architec- prove the relevance of our approach, we conduct a detailed tural and management knowledge for Cloud applications and case study that illustrates the high complexity of applying an their integration was captured in the past years using patterns, abstract migration management pattern to a concrete use case. e. g., Enterprise Integration Patterns [2] and Cloud Computing We validate the approach through a prototype that extends Architecture and Management Patterns [1]. A management an existing management framework by the presented concept pattern typically documents the general characteristics of a and the implementation of real world migration use cases to certain (i) problem, (ii) its context, and (iii) the solution in an prove its technical feasibility. Furthermore, we evaluate the abstract way without getting lost in technical realization details. concept in terms of automation, technical complexity, standards Thereby, they help applying proven conceptual management compliance, separation of concerns, and extensibility. expertise to individual applications, which typically requires a manual refinement of the pattern’s abstract solution for the The paper is structured as follows: in Section II, we describe concrete use case. However, this manual refinement leads to background information and a case study. In Section III, we two challenges that are tackled in this paper to enable applying describe the management framework which is extended by our the concept of management patterns efficiently in the domain approach presented in Section IV. In Section V, we evaluate the of Cloud Application Management: (i) technical complexity approach and present related work in Section VI. We conclude and (ii) automation of refinement and solution execution. the paper and give an outlook on future work in Section VII. 1) Technical Complexity of Refinement: The technical layer II.BACKGROUND,MOTIVATION, AND REQUIREMENTS of IT management and operation becomes a more and more difficult challenge as each new technology and its management In this section, we provide background information about issues increase the degree of complexity—especially if different the domain of Cloud Application Management and analyze the heterogeneous technologies are integrated in complex sys- requirements to apply the concept of patterns. Afterwards, we tems [10]. Thus, the required refinement of a pattern’s abstract motivate the presented approach through a case study. solution to fine grained technical management operations as well as their correct parametrization and execution are complex A. Automating Patterns for Cloud Application Management tasks that require technical expert management knowledge. In this section, we discuss why the concept of patterns 2) Automation of Refinement and Solution Execution: In and their automation are of vital importance in the domain Cloud Computing, application management must be automated of Cloud Application Management to optimize management as the manual execution of management tasks is too slow and and to reduce the number of errors and downtimes. Due error-prone since human operator errors account for the largest to the steadily increasing use of IT in enterprises, accurate fraction of failures in distributed systems [7][14]. Thus, the operation and management are of crucial importance to align refinement of a pattern’s abstract solution to a certain use case business and IT. As a consequence, requirements such as and the execution of the refined solution must be automated. high-availability, elasticity, and cheap operation of IT arise increasingly. Therefore, more and more enterprises outsource B. Case Study and Motivating Scenario their IT to external providers to achieve these properties reliably and to automate IT management—both enabled by In this section, we present a case study that is used Cloud Computing [12]. However, the fast growing number as motivating scenario throughout the paper to explain the of proprietary, mostly heterogeneous, Cloud services offered approach and to illustrate the high complexity of applying by different providers leads to a high complexity of creating abstract management patterns to concrete use cases. Due to composite Cloud applications and executing management tasks the important issue of vendor -in in Cloud Computing, we as the (i) offered services, (ii) non-functional capabilities, and choose a migration pattern to show the difficulties of refinement (iii) application programming interfaces (APIs) of different and the opportunities enabled by our approach. We explain the Cloud providers differ significantly from each other. As a result, important details of the pattern and apply it to the use case the actual structure of an application, the involved types of of migrating a Java-based application to the Amazon Cloud. Cloud services (e. g., platform services), and the management The selected pattern is called “Stateless Component Swapping technologies to be used mainly depend on the providers and Pattern” [15] and originates from the Cloud Computing pattern are hardly interoperable with each other due to proprietary language developed by Fehling et al. [1][13][15]. The question characteristics. This results in completely different management answered by this pattern is “How can stateless application processes for different Cloud providers implementing the same components that must not experience downtime be migrated?”. conceptual management task. As a consequence, the conceptual Its intent is extracting stateless application components from solution implemented in such processes gets obfuscated by one environment and deploying them into another while they are

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 61

69 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Origin Stateless Component Swapping Process Target (Domain) Environment Environment (refersTo) Provider: uniteddomains (refersTo) Name: myservice.org Account: myUser Password: myPassword

stack config (StatelessWAR) (StatelessWAR) Extract Recreate Component Application Stack File: serviceimpl.war File: serviceimpl.war URL: 92.68.1.1:8080/service URL: 54.73.142.190:8080/service application files (hostedOn) (hostedOn) application files (Tomcat7) (Tomcat7) Provision Component HTTP-Port: 8080 HTTP-Port: 8080 Username: TomcatAdmin Username: TomcatAdmin Password: jfwf?jowwßj Password: jfwf?jowwßj

Configure (hostedOn) (hostedOn) Load Balancing (Ubuntu12.04VM) (Ubuntu12.04) IP-Address: 54.73.142.190 SSHCredentials: …. SSHCredentials: …

Decommission (hostedOn) (hostedOn) Component (Server) (AmazonEC2)

IP-Address: 92.68.1.1 Account: MyAccount … Password: fw9aa2fr

Figure 1. Abstract Stateless Component Swapping Process (adapted from [15]). Figure 2. Use case for applying the Stateless Component Swapping Pattern.

active in both environments during the migration. Afterwards, Figure 2 shows the technical details of this migration. On the the old components are decommissioned. The context observed left, the current Enterprise Topology Graph (ETG) [17] of the by this pattern is that, in many business cases, the downtime application is shown. An ETG is a formal model that captures of an application is unacceptable, e. g., for customer-facing the current state of an application in the form of a topology, services. A stateless application shall, therefore, be migrated which describes all components and relations including their transparently to the accessing human users or applications. Here, types, configuration, and runtime information. ETGs of running “stateless” means that the application does not handle internal applications can be discovered fully automatically using the session state: the state is provided with each request or kept in “ETG Discovery Framework” [18]. We use the visual notation provider-supplied storage. Figure 1 depicts the pattern’s solution Vino4TOSCA [19] to render ETGs graphically: components as Business Process Model and Notation (BPMN) [16] diagram: are depicted as rounded rectangles containing their runtime the component to be migrated is first extracted from the origin properties below, relationships as arrows. Element types are environment while the required application stack is provisioned enclosed by parentheses. On the right, the goal of the migration concurrently in the target environment. Then, the component is is shown: all components and relationships drawn with dotted deployed on this stack in the target environment while the old lines belong to the goal state, i. e., the refined pattern solution. component is still active. Finally, after the new component is provisioned, the reference to the migrated component is updated To refine the pattern’s abstract solution process shown in (load balancing) and the old component is decommissioned. Figure 1 to this use case, the following tasks have to be performed: (i) the WAR to be migrated must be extracted This pattern shall be applied to the following concrete use from the origin environment, (ii) a virtual machine (VM) must case. A stateless Webservice implemented in Java, packaged as be provisioned on EC2, (iii) a Tomcat 7 Servlet Container must Web Archive (WAR), is currently hosted on the local physical IT be installed on this VM, (iv) the WAR must be deployed on infrastructure of an enterprise. Therefore, a Tomcat 7 Servlet the Tomcat, (v) the domain must be updated, and (vi) the old Container is employed that runs this WAR. The Tomcat is WAR must be decommissioned. The technical complexity of deployed on an Ubuntu Linux operating system that is hosted this migration is quite high as four different heterogeneous on a physical server. The Webservice is publicly reachable under management APIs and technologies have to be combined and a domain, which is registered at the domain provider “United one workaround is required to achieve these goals: the extraction Domains”. This service shall be migrated to Amazon’s public of the WAR deployed on the local Tomcat is not supported by Cloud to relieve the enterprise’s physical servers. Therefore, Tomcat’s management API [20]. Thus, a workaround is needed Amazon’s public Cloud offering “Elastic Compute Cloud (EC2)” that extracts the WAR file directly from the underlying Ubuntu is selected as target environment. On EC2, a virtual machine operating system. However, this is technically not trivial: an with the same Ubuntu operating system and Tomcat version SSH connection to the operating system must be established, the shall be provisioned to run the Webservice in the Cloud. respective directory must be found in which Tomcat stores the

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 62

70 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

(Domain) Management (refersTo) (refersTo) Planlet Library Provider: uniteddomains Name: myservice.org Account: myUser x Password: myPassword

Plan (StatelessWAR) (StatelessWAR) Generator File: serviceimpl.war File: $extractedWAR URL: 92.68.1.1:8080/service URL:

Desired Application Management (hostedOn) (hostedOn) State Model Workflow (Tomcat7) (Tomcat7) Figure 3. Management Planlet Framework Overview (adapted from [8]). HTTP-Port: 8080 HTTP-Port : 8080 Username: TomcatAdmin Username: TomcatAdmin Password: jfwf?jowwßj Password: jfwf?jowwßj (hostedOn) deployed WAR files, and the correct WAR must be transferred (hostedOn) from the remote host to a local host using protocols such (Ubuntu12.04VM) as Secure Copy (SCP). To provision a new virtual machine, (Ubuntu12.04) IP-Address: EC2’s HTTP-based management API [21] has to be invoked. SSHCredentials: …. However, an important security detail has to be considered SSHCredentials: …. here: per default, a virtual machine on EC2 is not accessible (hostedOn) (hostedOn) from the internet. Amazon employs so-called “Security Groups” to define firewall rules for accessing virtual machines. Thus, (Server) (AmazonEC2) as the Webservice should be accessible from the internet, a IP-Address: 92.68.1.1 Account: MyAccount Security Group must be defined to allow access. To install … Password: fw9aa2fr Tomcat 7, script-centric configuration management technologies such as Chef can be used. To deploy the WAR on Tomcat, = Create-Annotation = SetSecurityGroup-Annotation Tomcat’s HTTP-based management API has to be invoked. The = Destroy-Annotation = ExtractApplication-Annotation domain can be updated using the management API of United Domains [22]. However, to avoid downtime, the old WAR must not be decommissioned until the Domain Name System Figure 4. DASM describing the management tasks of the refined pattern. (DNS) servers were updated with the new URL. Therefore, a DNS Propagation Checker must be employed. To summarize, the required technical knowledge to refine and implement the on the associated element, e. g., to create, update, or destroy pattern’s abstract solution for this concrete use case is immense. the corresponding node or relation. A Management Annotation In addition, automating this process and orchestrating several (depicted as coloured circle) defines only the abstract semantics management APIs that provide neither a uniform interface nor of the task, e. g., that a node should be created, but not its compatible data formats is complex, time-consuming, and costly technical realization. Thus, in contrast to executable imperative as the process must be created by experts to avoid errors [10]. management descriptions such as management workflows that As a consequence, (i) handling the technical complexity and define all technical details, a DASM describes the management what (ii) automating this process are difficult challenges. tasks to be performed only declaratively, i. e., only the is described, but not the how. As a consequence, DASMs are not executable and are, therefore, transformed into executable III.EMPLOYED MANAGEMENT FRAMEWORK Management Workflows by the framework’s Plan Generator. The approach we present in this paper tackles these issues Figure 4 shows a DASM that realizes the motivating by extending the “Management Planlet Framework” [8][10][23]. scenario of Section II-B. The DASM describes the required This framework enables describing management tasks to be additional nodes and relations as well as Management Annota- performed in an abstract and declarative manner using Desired tions that declare the tasks to be performed: the green Create- Application State Models, which can be transformed fully Annotations with the star inside declare that the corresponding automatically into executable workflows by a Plan Generator node or relation shall be created while the red circles with the through orchestrating Management Planlets. In this section, we “X” inside represent Destroy-Annotations, which declare that the explain the framework briefly to provide all required informa- associated element shall be destroyed. The magenta coloured tion to understand the presented approach. The concept of the ExtractApplication-Annotation is used to extract the application management framework is shown in Figure 3. The Desired files of the WAR node, the blue coloured SetSecurityGroup- Application State Model (DASM) on the left declaratively Annotation configures the Security Group to allow accessing describes management tasks that have to be performed on the node from the internet. This DASM specifies the tasks nodes and relations of an application. It consists of (i) the described by the abstract solution of the Stateless Component application’s ETG, which describes the current structure and Swapping Pattern refined to the concrete use case of migrating a runtime information of the application in the form of properties, Java Webservice to EC2. As Management Annotations describe and (ii) Management Annotations, which are declared on nodes tasks only declaratively, the model contains no technical details and relations to specify the management tasks to be executed about management APIs, data formats, or the control flow.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 63

71 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

P Create Ubuntu 12.04 VM on Amazon EC2 Planlet IV. AUTOMATED REFINEMENT OF MANAGEMENT PATTERNS TO EXECUTABLE MANAGEMENT WORKFLOWS

(Ubuntu12.04VM) In the previous section, we explained how DASMs can be used to describe management tasks in an abstract manner SSHCredentials: * and how they are transformed automatically into executable IP-Address: * workflows by the Management Planlet Framework. Thereby, the (hostedOn) framework provides a powerful basis for automating patterns and handling the technical complexity of pattern refinement. (AmazonEC2)

Account: * Password: * (StatelessApplication) Annotated Topology Fragment Workflow (hostedOn)

(RuntimeEnvironment) Figure 5. Management Planlet that creates an Ubuntu VM on Amazon EC2. Topology Fragment Transformation

ABC The framework’s Plan Generator interprets DASMs and gener- Textual Description ates the corresponding executable workflows automatically. This Stateless Component Swapping SAMP is done by orchestrating so-called Management Planlets, which are workflows executing Management Annotations on nodes and relations such as deploying a WAR on Tomcat or updating a Figure 6. Stateless Component Swapping SAMP. domain with a new URL. Planlets are developed by technology experts and provide the low-level imperative management In a former work [8], we presented an approach to generate logic to execute the declarative Management Annotations used DASMs automatically by applying management patterns to in DASMs. Thus, they are reusable management building ETGs of running applications. However, as patterns should not blocks implementing the how of the declaratively described capture use case-specific information, the approach provides abstract management tasks in DASMs. Management Planlets only a semi-automated means and requires a manual refinement express their functionality through an Annotated Topology of the resulting DASM. Consequently, we call these patterns Fragment, which describes (i) the Planlet’s effects in the form Semi-Automated Management Patterns (SAMP). In this paper, of Management Annotations it executes on elements and (ii) we extend the concept of SAMPs. Therefore, we introduce preconditions that must be fulfilled to execute the Planlet. them briefly to provide required information. The input of a The Plan Generator orchestrates suitable Planlets to process SAMP is the current Enterprise Topology Graph (ETG) of the all Management Annotations in the DASM. The order of application to which the pattern shall be applied, its output is Management Planlets is determined based on their preconditions a DASM that declaratively describes the pattern’s solution and effects: all preconditions of a Planlet must be fulfilled by in the form of Management Annotations to be performed. the DASM itself or by another Planlet that is executed before. A SAMP consists of three parts, as shown in Figure 6: (i) Topology Fragment, (ii) Transformation, and (iii) textual Figure 5 shows a Planlet that creates a new Ubuntu 12.04 description. The Topology Fragment is a small topology that virtual machine on Amazon EC2. The Annotated Topology Frag- defines the pattern’s context, i. e., it is used to determine if a ment exposes the Planlet’s functionality by a Create-Annotation pattern is applicable to a certain ETG. Therefore, it describes attached to the node of type “Ubuntu12.04VM” which has a the nodes and relations that must match elements in the “hostedOn” relation to a node of type “AmazonEC2”. The ETG to apply the pattern to these matching elements. For Planlet’s preconditions are expressed by all properties that have example, the Stateless Component Swapping SAMP shown no Create-Annotation attached: the desired “SSHCredentials” in Figure 6 is applicable to ETGs that contain a component of of the VM node as well as “Account” and “Password” of the type “StatelessApplication” that is hosted on a component of Amazon EC2 node must exist to execute the Planlet, which type “RuntimeEnvironment”. As the type “StatelessWAR” is takes this information to create the new VM (we omit other a subtype of “StatelessApplication” and “Tomcat7” a subtype properties to simplify). The Planlet’s effects on elements are of “RuntimeEnvironment”, the shown SAMP is applicable to expressed by Create-Annotations on their properties, i. e., the the motivating scenario (cf. Figure 2). The second part is a Create-Annotation on the “IP-Address” of the VM node means Transformation that implements the pattern’s solution logic, that the Planlet sets this property. The existence of this property i. e., how to transform the input ETG to the output DASM that and the “SSHCredentials” are typical preconditions of Planlets describes the tasks to be performed. However, SAMPs provide that install software on virtual machines. Thus, Planlets can be only a semi-automated means as the resulting DASM requires ordered based on such properties. The strength of Management further manual refinement—similar to the required refinement Planlets is hiding the technical complexity completely [8]: of normal patterns for concrete use cases. Figure 7 shows the Plan Generator orchestrates Planlets based only on their the DASM resulting from applying the Stateless Component Topology Fragments and the abstract Management Annotations Swapping SAMP to the ETG described in the motivating declared in the DASM, i. e., without considering technical scenario. As the pattern’s transformation can not be aware of details implemented by the Planlet’s workflow. This provides the desired refinement for this use case, it is only able to apply an abstraction layer to integrate management technologies. the abstract solution: (i) the stateless application is extracted,

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 64

72 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

(Domain) Management Pattern (refersTo) (refersTo) Provider: uniteddomains Name: myservice.org Account: myUser refines refines Password: myPassword Management Idiom 1 … Management Idiom n

(StatelessWAR) (StatelessWAR) considers Refinement Layer considers File: serviceimpl.war File: $extractedWAR URL: 92.68.1.1:8080/service URL: Use Case 1 Use Case n (hostedOn) (hostedOn) …

(Tomcat7) (Tomcat7) Figure 8. Additional refinement layer below management patterns. HTTP-Port: 8080 HTTP-Port: 8080 Username: TomcatAdmin Username: TomcatAdmin Password: jfwf?jowwßj Password: jfwf?jowwßj applied fully automatically results from the generative nature of (hostedOn) (hostedOn) patterns since they describe only the general core of solutions. Consequently, they must be refined manually to individual use (Ubuntu12.04) (Ubuntu12.04) cases. Therefore, we need a more specific means below patterns SSHCredentials: …. SSHCredentials: …. to document concrete management problems and already refined (hostedOn) (hostedOn) solutions. According to Buschmann et al. [24], who consider design issues of software, so-called “idioms” represent low- (Server) (Server) level patterns that deal with the implementation of particular

IP-Address: 92.68.1.1 IP-Address: design issues in a certain programming language. They address … … aspects of both design and implementation and provide, thus, already refined pattern solutions. For example, an idiom may = Create-Annotation = ExtractApplication-Annotation describe concretely how to implement the abstract Model-View- Controller Pattern (MVC) in Java. Thus, this concept solves a = Destroy-Annotation similar kind of refinement problem in the domain of software architecture and design. Therefore, we adapt this refinement Figure 7. DASM after applying the Stateless Component Swapping SAMP. concept to the domain of Cloud Application Management. We insert an additional refinement layer below management (ii) the stack is copied, and (iii) all incoming relations of the patterns in the form of Application Management Idioms, which application to be migrated are switched to the new deployment. are a tailored incarnation of a certain abstract management To refine this DASM for the motivating scenario, the “Server” pattern refined for a concrete problem, context, and solution node type must be replaced by “AmazonEC2”, the operating of a certain use case. Figure 8 shows the conceptual approach: system must be a virtual machine of type “Ubuntu12.04VM”, a management pattern is refined by one or more Management and a Management Annotation must be added to configure the Idioms that consider each a concrete use case of the pattern and Security Group (cf. Figure 4). However, manual refinement provide an already refined solution. Thus, instead of refining violates the two requirements analyzed in Section II-A as (i) the an abstract management pattern manually to a certain use technical complexity remains (e. g., operator has to be aware case, the pattern to be applied can be refined automatically by of Security Groups) and (ii) human intervention is required, selecting the appropriate Management Idiom directly. Applying which breaks the required full automation. Of course, Semi- this concept to our motivating scenario, the refinement of the Automated Management Patterns could be created for concrete Stateless Component Swapping Pattern to our concrete use use cases as also shown in Breitenbucher¨ et al. [8]. However, case of migrating a stateless Java Webservice packaged as this violates the abstract nature of patterns and causes confusing WAR to Amazon EC2 (see Section II-B) can be captured by dependencies between patterns and use cases, which decreases a “Stateless WAR from Tomcat 7 to Amazon EC2 Swapping usability. In addition, with the increasing number of such use Idiom”, which describes the refined problem, context, and case-specific management patterns, the conceptual part of the solution. Thus, instead of providing only the generic solution, abstract solution gets obfuscated. Therefore, we add an explicit this idiom is able to describe the tasks in detail tailored to this refinement layer to separate the different layers of abstraction. concrete context. Thereby, management tasks such as defining the Security Group can be described while the link to the A. Overview of the Approach conceptual solution captured by the actual pattern is preserved. In this and the next section, we present the main contribution The additional refinement layer enables separating concerns: of this paper. The approach enables applying patterns fully on the management pattern layer, the generic abstract problem, automatically to concrete use cases and solves the two problems context, and solution are described to capture the conceptual analyzed in Section II-A: (i) handling the technical complexity core. On the Management Idiom layer, a concrete refinement of refinement and (ii) automating refinement and solution exe- is described that possibly obfuscates the conceptual solution cution. The goal is to automate the whole process of applying a partially to tackle concrete issues of individual use cases. As a pattern selected by a human operator to a concrete use case. The consequence, the presented approach enables both (i) capturing reason that Semi-Automated Management Patterns can not be generic management knowledge and (ii) providing tailored

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 65

73 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

System System System System

Manual Automatic Automatic Automatic Automatic ETG Management Pattern Management Idiom Workflow Workflow Discovery and Idiom Selection Application Generation Execution 1 2 3 4 5

Handling technical complexity of refinement

Automating refinement and solution execution

Figure 9. Process that describes how to apply a management pattern fully automatically to a running application using Automated Management Idioms.

refinements linking to the actual pattern. In the next section, we origin environment. Similar to the transformation of the pattern, extend the concept of Semi-Automated Management Patterns all incoming relations of the old WAR node are redirected to by Automated Management Idioms following this concept. the new WAR. Thus, the resulting DASM corresponds exactly to the DASM shown in Figure 4, which would be created B. Automated Management Idioms manually by an expert to refine the pattern to this use case. As a result, applying this Automated Management Idiom to To automate the new refinement layer, we introduce Auto- the motivating scenario’s ETG results in a completely refined mated Management Idioms (AMIs) in this section. Automated DASM which can be transformed directly into an executable Management Idioms refine Semi-Automated Management workflow by the Management Planlet Framework. Thus, no Patterns through (i) providing a refined Topology Fragment further manual refinement is required to apply the pattern to that formalizes the particular context to which the idiom is this use case. Nevertheless, the Management Planlet Framework applicable and (ii) implementing a more specific transformation enables implementing this Automated Management Idiom on tailored to the refined context for transforming the input ETG a high level of abstraction due to Management Annotations. directly into an already refined DASM. Each AMI additionally defines a Pattern Reference (PR) linking to its original pattern. C. Fully Automated Pattern-based Management System In this section, we explain the process of applying a manage- ment pattern in the form of an Automated Management Idiom (StatelessWAR) automatically to a running application using the Management (hostedOn) Planlet Framework. This process provides the basis for a Fully Automated Pattern-based Management System (Tomcat7) and consists of the five steps shown in Figure 9. First, the ETG of the Topology Fragment Transformation application to be managed has to be discovered. This step is automated by using the ETG Discovery Framework [18], ABC Textual Description which gets an entry node of the application as input, e. g., the URL pointing to a deployed Webservice, and discovers the Stateless WAR from Tomcat 7 to PR complete ETG of this application fully automatically including Amazon EC2 Swapping AMI all runtime properties. In the second step, the user selects the pattern to be applied in the form of an Automated Management Figure 10. Stateless WAR from Tomcat 7 to Amazon EC2 Swapping AMI. Idiom. This is the only manual step of the whole process. In the third step, the selected idiom is applied by the framework fully Figure 10 shows the AMI that refines the Stateless Com- automatically to the discovered ETG. Therefore, the idiom’s ponent Swapping SAMP for our motivating scenario. The transformation is executed on the ETG. The output of this refinement of the context, to which this idiom is applicable, is step is a Desired Application State Model that is based on expressed by the refined Topology Fragment: while the pattern the discovered ETG but additionally contains all management is applicable to topologies that contain an abstract “StatelessAp- tasks to be executed in the form of Management Annotations. plication” that is hosted on an abstract “RuntimeEnvironment”, This DASM is used in Step 4 by the Management Planlet the shown idiom refines this fragment and is only applicable Framework to generate an executable management workflow, as to a node of type “StatelessWAR” that is hosted on a node described in Section III. In the last step, the generated workflow of type “Tomcat7”. The refinement of the Topology Fragment is executed using a workflow engine. The overall process fulfills restricts the idiom’s applicability and enables to implement the two requirements analyzed in Section II-A: (i) handling a transformation that considers exclusively this concrete use technical complexity and (ii) automating refinement and so- case. Therefore, the idiom’s transformation differs from the lution execution. The technical complexity is hidden by the pattern’s transformation by adding more details to the output framework’s declarative nature: management tasks are described DASM: it refines the “Server” node directly to “AmazonEC2” only as abstract Management Annotations. All implementation and adds the required Management Annotation that configures details are inferred automatically by Management Planlets. the Security Group accordingly in order to enable accessing Thus, the user only selects the idiom to be applied, the technical the migrated WAR from the internet. The operating system is realization is invisible and automated. The second requirement exchanged by “Ubtuntu12.04VM” while the “Tomcat7” and of automation is achieved through executable transformations “StatelessWAR” nodes are simply copied from the ETG of the of idioms and the employed Management Planlet Framework.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 66

74 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

D. Automatic Refinement Filtering and Pattern Configuration User Interface

To apply a pattern automatically using this system, the user Application Management API first triggers the automatic ETG discovery of the application to be managed, searches the Semi-Automated Management Pattern ETG Discovery TOSCA Pattern and Pattern and (SAMP) to be applied, and selects the desired refinement in Framework Importer Idiom Manager Idiom Applier the form of an Automated Management Idiom. Based on their Plan Topology Fragments, non-applicable Automated Management Pattern & Idiom Generator Idioms are filtered automatically by the system. For example, Library Workflow if the Stateless Component Swapping SAMP is refined by Engine Planlet two different Automated Management Idioms—one is able to Library migrate a PHP application hosted on an Apache HTTP Server to Amazon EC2, the other is the described WAR migration idiom Figure 11. Fully Automated Pattern-based Management System Architecture. shown in Figure 10—the system offers only the idioms whose Topology Fragments match the elements in the ETG. Thus, in case of our motivating scenario, only the idiom for migrating be applied are passed to the Pattern and Idiom Applier, which the WAR application is offered as exclusively its Topology executes the pattern’s / idiom’s transformation on the ETG Fragment matches the ETG. Based on this matchmaking, the and passes the resulting DASM to the Plan Generator. The system is able to offer only applicable management patterns generator is connected to a local Management Planlet Library and refinements in the form of Automated Management Idioms. that contains all available Planlets, which are implemented Therefore, we call this concept Automatic Refinement Filtering. using the standardized Business Process Execution Language (BPEL) [25]. It generates the corresponding management If multiple idioms of a selected SAMP match the ETG workflow also in BPEL, which is deployed and executed after filtering, e. g., one idiom migrates the WAR application afterwards on the employed workflow engine “WSO2 Business to Amazon EC2, another to Microsoft’s public Cloud “Azure”, Process Server 2.0”. We implemented several Semi-Automated the user is able to configure the refinement of the pattern by Management Patterns and corresponding refinements in the a simple manual idiom selection while its actual application form of Automated Management Idioms, e. g., for migration and execution are automated completely. Therefore, we call scenarios similar to the one used in this paper. this concept Configurable Automated Pattern Refinement. The combination of these two concepts automates the refinement in The Topology and Orchestration Specification for Cloud two dimensions: first, Automatic Refinement Filtering preselects Applications (TOSCA) [26] is an OASIS standard to describe applicable refinements automatically, which relieves the user Cloud applications and their management in a portable way. from finding applicable idioms. Second, the remaining filtered Therefore, it provides a specification to model application idioms can be applied directly without human intervention. topologies that can be linked with management workflows. Thus, the only manual step in this process is selecting the As our approach is based on the same concepts, our and the desired configuration in the form of an idiom. supports importing and deploying TOSCA-based applications, which can be managed afterwards using the presented approach.

V. VALIDATION AND EVALUATION B. Standards Compliance and Interoperability In this section, we validate the presented approach by a (i) Standards are a means to enable reusability, interoperability, prototypical implementation and evaluate the concept in terms and maintainability of software and hardware, which leads to of (ii) standards compliance, (iii) automation, (iv) technical higher productivity and helps aligning the enterprise’s IT to its complexity, (v) separation of concerns, and (vi) extensibility. business. However, most available management approaches are based on non-standardized APIs or domain-specific languages A. Prototype (DSLs) which makes it difficult to provide and transfer the required knowledge. The presented approach tackles this issue To validate the technical feasibility of the approach, we by supporting two existing standards: (i) TOSCA is used extended the Java prototype presented in our former work [8] to import standardized application descriptions that can be by Automated Management Idioms. The system’s architecture managed using SAMPs and AMIs while the (ii) BPEL standard is shown in Figure 11. The Web-based user interface is is used to implement and execute the generated Management implemented in HTML5 using Java Servlets. Therefore, the Workflows. In addition, Management Planlets and the generated prototype runs on a Tomcat 7 Servlet Container. The UI calls workflows are implemented in BPEL, which is a standard Application Management API the to apply a pattern to a running to describe executable workflows. Thus, they are portable application. Therefore, it discovers the application’s ETG by across standard-compliant BPEL engines. In addition, Planlets integrating the ETG Discovery Framework [18] and employs allow integrating different kinds of management technologies Pattern and Idiom Manager a to select the pattern / idiom seamlessly [8][10]. As a result, the presented approach is to be applied. The manager is connected to a local library agnostic to individual management technologies and Cloud which stores deployable SAMPs and AMIs in the form of Java- providers, which supports interoperability and eases integration. based Webservices and implements a matchmaking facility to analyze which patterns / idioms are applicable to a certain C. Automation ETG. To add new patterns or idioms to the system, the library provides an interface to register additional implementations. In Cloud Application Management, automation is of vital After selection, the discovered ETG and the SAMP / AMI to importance. Workflow technology enables automating the execu-

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 67

75 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

tion of management tasks and provides powerful features such F. Extensibility as automated recoverability and compensation [9]. However, if these processes that implement a refined solution of the The presented approach is extensible on multiple layers as it respective management pattern have to be created manually, provides an explicit integration framework for patterns, idioms, it is not efficient and, in addition, error-prone. The presented and Planlets through using libraries. New Semi-Automated approach automates the creation of workflows that implement Management Patterns and Automated Management Idioms can a pattern’s refined solution through the introduced Automated be created based on a uniform Java interface and integrated into Management Idiom layer and the employed framework. Thus, the system seamlessly by a simple registration at the library. All only the choice when to trigger a management pattern is left patterns and idioms in the library are considered automatically to the human operator. This automation decreases the risk when an application shall be managed by comparing their of human errors, which account for the largest fraction of Topology Fragments with the application’s ETG. To extend failures and system downtimes in distributed systems [7][14]. the system in terms of management technologies, Planlets can In addition, the required technical knowledge to understand the be implemented for new node types, relationship types, or different management technologies is not required at all as the Management Annotations and stored in the Planlet Library. whole process of determining the tasks to be performed and The framework’s Plan Generator integrates new Management orchestrating the required technologies is fully automated. Planlets without further manual effort when processing DASMs.

D. Technical Complexity VI.RELATED WORK Manually handling the technical complexity of pattern Several works focus on automating application management refinement for composite Cloud applications is a major issue in terms of application provisioning and deployment. Eilam et due to heterogeneous and proprietary management technologies al. [27], Arnold et al. [28], and Lu et al. [29] consider pattern- (cf. Section II). The presented approach tackles this issue based approaches to automate the deployment of applications. by automating the whole refinement process from a pattern’s However, their model-based patterns are completely different abstract solution to the final executable workflow on two from the abstract kind of patterns we consider in this paper. In layers: (i) automating refinement by Automated Management their works, patterns are topology models which are used to as- Idioms and (ii) automated workflow generation using the sociate or derive the corresponding logic required to deploy the Management Planlet Framework. The automated refinement combination of nodes and relations described by the topology, removes the two manual tasks to specify concrete node and similarly to the Annotated Topology Fragments of Management relationship types for abstract types and to add refinement- Planlets. Mietzner [30] presents an approach for generating specific Management Annotations (cf. Figure 4 and Figure 7). provisioning workflows by orchestrating “Component Flows”, Hence, the formerly required technical expertise on proprietary which implement a uniform interface to provision a certain management idiosyncrasies such as Security Groups is no longer component. However, these works focus only on provisioning a mandatory prerequisite. Secondly, the orchestration of the and deployment and do not support management. different management technologies is completely handled by the Plan Generator that transforms the refined DASM fully Fehling et al. [1][15] present Cloud Application Manage- automatically into an executable Management Workflow. Thus, ment Patterns that consider typical management problems, this removes all manual steps and the only task that is left e. g., the “Stateless Component Swapping Pattern”, which to the human user is choosing the AMI to be applied. This was automated in this paper. They propose to attach abstract eliminates potential sources of errors on the technical layer [1]. processes to management patterns that describe the high-level steps of the solution. However, these abstract processes are not E. Separation of Concerns executable until they are refined manually for individual use cases. Their management patterns also define requirements in The presented approach separates concerns through splitting the form of architectural patterns that must be implemented the process of selecting, creating, and automating a pattern, its by the application. These dependencies result in a uniform refinement, and execution. Our approach enables IT experts to pattern catalog that interrelates abstract management patterns capture generic management knowledge in patterns that can be with architectural patterns. In addition, they propose to annotate refined through Automated Management Idioms developed reusable “Implementation Artifacts” to patterns that may assist by specialists of certain areas. Thus, pattern creators and applying the pattern, e. g., software artifacts or management AMI developers are separate roles that are responsible for processes. Their work is conceptually equal to our approach different kinds of knowledge: SAMP creators capture generic in terms of using processes to automate pattern application. management knowledge, AMI creators refine this through However, most of the refinement must be done manually, which implementing concrete technology-specific management knowl- leads to the drawbacks discussed in Section I: management edge. In addition, experts of low-level API orchestration that processes must be created in advance to be executable when understand the different management technologies are able to needed. In addition, changing application structures caused by automate their knowledge through implementing Management other pattern applications, for example, migration patterns, lead Planlets. Thus, even AMI creators do not have to deal with to outdated processes that possibly result in reimplementation. complex technology-specific details such as API invocations In Falkenthal et al. [31], we show how reusable solution or parametrization: they only implement their knowledge implementations, e. g., Management Workflows, can be linked declaratively, i. e., without defining the final API calls etc. This with patterns. However, also this approach requires at least enables separating different responsibilities and concerns as one manual implementation of the concrete solution which well as a seamless integration of different kinds of knowledge. is typically tightly coupled to a certain application structure.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 68

76 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Fehling et al. [32] also present a step-by-step pattern identifi- [9] F. Leymann and D. Roller, Production workflow: concepts and techniques. cation process supported by a pattern authoring toolkit. This Prentice Hall PTR, 2000. authoring toolkit can be combined with our approach to create [10] U. Breitenbucher,¨ T. Binz, O. Kopp, F. Leymann, and J. Wettinger, patterns and idioms that can be automated afterwards. Thus, “Integrated cloud application provisioning: Interconnecting service-centric and script-centric management technologies,” in CoopIS. Springer, the work of Fehling et al. provides the basis for creating AMIs September 2013, pp. 130–148. out of patterns captured in natural text following this process. [11] S. Leonhardt, “A generic artifact-driven approach for provisioning, con- Reiners et al. [33] present an iterative pattern evolution figuring, and managing infrastructure resources in the cloud,” Diploma thesis, University of Stuttgart, Germany, November 2013. process for developing patterns. They aim for documenting [12] F. Leymann, “Cloud Computing: The Next Revolution in IT,” in Proc. and developing application design knowledge from the very 52th Photogrammetric Week, September 2009, pp. 3–12. beginning and continuously developing findings further. Non- [13] C. Fehling, F. Leymann, R. Retter, W. Schupeck, and P. Arbitter, Cloud validated ideas are documented already in an early stage and Computing Patterns: Fundamentals to Design, Build, and Manage Cloud have to pass different phases until they become approved Applications. Springer, January 2014. design patterns. This process can be transferred to the domain [14] D. Oppenheimer, A. Ganapathi, and D. A. Patterson, “Why do internet of Cloud Application Management Patterns. Our approach services fail, and what can be done about it?” in USITS. USENIX supports this iterative pattern evolution process as it helps to Association, June 2003, pp. 1–16. apply captured knowledge easily and quickly to new use cases. [15] C. Fehling, F. Leymann, S. T. Ruehl, M. Rudek, and S. Verclas, “Service Migration Patterns - Decision Support and Best Practices Thus, it provides a complementary framework to our approach for the Migration of Existing Service-based Applications to Cloud that enables validating and capturing knowledge. Environments,” in SOCA. IEEE, December 2013, pp. 9–16. [16] OMG, Business Process Model and Notation (BPMN), Version 2.0, VII.CONCLUSIONAND FUTURE WORK Object Management Group Std., Rev. 2.0, January 2011. [17] T. Binz, C. Fehling, F. Leymann, A. Nowak, and D. Schumm, “For- In this paper, we presented the concept of Automated malizing the Cloud through Enterprise Topology Graphs,” in CLOUD. Management Idioms that enables automating the refinement IEEE, June 2012, pp. 742–749. and execution of a pattern’s abstract solution automatically [18] T. Binz, U. Breitenbucher,¨ O. Kopp, and F. Leymann, “Automated to a certain use case by generating executable management Discovery and Maintenance of Enterprise Topology Graphs,” in SOCA. workflows. We showed that the approach enables (i) applying IEEE, December 2013, pp. 126–134. the concept of patterns efficiently in the domain of Cloud [19] U. Breitenbucher,¨ T. Binz, O. Kopp, F. Leymann, and D. Schumm, “Vino4TOSCA: A visual notation for application topologies based on Application Management through automation, (ii) abstracting TOSCA,” in CoopIS. Springer, September 2012, pp. 416–424. the technical complexity of refinement, and (iii) reducing human [20] Apache Software Foundation. Apache Tomcat 7 API. [Online]. intervention. The approach enables operators to apply various Available: http://tomcat.apache.org/tomcat-7.0-doc/ management patterns fully automatically to individual use [21] Amazon Web Services. Elastic Compute Cloud API Reference. [Online]. cases without the need for detailed technical expertise. The Available: http://docs.aws.amazon.com/AWSEC2/latest/APIReference prototypical validation, which extends the Management Planlet [22] United Domains. Reselling API. [Online]. Available: http://www. Framework, proves the concept’s technical feasibility. In future ud-reselling.com/api work, we plan to investigate how Automated Management [23] U. Breitenbucher,¨ T. Binz, O. Kopp, F. Leymann, and M. Wieland, Idioms can be triggered automatically based on occurring events “Policy-Aware Provisioning of Cloud Applications,” in SECURWARE. and how multiple patterns can be applied together. Xpert Publishing Services, August 2013, pp. 86–95. [24] F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, and M. Stal, Pattern-Oriented Software Architecture, Volume 1: A System of Patterns. ACKNOWLEDGMENT Wiley, 1996. [25] OASIS, Web Services Business Process Execution Language (WS-BPEL) This work was partially funded by the BMWi project Version 2.0, OASIS, April 2007. CloudCycle (01MD11023). [26] OASIS, Topology and Orchestration Specification for Cloud Applications Version 1.0, May 2013. REFERENCES [27] T. Eilam, M. Elder, A. Konstantinou, and E. Snible, “Pattern-based composite application deployment,” in IM. IEEE, May 2011, pp. [1] C. Fehling, F. Leymann, J. Rutschlin,¨ and D. Schumm, “Pattern-based 217–224. development and management of cloud applications.” Future Internet, vol. 4, no. 1, March 2012, pp. 110–141. [28] W. Arnold, T. Eilam, M. Kalantar, A. V. Konstantinou, and A. A. Totok, “Pattern based soa deployment,” in ICSOC. Springer, September 2007, [2] G. Hohpe and B. Woolf, Enterprise Integration Patterns: Designing, pp. 1–12. Building, and Deploying Messaging Solutions. Addison-Wesley, 2003. [29] H. Lu, M. Shtern, B. Simmons, M. Smit, and M. Litoiu, “Pattern-based [3] A. Nowak et al., “Pattern-driven Green Adaptation of Process-based deployment service for next generation clouds,” in SERVICES. IEEE, Applications and their Runtime Infrastructure,” Computing, February June 2013, pp. 464–471. 2012, pp. 463–487. [30] R. Mietzner, “A method and implementation to define and provision [4] C. Alexander, S. Ishikawa, and M. Silverstein, A Pattern Language: variable composite applications, and its usage in cloud computing,” Towns, Buildings, Construction. Oxford University Press, 1977. Dissertation, University of Stuttgart, Germany, August 2010. [5] P. Mell and T. Grance, “The NIST Definition of Cloud Computing,” [31] M. Falkenthal, J. Barzen, U. Breitenbucher,¨ C. Fehling, and F. Leymann, Tech. Rep., July 2009. “From Pattern Languages to Solution Implementations,” in PATTERNS. [6] M. Armbrust et al., “A view of cloud computing,” Communications of Xpert Publishing Services, May 2014. the ACM, vol. 53, April 2010, pp. 50–58. [32] C. Fehling, T. Ewald, F. Leymann, M. Pauly, J. Rutschlin,¨ and [7] A. B. Brown and D. A. Patterson, “To err is human,” in EASY, July D. Schumm, “Capturing cloud computing knowledge and experience in 2001, p. 5. patterns,” in CLOUD. IEEE, June 2012, pp. 726–733. [8] U. Breitenbucher,¨ T. Binz, O. Kopp, and F. Leymann, “Pattern-based [33] R. Reiners, “A pattern evolution process - from ideas to patterns.” in runtime management of composite cloud applications,” in CLOSER. Informatiktage. GI, September 2012, pp. 115–118. SciTePress, May 2013, pp. 475–482.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 69

77 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

A Method for Situational and Guided Information System Design

Dalibor Krleža Krešimir Fertalj Global Business Services Department of Applied Computing IBM Faculty of Electrical Engineering and Computing Miramarska 23, Zagreb, Croatia University of Zagreb [email protected] Unska 3, Zagreb, Croatia [email protected]

Abstract—Model Driven Architecture is not highly used in were developed for specific projects. Some generic design current information system development practice. One of the and development methodologies, such as Rational Unified reasons is that modeling languages are mostly used for Process (RUP) [3][12], also rely on model based design. documenting of the information system development and However, basic purpose of methodologies is to help organize enhancement of communication within project teams. Without projects, giving guidance for project activities and guidance, an information system design results in models of deliverables, leaving execution to project team members. low quality, which cannot be used for anything more than When a modeling language is structured and formal documenting pieces of the information system. When it comes enough, automatic transformation can be used. The Meta to model transformation, project team members usually reuse Object Facility (MOF), standardized by the OMG and predefined transformations included in a modeling tool. described in [5], is a metalanguage for modeling languages Models of low quality that are not traceable and not complete are hard to transform. Model quality cannot be high if that can be transformed automatically. Automatic modeling activities are not guided and constrained. Guidance transformation takes artifacts of a source model and converts and constraints can be imposed through project activities led them into artifacts of a target model by using transformation by a senior designer responsible for model quality. In this mapping. Transformation can be additionally used to article, a method for situational modeling guidance is establish relationships between models, or to check presented. This method is adaptable and situation dependent. consistency of artifacts between a source and target model. Implemented within a modeling tool, the method should allow Czarnecki and Helsen [4] elaborate a number of model project team members responsible for model quality to give transformation approaches and basic features of guidance and constraints, and to ensure model quality through transformation rules. Most used are graph based the modeling tool. transformations and transformation languages. Transformation languages can be declarative or imperative. Keywords-modeling; guidance; design; pattern; The OMG standardized group of MOF based transformation transformation. languages named Query/View/Transformation (QVT) [7]. QVT Relational language (QVT-R) is a typical example of a I. INTRODUCTION declarative approach with a graphical notation. QVT The Model Driven Architecture (MDA), standardized by Operational language (QVT-O) is an example of an the Object Management Group (OMG) [1], is an information imperative approach. system design approach based on models and model The focus of this article is delivery of the models that are transformations. Using MDA, an information system is of high quality. In order to understand what this means, we designed through several models of different abstraction can use one of the existing quality models. One example is levels, from business oriented models to technical and the quality model given by Lange and Chaudron [8]. Relying platform specific models. MDA defines three different types only on methodology guidance will not necessarily produce of models having different levels: abstract and business a model of high quality, because it allows designers to focus oriented Computational Independent Model (CIM), on wrong aspects and details within the model. The result technically oriented Platform Independent Model (PIM), and can be a model of poor quality, problems with traceability very detailed Platform Specific Model (PSM). and inability to transform or analyze created model. One way Model transformation is a key procedure in MDA. to solve these problems is by appointing a senior designer to According to the specification [1], "model transformation is the design lead role. The design lead responsibilities are to the process of converting one model to another model". establish modeling guidance and constraints, oversee Model transformation can be done manually or modeling work, check delivered models and to ensure model automatically. Manual model transformation is more quality. According to the quality model [8], this means that common than we think. It is not unusual for a designer to all models are traceable, complete, consistent and start modeling from scratch by using models delivered earlier correspondent to the information system. Establishing in the project. Such an approach is defined within various modeling constraints means imposing patterns that need to design and development methodologies. Chitforoush, be used during the information system design. Yazdandoost and Ramsin [2] are giving an overview of MDA specific methodologies. Most of these methodologies

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 70

78 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

In this article, a method for automated modeling information system. The modeling space contains four guidance and imposing constraints is proposed. The layers. The application layer is comprised of models with the proposed method utilizes existing specifications such as business logic. The information layer is comprised of MDA, MOF, UML and QVT to achieve a guided process of information and data models. Models containing architecture an information system design, supported by model details and infrastructure nodes are placed in the transformation. The proposed method will be extensible in infrastructure layer. And finally, there needs to be a specific order to naturally fit existing design and development layer for transformation and tracing models. Of course, a methodologies. The purpose of the method is to ease number of layers and their purpose depend on a set of communication between a design lead and his team members models representing an information system design. One and to enable management of an information system design model can belong to multiple layers. For example, a model work through usage of a modeling tool. containing requirements can easily be considered for In Section II, a modeling space is defined. The modeling application, information and infrastructure related. The space is a way how to combine all models of an information proposed modeling space must also support a clear system together, giving them relationship and defining their distinction between abstract and detailed models. Abstract purpose. In the same Section, a relationship between pattern and computing independent models are placed on top of each instances and models is given. In Section III, current layer. Models with more details are closer to the bottom of modeling practice in the context of methodologies is the layer. Figure 1 shows the placement of different MDA discussed, which helps understand how the pattern instances model types in the proposed modeling space. are created during the project. In Section IV, an overview of the pattern instance transformation is given. The pattern instance transformation is essential for the method proposed in this article. In Section V, the tracing and transformation language is defined. This language is used to bind pattern instances together and help to establish tracing between model artifacts. In Section VI, an overview of the method for situational and guided information system design is given.

II. MODELING SPACE Figure 2. Models and pattern instances placed in the application A modeling space can be represented as a three layer of the modeling space

dimensional space containing all possible models of a Each model is a set of artifacts. These artifacts originate designed information system. The modeling space must from a modeling language, such as UML. A set of models follow MDA philosophy, support different levels of together represent a design of an information system. abstraction given in MDA specification, and classification of However, there are building blocks between single artifact the containing models. and a whole model that are meaningful for designers. These building blocks are patterns. Example of repeating patterns within different models is given in Figure 2. CIM1 contains repeating sets of model artifacts that can be interpreted as requirements, CIM2 contains business processes, PIM1 use cases, PIM2 components, and PSM1 implementation of components defined in PIM2. Which patterns will appear in the modeling space depends on the design of an information system. Computational independent patterns are usually created early in the project and they depend on used architecture as well as how business analysis is performed. These high level abstract patterns have the biggest impact on the design of an information system. Platform independent patterns are derived from architecture and computational independent patterns. They represent an elaboration of computational independent patterns within an architectural context. The most detailed are platform specific patterns that represent the implementation of platform independent patterns for a specific infrastructure yielded by the previously determined architecture. Figure 1. Structure of a modeling space In order to establish the method proposed in this article, a library of modeling patterns and transformations must be The proposed modeling space presented in Figure 1 is in established. Modeling patterns can be determined in several three dimensions because it contains different layers different ways. Gamma, Helm, Johnson and Vlissides [9] representing respective aspects or viewpoints of the designed

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 71

79 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

propose a list of basic object-oriented patterns visualized in non-functional requirements, business processes, or business the UML. Hohpe and Woolf [10] propose a list of enterprise use cases. Idea is to have these patterns and related integration patterns. Enterprise integration patterns are more transformations ready for use in the modeling library that is abstract than object-oriented patterns. used for the project. Elaboration of newly created pattern Collecting modeling patterns from existing models of the instances in CIMs can be done in the Inception phase. already developed information system is another way. It can PIMs, part of the PSMs, architecture models and be done manually or automatically by detecting repetitions in infrastructure models, are created in the Elaboration phase. existing models. Detection itself can be done by the graph In this phase, we do most of an information system design matching method [11]. However, this is just a part of a and take the most important decisions. In the Elaboration collection process. Rahm et al. [15] propose graph matching phase, patterns used in CIMs are guidance for choosing method for detection of cloned fragments in graph based patterns that will be used next. For example, usual patterns models. According to their definition, repetitive fragments that could be used here contain use cases, components and that are similar enough can be considered for clones or nodes. patterns. A similar approach can be applied to UML models. The PSM is usually the last step in the design of an And finally, as already mentioned, modeling patterns can information system. The ultimate goal is to get the source be a great way how to give a sense of direction and code and deployment units. Therefore, the PSM must contain cooperation to a team of designers. A pattern is a class, a pattern instances that define a sufficient level of details for blueprint that binds one or more modeling artifacts together. transformation into the source code, in a way that there is Application of a pattern means his instantiation within at less work as possible for programmers. Pattern instances in least one model in the modeling space. Applying the the PSM are mostly implementation of pattern instances in modeling pattern does not mean that the modeling is the PIM. For example, in the Component Based Modeling completed. Adding details and further elaboration of the (CBM), the PSM contains platform specific implementations pattern instance is needed, in order to give it enough details of components defined in the PIM. to fit an information system design. As the design of an information system advances through the project, designers can create new pattern instances or elaborate existing ones, as presented in Figure 4. A new pattern instance can be created to document business need, reflect already existing functionality that will be reused, or by transforming from already existing pattern instance in the modeling space. Transformation between pattern instances will probably be the most used option. Elaboration of the existing pattern instances is also very important. Once a new Figure 3. Example of a simple modeling pattern: component and pattern instance has been created, it must be elaborated in interface subsequent project activities.

Figure 3 presents a pattern that is comprised of an empty IV. PATTERN INSTANCE TRANSFORMATION interface and a component. After applying this pattern a pattern instance is created. Further elaboration of the pattern In the MDA specification [1], various different model-to- instance must add interface details, operations and attributes, model transformation examples can be found. subcomponents and additional interfaces. Transformation can be done within the same model, between two different models, for model aggregation, or model III. MAPPING BETWEEN METHODOLOGY AND PATTERN separation. Grunske et al. [6] are presenting important notion USAGE of "horizontal" and "vertical" transformations. Horizontal CIMs are usually created very early in the project. In the transformation is done between models of the same RUP, business models are created in the Inception phase. It abstraction level. Typical horizontal transformation is PIM to means that selecting and applying CIM related patterns and PIM, or PSM to PSM. Any transformation within the same further elaboration can be done very early in the project. model is also a horizontal transformation. Vertical These patterns will be classified as functional requirements, transformation is done between models of different

Figure 4. RUP and advancement through a design of an information system

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 72

80 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

abstraction levels, or from a model to the source code. A A transformation written in QVT-R [7] has two different transformation from PIM to PSM, or from PSM to the source modes: checking mode and enforcement mode. In the code is vertical transformation. checking mode, transformation rules can be used to validate Model transformation is the procedure for translating correctness and completeness of involved pattern instances. source model into target model. A modeling space can be In the enforcement mode, transformation rules can be used defined as a finite set of models 푀푆 = {푀1, 푀2, … , 푀푛 } . for creating, updating, or deleting artifacts in target pattern Each model is a finite set of artifacts 푀푖 = {푎1, 푎2, … , 푎푚 }. instances, in order to reflect all the details found in source A transformation is a function 푡푟: 푀푆 → 푀푆 that takes a set pattern instances. of artifacts 푎푟푆표 from a set of source models 푆표 ⊆ 푀푆 such that 푎푟푆표 ⊆ 푆표, analyses this set of artifacts and translates 1) Validation of pattern instances and imposing them into another set of artifacts 푎푟푇푎 in a set of target constraints models 푇푎 ⊆ 푀푆 , such that 푎푟푇푎 ⊆ 푇푎 . Transformation When the transformation is applied, execution of the can be done within the same model 푆표 = 푇푎 = 푀푖 , or transformation must perform several different tasks. between two disjunctive sets of models 푆표 ≠ 푇푎. Since a As the first step, transformation must validate that transformation can have multiple models from source and supplied source pattern instances are matching expected target side, these sets do not need to be disjunctive 푆표 ∩ source side of the transformation. A set of mandatory 푇푎 ≠ ∅, meaning that the transformation can include same transformation rules must validate source pattern instances. model 푀푖 on source and target side, or 푀푖 ∈ 푆표 ∧ 푀푖 ∈ 푇푎. If all mandatory transformation rules are satisfied from the A transformation can use the same source and target source side then the transformation can be applied to a artifacts, meaning that 푎푟푆표 ∩ 푎푟푇푎 ≠ ∅ when 푆표 ∩ 푇푎 ≠ ∅, supplied source, i.e., the transformation can be applied to the or it can use two disjunctive sets of artifacts 푎푟푆표 ∩ 푎푟푇푎 = source pattern instance that contains all artifacts needed by ∅. the mandatory transformation rules. From a pattern point of view, each pattern instance is a The second step is the creation of the target pattern set of model artifacts. This definition is valid for cross model instances. Transformation rules must create all target pattern pattern instances as well. All pattern instances in the instances and their artifacts. Every pattern is characterized by modeling space 푀푆 form a finite set of pattern instances the mandatory artifacts that define the essence of the pattern, 푀푃 = {푃푖 : 0 < 푖 ≤ 푚 ∧ 푃푖 ⊆ 푀푆} . In this context, or what makes this pattern different from other patterns. Not transformation is a function 푡푟: 푀푃 → 푀푝 . Such all artifacts created by the transformation must be considered transformation takes a set of source pattern instances for mandatory. Mandatory artifacts in the target pattern instances are created by the mandatory transformation rules. 푝푆표 ⊆ 푀푝 , analyses all artifacts in these instances and translates them into artifacts that form a set of target pattern However, not all mandatory transformation rules must create instances 푝 ⊆ 푀 . Figure 5 shows an example of mandatory artifacts in the target pattern instances. 푇푎 푝 The last step is to create a set of constraints that will transformation application to a cross model pattern instance. disallow designers to change some of the artifacts in the involved pattern instances. Transformation binds involved pattern instances together by imposing constraints on their artifacts. Each pattern instance can be bound with other pattern instances through several different transformations. Constraints are imposed by the mandatory transformation Figure 5. Cross model pattern instance and transformation rules. Imposed constraints are used to limit designer changes in Every transformation can be encapsulated in a black box the modeling space to prevent: implementation. Such an approach is used in [7] along with 1. Violating correctness and completeness of the the QVT specification. However, taking a step back and pattern instances by changing their mandatory observing the QVT specification as one of the transformation artifacts. Obviously, all mandatory artifacts must be approaches, every transformation can be defined as a black constrained. box having an interface that depends on the context of 2. Breaking transformation binding by changing transformation usage. artifacts that are satisfying source and target side of A. Transformation rules the mandatory transformation rules. In this case, constrained artifacts do not need to be mandatory. Czarnecki and Helsen [4] are giving important features of If we observe a target pattern instance made of 푙 artifacts transformation rules. Since a transformation can be 푃푖 = {푎1, 푎2, … , 푎푙 }, a subset 푀푃푖 ⊆ 푃푖 is considered for a implemented in many different ways, we must observe it in set of mandatory artifacts of the pattern instance 푃 . If we more abstract and generic way. No matter if we use a 푖 have a finite set of applied transformations 푇푃 = declarative or imperative approach, each transformation is a 푖 {푡푟 , 푡푟 , … , 푡푟 } having 푃 as an involved pattern instance, set of rules that creates a relationship between a set of source 1 2 푘 푖 we can derive a mapping function 퐶: 푇푃 → 푋, where 푋 ⊆ 푃 artifacts and a set of target artifacts. Features and principles 푖 푖 is a set of artifacts in 푃 constrained by a transformation given in [4] can be applied to these transformation rules. 푖 푡푟푥 ∈ 푇푃푖 . In the context of the previous definition about

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 73

81 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

difference between mandatory and constrained artifacts, we artifacts in pattern instances across the modeling space are can conclude that 푃푖 can be in a situation where 퐶(푡푟푗 ) ∩ encouraged. Such changes must be propagated throughout 퐶 푡푟푘 = ∅ ∧ 푗 ≠ 푘 , and 퐶(푡푟푗 ) = 푀푃푖 ∧ 퐶 푡푟푘 ∩ 푀푃푖 = the modeling space, wherever transformation between ∅. Finally, 푀푃 ⊆ 푘 퐶(푡푟 ) means that not all constrained pattern instances allows it. This propagation must be 푖 푗 =1 푗 automatic and seamless. artifacts need to be mandatory, but all mandatory artifacts are

constrained since we want to preserve a pattern definition. Each pattern instance can be a result of several different 3) Top-level pattern instances pattern instances done earlier in the same project, or it can be Top-level pattern instances do not have predecessors. a reason for creating several new pattern instances later in These pattern instances can be modeled manually by a the same project. Several good examples can be found in [9]: designer without using any transformation, or they can be a facade associated with a web service client can be used as a created by using a transformation. Such transformation does mediator between two different subsystems. In this example, not need to have input source pattern instances. In order to the mediator is the pattern whose instance is bound by two give the transformation some instructions, input parameters different transformations. can be used. Transformations that create only target pattern instances can be used both for validation and enforcement purposes. All transformation rules in this transformation are 2) Pattern instance elaboration mandatory transformation rules that create an initial version A transformation can be used to perform changes on of target pattern instances and impose constraints on them. involved pattern instances. This approach is used when new However, these constraints must allow elaboration of newly pattern instances are created, or existing instances are created top-level pattern instances in order to allow adding updated or deleted. Even when two pattern instances are needed details. Functional or non-functional requirements bound with a transformation, the source pattern instance can are typical examples of top-level patterns. An external be elaborated by adding new details and artifacts. A service definition is another example of such pattern. transformation can be made so that these newly added details

automatically update target pattern instances. Artifacts that In the example in Figure 6, pattern instance 푃 is made of are not constrained by one of the binding transformations are 1 handled by optional transformation rules responsible for CodebookComponent and related interface. All mandatory spreading of elaboration details. Bidirectionality is a very artifacts are marked with red color. Artifacts added in the important transformation aspect described in [7] and [13]. elaboration of 푃1 are marked with brown color. Mandatory While transformation might constrain changes of some artifacts within 푃1 are all we need to declare a component. artifacts in target pattern instances, changes of unconstrained Transformation 푡푟1 mandatory rules are responsible for translation of mandatory artifacts from 푃1 to 푃3. The artifact

Figure 6. Example of pattern instance transformation

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 74

82 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

created within 푃3 is EJB3 facade as realization of set of the mandatory source rules CodebookComponent. Transformation 푡푟1 also created 푀푆푅 = {푚푠푟1(푋), 푚푠푟2(푋), … , 푚푠푟푛 (푋)} , where 푋 = 푝푆표 relationships between mandatory artifacts of 푃3 and 푃1 is a set of terms, then the applicability of the transformation pattern instances. An information system designer can be expressed as Α(X) ← 푚푠푟1(X) ∧ 푚푠푟2(X) ∧ … ∧ additionally elaborated 푃1 and added 푚푠푟푛 (X). Each mandatory source rule is comprised of atoms ProductCodebookSubcomponent and for checking artifacts within a source pattern instance set ClientCodebookSubcomponent together with related 푚푠푟푖 푋 ← 푎1 푦1, 푋 ∧ 푎2 푦2, 푋 ∧ … 푎푚 푦푚 , 푋 , where interfaces. Transformation 푡푟1 optional rules translated these 푦푖 ∈ 푋 . The applicability defined this way can only subcomponents from 푃1 into classes of 푃3 and created new determine whether a transformation can be applied to a set of relationships between optional artifacts of 푃3 and 푃1. 푃1 and source pattern instances or not. Another way is to define a 푃3 are bound with transaction 푡푟1. It means that mandatory measure of the applicability by expressing percentage of artifacts of 푃3 are constrained by 푡푟1 and cannot be changed mandatory transformation rules that are satisfied. A finite set unless corresponding artifacts in 푃1 are changed. Introducing of satisfied rules is 푀푆푅푆 = {푟(푋) ∈ 푀푆푅: 푟(푋) = 푡푟푢푒} ⊆ relationships between 푃3 and 푃1 , such as interface 푀푆푅. The measure of the applicability can be defined as realization, simplifies the propagation of interface changes 퐴푚 = 푀푆푅푆 푀푆푅 ∗ 100 , or percentage of satisfied between these two pattern instances. Propagation of interface mandatory source transformation rules. This measure can changes is a matter of transformation 푡푟1! help a designer to see which transformations in the modeling The information system designer's final decision was to library are close to being applicable and what are the reuse the existing service to read product data from a product differences. Consulting the measure of the transformation catalog information system. 푃2 is the pattern instance that applicability is one aspect of the design guidance. represents the product catalog service provider. This pattern Validation of involved pattern instances is a similar instance can be created by another transformation from concept to the applicability of the transformation, but it must WSDL. Transformation 푡푟2 is used to translate mandatory involve both source and target pattern instances. artifacts from 푃2 representing the service provider, into the V. TRANSFORMATION AND TRACING LANGUAGE set of artifacts for 푃3 representing the product catalog service consumer. 푃3 must be further elaborated in order to connect Relationship between model artifacts and a pattern CodebookComponent realization with the product catalog instance is not established within the UML. Although there service consumer. is the Package element defined within the UML, its purpose is not the same as "pattern instance" defined earlier in this 4) Transformation applicability article. Also, application of transformation and imposing As already defined, a transformation takes a set of constraints on target pattern instances must leave some trail. modeling space artifacts and translates them into another set Creation of a Transformation and Tracing Model (TTM), of artifacts. Earlier definition shows that the transformation automatically or manually, can help to resolve before can include pattern instances as artifact containers. The size mentioned issues. Every time a new pattern instance is of a pattern instance can be one artifact, up to a whole model. created, new artifact is added into TTM representing this A pattern instance can also be a set of artifacts coming from pattern instance. All model artifacts belonging to this pattern different models within the modeling space. In order to use instance are automatically bound to it. It can be the result of transformation, source side of it must be satisfied. Precisely, the transformation, or it can be done manually meaning that mandatory transformation rules source side must be satisfied a modeling tool must have capabilities for it. Also, each time in order for the transformation to be able to create a set of when a transformation is used, this transformation is added target artifacts and impose constraints on them. If the to TTM including all relationships between pattern instances transformation is applied to a set of pattern instances and a and used transformation. Each time a transformation is used, set of source pattern instances satisfies source side of the and this transformation is imposing constraints on involved mandatory transformation rules, the transformation is pattern instances, these constraints are added to pattern applicable to this set of pattern instances. instances in TTM and bound to the transformation that created them, since these constraints are the result of the Transformation and related transformation rules, transformation. In order to do this modeling, a especially if they are written in a declarative programming Transformation and Tracing Language (TTL) must be language such as QVT-R, are logic programs [14]. defined. The UML and the TTL must be compatible, Transformation 푡푟 can be defined as a logic program 푃 , meaning that they must have a common M0 ancestor [13]. comprised of mandatory and optional set of rules on source Therefore, the TTL must be a MOF metamodel. An and target side. Applicability of a transformation can be overview of the TTL is presented in Figure 7. derived only from source mandatory rules. If we take a finite

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 75

83 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Figure 7. TTL definition The TTL is having the following elements:  TransformationConstraint - An element defined by  Pattern - A pattern type. Allows classification of its name, representing a constraint on members of a pattern instances. pattern instance imposed by used transformation.  PatternInstance - An element similar to the UML This element is contained by the pattern instance and Package element. Represents a container for model connected to the transformation responsible for the artifacts. This element is defined by its name and creation of the constraint. This element is the result type. Pattern type (or class) can be very helpful when of the transformation and can be used to validate the constructing transformation rules and it can impact pattern instance correctness and completeness. the transformation applicability since  TransformationConstraintConnector - A relationship transformations can be applied to the pattern between resulting constraint and the transformation instances of specific types. that created it, directed from the transformation to  Transformation - An element defined by its name the constraint. Each constraint can be imposed by and type, representing applied transformation. It only one transformation, but one transformation can contains transformation rules used in the impose multiple constraints within multiple pattern transformation. The transformation must be instances. connected to a set of source and target pattern instances, being connected to at least one target In the TTM example in Figure 8, brown artifacts were pattern instance. Connector direction is determined created before 푡푟1 was applied. We can say that pattern by the TransformationConnectorType enumeration. instances 푝1 and 푝2 were designed manually. Green artifacts  TransformationConnector, are produced by the transformation 푡푟1 . Actions taken TransformationConnectorEnd, during an information system design are automatically stored PatternConnectorEnd - A connector is a directed in a TTM for multiple purposes: preserving correctness and relationship between a pattern instance and a completeness of the modeling space, reconstruction of transformation. Connector direction must have a activities in the design process, and analysis of the resulting visual notation. If the connector is directed from the design work. pattern instance to the transformation, it represents the source pattern instance in the context of the VI. GUIDANCE transformation. If the connector is directed from the So far, this article gave only insights into elements transformation to the pattern instance, it represents needed to establish the method for situational information the target pattern instance in the context of the system design guidance. How to explain designers what is transformation. Connector end elements represent preferred designing practice and how an information system the point of touch between the connector and the design should look alike? Many companies have well pattern instance, or the connector and the established design practices, from methodology, project transformation. activities and modeling point of view. Selection of architectures, technology and practical experience gives a

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 76

84 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Figure 8. TTM example company starting point. The method proposed in this article applicable. Further elaboration of this pattern instance can simply takes this experience and allows the company to add needed details. It means that applying a transformation document their design practices within a modeling tool. can result in a chain of transformations and creation of new pattern instances if a modeling tool is permitted to execute A. Guidance given through a modeling library applicable transformations automatically. We already mentioned that a modeling library is Another way is a selection of transformations from the comprised of patterns and transformations. Since modeling library used in the project. A design lead can transformation binds two patterns instances together (as manage a set of allowed transformations for his project, described in the section III), selection of a transformation limiting designer's choice of applicable or nearly applicable imposes a selection of involved patterns. Similarly, selection transformations. For example, the architectural decision to of patterns imposes a selection of potentially applicable use JAX-WS web services will influence the choice of transformations. transformations for the project. Similarly, the design lead can Applicability and measure of applicability are important manage a set of allowed patterns implicitly by imposing a set transformation features that can be used to give guidance. A of allowed transformations. designer can elaborate a model or pattern instance and occasionally check for transformations that are applicable to B. Guidance given through a model the model or pattern instance he is working on. If there is no More specific guidance can be given through a specific transformation currently applicable, the designer can check model that predetermines patterns and transformations used transformations that are nearly applicable and the gap that in an information system design process. Such model is needs to be closed in the model or pattern instance in order created a priori, before the start of the design activities. for this nearly applicable transformation to become Creation of the guidance model is an ongoing activity applicable. Of course, many designers have enough through the whole project. The TTL can be used for this experience to know which transformation would need to be purpose. This model must represent a selection of allowed used next even before modeling of the pattern instance is pattern types and related transformations. Such model can be finished. If there is a problem with selected transformation, used by the designer to check guidance, or directly by a and rules in the transformation are not correct, meaning that modeling tool for selection of allowed transformation list for the transformation will never become applicable, this particular pattern type. It is the same approach as in the particular transformation can be changed as part of previous Section, with additional visualization of selected company's design practice evolution. design practice for the ongoing project. As already stated before, some transformations can be used exclusively to create new top-level pattern instances. VII. CONCLUSION AND FUTURE WORK Such transformations are used to create mandatory and MDA is having two major practical problems: designers optional artifacts in the target pattern instances. Optional have too much freedom while creating information system artifacts initially created in the target pattern instance can design models and transformation scope can be very fulfill requirements for the next transformation to become

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 77

85 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

ambiguous. Usage of a pattern as the main building block for an information system design is a well known approach. In the context of this article, design of an information system is REFERENCES done block by block, allowing the design lead to choose blocks to be used. Such approach allows the design team to [1] OMG, MDA. "Guide, Version 1.0.1, 2003." Object Management use past positive experience to select or define best patterns Group. for the information system they are designing. Another very [2] F. Chitforoush, M. Yazdandoost, and R. Ramsin, "Methodology important element of this method is the usage of support for the model driven architecture." Proceedings of the 14th transformations to create pattern instances. Transformations Asia-Pacific Software Engineering Conference, IEEE, Dec. 2007, pp. 454-461, doi: 10.1109/ASPEC.2007.58. must be perceived as the behavioral part of the method. [3] I. Jacobson, G. Booch, and J. E. Rumbaugh, "The unified software Applicability and measure of the applicability are very development process-the complete guide to the unified process from important features of the transformation given in this article. the original designers." Addison-Wesley, 1999. They enable controlled application of transformations, which [4] K. Czarnecki and S. Helsen, "Classification of model transformation represents guidance for the design team. approaches." Proceedings of the 2nd OOPSLA Workshop on Of course, designers are still free to model according to Generative Techniques in the Context of the Model Driven their preferences, as long as they are within boundaries Architecture, vol. 45, no. 3, Oct. 2003, pp. 1-17. imposed by the proposed method, which is assured by an [5] OMG, "Core Specification, Version 2.4.1, 2011." Object optional part of each transformation helping team to keep Management Group. artifacts of bound pattern instances synchronized. [6] L. Grunske, L. Geiger, A. Zündorf, N. Van Eetvelde, P. Van Gorp, and D. Varro, "Using graph transformation for practical model-driven Bidirectionality feature of the transformation helps to reflect software engineering." Model-driven Software Development, changes in both directions. Chain of pattern instances can be Springer Berlin Heidelberg, 2005, pp. 91-117, doi: 10.1007/3-540- easily updated through transformations used to form the 28554-7_5. chain. Since a pattern instance is supposed to have [7] OMG, "Meta Object Facility (MOF) 2.0 Query/View/Transformation significantly smaller scope than a model, keeping several (QVT), Version 1.1, 2011." Object Management Group. pattern instances synchronized during elaboration should be [8] C. F. J. Lange and M. R. V. Chaudron, "Managing model quality in much easier than with big models. UML-based software development." 13th IEEE International Workshop on Software Technology and Engineering Practice, IEEE, Current modeling tools are introducing a high level of Sep. 2005, pp. 7-16, doi: 10.1109/STEP.2005.16. automation. This automation is mostly related to elements of [9] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, "Design patterns: the modeling languages supported by a modeling tool. Elements of reusable object-oriented software." Addison-Wesley, Changing the modeling tool behavior to follow the model in 28th edition, 2004. a modeling space is needed feature. [10] G. Hohpe and B. Woolf, "Enterprise Integration Patterns." Addison This article is giving only the main idea that can be Wesley, 2004. significantly improved and extended. There are still [11] M. Gupta, R. Singh Rao, and A. Kumar Tripathi, "Design pattern opportunities for improvement of the mapping between detection using inexact graph matching." 2010 International proposed method and methodologies. Also, the TTL defined Conference on Communication and Computational Intelligence, IEEE, Dec. 2010, pp. 211-217. in this article can be extended with elements for interaction [12] P. Kroll and P. Kruchten, "The rational unified process made easy: a with modeling tool, model analysis capabilities and model practitioner's guide to the RUP." Addison-Wesley, 2003. quality assessment. Interaction between a TMM and a [13] A. G. Kleppe, J. B. Warmer, and W. Bast, "MDA explained, the modeling tool can be extended with modeling events, model driven architecture: Practice and promise." Addison-Wesley, allowing a design lead to define modeling tool actions 2003. associated with patterns and transformations. For example, a [14] A. Van Gelder, K. A. Ross, and J. S. Schlipf, "The well-founded TTM can include an event handler on a pattern that can be semantics for general logic programs." Journal of the ACM (JACM), triggered by the modeling tool when a new subcomponent is ACM, vol. 38, no. 3, Jul. 1991, pp. 619-649, doi: added into a pattern instance. The event handler initiates 10.1145/116825.116838. execution of a specific transformation that automatically [15] N. H. Pham, H. A. Nguyen, T. T. Nguyen, J. M. Al-Kofahi, and T. N. Nguyen, "Complete and accurate clone detection in graph-based adds interface and interface realization relationship for this models." Proceedings of the 31st International Conference on newly added subcomponent. Software Engineering, IEEE, May 2009, pp. 276-286, doi: 10.1109/ICSE.2009.5070528.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 78

86 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

An Analytic Evaluation of the SaCS Pattern Language – Including Explanations of Major Design Choices

Andre´ Alexandersen Hauge Ketil Stølen Institute for energy technology, Halden, Norway SINTEF ICT, Oslo, Norway University of Oslo, Norway University of Oslo, Norway [email protected] [email protected]

Abstract—In this paper, we present an analytic evaluation of the tool, and organisational. A set of requirements is presented Safe Control Systems (SaCS) pattern language for the develop- for each appropriateness factor in order to characterise more ment of conceptual safety designs. By a conceptual safety design precisely what is expected from our language in order to be we mean an early stage specification of system requirements, appropriate. The requirements represent the criteria for judging system design, and safety case for a safety critical system. The what is appropriate of a language for conceptual safety design, SaCS pattern language may express basic patterns on different independent of SaCS being appropriate or not. We motivate aspects of relevance for conceptual safety designs. SaCS may also be used to combine basic patterns into composite patterns. our choices and discuss to what extent the requirements are A may be instantiated into a conceptual safety fulfilled. design. A framework for evaluating modelling languages is used to conduct the evaluation. The quality of a language is within The remainder of this article is structured as follows: the framework expressed by six appropriateness factors. A set Section II provides a short introduction to the SaCS pattern of requirements is associated with each appropriateness factor. language. Section III discusses evaluation approaches and The extent to which these requirements are fulfilled are used to motivates the selection of SEQUAL. Section IV motivates the judge the quality. We discuss the fulfilment of the requirements selection of requirements and conducts an evaluation of the formulated for the SaCS language on the basis of the theoretical, SaCS language with respect to these requirements for each technical, and practical considerations that were taken into appropriateness factor. Section V presents related work on account and shaped the SaCS language. pattern-based development. Section VI draws the conclusions. Keywords–pattern language, analytic evaluation, design concep- tualisation, safety. II.BACKGROUNDON THE SACSPATTERN LANGUAGE Fig. 1 defines a composite pattern according to the syntax I.INTRODUCTION of SaCS [5]. The composite described in Fig. 1 is named Safety A pattern describes a particular recurring problem that Requirements and consists of the basic patterns Hazard Anal- arises in a specific context and presents a well-proven generic ysis, Risk Analysis, and Establish System Safety Requirements. scheme for its solution [1]. A pattern language is a language for The basic patterns are specified separately in a structured specifying patterns making use of patterns from a vocabulary manner comparable to what can be found in the literature of existing patterns and defined rules for combining these [1][2][6][7][8][9][10][11] on patterns. [2]. A safety critical system [3] is a system “whose failure could result in loss of life, significant property damage, or In Fig. 1, the horizontal line separates the declaration part damage to the environment”. With a conceptual safety design of the composite pattern from its content. The icon placed we mean an early stage specification of system requirements, below the identifier Safety Requirements signals that this is system design, and safety case for a safety critical system. a composite pattern. Every pattern in SaCS is parametrised. The Safe Control Systems (SaCS) pattern language has been An input parameter represents the information expected to be designed to facilitate the specification of patterns to support the provided when applying a pattern in a context. An output development of conceptual safety designs. The intended users parameter represents the expected outcome of applying a of SaCS are system engineers, safety engineers, hardware and pattern in a given context. The inputs to Safety Requirements software engineers. are listed inside square brackets to the left of the icon, i.e., ToA and Haz. The arrow pointing towards the brackets symbolises This paper conducts an analytic evaluation of the suit- input. The output of the pattern is also listed inside square ability of the SaCS pattern language for its intended task. brackets, but on the right-hand side of the icon, i.e., Req. The A framework for analysing languages known as the semiotic arrow pointing away from the brackets symbolises output. An quality framework (SEQUAL) [4] is used as a basis for the icon placed adjacent to a parameter identifier denotes its type. evaluation. The appropriateness of a language for its intended The parameters ToA, Haz, HzLg, and Risks in Fig. 1 are of type task is in the framework characterised by six appropriateness documentation, while Req is of type requirement. The inputs factors [4]: domain, modeller, participant, comprehensibility, and outputs of a composite are always publicly accessible.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 79

87 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

System'and'Context' • The output Risks of the pattern Risk Analysis is Descrip3on assigned to the input Risks of the pattern Establish System'Hazards Safety'Requirements' System Safety Requirements. Descrip3on Specifica3on Safety That the three basic patterns are process patterns follows Requirements from the icon below their respective identifiers. There are six different kinds of basic patterns in SaCS, each represented by [1ToA,1Haz1] [1Req1] a specific icon.

Establish) Hazard) Risk) System)Safety) Analysis Analysis Requirements III.EVALUATION FRAMEWORK [ToA,1Haz] [HzLg] [Haz] [Risks] [Risks] [Req] Mendling et al. [12] describe two dominant approaches [ToA] [ToA] in the literature for evaluating the quality of modelling ap- proaches: (1) top-down quality frameworks; (2) bottom-up metrics that relate to quality aspects. The most prominent Hazard'Log Risk'Assessment top-down quality framework according to [12] is SEQUAL [4][13][14]. The framework is based on semiotic theory (the Figure 1. A composite pattern named Safety Requirements theory of signs) and is developed for evaluating the quality of conceptual models and languages of all kinds. Moody et al. [15] report on an empiric study involving 194 participants A particular instantiation of a parameter is documented on the use of SEQUAL and concludes that the study provides by a relation that connects a parameter with its associated strong support for the validity of the framework. Becker et development artefact. In Fig. 1, a grey icon placed adjacent al. [16] present a guideline-based approach as an alternative to an identifier of a development artefact classifies what kind to SEQUAL. It addresses the six factors: correctness, clarity, of artefact that is referenced. A dotted drawn line connecting a relevance, comparability, economic efficiency, and systematic parameter with an artefact represents an instantiates relation. design. Mendling et al. [12] also discuss a number of bottom- Instantiations of parameters expressed in Fig. 1 are: up metrics approaches. Several of these contributions are theoretic without empirical validation according to the authors. • the document artefact System and Context Description instantiates ToA. We have chosen to apply the SEQUAL framework for our evaluation as it is a general framework applicable to different • the document artefact System Hazards Description kinds of languages [4] whose usefulness has been confirmed in instantiates Haz. experiments [15]. Furthermore, an analytic evaluation is pre- • the requirement artefact Safety Requirements Specifi- ferred over a metric-based approach due to project limitations. cation instantiates Req. An analytic evaluation is also a suitable complement to the experience-based evaluations of SaCS presented in [17][18]. • the document artefact Hazard Log instantiates HzLg. The appropriateness of a modelling language for a specific • the document artefact Risk Assessment instantiates task is in SEQUAL related to the definition of the following Risks. sets: the set of goals G for the modelling task; its domain D in the form of the set of all statements that can be stated about the A one-to-many relationship exists between inputs in the situation at hand; the relevant knowledge of the modeller Km declaration part of a composite and similarly named inputs and other participants Ks involved in the modelling task; what with public accessibility (those pointed at by fat arrows) in persons involved interpret the models to say I; the language L the content part. The relationship is such that when ToA in the form of the set of all statements that can be expressed of Safety Requirements is instantiated (i.e., given its value in the language; relevant tool interpretation T of the models; by the defined relation to System and Context Description) and what is expressed in the models M. then every correspondingly named input parameter contained in the composite is also similarly instantiated. A one-to- Fig. 2 is adopted from [13] and illustrates the relationships one relationship exists between an output parameter in the between the different sets in SEQUAL. The quality of a declaration part of a composite and a correspondingly named language L is expressed by six appropriateness factors. The output parameter with public accessibility (those followed by quality of a model M is expressed by nine quality aspects. a fat arrow) in the content part. The relationship is such that when Req of Establish System Safety Requirements is produced In the following, we will not address the different quality then Req of Safety Requirements is similarly produced. aspects of a model M but rather address the quality of the SaCS pattern language. The arrows (thin arrows) connecting basic patterns in the content part of Safety Requirements represent two instances of The appropriateness factors indicated in Fig. 2 are related an operator known as the assigns relation. The assigns relations to different properties of the language under evaluation. The within Safety Requirements express that: appropriateness factors are [4]:

• The output HzLg of the pattern Hazard Analysis is • Domain appropriateness: the language should be able assigned to the input Haz of the pattern Risk Analysis. to represent all concepts in the domain.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 80

88 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Empirical)quality

Model Technical) externalization Physical) Seman0c) quality M quality quality Syntac0c) Social) quality pragma0c) quality Organisa0onal) Perceived Sosial) quality seman0c) quality quality Technical actor Modeling Language Modeling Sosial actor´s interpretation domain extension goal interpretation Sosial actor´s Modellers´s T D L G I explicit explicit knowledge knowledge Domain) Organisa0onal)appropriateness Ks Km appropriateness Comprehensibility)appropriateness Tool)appropriateness) Par0cipant))appropriateness Modeller)appropriateness

Figure 2. The quality framework (adopted from [19])

• Modeller appropriateness: there should be no state- TABLE I. OVERVIEW OF EVALUATION CRITERIA ments in the explicit knowledge of the modeller that ID Requirement DA.1 The language must include the concepts representing best practices within cannot be expressed in the language. conceptual safety design. DA.2 The language must support the application of best practices within concep- • Participant appropriateness: the conceptual basis tual safety design. should correspond as much as possible to the way MA.1 The language must facilitate tacit knowledge externalisation within concep- individuals who partake in modelling perceive reality. tual safety design. MA.2 The language must support the modelling needs within conceptual safety • Comprehensibility appropriateness design. : participants in the PA.1 The terms used for concepts in the language must be the same terms used modelling should be able to understand all the possible within safety engineering. statements of the language. PA.2 The symbols used to illustrate the meaning of concepts in the language must reflect these meanings. • Tool appropriateness: the language should have a PA.3 The language must be understandable for people familiar with safety engineering without specific training. syntax and semantics that a computerised tool can CA.1 The concepts and symbols of the language should differ to the extent they understand. are different. CA.2 It must be possible to group related statements in the language in a natural • Organisational appropriateness: the language should manner. CA.3 It must be possible to reduce model complexity with the language. be usable within the organisation it targets such that CA.4 The symbols of the language should be as simple as possible with it fits with the work processes and the modelling appropriate use of colour and emphasis. required to be performed. TA.1 The language must have a precise syntax. TA.2 The language must have a precise semantics. A set of requirements is associated with each appropriate- OA.1 The language must be able to express the desired conceptual safety design when applied in a safety context. ness factor. The extent to which the requirements are fulfilled OA.2 The language must ease the comprehensibility of best practices within are used to judge the quality of the SaCS pattern language for conceptual safety design for relevant target groups like system engineers, safety engineers, hardware and software engineers. its intended task. The requirements are defined on the basis of OA.3 The language must be usable without the need of costly tools. requirements found in the literature on SEQUAL [4].

IV. THE EVALUATION The different appropriateness factors are addressed succes- sively in Section IV-A to Section IV-F according to the order A necessary step in the application of SEQUAL [4][13] in Table I. Each requirement from Table I is discussed. A is to adapt the evaluation to account for the modelling needs. requirement identifier is presented in a bold font when first This amounts to expressing what the different appropriateness introduced in the text followed by the associated requirement factors of the framework represent in the particular context of and an evaluation of the extent to which the requirement is the evaluation in question. In particular, the modelling needs fulfilled by SaCS. are detailed by the definition of a set of criteria for each of the appropriateness factors. A. Domain appropriateness Table I introduces the criteria for evaluating the suitability DA.1 The language must include the concepts representing of the SaCS pattern language for its intended task. In the best practices within conceptual safety design. first column of Table I, the two letters of each requirement identifier identify the appropriateness factor addressed by the In the SaCS language, there are currently 26 basic patterns requirement, e.g., DA for Domain Appropriateness. [17][18] on different concepts within conceptual safety design.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 81

89 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Each pattern may be referenced by its unique name. Three of Safety standards [23] may demand a number of activities the currently available basic patterns are referenced in Fig. 1 to be performed in which certain activities must be applied in a and are named Hazard Analysis, Risk Analysis and Establish specific sequence. Safety standards [23] may also describe the System Safety Requirements. expected inputs and outputs of different activities and in this sense state what is the expected content of deliverables that Fig. 3 presents the icons used for basic SaCS patterns and allows a transition from one activity to the next. According indicates a categorisation. The three icons to the left are used to Krogstie [4], the main phenomena in languages that ac- for categorising patterns providing development guidance with commodate a behavioural modelling perspective are states and a strong processual focus. The three icons to the right are used transitions between states. In this sense, the language should for categorising patterns providing development guidance with support the modelling of the application of best practices a strong product focus. Different kinds of patterns express dif- according to a behavioural modelling perspective. ferent concepts and best practices within development of safety critical systems. The combined use of patterns from different Fig. 4 presents the icons for the different kinds of pa- categories facilitates development of conceptual safety designs. rameters and artefact references in SaCS. The documentation parameter and the documentation artefact reference types Process'Assurance Product'Assurance (represented visually by the icons presented in Fig. 4) are Requirement'Pa1ern'Reference' Requirement'Pa1ern'Reference defined in order to allow a generic classification of parameters Process'Assurance Product'Assurance and artefacts that may not be classified as requirement, design, Solu5on'Pa1ern'Reference Solu5on'Pa1ern'Reference or safety case. An example may be the result of risk analysis Process'Assurance Product'Assurance that is an intermediate result in conceptual safety design and an Safety'Case'Pa1ern'Reference Safety'Case'Pa1ern'Reference input to an activity on the specification of safety requirements [23][24]. The process of deriving safety requirements on the Figure 3. Icons for the different kinds of basic pattern references basis of an assessment of hazards is expressed by a chain of patterns as presented in Fig. 1. The outcome of applying the Habli and Kelly [20] describe the two dominant approaches last pattern in the chain is a requirements specification. The in safety standards for providing assurance of safety objectives last pattern cannot be applied before the required inputs are being met. These are: (1) the process-based approach; (2) the produced. product-based approach. Within the process-based approach, safety assurance is achieved on the basis of evidence from the application of recommended or mandatory development Requirement+Parameter Requirement+Artefact+Reference practices in the development life cycle. Within the product- based approach, safety assurance is achieved on the basis Design+Parameter Design+Artefact+Reference of product specific evidences that meet safety requirements derived from hazard analysis. The practice within safety stan- Safety+Case+Parameter Safety+Case+Artefact+Reference dards as described above motivate our categorisation into the Documenta*on+Parameter Documenta*on+Artefact+Reference process assurance and the product assurance pattern groups. The safety property of a system is addressed on the basis Figure 4. Icons for the different kinds of parameters and artefact references of a demonstration of the fulfilment of safety objectives. Seven nuclear regulators [21] define a safety demonstration as “a set Fig. 5 presents the symbolic representation of the different of arguments and evidence elements that support a selected relations in SaCS. Relations define transitions between patterns set of dependability claims - in particular the safety - of the or dependencies between elements within a composite pattern operation of a system important to safety used in a given plant definition. The reports [17][18] define the concepts behind the environment”. Although it is the end system that is put into different relations and exemplify the practical use of all the operation, evidences supporting safety claims are produced concepts in different scenarios. Fig. 1 is explained in Section throughout the system life cycle and need to be systematically II and exemplify a composite pattern containing five instances gathered from the very beginning of a development project of the instantiates relation and two instances of the assigns [21]. The safety case approach represents a means for explicitly relation. presenting the structure of claims, arguments, and evidences in a manner that facilitates evaluation of the rationale and instan.ates details basis for claiming that safety objectives are met. The safety assigns sa.sfies case approach is supported by several authors [10][20][21][22]. What is described above motivates the need for patterns combines demonstrates supporting safety case specification in addition to patterns on requirements elicitation and system design specification. Figure 5. Symbols for the different kinds of relations As indicated above, in the design of the SaCS pattern language we have as much as possible selected keywords and The need for the different relations presented in Fig. 5 icons in the spirit of leading literature within the area. This is motivated by the practices described in different standards indicates that we at least are able to represent a significant and guidelines, e.g., IEC 61508 [23], where activities like part of the concepts of relevance for conceptual safety design. hazard identification and hazard analysis are required to be performed sequentially and where the output of one activity DA.2 The language must support the application of best is assigned as input to another activity. Thus, we need a practices within conceptual safety design. concept of assignment. In SaCS, this is defined by an assigns

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 82

90 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

relation between patterns. When performing an activity like safety engineering best practices as defined in international hazard analysis, the results from the application of a number standards and guidelines [21][23][24][25][26][27] and other of methods may be combined and used as input. Two widely sources on safety engineering. The limited number of basic known methods captured in two different basic SaCS patterns patterns currently available delimit what can be modelled in a are Failure Modes and Effects Analysis (FMEA) and Fault composite pattern. Defining more basic patterns will provide a Tree Analysis (FTA). A concept for combining results is better coverage of the tacit knowledge that can be externalised. needed in order to model that the results from applying A user may easily extend the language. A basic pattern, e.g., several patterns as FMEA and FTA are combined into a union the pattern Hazard Analysis [17] referenced in Fig. 1, is defined consisting of every individual result. In SaCS, this is defined in a simple structure of named sections containing text and by a combines relation between patterns. A details relation illustrations according to a common format. The format is is used to express that the result of applying one pattern thoroughly detailed in [5]. is further detailed by the application of a second pattern. Table II compares the overall format of basic SaCS patterns Functional safety is an important concept in IEC 61508 [23]. to pattern formats in the literature. We have chosen a format Functional safety is a part of the overall safety that depends that resembles that of Alexander et al. [2] with the addition on a system or equipment operating correctly in response to of the sections “Pattern signature”, “Intent”, “Applicability”, its inputs. Furthermore, functional safety is achieved when and “Instantiation rule”. The signature, intent, and applicability every specified safety function is carried out and the level sections of basic patterns are documented in such a manner of performance required of each safety function is met. A that the context section provided in [2] is not needed. The satisfies relation between a pattern for requirements elicitation format in [2] is a suitable basis as it is simple, well-known, and and a pattern for system design expresses that the derived generally applicable for specifying patterns of different kinds. system satisfies the derived requirements. Safety case patterns The format provided by Gamma et al. [8] is also simple and supports documenting the safety argument. A demonstrates well-known, but tailored specifically for capturing patterns for relation between a safety case pattern and a design pattern software design. expresses that the derived safety argument represents a safety demonstration for the derived system. All in all, we admit that there may be relevant tacit knowledge that is not easily externalised as the SaCS language Fig. 6 and Fig. 7 illustrate how the intended instantiation is today. However, the opportunity of increasing the number order of patterns may be visualised. The direction of the arrow of basic patterns makes it possible to at least reduce the gap. indicates the pattern instantiation order; patterns (or more precisely the patterns referred to graphically) placed closer to MA.2 The language must support the modelling needs the starting point of the arrow are instantiated prior to patterns within conceptual safety design. placed close to the tip of the arrow. Patterns may be instantiated IEC 61508 [23] is defined to be applicable across all in- in parallel and thus have no specific order; this is visualised dustrial domains developing safety-related systems. As already by placing pattern references on separate arrows. mentioned, a key concept within IEC 61508 is functional safety. Functional safety is achieved according to [23] by A B adopting a broad range of principles, techniques and measures. A key concept within SaCS is that principles, techniques, methods, activities, and technical solutions of different kinds Figure 6. Serial instantiation are defined within the format of basic patterns. A limited number of concerns are addressed by each basic pattern. A A TABLE II. PATTERN FORMATS IN THE LITERATURE COMPARED TO BASIC SACS PATTERNS [5] B [6] [2] [8] [9] [10] [28] [29] [30] [5] Name          Also known as   Pattern signature  Figure 7. Parallel instantiation Intent    Motivation   Applicability    As argued above, the SaCS language facilitates the ap- Purpose  plication of best practices within safety design and mirrors Context     leading international standards within the area; in particular Problem       Forces   IEC 61508. We therefore think it is fair to say that the language Solution        to a large extent fulfils DA.2. Structure   Participants   Collaborations   B. Modeller appropriateness Consequences   Implementation   MA.1 The language must facilitate tacit knowledge exter- Sample code  nalisation within conceptual safety design. Example  Compare  Instantiation rule  As already mentioned, the current version of the language Related patterns       contains 26 basic patterns. The basic patterns are documented Known uses      in [17] and [18]. The patterns are defined on the basis of

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 83

91 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

specific combination of patterns is defined within a compos- PA.3 The language must be understandable for people ite pattern. A composite pattern is intended to address the familiar with safety engineering without specific training. overall challenges that appear in a given development context. The SaCS language is simple in the sense that a small set Individual patterns within a composite only address a subset of icons and symbols are used for modelling the application of of the challenges that need to be solved in the context. A patterns, basically: pattern references as in Fig. 3, parameters composite may be defined prior to work initiation in order to and artefact references as in Fig. 4, relations as in Fig. 5, and define a plan for the application of patterns. Another use may instantiation order as in Fig. 6. Guidance to the understanding be to refine a composite throughout the work process. This is of the language is provided in [5], where the syntax and the exemplified in [17] and [18]. A composite may also be defined semantics of SaCS patterns are described in detail. The SaCS once patterns have been applied in order to document the language comes with a structured semantics [5] that offers work process. A composite representing a plan may be easily a schematic mapping from syntactical elements into text in reused for documentation purposes by adding information on English. Guidance to the application of SaCS is provided by the instantiation of parameters. the examples detailed in [17][18]. Although we have not tested SaCS on people unfamiliar with the language, we expect that C. Participants appropriateness users familiar with safety engineering may comprehend the PA.1 The terms used for concepts in the language must be concepts and the modelling on the basis of [5][17][18] within the same terms used within safety engineering. 2-3 working days. Activities such as hazard identification and hazard analysis D. Comprehensibility appropriateness [26], methods such as fault tree analysis [31] and failure mode effects analysis [32], system design solutions including CA.1 The concepts and symbols of the language should redundant modules and voting mechanisms [33], and prac- differ to the extent they are different. tices like arguing safety on the basis of arguing that safety The purpose of the graphical notation is to represent a requirements are satisfied [21], are all well known safety structure of patterns in a manner that is intuitive, compre- engineering practices that may be found in different standards hensible, and that allows efficient visual perception. The key and guidelines [23][24][27]. The different concepts mentioned activities performed by a reader in order to draw conclusion above are all reflected in basic SaCS patterns. Moreover, from a diagram are according to Larkin and Simon [36]: as already pointed out, keywords such as process assurance, searching and recognising relevant information. product assurance, requirement, solution, safety case, etc. have all been selected based on leading terminology within safety Lidwell et al. [35] present 125 patterns of good design engineering. based on theory and empirical research on visualisation. The patterns describe principles of designing visual information for PA.2 The symbols used to illustrate the meaning of con- effective human perception. The patterns are defined on the ba- cepts in the language must reflect these meanings. sis of extensive research on human cognitive processes. Some One commonly cited and influential article within psychol- of the patterns are commonly known as Gestalt principles of ogy is that of Miller [34], on the limit of human capacity to perception. Ellis [37] provides an extensive overview of the process information. The limit, according to Miller, is seven Gestalt principles of perception building upon classic work plus or minus two elements. When the number of elements from Wertheimer [38] and others. Gestalt principles capture increases past seven, the mind may be confused in correctly the tendency of the human mind to naturally perceive whole interpreting the information. Thus, the number of symbols objects on the basis of object groups and parts. should be kept low in order to facilitate effective human One of several Gestalt principles applied in the SaCS information processing. language is the principle of similarity. According to Lidwell et al. [35], the principle of similarity is such that similar Lidwell et al. [35] describe iconic representation as “the elements are perceived to be more related than elements that use of pictorial images to make actions, objects, and concepts are dissimilar. in a display easier to find, recognize, learn, and remember”. The authors describe four forms for representation of informa- The use of the similarity principle is illustrated by the tion with icons: similar, example, symbolic, and arbitrary. We composite pattern in Fig. 1. Although each referenced pattern have primarily applied the symbolic form to identify a concept has a unique name, their identical icons indicate relatedness. at a higher level of abstraction than what may be achieved Different kinds of patterns are symbolised by the icons in with the similar and example forms. We have also tried to Fig. 3. The icons are of the same size with some aspects of avoid the arbitrary form where there is little or no relationship similarity and some aspects of dissimilarity such that a degree between a concept and its associated icon. Fig. 3, Fig. 4, Fig. of relatedness may be perceived. An icon for pattern reference 5, and Fig. 6 present the main icons in SaCS. In order to is different in shape and shading compared to an icon used for allow a flexible use of icons and keep the number of icons artefact reference (see Fig. 3 and Fig. 4). Thus, an artefact and low, we have chosen to not define a dedicated icon for each a pattern should be perceived as representing quite different concept but rather define icons that categorises several related concepts. concepts. A relatively small number of icons was designed in a CA.2 It must be possible to group related statements in the uniform manner in order to capture intuitive representations of language in a natural manner. related concepts. As an example, the referenced basic patterns in Fig. 1 have the same icons linking them by category, but There are five ways to organise information according to unique identifiers separating them by name. Lidwell et al. [35]: category, time, location, alphabet, and

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 84

92 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

continuum. The category refers to the organisation of elements Requirements by similarity and relatedness. An example of the application of the principle of categorisation [35] in SaCS is seen in the [ToA,&Haz] [&ReqSpec&] possibility to reduce the number of relations drawn between patterns when these are similar. Patterns in SaCS may have multiple inputs and multiple outputs as indicated in Fig. 1. Relations between patterns operate on the parameters. The Safety brackets [] placed adjacent to a pattern reference denotes an Requirements ordered list of parameters. In order to avoid drawing multiple [ToA,&Haz] [Req] relations between two patterns, relations operate on the ordered parameter lists of the patterns by list-matching of parameters. Func-onal [ReqSpec] Requirements Fig. 8 exemplifies two different ways for expressing vi- [ToA] &&[Req] sually the same relationships between the composite patterns named A and B. The list-matching mechanism is used to reduce the number of relation symbols drawn between patterns to Figure 9. Composition of composites one, even though the phenomena modelled represents multiple similar relations. This reduces the visual complexity and As colour blindness is common the SaCS language applies preserves the semantics of the relationships modelled. different shades of grey in visualisations.

A B Fig. 10 illustrates how the SaCS language makes use of the three Gestalt principles of perception [35][39][38] known ["Out1,"Out2,"Out3"] ["In1,"In2,"In3"] as: Figure-Ground; Proximity; and Uniform Connectedness. The Gestalt principles express mechanisms for efficient human perception from groups of visual objects. A B ["Out1"] ["In1"] Figure'Ground:!Figures!are!the!objects!of!focus!(i.e.,!icons,! ["Out2"] ["In2"] arrows,!brackets,!and!iden8fiers),!ground!compose! ["Out3"] ["In3"] undifferen8ated!background!(i.e.!the!white!background) Establish4 Hazard4 Risk4 System4Safety4 Figure 8. Alternative ways for visualising multiple similar relations Analysis Analysis Requirements [ToA,!Haz] [HzLg] [Haz] [Risks] [Risks] [Req] CA.3 It must be possible to reduce model complexity with the language. Hierarchical organisation is the simplest structure for visu- Proximity:!elements!close!to!each! Uniform4Connectedness:!connected! other!are!perceived!as!forming!a! elements!are!perceived!as!more!related! alising and understanding complexity according to Lidwell et group than!elements!that!are!not!connected al. [35]. The SaCS language allows concepts to be organised hierarchically by specifying that one pattern is detailed by Figure 10. A fragment of Fig. 1 illustrating the use of Gestalt principles another or by defining composite patterns that reference other composite patterns in the content part. E. Tool appropriateness Fig. 9 presents a composite pattern named Requirements that reference other composites as part of its definition. The TA.1 The language must have a precise syntax. contained pattern Safety Requirements is defined in Fig. 1. The syntax of the SaCS language (see [5]) is defined in The contained pattern Functional Requirements is not defined the EBNF [40] notation. EBNF is a meta-syntax that is widely and is referenced within Fig. 9 for illustration purposes. used for describing context-free grammars. Requirements may be easily extended by defining composites supporting the elicitation of, e.g., performance requirements TA.2 The language must have a precise semantics. and security requirements, and later model the use of such A structured semantics for SaCS patterns is defined in [5] patterns in Fig. 9. In Fig. 9, the output of applying the in the form of a schematic mapping from pattern definitions, Requirements pattern is represented by the parameter ReqSpec. via its textual syntax in EBNF [40], to English. The non- The ReqSpec parameter represents the result of applying the formal representation of the semantics supports human inter- combines relation on the output Req of the composite Safety pretation rather than tools, although the translation procedure Requirements and the output Req of the composite Functional as described in [5] may be automated. The presentation of the Requirements. semantics of patterns as a text in English was chosen in order CA.4 The symbols of the language should be as simple as to aid communication between users, possibly with different possible with appropriate use of colour and emphasis. technical background, on how to interpret patterns.

A general principle within visualisation according to Lid- F. Organisational appropriateness well et al. [35] is to use colour with care as it may lead to misconceptions if used inappropriately. The authors points OA.1 The language must be able to express the desired out that there is no universal symbolism for different colours. conceptual safety design when applied in a safety context.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 85

93 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

The application of the SaCS pattern language produces frames approach is useful for detailing and analysing a problem composite patterns that are instantiated into conceptual safety and thereby detailing requirements, the problem classes pre- designs. A composite pattern expresses a combination of basic sented in [42] are defined on a very high level of abstraction. patterns. The basic patterns express safety engineering best practices and concepts inspired by international safety stan- The use of boilerplates [43][44] for requirement specifica- dards and guidelines, e.g., [23][24][27]. International safety tion is a form of requirement templates but nonetheless touches standards and guidelines describe concepts and practices for upon the concept of patterns. The boilerplate approach helps development of safety critical systems that may be perceived the user phrase requirements in a uniform manner and to detail as commonly accepted. The SaCS pattern language is tested these sufficiently. Although boilerplates may be useful for out in two cases. The first concerned the conceptualisation of a requirement specification, the focus in SaCS is more towards nuclear power plant control system, while the second addressed supporting requirement elicitation and the understanding of the the conceptualisation of a railway interlocking system, fully challenges that appear in a specific context. detailed in [17] and [18], respectively. In both cases it was Withall [45] describes 37 requirements patterns for as- possible to derive a conceptual safety design using the SaCS sisting the specification of different types of requirements. language as support as well as model how patterns were The patterns are defined at a low level; the level of a single applied as support. requirement. The patterns of Withall may be useful, but as OA.2 The language must ease the comprehensibility of best with the boilerplates approach, the patterns support more the practices within conceptual safety design for relevant target specification of requirements rather than requirements elicita- groups like system engineers, safety engineers, hardware and tion. software engineers. Patterns on design and architecture of software-based sys- We have already explained how basic patterns represent tems are presented in several pattern collections. One of the concepts and best practices inspired by safety standards and well-known pattern collections is the one of Gamma et al. guidelines. Each basic pattern addresses a limited number of [8] on recurring patterns in design of software based systems. phenomena. Basic patterns are combined into a composite Without doubt, the different pattern collections and languages pattern where the composite addresses all relevant challenges on system design and architecture represent deep insight into that occur in a specific context. A composite pattern as the effective solutions. However, design choices should be founded one presented in Fig. 1 ease the explanation of how several on requirements, and otherwise follow well established prin- concepts within conceptual safety design are combined and ciples of good design. The choice of applying one design applied. pattern over another should be based on a systematic process of establishing the need in order to avoid design choices being Wong et al. [41] reviewed several large development left unmotivated. projects and software safety standards from different domains with respect to cost effectiveness and concludes that although The motivations for a specific design choice are founded standards provide useful and effective guidance, safety and on the knowledge gained during the development activities cost effectiveness objectives are successfully met by effective applied prior to system design. Gnatz et al. [46] outline the planning and by applying safety engineering best practices ev- concept of process patterns as a means to address the recurring idenced in company best practices throughout the development problems and known solutions to challenges arising during life cycle. Compared to a standard or a guideline, a composite the development process. The patterns of Gnatz et al. are not pattern in the SaCS language may be used to capture such tailored for development of safety critical systems and thus do a company specific best practice. In order to accommodate not necessarily reflect relevant safety practices. Fowler presents different situations, different compositions of patterns may be [7] a catalogue of 63 analysis patterns. The patterns do not defined. follow a strict format but represent a body of knowledge on analysis described textually and by supplementary sketches. OA.3 The language must be usable without the need of costly tools. While process patterns and analysis patterns may be rel- evant for assuring that the development process applied is Every pattern used in the cases described in [17][18] was suitable and leads to well informed design choices, Kelly interpreted and applied in its context by a single researcher [10] defines patterns supporting safety demonstration in the with background from safety engineering. A conceptual safety form of reusable safety case patterns. The patterns expressed design was produced for each case. Every illustration in are representative for how we want to address the safety [5][17][18] and in this paper is created with a standard drawing demonstration concern. tool. A challenge is to effectively combine and apply the knowl- edge on diverse topics captured in different pattern collections V. RELATED WORK and languages. Henninger and Correaˆ [47] survey different In the literature, pattern approaches supporting develop- software pattern practices and states “software patterns and ment of safety critical systems are poorly represented. In the collections tend to be written to solve specific problems with following we shortly discuss some different pattern approaches little to no regard about how the pattern could or should be and their relevancy to the development of conceptual safety used with other patterns”. designs. Zimmer [48] identifies the need to define relationships Jackson [42] presents the problem frames approach for between system design patterns in order to efficiently combine requirements analysis and elicitation. Although the problem them. Noble [49] builds upon the ideas of Zimmer and defines

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 86

94 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

a number of relationships such as uses, refines, used by, design and mirror leading international standards; in combine, and sequence of as a means to define relationships particular IEC 61508. between system design patterns. A challenge with the relations defined by Noble is that they only specify relations on a very • Comprehensibility: The comprehension of individual high level. The relations do not have the expressiveness for patterns and pattern compositions is supported by the detailing what part of a pattern is used, refined, or combined. use of terms commonly applied within the relevant Thus, the approach does not facilitate a precise modelling of industrial domains as well as by the application of relationships. principles of good design in visualisations, such as the Gestalt principles of perception [35][38]. Bayley and Zhu [50] define a formal language for pattern composition. The authors argue that design patterns are almost • Tool: Tool support may be provided on the basis of always to be found composed with each other and that the the syntax and semantics of the SaCS language [5]. correct applications of patterns thus relies on precise definition • Organisational: Organisations developing safety crit- of the compositions. A set of six operators is defined for ical systems are assumed to follow a development the purpose of defining pattern compositions. The language is process in accordance to what is required by standards. exemplified on the formalisation of the relationships expressed Wong et al. [41] reviewed several large development between software design patterns described by Gamma et al. projects and software safety standards from differ- [8]. As we want the patterns expressed in the SaCS language ent domains with respect to cost effectiveness and to be understandable to a large community of potential users, concludes that although standards provide useful and we find this approach a bit too rigid. effective guidance, safety and cost effectiveness objec- Smith [51] presents a catalogue of elementary software tives are successfully met by effective planning and by design patterns in the tradition of Gamma et al. [8] and applying safety engineering best practices evidenced proposes the Pattern Instance Notation (PIN) for expressing in company best practices throughout the development compositions of patterns graphically. The notation uses simple life cycle. SaCS patterns may be defined, applied, and rounded rectangles for abstractly representing a pattern and its combined in a flexible manner to support company associated roles. Connectors define the relationships between best practices and domain specific best practices. patterns. The connectors operate on the defined roles of patterns. The notation is comparable to the UML collaboration ACKNOWLEDGMENT notation [52]. This work has been conducted and funded within the UML collaborations [52] are not directly instantiable. In- OECD Halden Reactor Project, Institute for energy technology stances of the roles defined in a collaboration that cooperates (IFE), Halden, Norway. The second author has partly been as defined creates the collaboration. The main purpose is to funded by the ARTEMIS project CONCERTO. express how a system of communicating entities collectively accomplishes a task. The notation is particularly suitable for REFERENCES expressing system design patterns. [1] F. Buschmann, K. Henney, and D. C. Schmidt, Pattern-Oriented Soft- Several notations [53][54][55] for expressing patterns ware Architecture: On Patterns and Pattern Languages. Wiley, 2007, graphically use UML [52] as its basis. The notations are vol. 5. simple, but target the specification of software. [2] C. Alexander, S. Ishikawa, and M. Silverstein, A Pattern Language: Towns, Buildings, Construction. Oxford University Press, 1977, vol. 2. VI.CONCLUSION [3] J. C. Knight, “Safety Critical Systems: Challenges and Directions,” in Proceedings of the 24th International Conference on Software Engi- We have presented an analytical evaluation of the SaCS neering (ICSE’02). ACM, 2002, pp. 547–550. pattern language with respect to six different appropriateness [4] J. Krogstie, Model-based Development and Evolution of Information factors. We arrived at the following conclusions: Systems: A Quality Approach. Springer, 2012. [5] A. A. Hauge and K. Stølen, “Syntax & Semantics of the SaCS Pattern • Domain: In the design of the SaCS language we have Language,” Institute for energy technology, OECD Halden Reactor as much as possible selected keywords and icons Project, Halden, Norway, Tech. Rep. HWR-1052, 2013. in the spirit of leading literature within the area. [6] A. Aguiar and G. David, “Patterns for Effectively Documenting Frame- works,” in Transactions on Pattern Languages of Programming II, ser. This indicates that we at least are able to represent LNCS, J. Noble, R. Johnson, P. Avgeriou, N. Harrison, and U. Zdun, a significant part of the concepts of relevance for Eds. Springer, 2011, vol. 6510, pp. 79–124. conceptual safety design. [7] M. Fowler, Analysis Patterns: Reusable Object Models. Addison- Wesley, 1996. • Modeller: There may be relevant tacit knowledge that [8] E. Gamma, R. Helm, R. E. Johnson, and J. Vlissides, Design Patterns: is not easily externalised as the SaCS language is Elements of Reusable Object-Oriented Software. Addison-Wesley, today. However, the opportunity of increasing the 1995. number of basic patterns makes it possible to at least [9] R. S. Hanmer, Patterns for Fault Tolerant Software. Wiley, 2007. reduce the gap. [10] T. P. Kelly, “Arguing Safety – A Systematic Approach to Managing Safety Cases,” Ph.D. dissertation, University of York, United Kingdom, • Participants: The terms used for concepts have been 1998. carefully selected based on leading terminology within [11] B. Rubel, “Patterns for Generating a Layered Architecture,” in Pattern safety engineering. The SaCS language facilitates rep- Languages of Program Design, J. Coplien and D. Schmidt, Eds. resenting the application of best practices within safety Addison-Wesley, 1995, pp. 119–128.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 87

95 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

[12] J. Mendling, G. Neumann, and W. van der Aalst, “On the Correlation [31] IEC, “IEC 61025 Fault Tree Analysis (FTA), 2nd edition,” International between Process Model Metrics and Errors,” in Proceedings of 26th Electrotechnical Commission, 2006. International Conference on Conceptual Modeling, vol. 83, 2007, pp. [32] ——, “IEC 60812 Analysis techniques for system reliability – Pro- 173–178. cedure for failure mode and effects analysis (FMEA), 2nd edition,” [13] A. G. Nysetvold and J. Krogstie, “Assessing Business Process Modeling International Electrotechnical Commission, 2006. Languages Using a Generic Quality Framework,” in Proceedings of [33] N. Storey, Safety-critical Computer Systems. Prentice Hall, 1996. the 17th Conference on Advanced Information Systems Engineering [34] G. A. Miller, “The Magical Number Seven, Plus or Minus Two: Some (CAiSE’05) Workshops. Idea Group, 2005, pp. 545–556. Limits on Our Capacity for Processing Information,” Psychological [14] J. Krogstie and S. D. F. Arnesen, “Assessing Enterprise Modeling Lan- Review, vol. 63, no. 2, 1956, pp. 81–97. guages Using a Generic Quality Framework,” in Information Modeling [35] W. Lidwell, K. Holden, and J. Butler, Universal Principles of Design, Methods and Methodologies. Idea Group, 2005, pp. 63–79. 2nd ed. Rockport Publishers, 2010. [15] D. L. Moody, G. Sindre, T. Brasethvik, and A. Sølvberg, “Evaluating the [36] J. H. Larkin and H. A. Simon, “Why a Diagram is (Sometimes) Worth Quality of Process Models: Empirical Testing of a Quality Framework,” Ten Thousand Words,” Cognitive Science, vol. 11, no. 1, 1987, pp. in Proceedings of the 21st International Conference on Conceptual 65–100. Modeling, ser. LNCS. Springer, 2013, vol. 2503, pp. 380–396. [37] W. D. Ellis, A Source Book of Gestalt Psychology. The Gestalt Journal [16] J. Becker, M. Rosemann, and C. von Uthmann, “Guidelines of Business Press, 1997. Process Modeling,” in Business Process Management, ser. LNCS, vol. [38] M. Wertheimer, “Laws of Organization in Perceptual Forms,” in A 1806. Springer, 2000, pp. 30–49. sourcebook of Gestalt Psychology, W. D. Ellis, Ed. Routledge and [17] A. A. Hauge and K. Stølen, “A Pattern-based Method for Safe Control Kegan Paul, 1938, pp. 71–88. Conceptualisation – Exemplified Within Nuclear Power Production,” [39] J. Wagemans, J. H. Elder, M. Kubovy, S. E. Palmer, M. A. Peterson, Institute for energy technology, OECD Halden Reactor Project, Halden, M. Singh, and R. von der Heydt, “A Century of Gestalt Psychology in Norway, Tech. Rep. HWR-1029, 2013. Visual Perception: I. Perceptual Grouping and Figure-Ground Organi- [18] ——, “A Pattern-based Method for Safe Control Conceptualisation – zation,” Psychological Bulletin, vol. 138, no. 6, 2012, pp. 1172–1217. Exemplified Within Railway Signalling,” Institute for energy technol- [40] ISO/IEC, “14977:1996(E) Information technology - Syntactic metalan- ogy, OECD Halden Reactor Project, Halden, Norway, Tech. Rep. HWR- guage - Extended BNF,” International Organization for Standardization 1037, 2013. / International Electrotechnical Commission, 1996. [19] I. Hogganvik, “A Graphical Approach to Security Risk Analysis,” Ph.D. [41] W. E. Wong, A. Demel, V. Debroy, and M. F. Siok, “Safe Software: dissertation, Faculty of Mathematics and Natural Sciences, University Does It Cost More to Develop?” in Fifth International Conference on of Oslo, 2007. Secure Software Integration and Reliability Improvement (SSIRI’11), [20] I. Habli and T. Kelly, “Process and Product Certification Arguments – 2011, pp. 198–207. Getting the Balance Right,” SIGBED Review, vol. 3, no. 4, 2006, pp. [42] M. Jackson, Problem Frames: Analysing and Structuring Software 1–8. Development Problems. Addison-Wesley, 2001. [21] The members of the Task Force on Safety Critical Software, “Licensing [43] E. Hull, K. Jackson, and J. Dick, Requirements Engineering, 3rd ed. of safety critical software for nuclear reactors: Common position of Springer, 2010. seven european nuclear regulators and authorised technical support [44] G. Sindre, “Boilerplates for application interoperability requirements,” organisations,” http://www.belv.be/, 2013, [accessed: 2014-04-10]. in Proceedings of 19th Norsk konferanse for organisasjoners bruk av [22] C. Haddon-Cave, “The Nimrod Review: An independent review into the IT (NOKOBIT’12). Tapir, 2012. broader issues surrounding the loss of the RAF Nimrod MR2 aircraft [45] S. Withall, Software Requirement Patterns (Best Practices), 1st ed. XV230 in Afghanistan in 2006,” The Stationery Office (TSO), Tech. Microsoft Press, 2007. Rep. 1025 2008-09, 2009. [46] M. Gnatz, F. Marschall, G. Popp, A. Rausch, and W. Schwerin, [23] IEC, “IEC 61508 Functional safety of electrical/electronic/programmble “Towards a Living Software Development Process based on Process electronic safety-related systems, 2nd edition,” International Elec- Patterns,” in Proceedings of the 8th European Workshop on Software trotechnical Commission, 2010. Process Technology (EWSPT’01), ser. LNCS, vol. 2077. Springer, [24] CENELEC, “EN 50129 Railway applications – Communications, sig- 2001, pp. 182–202. nalling and processing systems – Safety related electronic systems for [47] S. Henninger and V. Correa,ˆ “Software Pattern Communities: Current signalling,” European Committee for Electrotechnical Standardization, Practices and Challenges,” in Proceedings of the 14th Conference on 2003. Pattern Languages of Programs (PLOP’07). ACM, 2007, pp. 14:1– [25] European Commission, “Commission Regulation (EC) No 352/2009 14:19, article No. 14. on the Adoption of Common Safety Method on Risk Evaluation and [48] W. Zimmer, “Relationships between Design Patterns,” in Pattern Lan- Assessment,” Official Journal of the European Union, 2009. guages of Program Design. Addison-Wesley, 1994, pp. 345–364. [26] ERA, “Guide for the Application of the Commission Regulation on [49] J. Noble, “Classifying Relationships between Object-Oriented Design the Adoption of a Common Safety Method on Risk Evaluation and Patterns,” in Proceedings of Australian Software Engineering Confer- Assessment as Referred to in Article 6(3)(a) of the Railway Safety ence (ASWEC’98), 1998, pp. 98–107. Directive,” European Railway Agency, 2009. [50] I. Bayley and H. Zhu, “A Formal Language for the Expression of Pattern [27] IEC, “IEC 61513 Nuclear power plants – Instrumentation and control Compositions,” International Journal on Advances in Software, vol. 4, systems important to safety – General requirements for systems, 2nd no. 3, 2012, pp. 354–366. edition,” International Electrotechnical Commission, 2001. [51] J. M. Smith, Elemental Design Patterns. Addison-Wesley, 2012. [28] A. Ratzka, “User Interface Patterns for Multimodal Interaction,” in [52] OMG, “Unified Modeling Language Specification, Version 2.4.1,” Ob- Transactions on Pattern Languages of Programming III, ser. LNCS, ject Management Group, 2012. J. Noble, R. Johnson, U. Zdun, and E. Wallingford, Eds. Springer, [53] H. Byelas and A. Telea, “Visualization of Areas of Interest in Software 2013, vol. 7840, pp. 111–167. Architecture Diagrams,” in Proceedings of the 2006 ACM Symposium [29] D. Riehle and H. Zullinghoven,¨ “A Pattern Language for Tool Con- on Software Visualization (SoftVis’06), 2006, pp. 105–114. struction and Integration Based on the Tools and Materials Metaphor,” [54] J. Dong, S. Yang, and K. Zhang, “Visualizing Design Patterns in in Pattern Languages of Program Design, J. Coplien and D. Schmidt, Their Applications and Compositions,” IEEE Transactions on Software Eds. Addison-Wesley, 1995, pp. 9–42. Engineering, vol. 33, no. 7, 2007, pp. 433–453. [30] K. Wolf and C. Liu, “New Client with Old Servers: A Pattern Language [55] J. M. Vlissides, “Notation, Notation, Notation,” C++ Report, 1998, pp. for Client/Server Frameworks,” in Pattern Languages of Program De- 48–51. sign, J. Coplien and D. Schmidt, Eds. Addison-Wesley, 1995, pp. 51–64.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 88

96 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Privacy by Design Permission System for Mobile Applications

Karina Sokolova∗†, Marc Lemercier∗ Jean-Baptiste Boisseau†

∗University of Technology of Troyes †EUTECH SSII Troyes, France La Chapelle Sain Luc, France {karina.sokolova, marc.lemercier}@utt.fr {k.sokolova, jb.boisseau}@eutech-ssii.com

Abstract—The Privacy by Design concept proposes to integrate concepts are already included in European data legislation; the respect of user privacy into systems managing user data from the notion is considered to be enforced in European Data the design stage. This concept has increased in popularity and Protection Regulation, therefore systems made with PbD are the European Union (EU) is enforcing it with a Data Protection compliant with the law. An application made with PbD notion Directive. Mobile applications have emerged onto the market is not only a benefit for users, along with the opportunity to and the current law and future directive is applicable to all provide a truly personalized service, but also a legal obligation mobile applications designed for EU users. By now it has been shown that mobile applications do not suit the Privacy by Design for developers. concept and lack for transparency, consent and security. The The PbD concept was firstly presented by Dr. Ann actual permission systems is judged as unclear for users. In this Cavoukian. She proposes seven key principles of PbD enabling paper, we introduce a novel permission model suitable for mobile application that respects Privacy by Design. We show that such the development of privacy-respective systems. The system adapted permission system can improve the transparency and should be proactive, not reactive, embed privacy feature from consent but also the security of mobile applications. Finally, the design, integrate Privacy by Default, respect user privacy, we propose an example of the use of our system on mobile data usage should be transparent to the end user and the application. user should have access to the mechanism of control of his data. Full functionality and the end-to-end security should be Keywords–permission, permission system, mobile, privacy by design, privacy, transparency, control, Android, iOS, application, preserved without any sacrifice. [2] development, software design, pattern, mobility, design, modelling, Mobile privacy was discussed in ’Opinion 02/2013 on apps trust on smart devices’ by the Article 29 Data Protection Working I.INTRODUCTION Party [3], where the opinion on mobile privacy and some general recommendations were given. The article states that Mobile devices gain in popularity. Thousands of services both the Data Protection Directive and the ePrivacy Directive and applications are proposed on mobile markets and down- are applicable to mobile systems and to all applications made loaded every day. Smart devices have a high data flow pro- for EU users. Data Protection Regulation is also applicable cessing and storing large amounts of data including private to mobile systems. The article defines four main problems of and sensitive data. Most applications propose personalized mobile privacy: lack of transparency, lack of consent, poor services but simultaneously collect user data even without security and disregard for purpose limitation. the user’s awareness or consent. More and more users feel concerned about their privacy and care about services they Many reports propose recommendations about mobile pri- use. The TRUSTe survey conducted in February 2011 shows vacy improvement repeating basic privacy notions (e.g., data that smartphone users are concerned about privacy even more minimization, clear notices) but the exact patterns or a techni- than about the security (the second in the survey results) [1]. cal solutions are missing [4]. Nowadays, people realize the lack of privacy especially The permission system is embedded in the mobile systems while using new technological devices where information is and is a crucial part of mobile security and privacy. Nowadays, massively collected, used and stored (Big Data notion). The permission systems do not follow Privacy by Design notions. privacy regulation aiming to control personal data use is set Many works are concentrated on analysing and modelling up in many countries. European Union privacy regulation the actual permission systems [5][6][7], on improving actual includes the European Data Protection Directive (Directive permission systems to give more control to the user [8][9][10] 95/46/EC) and the ePrivacy Directive. United States regulation or to add additional transfer permissions [11], on visual rep- includes Children’s Online Privacy Protection Act (COPPA) resentation of permissions [12], on user perceptions of current and The California Online Privacy Protection Act of 2003 permission systems [13][14][15], on the data flow analyses (OPPA). Canada is under the Personal Information Protection (possible data leakage detection) and the actual permissions en- and Electronic Documents Act (PIPEDA) concerning privacy. forcement and verifications [16][17][18][19][20][21][22][23]. To our knowledge no work has been conducted on redefining The Privacy by Design (PbD) notion proposes to integrate the permission system to fit the Privacy by Design notion. privacy from the system design stage [2] to build privacy- respecting systems. PbD proves systems can embed privacy The remainder of the paper is organized as follows: Section without sacrificing either security or functionality. Some PbD 2 describes current permission systems of iOS and Android

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 89

97 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

TABLE I and points out problems regarding Privacy by Design. Sec- ANDROIDANDIOSPERMISSION SYSTEMS COMPARISON tion 3 introduces our proposal: the pattern of the privacy- respecting permission system. We show that it can cope with Full Func- Default Transpa- Control the transparency, consent and purpose disregard problems and tionality Settings rency also improve the security. Section 4 shows the application of our novel permission system to the real mobile application. Android + - - - The paper ends with a conclusion and future works. iOS - + -/+ +

II.EXISTING MOBILE PERMISSION SYSTEMS to the user every time the data is going to be accessed, but it is rarely the case. Users see the list of dangerous permissions In this section, we present current iOS and Android per- on the screen before installing the application. mission systems and evaluate those systems regarding the PbD notion. We take into account the full functionality allowed by Android proposes more than 100 default permissions and the permission system, privacy by default, transparency and developers can add supplementary permissions. Multiple works the control notions. show that users do not understand many of default permissions and fail to judge the application privacy and security correctly • Full functionality: possibility to use all functionalities using the full permission list [13][15]. Permissions do not available on the platform. clearly show what data is used for and how. Moreover, some other studies show the abusive usage of android permissions • Privacy by Default: the default configurations of the by developers [24]. system are privacy protective. Some users do not check the Android permission list • Transparency: user should clearly understand what because they need a service and they know that all permis- data is used, how and for what purpose. sions should be accepted to obtain it. Android permission • Control: user should have a full control over his list looks like a license agreement on a desktop application personal data usage. that everybody accepts but very few actually read [25]. An Android user does not have any control over permissions once We consider the privacy policy to be very important for the the application is installed: permissions cannot be revoked. proactive and transparent system therefore we present the state Android does not include an iOS-like system permission of application privacy policy in both systems. manager (privacy settings) by default, therefore the user has to activate or deactivate the entire functionality to disable A. Permissions the access to related data (e.g., Wi-Fi or 3G for Internet connection; GPS for geolocation) or to use additional privacy iOS and Android have different strategies concerning the enhancing applications. access to the device data. The iOS platform gives to non- native applications access only to the functionalities listed in Both iOS and Android default permission systems mostly privacy settings: location services, contacts, calendar, reminder, inform about data access, but not about any other action that photos, microphone and Bluetooth (sensitive data, such as SMS can be completed with the data. For example, no permission and e-mails are not shared at all). Recently, the connection to is needed to transfer the data. Android and iOS include Facebook, Twitter, Flickr and Vimeo was added to the platform permissions for functionalities that can be related to personal (iOS7). Full functionality is given up for privacy reasons as data transfer, such as Bluetooth and Internet. Permissions can applications cannot use the full power of the platform but only be harmless to users, but there is no indication of whether a limited number of functionalities. personal data is involved in a transaction. This decreases the transparency of both platforms. An iOS application should have permission to access information listed above. By default, the installed application Android and iOS permissions do not include purpose ex- has no permission granted. The application displays a pop- planation. An iOS application helps to understand the purpose up explaining what sensitive data it needs before to access by asking permission while in use, but if an application has it. The user can accept or decline permission. If permission a granted permission once for one functionality it could use is declined, the corresponding action is not executed. If the it again for a different purpose without informing the user. permission is accepted, the application obtains access to the Android users can only guess what permission is used for and corresponding data. The user is asked to grant permission only whether the use is legitimate. once, but he/she can activate or deactivate such permission Table I shows the system differences regarding four main for each application in privacy settings integrated by default privacy notions: full functionality, transparency, control and into the iOS. Thereby iOS maintains transparency, control and privacy by default. One can see that the current Android privacy by default. permission system is missing in transparency, control and The Android system remains on the sharing principle. default privacy; iOS sacrifices the functionality and also lack Full functionality is preserved: applications have access to all of transparency. Permissions are often functionality-related and native applications’ data and can expose the data themselves. users fail to understand and to judge them. Personal data usage Applications need permission to access the data, but differently is unclear and the purpose is missing. from iOS, users should accept the full list of permissions B. Privacy Policy before installing an application. While all permissions are granted, an application has full access to the related data. Some Users should choose applications they can trust. Apple Android permissions tagged as ’dangerous’ can be prompted ensures that applications available on the market are potentially

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 90

98 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

harmless, although Android users should judge the application Privacy Policy should be short and clear. Users should have for themselves with the help of information available on a global vision of the data usage and functionalities before the market. The AppStore and Google Play provide similar they install an application. Users rarely read long involved information: name, description, screenshots, rating and user policies, especially when they want a service and feel they reviews. have no choice but to accept all permissions. Our permission system enables a simple policy to be generated with a list of The transparency and the proactivity of the system can be permissions. improved by including the privacy policy in the store. A user can be informed about the information collected and stored before he downloads the application. Without any privacy pol- A. Privacy by Design Permission System icy, the user can hardly evaluate the security and privacy of the The permission system is integrated into the mobile op- application, only the functionality and stability of the system. erating system; well designed, it makes a proactive privacy- In their feedback, users often evaluate the functionalities and respecting tool embedded into the system. user interfaces and report bugs, but they rarely indicate privacy and security problems. We model our permission system with an access control model. We choose discretionary access control where only data iOS does not require developers to include the privacy owner can grant access. The user should be able to control the policy in the application but only in applications directed at data, therefore we consider the user is a unique owner of all children younger than 13 years old. Apple encourages the use information related to him. of privacy policy in the App Store Review Guidelines and iOS Developer Program License Agreement. Apple specify that Rapp is a set of rules assigned to the application. We define developers should insure the application is compliant with all a rule as an assignment of the Right over an Object to the laws of the country the application is distributed in. On viewing Subject. the App Store Review Guidelines one can see that all Privacy by Design fundamental principles and data violation possibil- ∀rule ∈ Rapp, rule = (s, r, o) (1) ities are covered by Apple verification. However, the exact where s ∈ Subject, r ∈ Right, o ∈ Object evaluation process used by Apple remains secret and some privacy-intrusive applications may appear in the store. Until We define the mobile application as a Subject. Objects recently, Apple authorized the use of device identification. are the user-related data, such as e-mail, contact list, name and This identification number was not considered private. Many surname, phone number, address, social networks friend list, applications used this number to uniquely identify their users etc. therefore many applications were considered privacy intrusive Subject = Mobile Application (2) [26]. Object = {P hone#, Name, Contacts, ···} (3) Google Play Terms of Service do not require any pri- vacy policy to be added to the Android applications. Google To define Right we have to introduce Acation and provides an option to include the privacy policy but does Ppurpose. not verify or enforce it. Google Developers Documentation We define a set of actions denoted Action as all actions provides recommendations and warns that the developer has a can be carried out on user private data by the application: load, responsibility to ensure the application is compliant with the access, process, store and transfer of the data. We define the laws of the countries in which the application is distributed. Action as follows: Some developers include license agreement and privacy Action = {Load, Access, P rocess, Store, T ransfer} (4) policy. According to [27] only 48% of the top 25 Android paid Purpose is assigned by the application and depends on applications, 76% of the top 25 Android free applications, 64% the service. For example, purpose could be ’retrieve forgotten of iOS paid applications and 84% of iOS free applications have password’, ’display on the screen’, ’calculate the trust score’, included the privacy policy. Android includes the permission ’send news’, ’automatically retrieve nearest restaurant’ and list in the store and this can be considered a privacy policy, ’automatically attach location to the message’. but, as previously discussed, the list is unclear to the final user. Purpose = {Retrieve forgotten password, ···} (5)

III.PRIVACYRESPECTINGPERMISSIONSYSTEM We define the set of rights denoted Right for all actions except the Store action as follows. Mobile phones have significant data flow: information can be received, stored, accessed and sent by the application. Data ∀r ∈ Right, ∀a ∈ Action − {Store} , r = (a, p) (6) can be entered by the user, retrieved from the system sensors where p ∈ Purpose. or applications, come from another mobile application, arrive We define the set of rights denoted Right with the action from the server or from other devices. Data can be shared on equal to Store having an additional parameter T ime inform- the phone with another application, with the server or another ing about the time storage. We define the period [0,T ] as an device. application lifetime. We propose to focus permissions on data and the action that can be carried out on this data, rather than on the technology ∀r ∈ Right, if a = {Store} , r = (a, p, t) (7) used. The definition of purpose of the data usage is included in our permission system. where p ∈ Purpose, a ∈ Action, t ∈ [0,T ]

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 91

99 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Figure 1. Example of state modification diagram for a given permission

The time storage can indicate the number of days, hours or months data is stored or the time regarding the application lifecycle: until the application is closed, until the application is stopped, until the application is uninstalled. All personal data available during the deinstallation of the application is deleted regardless of defined period, as it cannot exceed the application Figure 3. Sequence diagram: first use of one permission lifetime. Users should be able to grant or revoke the displayed Each rule should be explicitly asked of the user to be permission. Finally, user should dispose the settings with all assigned. Thus each rule has a State: granted or revoked. To rule ∈ Rapp to be able to Grant or to Revoke individual respect the Privacy by Default notion the default State of the permissions in later use. permission in installed applications is Revoked. We propose The sequence diagram in Figure 3 shows the pattern in to define the State as follows: action when the permission is used for the first time. ∀rule ∈ Rapp, ∀t ∈ ]0,T ] Thus, we obtain the pattern the developer can follow to Granted, user accepts the rule State(rule, t) = (8) design the permission system. The developer should define Revoked, user declines the rule the permission for all personal data (Object) used in the application (Subject). ∀rule ∈ Rapp, State(rule, 0) = Revoked (9) The permission (rule) is stored inside the application with The State of a rule r1 ∈ Rapp changes over the application its current State. The default State is Revoked. Developers lifetime. The diagram in Figure 1 shows an example of the should verify that the permission is displayed and requested state modification. at least once and that it is available in settings for modifica- tion. The simple privacy policy can be generated from the list A given user should be informed about the use of the of defined rules and added to the store. permission: the rule should be defined and displayed for each o ∈ Object. The Figure 2 shows the recapitulative IV. APPLICATION schema of the rule definition. In this section, we propose an example of permission system made for the application of trust evaluation of friends on social networks named Socializer 1.0 [28]. We choose this application because its service is based on private information and cannot be anonymous, the PbD notion should be integrated into this application. This application needs user friend lists of different social networks (Facebook, Twitter, LinkedIn) and the contact list to view friends and common friends to calculate the overlap of friends in different social networks and contact list and to evaluate the trust of Facebook friends. Contact list is found on the smartphone, therefore the application needs an Access right.

r1 = (s, (Access, Pr1), ContactList) (10) where s is a Subject defined as the application Socializer 1.0; Pr1 is a purpose defined as a set of p1, p2 and p3: Pr1 = {p1, p2, p3}; p1= view the list of contacts on the screen; p2= calculate the overlap; p3= calculate the trust. Social networking friends lists should usually be retrieved from the server of a given social network thereby the load and store actions should be defined. The Facebook friends list Figure 2. Activity diagram for the rule definition

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 92

100 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

with the contact list is essential to assure the overlap and trust anonymized statistics to the developer. Those actions should functionality. be taken with the user’s express consent.

r2 = (s, (Load, Pr2), F acebookF riendList) (11) r13 = (s, (T ransfer, p12), F acebookF riendT rustScore) (22) r3 = (s, (Store, Pr2, t1), F acebookF riendList) (12) where t is a storage time defined as: while the application where p12 is a purpose defined as ’share results on Face- 1 book’. is installed; Pr2 is a purpose defined as a set of p2, p3 and p4: Pr1 = {p2, p3, p4}; p4= view the Facebook friend list on the r14 = (s, (T ransfer, p13), screen. F acebookF riendT rustScore) (23)

For each Facebook friend, the list of common friends with where p13 is a purpose defined as ’share results on Twitter’. the user is necessary for trust calculation. r15 = (s, (T ransfer, p14), T rustAndBehavior) (24) r4 = (s, (Load, p3), F acebookCommonF riendLists) (13) where p14 is a purpose defined as ’contribute to the improvement of the methodology’. A list of friends from other social networks improves the scores of overlap and trust. The final application has 15 rules that should be accepted by the user. r5 = (s, (Load, Pr5), T witterF riendList) (14) Rapp = {r1, r2, r3, ··· , r14, r15} (25) r6 = (s, (Store, Pr5, t1), T witterF riendList) (15) The rules r1, r2, r3 and r4 have a common purpose, all where Pr5 is a purpose defined as a set of p5 and p6: rules should be accepted to achieve the functionality mentioned Pr5 = {p5, p6}; p5= view the Twitter friend list on the screen; in the purpose: ’calculate the trust’. Similarly, r5 should be p6= improve the overlap and trust score with Twitter friends; grouped with r6 and r7 with r8. The rules from r9 to r15 should be accepted one by one to achieve the aforementioned purpose r7 = (s, (Load, Pr7), LinkedInF riendList) (16) (to get the functionality). Finally, we obtain 10 permissions to r8 = (s, (Store, Pr7, t2), LinkedInF riendList) (17) be added to the application to propose control to the user. The Table II recapitulates permissions. where Pr7 is a purpose defined as a set of p6 and p7: Pr7 = {p60 , p7}; p60 = improve the overlap and trust score To compare with actual permission systems, (a) iOS with LinkedIn friends; p7= view the LinkedIn friend list on requires contact list, Facebook and Twitter access permis- the screen. sions. (b) Android requires ’internet’, ’read contacts’ and ’get accounts’ access permissions. Facebook and Twitter con- The second functionality of the application is to evaluate nections are managed with APIs that requires permissions to the behaviour of Facebook and Twitter friends to indicate be declared on the platform but the permission management potentially dangerous contacts. The behaviour evaluation is will not be available for users in the mobile application by calculated by analysing the messages published by the given default. iOS permissions give certain transparency to the user friend over time. The application needs a permission to load but Android permissions are vague. messages. r = (s, (Load, p ), T witterF riendMessages) (18) We obtained more fine-grained control of the application 9 8 and the data including permissions to all necessary personal where p8 is a purpose defined as ’calculate the Twitter data, actions carried out on this data and corresponding pur- friends behavior’. poses. The recapitulation table (Table II) clearly shows what data are used for what purpose. This kind of table can be added r10 = (s, (Load, p9), F acebookF riendMessages) (19) to the privacy policy to improve transparency. where p9 is a purpose defined as ’calculate the Facebook friends behaviour’. V. CONCLUSIONAND FUTURE WORK The third functionality proposes to view today Facebook We modelled a permission system for mobile application and Twitter messages on the screen for the user. regarding Privacy by Design. This permission system is data- r11 = (s, (Store, p10, t2), oriented, thus the final user can easily understand what per- T odayT witterF riendMessages) (20) sonal data is involved. We include the action that is missing from current iOS and Android permission systems, such as where p10 is a purpose defined as ’view today Twitter load and transfer, that improves transparency of the application. messages’; t2 is a storage time defined as: one day. The novelty is to include the purpose of the data usage r12 = (s, (Store, p11, t2), into the permission system. The clear purpose will help users T odayF acebookF riendMessages) (21) to understand better why the data is used and to judge whether this permission is needed. Purpose in permission also forces where p11 is a purpose defined as ’view today Facebook messages’. developers to apply the minimization principle: a developer cannot use the data if he cannot define the clear purpose of The user has the option to share the scores by posting usage. The compulsory purpose definition should help guard new messages on Facebook and Twitter. The user can also against the abusive permission declaration ’in case’. Finally, contribute to the research by sending the trust and behaviour purpose gives the user more fine-grained control, as the same

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 93

101 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

TABLE II TABLERECAPITULATINGPERMISSIONSNEEDEDFORTHEAPPLICATION on users and developers could be measured by conducting the (LASTCOLUMNISAPERMISSIONNUMBER) real-life experience. We aim to measure the impact of integra- tion of the new permission system on design and development Object Action Purpose # time, as well as particular situations and difficulties in applying the pattern. We also aim to evaluate user perceptions. We have Contacts list Load View; Calculate an additional hypothesis that the explicative application with Overlap and 1 Facebook friends Load; high transparency improves the user experience and leads to Trust list Store more positive perception of the same application, therefore the Facebook Load use of our permission system gives benefits to the application common friends owner. View; Improve Twitter friends Load; Overlap and 2 list This work can be continued by developing an enforcement Store Trust system automatically verifying whether all necessary permis- View; Improve LinkedIn friends sions are properly defined. Many works propose systems mon- Overlap and 3 list itoring mobile data flow, therefore the permission verification Trust system can be based on one of the already proposed systems. Tw. friends Twitter messages 4 Another important aspect for the developer is to be able to Load behaviour prove the application is compliant with the law. The system Facebook Fb. friends 5 generating on-demand reports on the data, including the private messages behaviour data usage, can be developed. Today Tw. View Tw. 6 messages Store messages Today Fb. View Fb. 7 REFERENCES messages messages Publish to [1] TRUSTe. Consumer Mobile Privacy Insights Report. [retrieved: Apr., 8 2011] Trust score Transfer Twitter Publish to [2] A. Cavoukian, “Privacy by design: The 7 foundational principles,” 2009. 9 Facebook [3] E. data protection regulators, “Opinion 02/2013 on apps on smart devices,” EU, Tech. Rep., Feb. 2013. Trust and Contribute to Transfer 10 [4] K. D. Harris, “Privacy on the go,” California Department of Justice, behaviour research Jan. 2013, pp. 1–27. [5] W. Shin, S. Kiyomoto, K. Fukushima, and T. Tanaka, “Towards For- data can be allowed to be used for one functionality but mal Analysis of the Permission-Based Security Model for Android,” not for another. It is important for our system to integrate in Wireless and Mobile Communications, 2009. ICWMC ’09. Fifth clear purpose and not a vague explanation (e.g., ’measure the International Conference on. IEEE Computer Society, 2009, pp. 87–92. frequency of application utilization’ instead of ’improve user [6] K. W. Y. Au, Y. F. Zhou, Z. Huang, P. Gill, and D. Lie, “Short paper: experience’). a look at smartphone permission models,” in SPSM ’11 Proceedings of the 1st ACM workshop on Security and privacy in smartphones and PbD states that the user should have a control over his data mobile devices. ACM Request Permissions, Oct. 2011, pp. 63–67. and be Privacy by Default, therefore permissions used in the [7] R. Stevens, J. Ganz, V. Filkov, P. T. Devanbu, and H. Chen, “Asking application are revoked by default. Users should be clearly for (and about) permissions used by Android apps.” MSR, 2013, pp. informed and asked to grant permission. Moreover, users 31–40. should keep control of permissions during all the application [8] A. R. Beresford, A. Rice, N. Skehin, and R. Sohan, “MockDroid: trad- ing privacy for application functionality on smartphones,” in HotMobile use-time, therefore the permission setting must be available. ’11: Proceedings of the 12th Workshop on Mobile Computing Systems Our permission system helps developers to be compliant and Applications, ser. HotMobile ’11. ACM Request Permissions, Mar. 2011, pp. 49–54. with the law; it defines what permissions the developer should [9] P. Hornyack, S. Han, J. Jung, S. Schechter, and D. Wetherall, “These add to the application, but in the current state it cannot ensure aren’t the droids you’re looking for: retrofitting android to protect that all necessary permissions are really added. Our pattern data from imperious applications,” in Proceedings of the 18th ACM indicates to the developer what should be added to the appli- conference on Computer and communications security, ser. CCS ’11. cation to be more transparent, but if he decides to transfer data ACM Request Permissions, Oct. 2011, pp. 639–652. without asking permission, the pattern allows this (even if it [10] M. Nauman, S. Khan, and X. Zhang, “Apex: extending Android goes against European law). The generated privacy policy can permission model and enforcement with user-defined runtime con- straints,” in ASIACCS ’10: Proceedings of the 5th ACM Symposium on give the first indication permitting evaluation if the data usage Information, Computer and Communications Security. ACM Request is reasonable and the purpose is clear. Manual verification of Permissions, Apr. 2010, pp. 328–332. an application can show the anomaly in permission system [11] S. Holavanalli et al., “Flow Permissions for Android,” in Automated usage. Software Engineering (ASE), 2013 IEEE/ACM 28th International Con- ference on, 2013, pp. 652–657. We aim to build a framework for the automatic manage- [12] J. Tam, R. W. Reeder, and S. Schechter, “Disclosing the authority ment of a new permission system to simplify the developers applications demand of users as a condition of installation,” Microsoft work. We target the Android system first as it is more crucial Research, 2010. due to the more open communication and data sharing and the [13] S. Egelman, A. P. Felt, and D. Wagner, “Choice Architecture and vagueness of the current permission system. Smartphone Privacy: There’s a Price for That,” in Proceedings of the 11th Annual Workshop on the Economics of Information Security The impact of new privacy-respective permission systems (WEIS), 2012.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 94

102 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

[14] M. Lane, “Does the android permission system provide adequate information privacy protection for end-users of mobile apps? ,” in 10th Australian Information Security Management Conference, Dec. 2012, pp. 65–73. [15] P. G. Kelley et al., “A conundrum of permissions: Installing applications on an android smartphone,” in Proceedings of the 16th International Conference on Financial Cryptography and Data Security, ser. FC’12. Berlin, Heidelberg: Springer-Verlag, 2012, pp. 68–79. [16] A. P. Felt, E. Chin, S. Hanna, D. Song, and D. Wagner, “Android permissions demystified,” in CCS ’11: Proceedings of the 18th ACM conference on Computer and communications security. ACM Request Permissions, Oct. 2011, pp. 627–638. [17] P. Berthome´ and J.-F. Lalande, “Comment ajouter de la privacy after design pour les applications Android? (How to add privacy after design to Android applications?),” Jun. 2012. [18] M. Egele, C. Kruegel, E. Kirda, and G. Vigna, “PiOS: Detecting Privacy Leaks in iOS Applications.” 2011. [19] C. Gibler, J. Crussell, J. Erickson, and H. Chen, “AndroidLeaks: Auto- matically Detecting Potential Privacy Leaks in Android Applications on a Large Scale,” in Trust and Trustworthy Computing. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 291–307. [20] T. Vidas, N. Christin, and L. Cranor, “Curbing android permission creep,” in In W2SP, 2011. [21] Y. Zhang et al., “Vetting undesirable behaviors in android apps with permission use analysis,” in CCS ’13: Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM, Nov. 2013, pp. 611–622. [22] W. Xu, F. Zhang, and S. Zhu, “Permlyzer: Analyzing permission usage in Android applications,” Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on, 2013, pp. 400–410. [23] W. Luo, S. Xu, and X. Jiang, “Real-time detection and prevention of android SMS permission abuses,” in SESP ’13: Proceedings of the first international workshop on Security in embedded systems and smartphones. ACM Request Permissions, May 2013, pp. 11–18. [24] A. P. Felt, K. Greenwood, and D. Wagner, “The effectiveness of appli- cation permissions,” in WebApps’11: Proceedings of the 2nd USENIX conference on Web application development. USENIX Association, Jun. 2011, pp. 7–7. [25]R.B ohme¨ and S. Kopsell,¨ “Trained to accept?: a field experiment on consent dialogs,” in CHI ’10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM Request Permissions, Apr. 2010, pp. 2403–2406. [26] E. Smith, “iPhone applications and privacy issues: An analysis of application transmission of iPhone unique device identifiers UDIDs,” October 2010. [27] “Mobile Apps Study,” Future of Privacy Forum (FPF), pp. 1–16, Jun. 2012. [28] C. Perez, B. Birregah, and M. Lemercier, “A smartphone-based online social network trust evaluation system,” Social Network Analysis and Mining, vol. 3, no. 4, 2013, pp. 1293–1310.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 95

103 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

Effectively Updating Co-location Patterns in Evolving Spatial Databases

Jin Soung Yoo and Hima Vasudevan

Computer Science Department Indiana University-Purdue University Fort Wayne Indiana, USA 46805 Email: yooj,[email protected]

Abstract—Spatial co-location mining has been used for dis- updated, a new data point can make neighbor relationships covering spatial event sets which show frequent association with existing data points as well as other new data points relationships based on the spatial neighborhood. This paper on the continuous space. Thus, all neighbor relationships in presents a problem of finding co-location patterns on evolving the updated database should be examined for the maintenance spatial databases which are constantly updated with fresh data. of co-location patterns. The spatial pattern mining process Maintaining discovered spatial patterns is a complicated process is a computational and data intensive task, therefore simply when a large spatial database is changed because new data points make spatial relationships with existing data points on re-executing a state-of-the-art co-location mining algorithm, the continuous space as well as among themselves. The change whenever the database is updated, can result in an explosion of of neighbor relations can affect co-location mining results with required computational and I/O resources. This paper proposes invalidating existing patterns and introducing new patterns. This an algorithm for effectively updating discovered co-location paper presents an algorithm for effectively updating co-location patterns with the addition of spatial data points. analysis results and its experimental evaluation. The remainder of this paper is organized as follows. Section Keywords–Spatial association mining; Co-location pattern; In- II presents the basic concept of co-location pattern mining and cremental update the related work. Section III describes our algorithmic design concept for incremental co-location mining and the proposed I.INTRODUCTION algorithm. Its experimental evaluation is presented in Section IV. This paper will conclude in Section V. As one of the spatial data mining tasks, spatial association mining is often used for discovering spatial dependencies among objects [1]–[4]. A spatial co-location represents a set II.BASIC CONCEPT AND RELATED WORK of spatial features which are frequently observed together The preliminary knowledge of spatial co-location pattern in a nearby area [3]. Examples of frequently co-located mining and the related work are presented in this section. features/events include symbiotic species such as West Nile incidents and stagnant water sources in epidemiology, and interdependent events such as a car accident, traffic jam, A. Basic concept of spatial co-location mining policemen and ambulances in transportation. In business, co- Let E = {e1,...,e } be a set of event types, and location patterns can be used for finding relationships among m S = {o1,...,on} be a set of their objects with geographic services requested by mobile users in geographic proximity. location. When the Euclidean metric is used for the neighbor Most of the spatial association mining works [3]–[10] relationship R, two objects oi and oj are neighbors of each assume that all data is available at the start of data analy- other if the ordinary distance between them is not greater than sis. However, many application domains including location- a neighbor distance threshold d. A co-location X is a set of based services, public safety, transportation and environmental event types, {e1,...,ek} ⊆ E, whose objects are frequently monitoring collect their data periodically or continuously. For neighbors to each other on space. The co-location instance example, a police department accumulates, on average, 10,000 I of X is defined as a set of event objects, I ⊆ S, which crime incidents per month [11]. For Earth observation, daily includes all types in X and makes a clique under R. climate measurement values are collected at every 0.5 degree The prevalence strength of a co-location is often measured grid of the globe [12]. For keeping the analysis result coherent by participation index value [3]. The participation index with respect to the most recent database status, discovered PI(X) of X = {e1,...,e } is defined as patterns should be updated. k PI(X) = min {PR(X, ei)}, (1) The problem of updating spatial co-location patterns ei∈X presents more challenges than updating frequent itemsets in a traditional transaction database. In the classical association where 1 ≤ i ≤ k, and PR(X, ei) is the participation ratio analysis, the database update means the simple addition of of event type ei in X, which is the fraction of objects of new transaction records, or the deletion of existing records. event ei in the neighborhood of instances of X − {ei}, i.e. PR X, e |distinct objects of ei in instances of X| . If PI X Newly added transaction records are separately handled from ( i) = |objects of ei| ( ) existing records because the database is a collection of disjoint is greater than a given minimum prevalence threshold, we say transaction records. In contrast, when a spatial database is X is a prevalent co-located event set or a co-location.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 96

104 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

(b) Incremental data points and their relationships (c) Incremental B. Related work y . B.5 (changed and new*) C.2 neighbors The problem of mining association rules based on spa- B.2 B.4 tial relationships (e.g., proximity and adjacency) was first Items (Neighbor objects) A.2 A.1 B.1, B.7*, C.1, C.4* discussed by Koperski et al. [1]. Shekhar, et al. [3] defines B.8 New neighbors A.3 B.3, B.8*, C.1, C.5* . B.7 A.4 * {A.5*, B.6*} A.5* B.3, B.6*, B.9*, C.1 the co-location pattern and proposes a join-based co-location C.5 {A.5*, B.9*} B.1 C.4* A.3 mining algorithm. Morimoto [2] studies the same problem C.4 * * {B.7*, C.4*} B.6* A.1 B.7* C.4* B.3 {A.1, B.7*} to discover frequent neighboring service class sets. A space * C.1 {A.1, C.4*} B.8* {A.3, B.8*} B.9* C.1 partitioning and non-overlap grouping scheme is used for B.1 A.5 {A.3, C.5*} B.9 C.3 C.4* finding neighboring objects. Yoo et al. [4], [10] propose join * * {A.5*, B.3} C.5* {A.5*, C.1} less algorithms to reduce the number of expensive spatial join *B.6 {B.1, C.4*} :Old data point x {B.9*, C.1} operations in finding co-location instances. Celik et al. [8] :Old data point having a relationship with a new data point extends the notion of co-location to a local zone-scale pattern. *:New data points Eick et al. [13] proposes a framework for mining regional co- Neighbor objects location patterns and Mohan et al. [14] presents a graph based A.1 B.1 , B.7 , C.1 , C.4 Add A.5; B.6, B.7, B.8, B.9; C.4, C.5 A.2 B.4 , C.2 approach for regional co-location discovery. Recognizing the A.3 B.3 , B.8 , C.1 , C.5 y . B.5 A.4 C.1 dynamic nature of database, much effort has been devoted C.2 Neighbor objects A.5 B.3 , B.6 , B.9 , C.1 B.2 B.4 A.1 B.1 , C.1 B.1 to the problem of incrementally mining frequent itemsets in A.2 B.4 , C.2 B.2 classical association rule mining literature [15]–[19]. However, A.2 A.3 B.3 , C.1 A.4 C.1 B.3 C.1 , C3 to find the problem to update co-location patterns in spatial B.1 B.4 C.2 A.4 B.2 B.5 data mining literature is rare. The most similar work with A.3 B.3 C.1 , C3 B.6 B.4 C.2 B.7 ours is He et al. [20] which is compared in our experimental A.1 C.1 B.3 B.5 B.8 evaluation. C.1 B.9 C.2 C.1 B.1 C.3 C.3 C.2 E.i: instance i of event type E C.3 : identified neighbor relationship Old neighbors III.INCREMENTAL CO-LOCATION MINING C.4 x C.5 Let Sold = {o1,...,on} be a set of old data points in a (a) Old data points and their relationships (d) Updated neighbors spatial database and Sin = {on+1,...,on+h} be a set of new Figure 1. New, Changed and Updated Neighborhoods data points added in the database. Let S be all data points in the updated database, i.e., S=Sold ∪Sin. There are two types of co-location in the update. The retained co-location is an event Let Nnew = {nnew(o1),...,nnew(oh)} be a set of all new set prevalent in both Sold and S. The emerged co-location is an neighborhoods for Sn and Nchg = {nchg(o1),...,nchg(oq)} event set not prevalent in Sold but prevalent in S. We propose be a set of all changed neighborhoods where {o1,...,oq} ⊆ an algorithm of Effective Update of COLOCation patterns Sold. We call the union of Nnew and Nchg to incremental (EUCOLOC). The proposed algorithm has two update stages. neighborhood set (Ninc). The first update stage examines only neighbor relationships of new data points, and finds all retained co-locations and some When an increment Sin is added as shown in Figure 1 (b), emerged co-locations. If an emerged set is found from the first the EUCOLOC algorithm first finds all neighbors (NP ) of new update, the second update stage is triggered for finding other data points in S using a geometric method or a spatial query emerged co-location patterns in the updated database. Figure 3 method (Algorithm Line 2). The incremental neighborhood set shows the pseudo code of EUCOLOC algorithm. (Ninc) is prepared by finding new neighborhoods from NP and detecting changed neighborhoods from the old neighbor- hoods Nold (Line 3 & Figure 1 (c)). Figure 1 (d) shows the A. Neighborhood Process entire neighborhood information (N) of the updated database Directly finding all co-location instances forming clique (Line 4). neighbor relationships from spatial data is computationally ex- pensive. Instead, we process the neighbor relationships related B. First update and detection to the new data points Sin. Let an event set be a border event set if the event set’s all Definition 1: The neighborhood of a new object o ∈ Sin, proper subsets are prevalent, but not prevalent itself. The bor- new neighborhood nnew(o), is defined to {o,o2,...,op|oi ∈ der sets are used for detecting an emerging co-location without S ∧ R(o,oi)=true ∧ o’s event type < oi’s event type}, where the generation and testing of many unnecessary candidates. 2 ≤ i ≤ p. The candidate event sets for the first update are previous co- We assume there is a total ordering among the event types located event sets (Pold) and border event sets (Bold)(Line 7). The incremental co-location instances of the candidate event (i.e., a lexicographic order e). R is a neighbor relationship function. Next, if an existing data point has a neighbor rela- sets are searched from the incremental neighborhoods (Ninc) tionship with at least one new data point, its neighborhood is without examining the entire neighbor relationships. (Line 8 changed. & Figure 1 (c)). A filter-and-refine search strategy is used for finding co-location instances. Let SI = {o1,o2,...,ok} Definition 2: The changed neighborhood of an old object be a set of objects of a candidate set c = {e1, e2,...,ek} o ∈ Sold, nchg(o), is defined to {o,o2,...,op|oi ∈ S ∧ ∃oi ∈ where e1 < e2 ... < ek. If the first object o1 has neighbor Sin ∧ R(o,oj)=true ∧ o’s event type < oi’s event type }, relationships with all other objects in the set, SI is called a where 2 ≤ i ≤ p. star instance of c. The start instances of {e1, e2,...,ek} are

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 97

105 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

  1: procedure PREPROCESS 2: NP ← search neigh pairs(Sin, Sold, R) A B C B−incremental 3: Ninc ← gen incr neigh trans(NP , Nold) neighborhood 4: N ← gen upd neigh trans(Nold, Ninc) transactions B C C clique 5: end procedure {B.1 , C.4*} colocation 1 B.1 , C.4* {B.7*, C.4*} instances 2 B.6* A−incremental C {B.9*, C.1} 3 B.7* , C.4* 6: procedure FIRSTUPDATEDETECTION neighborhood transactions star instances {A.1, B.1, C.1} not incremental instance 4 B.8* 7: C ← Pold ∪ Bold 1 A.1 , B.1, B.7* , C.1, C.4* {A.1, B.1, C.4*} 5 B.9* , C.1 8: SI ← scan incr star inst(C, Ninc) 2 A.3 , B.3, B.8* , C.1, C.5* {A.1, B.7*, C.1} 9: k ← 2 3 A.5* , B.3, B.6*, B.9*, C.1 instance lookup {A.1, B.7*, C.4*} 10: while Ck 6= ∅ do not incremental instance {A.3, B.3, C.1} 11: for all c ∈ Ck do star instances {A.3, B.3, C.5 *} not clique instance 12: {A.3, B.8*, C.1} not clique instance CIc ← find incr clique inst(SIc, NP ) {A.3, B.8*, C.5*} not clique instance 13: PI ← compute incPI(old PB info, CIc) {A.5*,B.3, C.1} 14: if PI ≥ min prev then {A.5*,B.6*, C.1} not clique instance 15: P ← P ∪ c {A.5*,B.9*, C.1} 16: if c ∈ Bold then 17: ES ← ES ∪ c Figure 2. Event subsets and their instance search space 18: end if 19: else 20: B ← B ∪ c 21: end if collected from the neighborhoods of e1 according to Defini- 22: end for tion 1 and 2. The candidate instance SI = {o1,o2,...,ok} of 23: k ← k+1 c = {e1,...,ek} becomes a true co-location instance of c if its 24: end while subinstance {o2,...,ok} forms a clique. The cliqueness of the 25: end procedure subinstance can be checked by simply querying the co-location 26: procedure SECONDUPDATE instances of c’s sub event set {e2,...,e } if the subinstance k 27: if ES 6= ∅ then has at least one new point, as shown in Figure 2. 28: k ← 3 The participation index of a candidate is computed with its 29: Ck ← gen sizeK candidates(Pk−1, ESk−1) 30: while C 6= ∅ do incremental co-location instances (CI ) and previous instance k c 31: SI ← scan star instances(Ck, N) metadata (old PB info) which has the object information 32: for all c ∈ Ck do of its old co-location instances (Line 13). The prevalence of a 33: CIc ← find clique instances(SIc) 34: if compute PI(CI ) min prev then candidate c = {e1,...,ek} is updated with c ≥ 35: P ← P ∪ c; ES = ES ∪ c incP I(c) = min{incP R(c, ei)}, (2) 36: elseB ← B ∪ c ei∈c 37: end if 38: end for where 1 ≤ i ≤ k, and incP R(c, ei) is the updated participation 39: k ← k+1 ratio of event type ei with the incremental co-location instances 40: Ck ← gen sizeK candidates(Pk−1, ESk−1) |Oi S Ii| 41: end while of c, incP R(c, ei) = , where |Soldi | is the total |Soldi |+|Sini | 42: end if number of old objects of ei, |Sini | is the total number of new 43: Pold ← P ; Bold ← B; Nold ← N; Sold ← Sold ∪ Sin objects of ei, Oi is a set of distinct objects of ei in the old co- 44: return P ; 45: end procedure location instances of c, and Ii is a set of distinct objects of ei in the incremental co-location instances of c. If the participation Figure 3. EUCOLOC algorithm index is greater than min prev, the event set is a co-location (∈ P ) (Line 14-15). If this co-location is from the border set Bold, it also becomes an emerged co-location (∈ ES) (Line The star instances of candidates are collected from the entirely 16-17). updated neighborhood transactions (N)(Line 31). The true co- location instances are filtered from the candidate instances. The C. Second update stage prevalence value of a candidate is calculated using the original If any emerged set is found from the first update stage, there participation index (Equation (1)) because this set is a new is a possibility of finding other emerged event sets according candidate with no previous instance metadata. If the candidate to the following lemma. is prevalent, it becomes an emerged co-location. Otherwise, the set is included in the border set for future update. The Lemma 1: Let X be a co-located event set that is prevalent second update is repeated with the increase of the pattern size in the updated set S = Sold ∪ Sin but not prevalent in the old until no more candidate (Line 30-41). set Sold. Then there exists a subset Y ⊆ X such that Y is an emerged event set. IV. EXPERIMENTAL EVALUATION Proof: Let Y be a minimal cardinality subset of X that is We compared the performance of EUCOLOC with two prevalent in S, not in S . Since Y is a prevalent event set in old other co-location mining algorithms. One (denoted as IMCP S, so are all of is proper subsets. However, by the minimality in this paper) has an update function [20]. The implementation of Y , none of these subsets are new prevalent sets in S. Thus, of this algorithm is based on our understanding of the work. Y is a border set in S , and Y ⊆ X as claimed. old The other (denoted as GeneralColoc) does not have an update In the second update, a candidate is an event set which function [4]. All the experiments were performed on a Linux has at least one emerged event set as its subset (Line 29). system with 8.0 GB memory, and 2.67 GHz CPU.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 98

106 / 107 PATTERNS 2014 : The Sixth International Conferences on Pervasive Patterns and Applications

min_prev_threshold=0.05, distance=10 Real climate dataset, distance=6, # of old points=7,729, # of new points=7,787 prevalence threshold. In the future, we plan to explore these 50 500 EUCOLOC EUCOLOC problems. 45 EUCOLOC-fixed inter relationship(5%) IMCP 40 IMCP 400 GeneralColoc 35 30 300 REFERENCES 25 20 200 [1] K. Koperski and J. Han, “Discovery of Spatial Association Rules in

Execution time 15 Execution time Geographic Information Databases,” in Proceedings of the International 10 100 Symposium on Large Spatial Data bases, 1995, pp. 47–66. 5 0 [2] Y. Morimoto, “Mining Frequent Neighboring Class Sets in Spatial 1,200 2,400 3,600 4,800 0.15 0.2 0.25 0.3 0.35 0.4 Number of incremental data points Mimimum prevalence threshold Databases,” in Proceedings of the ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining, 2001, pp. 353–358. (a) (b) [3] Y. Huang, S. Shekhar, and H. Xiong, “Discovering Co-location Patterns from Spatial Datasets: A General Approach,” IEEE Transactions on Figure 4. Experiment Result (a) By incremental data size (b) By prevalence Knowledge and Data Engineering, vol. 16, no. 12, 2004, pp. 1472– threshold 1485. [4] J. S. Yoo and S. Shekhar., “A Join-less Approach for Mining Spatial Co-location Patterns,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 10, 2006, pp. 1323–1337. [5] J. S. Yoo and S. Shekhar, “A Join-less Approach for Spatial Co- In the first experiment, we compared the performance of location Mining: A Summary of Results,” in Proceedings of the IEEE EUCOLOC and IMCP by varying the incremental size of International Conference on Data Mining, 2005, pp. 813–816. synthetic data. The number of distinct event types was 50. The [6] H. Xiong, S. Shekhar, Y. Huang, V. Kumar, X. Ma, and J. S. Yoo, number of old data points was 10,020. The first incremental “A Framework for Discovering Co-location Patterns in Data Sets with set has 1,200 data points. The second incremental set is two Extended Spatial Objects,” in Proceedings of the SIAM International Conference on Data Mining, 2004, pp. 78–89. times bigger than the first set. The third one was three times [7] J. Yoo and M. Bow, “Finding N-Most Prevalent Colocated Event Sets,” bigger than the first set, and so on. The ratio of old data points in Proceedings of the International Conference on Data Warehousing which have relationships with new points was increased with and Knowledge Discovery, 2009, pp. 415–427. the increase of new data (i.e., 5%, 10%, 15% and 20%). As [8] M. Celik, J. M. Kang, and S. Shekhar, “Zonal Co-location Pattern shown in Figure 4 (a), the execution times of both EUCOLOC Discovery with Dynamic Parameters,” in Proceedings of the IEEE and IMCP increased with the incremental data size. The International Conference on Data Mining, 2007, pp. 433 – 438. EUCOLOC showed better performance than IMCP. When the [9] J. S. Yoo and M. Bow, “Mining Spatial Colocation Patterns: A Different ratio of relationships with old data points was fixed to 5%, the Framework,” Data Mining and Knowledge Discovery, vol. 24, no. 1, 2012, pp. 159–194. execution times of EUCOLOC were stable, or very slowly [10] J. S. Yoo and S. Shekhar, “A Partial Join Approach for Mining Co- increased. The performance of EUCOLOC depends on the location Patterns,” in Proceedings of the ACM International Symposium inter-neighbor relationship ratio. on Advances in Geographic Information Systems, 2004, pp. 241–249. We also conducted the evaluation of EUCOLOC with real [11] “San Francisco Crime Incidents,” https://data.sfgov.org/. climate measurement data [12]. The total number of processed [12] “Earth Observation Data,” http://data.giss.nasa.gov/. event types was 18. 7,728 event records were used for the old [13] C. F. Eick, R. Parmar, W. Ding, T. F. Stepinski, and J. Nicot, “Finding Regional Co-location Patterns for Sets of Continuous Variables in Spa- data. 7,787 new event records were added for the incremental tial Datasets,” in Proceedings of the ACM SIGSPATIAL International data. We used 6 as a neighborhood distance, which means 6 Conference on Advances in Geographic Information Systems, 2008, pp. cells on latitude-longitude spherical grids, where each grid cell 1–10. is 1 degree × 1 degree. About half of the old event objects had [14] P. M. et al., “A Neighborhood Graph based Approach to Regional Co- neighbor relationships with the new ones. Figure 4 (b) shows location Pattern Discovery: A Summary of Results ,” in Proceedings the result. EUCOLOC showed slowly increasing execution of the ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2011, pp. 122–132. time than other algorithms when the prevalence threshold was [15] N. Ayan, A. Tansel, and E. Arkyn, “An Efficient Algorithm to Update decreased. Large Itemsets with Early Pruning,” in Proceedings of the International Conference on Knowledge Discovery and Data Mining, 1999, pp. 287– V. CONCLUSION 291. [16] D. Cheung, J. Han, V. Ng, and C. Y. Wong, “Maintenance of Discov- In this paper, we presented an algorithm for efficiently ered Association Rules in Large Databases: An Incremental Updating mining co-location patterns in evolving spatial databases. The Technique,” in Proceedings of the IEEE International Conference on proposed algorithm has two update stages. The first update Data Engineering, 1996, pp. 106 – 114. stage is 1) to avoid the generation and testing of many unnec- [17] S. Thomas and S. Chakravarthy, “Incremental Mining of Constrained Associations,” High Performance Computing (HiPC), vol. 1970, 2000, essary candidates using the border concept, 2) to search only pp. 547–558. incremental neighborhoods for the update, and 3) to update the [18] D. Cheung, S. D. Lee, and D. Kao, “A General Incremental Technique prevalence value of current co-locations with their incremental for Maintaining Discovered Association Rules ,” in Proceedings of instances and minimal previous co-located object information. the International Conference on Databases Systems for Advanced The second update stage is used for only finding new co- Applications, 1997, pp. 185 – 194. located event sets (emerged ones). The initial experimental [19] S. Thomas, S. Bodagala, K. Alsabti, and S. Ranka, “An Efficient evaluation shows our algorithmic design decision is effective Algorithm for the Incremental Updation of Association Rules in Large Databases,” in Proceedings of the ACM International Conference on in updating discovered co-location patterns. The proposed Knowledge Discovery and Data Mining, 1997, pp. 263–266. algorithm can be easily extended to handle the case of deleted [20] J. He, Q. He, F. Qian, and Q. Chen, “Incremental Maintenance of data points. Our approach can be adopted for the cases of Discovered Spatial Colocation Patterns,” in Proceedings of Data Mining change of important parameters, such as neighbor distance and Workshop, 2008, pp. 399 – 407.

Copyright (c) IARIA, 2014. ISBN: 978-1-61208-343-8 99

Powered by TCPDF (www.tcpdf.org) 107 / 107