21th International Configuration Workshop

Proceedings of the 21th International Configuration Workshop

Edited by Lothar Hotz, Michel Aldanondo, Thorsten Krebs

September 18 – 19, 2019

Hamburg, Germany

Organized by

University of Hamburg Hamburger Informatik Technologie‐Center e.V. Department of Computer Science Vogt‐Kölln‐Str. 30, 22527 Hamburg GERMANY

ISSN 1613‐0073

Lothar HOTZ, Michel ALDANONDO, Thorsten KREBS, Editors Proceedings of the 21th International Configuration Workshop September 18‐19, 2019, Hamburg, Germany Chairs Lothar Hotz, University of Hamburg, HITeC, Hamburg, Germany Michel Aldanondo, Toulouse University, Mines Albi, France Thorsten Krebs, encoway GmbH, Bremen, Germany

Program Committee Michel Aldanondo, Toulouse University, Mines Albi, France Tomas Axling, Tacton Systems, Denmark Andrés Felipe Barco, Universidad Santiago de Cali, Colombia David Benavides, University of Seville, Spain Andreas Falkner, Siemens AG, Austria Alexander Felfernig, Graz University of Technology, Austria Cipriano Forza, University of Padova, Italy Gerhard Friedrich, University of Klagenfurt, Austria Paul Grünbacher, Johannes Kepler University Linz, Austria Albert Haag, Product Management GmbH, Germany Alois Haselböck, Siemens AG, Austria Petri Helo, University of Vaasa, Finland Lothar Hotz, University of Hamburg, HITeC, Germany Dietmar Jannach, University of Klagenfurt, Austria Thorsten Krebs, encoway GmbH, Bremen, Germany Tomi Männistö, University of Helsinki, Finland Mikko Raatikainen, Aalto University, Finland Rick Rabiser, Johannes Kepler University Linz, Austria Sara Shafiee, Technical University of Denmark, Denmark Markus Stumptner, University of South Australia, Australia Juha Tiihonen, University of Helsinki, Finland Elise Vareilles, Toulouse University, Mines Albi, France Yue Wang, Hang Seng Management College, Hong Kong Linda Zhang, IESEG Business School of Management Paris, France

Local Arrangements Lothar Hotz, University of Hamburg, HITeC, Germany Evelyn Staske, HITeC, Germany

Preface

Configuration is the task of composing product models of complex systems from parameterisable components. This task demands for powerful knowledge‐representation formalisms to capture the great variety and complexity of configurable product models. Furthermore, efficient reasoning and conflict resolution methods are required to provide intelligent interactive behavior in configurator software, such as solution search, satisfaction of user preferences, personalization, or optimization.

The main goal of the Configuration Workshop is to promote high‐quality research in all technical and application areas related to configuration. In this year, besides typical contributions about knowledge representation and reasoning in configuration, adaptation and re‐configuration of delivered products is a one focus.

The workshop is of interest for both, researchers working in the various fields of Artificial Intelligence (AI) technologies as well as industry representatives interested in the relationship between configuration technology and the business problem behind configuration and mass customization. It provides a forum for the exchange of ideas, evaluations and experiences especially in the use of AI techniques within these application and research areas.

The 2019 Workshop on Configuration continues the series of workshops started at the AAAI'96 Fall Symposium and continued on IJCAI, AAAI, and ECAI since 1999. In recent years, the workshop was held independently from major conferences.

This year special thanks has to be given to following Configuration Workshop Sponsors: Siemens (Austria), Product Management Haag (Germany), Variantum (Finland), EventHelpr (Austria), encoway (Germany), IMT Mines‐Albi‐Carmaux (France), HITeC (Germany), University of Hamburg (Germany)

Lothar Hotz, Michel Aldanondo, and Thorsten Krebs

September 2019 Contents

Consistency Management

Coping with Inconsistent Models of Requirements 1 Juha Tiihonen, Mikko Raatikainen, Lalli Myllyaho, Clara Marie Lüders, and Tomi Männistö

Consistency‐based Merging of Variability Models 9 Alexander Felfernig, Mathias Uta, Gottfried Schenner, and Johannes Spöcklberger

Conversational Recommendations Utilizing Model‐based Reasoning 13 Oliver Tazl, Alexander Perko, and Franz Wotawa

Decision Biases in Preference Acquisition 20 Martin Stettinger, Alexander Felfernig, and Ralph Samer

Product and Service Configuration

Enrichment of Geometric CAD Models for Service Configuration 22 Daniel Schreiber, Lukas Domarkas, Paul Christoph Gembarski, and Roland Lachmayer

Applications and Benefits smartfit: Using Knowledge‐based Configuration for Automatic Training Plan Generation 30 Florian Grigoleit, Peter Struss, and Florian Kreuzpointner

Prioritizing Products for Profitable Investments on Product Configuration Systems 38 Sara Shafiee, Lars Hvam, and Poorang Piroozfar

A Search Engine Optimization Recommender System 43 Juan Camilo Duque Delgado, Christian David Hoyos, Andrés Felipe Barco Santa, and Elise Vareilles

Comparing the Gained Benefits from Product Configuration Systems 48 Sara Shafiee, Lars Hvam, and Anders Haug

Configuration Requirements

Reusing Components across Multiple Configurators 53 Amartya Ghosh, Anna Myrodia, Lars Hvam, and Niels Henrik Mortensen

Adaptive Autonomous Machines – Requirements and Challenges 61 Lothar Hotz, Stephanie von Riegen, Matthias Riebisch, Markus Kiele‐Dunsche, and Rainer Herzog

Constraint Solver Requirements for Interactive Configuration 65 Andreas Falkner, Alois Haselböck, Gerfried Krames, Gottfried Schenner, and Richard Taupe

Configuration and Standards

Portfolio Management: How to Find Your Standard Variants 73 Frank Dylla, Daniel Jeuken, and Thorsten Krebs

Copyright © 2019 for the individual papers by the papers' authors. Copyright © 2019 for the volume as a collection by its editors. This volume and its papers are published under the Creative Commons License Attribution 4.0 International (CC BY 4.0). Coping with Inconsistent Models of Requirements

Juha Tiihonen1 and Mikko Raatikainen1 and Lalli Myllyaho1 and Clara Marie Luders¨ 2 and Tomi Mannist¨ o¨1

Abstract. Issue trackers are widely applied for requirements engi- issues are related to the requirements planned for the next releases. neering and product management. They typically provide good sup- To aggravate the problem, the data in a tracker is heterogeneous and port for the management of individual requirements. However, holis- often inconsistent. Thus, trackers are not optimal for the concerns of tic support for managing the consistency of a set of requirements product management or requirements engineering that need to deal such as a release is largely missing. The quality of issue data may be with different requirement options, alternatives, and constraints, as insufficient for global analyses supporting decision making. We aim well as their dependency consequences when deciding what to do or to develop tools that support product management and requirement not to do. This lack of support exists despite dependencies are found engineering also in cases where the body of requirements, e.g. for to be one of the key concerns that need to be taken into account in a software release, is inconsistent. Software releases can be seen as requirements prioritization [7, 1, 17] and release planning [16, 2]. configurations of compatible, connected requirements. Our approach Our objective is to help holistic management of requirements described in this paper can identify inconsistent elements in bod- while issue trackers are utilized. The specific focus is on the ap- ies of requirements and perform diagnoses using techniques from plication of technologies common in the field of Knowledge Based Knowledge Based Configuration. The research methodology follows Configuration (KBC) to support the stakeholders who are required to the principles of Design Science: we built a prototype implementa- deal with dependent requirements and issues in a tracker in their daily tion for the approach and tested it with relevant use cases. The work. We support decision making such as configuration of release Company has large sets of real requirement data in their Jira issue plans instead of automating it as needs are not known well enough tracker. We characterize that data and use it for empirical perfor- and criteria are hard to formalize. We describe the technical approach mance testing. The approach can support product management and of a system that aims to provide such support. The system is based on requirements engineering in contexts where large, inconsistent bod- generating a requirement model that closely resembles a traditional ies of requirements are typical. Empirical evaluation shows that the configuration model. We also provide data which in practice shows approach scales to usage in large projects, but future work for im- that the approach fits in the context and scales even to large projects. proving performance is still required. Value in real use is highly plau- We aim to address the following research questions: What are the sible but demonstration requires tighter integration with a developed major requirements of the system? What are the characteristics of visualization tool, which would enable testing with real users. real requirements data? How does the performance of computation scale up? The applied research methodology follows Design Science in the sense that the aim is to innovate a novel approach and bring it 1 Introduction into a specific new environment so that the results have value in the Over the years Issue trackers have become important tools to man- environment [10, 19]. The context of research has been the Horizon 3 age data related to products. The trackers are especially popular in 2020 project OpenReq . Our primary case has been large-scale, globally distributed open source projects [4, 5], such as (see Section 3) with a large database of issues. Bugzilla for , Github tracker for Spring Boot, and Jira for Qt. Previous work: Dependencies in Requirements In the field of re- A tracker can contain thousands of bugs and other issues reported by quirements engineering research, both industrial studies [11, 18], and different stakeholders. These issues typically become requirements release planning [16] and requirements prioritization [17] methods for a future release of a product. These requirements are often re- emphasize importance of dependencies but lack details for the se- lated to each other—it is not uncommon to have the same require- mantics of dependencies. However, taxonomies have been proposed ment more than once thus being related as similar or duplicate; or one for requirements dependencies [13, 3, 6, 20]. These include structural requirement requires another requirement. However, trackers primar- dependencies, such as refines or similar; constraining dependencies, ily provide support for individual requirements over their life cycle. such as require or conflict; and value-based dependencies, such as Even though dependencies can sometimes be expressed for each in- increases value or increases costs. Although the taxonomies share dividual requirement, more advanced understanding or analysis over similarities with each other, the taxonomies vary in terms of size and all issues and their dependencies in a system is not well supported: clarity of dependency semantics. Only a few taxonomies have been Developers do not conveniently see related requirements; a require- studied empirically so that saturated evidence for an established or ment engineer cannot deal with requirements as an interconnected general taxonomy has not emerged. entity; and a product manager does not see what requirements and This paper is structured as follows. Section 2 introduces our ap- proach for holistically supporting requirement management and de- 1 University of Helsinki, Finland, email: {juha.tiihonen, mikko.raatikainen, scribes the system we developed. Section 3 describes the real indus- tomi.mannisto, lalli.myllyaho}@helsinki.fi 2 University of Hamburg, Germany, email:[email protected] hamburg.de 3 https://openreq.eu/

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

1 trial context applied for evaluation. Performance testing in Section 4 hours) and version string. The version strings conform to the com- covers both the approach and results. These results are analysed in mon notation: E.g., ’4’, ’4.1’ and ’4.1.12’ are version strings. They Section 5. Discussion forms Section 6. Finally, Section 7 concludes. can be amended with prefixes and suffixes, e.g. ABC-4.12.1RC1 rep- resents Release Candidate 1 of the version 4.12.1 of product ABC6. 2 Approach & System Dependencies are binary relationships between two requirements. The types of dependencies with clear semantics are summarized in Software releases or a set of requirements can be seen as configura- Table 1. In the table, relra and priora specify the assigned release tions of mutually compatible requirements. Consistency check and and priority of requirement ra, respectively. Assignment to release diagnosis techniques that are commonplace in KBC can be applied 0 means that the requirement is not assigned to any release. Many in the context of software release planning. of the dependencies are similar to those identified in [9]. The depen- dency duplicates(ra, rb) is managed in pre-processing by collecting 2.1 Context: Characteristics of issue tracker data all dependencies of rb to ra. The compositional structure of requirements is expressed as de- The characteristics of requirements in a tracker have a profound ef- composition dependencies. This part-of hierarchy of requirements fect on a practical approach. The requirements are manually reported seems to be typically 4 levels of depth at maximum. For instance, by different people—the granularity, level of detail, and quality dif- Epics can have user stories, and user stories can have task. fer. These differences would remain even if all issues had been man- ually reviewed, as e.g. in the case of Qt’s triage process that can even send an issue back for further information or clarification. Even the 2.3 Solution Functionality typology or purpose of issues, such as epics for feature requests and bugs for deficiencies, is not always adhered to. Duplicated or similar The current main functionalities for requirement engineering are issues are not uncommon: A bug or feature request can be reported consistency checks and diagnosis services as well as computation of by several persons, each possibly providing some unique character- transitive closure. The user interface for and visualization of depen- istics or details that need to be preserved. The semantics of depen- dencies by OpenReq Issue Link Map7[12] is vital for practical usage dencies between requirements is not always completely clear and the but not the focus here. dependencies are not applied consistently by different people. The Transitive closure service computes a transitive closure of a Re- relationships are not even necessarily marked at all. As a result, the quirement in focus of analysis by following all links in breath-first data in a tracker is in practice doomed to be inconsistent and incom- manner up to the specified depth in terms of the number of depen- plete. Therefore inference on the whole database is difficult or even dencies followed. By adjusting the desired depth, different contexts meaningless. Correcting the whole database is practically hopeless or of analysis can be formed. For releases, the current implementation at least impractical. Therefore, we believe it is more fruitful to pro- calls the service for each requirement of the release and combines vide requirement engineering with tools that can help to cope with the results. the less-than-perfect data. Consistency check analyzes a defined contexts of analysis formed by a set of requirements and their dependencies, priorities, and re- 2.2 Conceptualization of the problem leases. The following aspects are checked: Each binary dependency must satisfy the semantics of the dependency as defined in Table 1. If the whole tracker database is likely to remain inconsistent, could Here, the assigned release and the priority of each requirement is we restrict the focus to some relevant subsets? Our approach is based taken into account. If effort consumption is specified, the sum of ef- on this idea. We support analyzing a requirement and its neighbour- forts of requirements assigned to a release must be less or equal than hood. A requirement is taken to the point of focus. We follow any the capacity of the release. The analysis reports aspects such as in- relationships (described below) of that issue to neighbour issues. A consistent relationships and resource consumption per release. Both transitive closure of issues within desired depth is calculated as a human-friendly messages and machine-friendly JSON data fields are graph. Depth is the minimal distance between two issues. The tran- included in the response. sitive closure is used as the context for analyses. Another natural Diagnosis can be optionally performed in conjunction of a consis- context of analysis is a release. An issue can be assigned to a specific tency check. Diagnosis attempts to provide a ’repair’ by removing re- release such as 4.12.1. The combined neighbourhoods of the issues quirements or dependencies. Requirement removal is justified espe- of a release can be taken as the context of analysis4. Consequently, cially when capacity consumption is excessive. It is also possible that for a given context of analyses, a requirement model is dynamically assignments or dependencies have been performed in a faulty man- generated. The requirement model is then mapped (through several ner. Therefore, a diagnosis (1) Can consider requirements as faulty; layers) into a formal model that supports inference. We combine in- (2) Can consider relationships as faulty; and (3) Can consider both ference with procedural analysis of inconsistencies, which readily requirements and relationships as faulty. enumerates local sources of inconsistencies even when the require- If all the elements proposed by a diagnosis (relationships, depen- ment model is inconsistent. dencies) are removed (requirement is unassigned, represented by as- We follow (and extend) the OpenReq datamodel5 [15]. Hence, any signing it to release 0), a consistent release plan is achieved. Diag- issue is considered as a Requirement that is characterized, among nosis can also fail. For example, removing only relationships cannot others, by priority (integer, smaller number is higher) and effort (in- fix excessive resource consumption. Diagnosis is based on the FAST- teger, e.g., in hours), status such as ’planned’ or ’complete’ as well DIAG algorithm [8]. as requirement text. A requirement can be assigned to a Release.A Release is characterized by startDate, releaseDate, capacity (e.g., in 6 We apply the Maven Comparable Versions: https://maven.apache.org/ref/3.6.0/maven- 4 release-based analysis is in early stages of development artifact/apidocs/org/apache/maven/artifact/versioning/ComparableVersion.html 5 https://github.com/OpenReqEU/openreq-ontology 7 https://api.openreq.eu/openreq-issue-link-map

2 Table 1. Semantics of Dependencies

Dependency Closest type in [9] Semantics Description excludes(ra, rb) atmostone(relra, relrb) relra = 0 ∨ relrb = 0 at most one out of {ra, rb} has to be assigned to a release incompatible(ra, rb) different(relra, relrb) relra 6= relrb ∨ relra = 0 ∨ {ra, rb} have to be implemented in different releases relrb = 0 requires(ra, rb) weakprecedence(relrb, relra) relra = 0 ∨ (relrb ≤ relra ∧ rb must be implemented before ra or in the same release, relrb > 0) or ra is not in any release implies(ra, rb) strongprecedence(relrb, relra) relra = 0 ∨ (relrb < relra ∧ rb must be implemented before ra or ra is not in any relrb > 0) release decomposition(ra, rb) (none) relra = 0∨(relra > 0∧relrb > Whole ra is not complete without part rb: rb must be 0 ∧ (relrb ≤ relra ∨ priorb > implemented at the same release or before ra or rb has priora) a lower priority so it can be assigned to a later release. A better name would be haspart(ra, rb)

For example, assume that Release 1 of capacity 3 (hours) has as- signed requirements REQ1 (effort:2h) and REQ2 (2h). Release 2 of capacity 4 has REQ3 (3h) and there is a dependency excludes(REQ1, REQ2). Analysis and diagnosis results would include, among others, (white space modified):

{"response": [{ "AnalysisVersion": "analysis", "AnalysisVersion_msg": "Analysis and consistency check", "Consistent": false, "Consistent_msg": "Release plan contains errors", "RelationshipsInconsistent": [{ "From": "REQ1", "To": "REQ2", "Type": "excludes"}], "RelationshipsInconsistent_msg": "Relationships that are not respected (inconsistent): rel_REQ1_excludes_REQ2", "Releases": [{ ** Release 0 omitted||() ... "Release": 1, "Release_msg": "Release $1$", "RequirementsAssigned": [ "REQ2", "REQ1"], "RequirementsAssigned_msg": "Requirements of release: REQ2, REQ1", "AvailableCapacity": 3, "CapacityUsed": 4, "CapacityBalance": -1, "CapacityUsageCombined_msg": "Capacity: available 3h, used 4h, remaining -1h"}, ... Figure 1. The architecture of the system. {"AnalysisVersion": "reqdiag", "AnalysisVersion_msg": "Requirements diagnosis", "Consistent": true, "Consistent_msg": "Release plan is correct", "Diagnosis": { "DiagnosisRequirements": [ "REQ1" ], "DiagnosisRelationships": []}, Problem and uses the Choco Solver [14] and FastDiag [8]. The ad- "Diagnosis_msg": "Diagnosis: remove these requirements (REQ1) AND these relationships ((none) )", ditional functionality of KeljuCaas is to form and maintain a graph ... {"Release": 1, containing all received requirements for caching purpose for large "Release_msg": "Release $1$", "RequirementsAssigned": ["REQ2"], "RequirementsAssigned_msg": "Requirements of release: REQ2", data sets. The graph can then be searched for related requirements in "AvailableCapacity": 3, "CapacityUsed": 2, "CapacityBalance": 1, "CapacityUsageCombined_msg": "Capacity: available 3h, used 2h, remaining 1h"}, a transitive closure of a single requirement for the specified depth that ... is used for visualization and analysis. Mulperi service operates as a pipe-and-filter facade component to transform data and format data The Diagnosis of Requirements would suggest removing REQ1. to KeljuCaaS and provides its answers back to the caller. For exam- Updated capacity calculations and resulting release assignments are ple, in a case of a small data, Mulperi can directly send data to Kelju- reported. Diagnosis of only relationships cannot succeed, because of CaaS, whereas in a case of large data such as in Qt’s Jira, Mulperi the excess capacity. differentiates functionality to send data to KeljuCaaS to construct the graph and any requested consistency check first queries this graph for 2.4 Solution Architecture and Implementation a transitive closure for desired depth that is then sent for consistency check. The reason for separating the functionality of Mulperi from We have implemented the approach as a service-based system con- KeljuCaaS is to keep inference in a more generic service. sisting of independent services8, which in practice operate in a chore- The integration services provide integration with existing require- ographic manner combining the pipe-and-filter and layered architec- ments management systems, specifically with Qt’s Jira. The key fa- tural styles (Fig. 2). The services collaborate through message-based cade and orchestrator service is called Milla. Milla imports Qt’s Jira interfaces following REST principles. issues as JSON from the Jira’s REST interface and converts them into The basic services realize the concepts described above by two Java objects. These objects are sent from Milla to Mallikas database services: KeljuCaaS and Mulperi. KeljuCaaS is a Configurator-as-a- for caching storage as well as to Mulperi for processing. Milla is also Service, whose responsibility is to provide analyses for models that able to fetch new or modified issues from Jira to keep data up to date. it receives from Mulperi. Currently, KeljuCaas provides functionality Mallikas is a simple database for storing Qt’s Jira issues as objects. described in Section 2.3 based on information described in Section It uses the H2 database engine and Java Persistence API to cache the 2.2. For consistency check, KeljuCaas has a procedural component data. This improves performance and avoids constant access to Jira. that checks the model for inconsistencies and reports them. These in- The user interface is provided with OpenReq Issue Link Map9 consistencies may result from dependencies between requirements (Fig. 2). The user interface shows a 2D diagram of dependencies including their assignments to releases, priority violations, or re- from the desired issue by selected depths. An issue can be searched or quirement efforts exceeding the capacity of the release. For diagno- clicked on the diagram. On the right, tabs separate basic information, sis, KeljuCaas converts the release plan into a Constraint Satisfaction

8 EPL licensed https://github.com/OpenReqEU 9 The demo version is available through https://openreq.eu/tools-data/

3 Table 2. The number of different issues types in Qt’s Jira.

Project total bug epic user task sugg- change story estion request QTPLAYGROUND 15 11 0 0 0 4 0 QTWB 23 16 0 1 3 3 0 QTSOLBUG 193 122 0 0 8 63 0 QTSYSADM 261 16 0 0 242 2 0 QTJIRA 280 162 0 2 39 77 0 QSR 399 123 6 34 229 7 0 QDS 558 265 12 26 195 60 0 QTVSADDINBUG 629 514 0 21 14 80 0 QTWEBSITE 676 519 5 0 21 121 0 AUTOSUITE 871 330 67 159 298 17 0 PYSIDE 890 754 0 39 41 56 0 QTCOMPONENTS 1144 617 9 186 293 39 0 QTIFW 1266 931 2 12 119 202 0 QBS 1397 955 6 4 226 206 0 QTMOBILITY 1926 1538 0 0 93 149 146 QTQAINFRA 2635 915 29 120 1444 127 0 QT3DS 3292 1685 52 165 1227 163 0 QTCREATORBUG 21217 16975 3 76 1163 2979 21 QTBUG 74287 58583 223 623 6182 8636 40 Total 111959 85031 414 1468 11837 12991 207

sub-task, and technical task but there are no clear guidelines of use Figure 2. A screen capture of OpenReq issue link map. and the usage is not consistent. Thus, we do not differentiate between different task types. The issue types define common properties as name-value pairs, dependency detection, and the results of consistency check. customizable by issue type. The property values can be text, such as for a title and description; an enumerated value from a closed set, 3 Evaluation Context: The Qt Company & Jira such as for priority; an enumerated value from an editable and ex- tending set, such as for release numbers or users; or a date. Each We demonstrate practical application of our approach in realistic set- issue can have comments. The change history of the issue is logged. tings and evaluate performance using Jira of the Qt Company. The Qt The relevant properties in this context are priority and fix version. Pri- Company is a public company having around 300 employees and the ority has predefined values from P0 to P6. P0 ’blocker’ is the highest headquarters in Finland. Its product, Qt10 is a software development priority and P6 is the lowest priority. A fix version refers to the re- kit that contains a software framework and its supporting tools. The lease in which the issue has been or will be completed and adheres software framework is targeted especially for cross-platform mobile to maven convention described above. applications, graphical user interfaces, and embedded application de- Jira has six different directed dependency types knows as links: velopment. A well-known application example using Qt is the Linux duplicate, require, replace, results, tests, relates (cf. the top row of KDE desktop environment but most of today’s touch screen and em- Table 3). Only ’requires’ and ’duplicate’ have a clear semantics. The bedded systems with a screen use Qt. other dependency types are used non-uniformly. In addition, Jira has decomposition (parent-child) relationship. 3.1 Jira’s Data Model at the Qt Company Issues in Epic is used to add any other type of issues than epic as child to an epic. Sub-task relations are used to add tasks as child 11 All requirements and bugs of Qt are managed in the Qt’s Jira issue to other issues than tasks. However, the semantics is the same for all 12 tracker that has been in use for over 15 years. Jira is a widely used decomposition relationships even though the name differs. As the re- issue tracker that provides many issue types and a lot of function- sult, issues can have an up to three level compositional hierarchy. ality, especially for individual issue management. All product plan- The resulting rules regarding the dependencies are the following: ning at Qt is performed using Jira, despite attempts to integrate with All child issues, which have the same or higher priority, must not be roadmapping tools. Qt has configured Jira for its needs. In the fol- assigned to a later release; any required issue must not have a later lowing, we describe the Jira data as applied at Qt. release or lower priority; and all links from a duplicated issue are Jira is organized into projects consisting of issues. The issues are inherited by the duplicate issue divided into different issue types as shown at the top row of Table 2. A bug refers basically to any deficiency found from the existing soft- ware. However, the difference between a deficiency and a new feature 3.2 Data Quantity and Characteristics is not always clear. A bug report can request also new features. Epic, The data in Qt’s Jira is divided into public and private parts. The user story, task and suggestion each refer to new development ideas private part includes a couple of thousand issues of confidential cus- or features. Change requests are used infrequently without clear pur- tomer projects and Qt’s strategic product management issues. We fo- pose in most projects. A task actually differentiates between a task, cus here only to the public part because it is significant enough as it 10 https://www.qt.io/ contains most (roughly 98%) of the issues, and describes most tech- 11 https://bugreports.qt.io nical details. 12 https://www.atlassian.com/software/jira Qt Jira is divided into 19 projects (Table 2). ’QTBUG’ is the main

4 Table 3. The number of different dependency types in total and internal pointing to an issue in the same project.

Total Internal

Project total part duplicate require replace results tests relates total part duplicate require replace results tests relates QTPLAYGROUND 0 0 0 0 0 0 0 0 0 (0%) 0 0 0 0 0 0 0 QTWB 9 1 6 0 0 0 0 1 2 (22%) 0 2 0 0 0 0 0 QTSOLBUG 13 0 0 2 6 0 0 4 7 (53%) 0 0 1 6 0 0 0 QTSYSADM 11 0 0 0 2 6 0 2 9 (81%) 0 0 0 1 6 0 2 QTJIRA 17 0 1 2 8 0 0 6 12 (70%) 0 0 0 7 0 0 5 QSR 364 306 1 48 0 2 0 7 333 (91%) 284 1 41 0 1 0 6 QDS 265 191 2 45 0 5 0 22 205 (77%) 172 1 19 0 1 0 12 QTVSADDINBUG 77 2 26 21 7 4 2 15 73 (94%) 1 26 21 7 2 2 14 QTWEBSITE 21 8 0 2 0 0 0 8 16 (76%) 8 0 1 0 0 0 7 AUTOSUITE 326 255 4 36 2 9 0 20 259 (79%) 203 3 27 1 6 0 19 PYSIDE 147 14 28 26 2 12 0 65 127 (86%) 13 25 23 1 10 0 55 QTCOMPONENTS 311 169 0 66 10 35 0 31 265 (85%) 169 0 29 10 32 0 25 QTIFW 248 51 34 33 60 9 0 53 140 (56%) 41 30 6 29 4 0 30 QBS 290 57 17 67 36 13 0 96 237 (81%) 50 12 50 28 10 0 87 QTMOBILITY 299 169 0 29 26 33 0 42 268 (89%) 169 0 17 19 26 0 37 QTQAINFRA 1221 566 37 384 23 51 1 152 712 (58%) 421 23 152 19 19 0 78 QT3DS 1816 1170 11 231 6 173 0 225 1705 (93%) 1170 7 189 6 159 0 174 QTCREATORBUG 4056 592 530 343 1198 221 8 1141 2975 (73%) 366 478 172 956 131 4 868 QTBUG 15366 3567 1858 3371 1280 1063 11 4152 13767 (89%) 3390 1826 2880 1009 913 6 3743 Total 24857 7118 2555 4706 2666 1636 22 6042 21112 (84%) 6457 2434 3628 2099 1320 12 5162

ticipate in a graph with only one and two other issues that can include private issues. 84497 (75% ) of issues are orphans meaning that they do not have any explicit dependency to another issue. QTBUG has the most rigorous release cycle that we describe as follows. In total, there are 164 releases, out of which 26 are empty releases without any issues. We did not investigate the reasons for empty releases but it is possible that issues have been moved to some other release and release is not done. The average and median num- ber of issues in a non-empty release is 194 and 122, respectively. Three releases have a large number of issues: 5.0.0/2142, 4.8.0/882, and 4.7.0/1571. Since 5.0.0. released December 2012, Qt5 has al- ready 111 releases. Qt 6.0.0 is planned for the November 2020.

4 Performance Evaluation 4.1 Approach for performance testing

Figure 3. The number of dependent issues at different depths. We tested end-to-end performance of our system with consistency checks and diagnosis, because they concern main functionality and are potentially computationally heavy. Performance tests for an individual issue used each issue of a project covering the Qt framework itself and ’QTCREATORBUG’ project in turn as a starting point root issue. The transitive closure is the IDE for the framework. These two projects are the largest but of different depths (1, 2, ...) was calculated for the root issue form- also the most relevant ones. The other projects are much smaller and ing a test set. The test were carried out to all issues at all existing some of them are even inactive. The number of different issue types depths. As the depth increased, the number of existing graphs at that and dependencies are shown in Tables 2 and 3, respectively. It is also depth decreased resulting in carrying out test to different sub-graphs noteworthy that while most dependencies are internal to a project it of a small number of large graphs. Issues without dependencies were is not uncommon to have dependencies between projects. filtered out. Consistency check and, depending on the test, diagnosis The dependencies form a set of graphs between the issues through were performed for the test set with a timeout. To limit execution their relationships transitively. Figure 3 illustrates the sizes of such time required by testing, testing of the root item with even greater graphs in small depths. In the data, there is one graph that contains depths was not performed after the first time-out was encountered. 6755 issues as its nodes and the greatest depth in this graph is 52 de- The test were ran as Unix-like shell scripts for the system running pendencies (edges). The remaining graphs are significantly smaller, in the localhost. The system exhibits overhead caused by the service the next largest ones containing 376, 164, 118, 114, and 91 issues, architecture. In order to estimate the overhead of architecture and and depths of 29, 21, 5 and 8 dependencies, respectively. As Figure 3 testing for the response times, we carried out consistency check for illustrates, the number of issues can grow relatively quickly when a set of 1000 issues that do not have any dependencies. The time depth grows. There are also small graphs: 9431 and 5488 issues par- required for the consistency check should be minimal. The response

5 Figure 4. Time for consistency check using 3s timeout. Max at depth 1 is probably an error. Figure 5. Consistency check results in percentages by depth using 3s time-out. The yellow line shows the count of the executed test at each depth. times were: average 128ms, minimum 103ms, maximum 216, and 4.3 Diagnosis and Consistency Check of an standard deviation 9ms. Individual Issue The tests were carried out using a 64bit Windows 10 laptop hav- ing Intel Core i5-7200 CPU @2.5GHz and 16GB RAM. The tests The second test performed consistency checks with diagnosis as de- were executed typically over the night but the computer was also oc- scribed in Section 4.1 for QTCREATORBUG. All three diagnoses casionally used at the same time for office tasks such as text editing are invoked in the case of an inconsistent test set. As the diagnosis is especially when execution had not yet completed. The tests used the carried out only for inconsistent issue graphs, we excluded consistent data retrieved from Qt Jira in May 2019. graphs. The results of the execution time with respect to the num- ber of dependencies are much worse than without diagnoses. The 3000ms time-out is quite tight, because the system performs three 4.2 Consistency Check of an Individual Issue separate diagnoses, leaving, on the average, slightly less than one second for each. The results (Fig. 6) show that the timeouts start al- The first performance test measures consistency checks without diag- ready from depth 3. While at depth 7 only 17% resulted in timeout, nosis as described in Section 4.1 for QTCREATORBUG using 3ms the following depths timeout became frequent (depth 8/65%, 9/81%, timeout. The largest graph in QTCREATORBUG has the maximum and 10/89%), and at level 18 all resulted in a timeout. Inspecting the depth of 4813 and this graph contains 6755 issues out of which 466 graph sizes, two smallest graphs causing a timeout were only 26 and are in QTCREATORBUG. We carried out 18950 consistency checks 75 issues. This may have been caused by computer overloading for tests successfully while 7789 tests caused timeout—or would have other use. Starting from the third smallest graph resulting a time- been scheduled for the same issue at a greater depth than the first out at the size of 129 issues, timeouts become frequent and the 30th timeout. After level 36, which contained 300 tests, all tests caused smallest graph causing a timeout has only 149 issues. timeout and tests at the greater depth are omitted from below. Figure 4 exhibits the time required for the consistency check and Figure 5 the respective results of the consistency check. The lines 4.4 Consistency Check of a Release take into account timeouts: For example, 75% percentile line ends Performance tests for a release follow the above scenario of individ- at the depth 19 when over 25% tests results cause timeout because ual tests except that a release consists of a set of issues rather than the percentile cannot be calculated anymore. The first timeouts took a single issue. Therefore, a root can consist of several issues. In the place at depth 14 for two items that had 4350 and 4253 issues in current implementation, a graph of each single issue of a release was their graphs. In fact, the smallest graph that contained a timeout was fetched. All graphs were sent to consistency checker. For any larger 3816 issues. Until depth of 5, over 60% of test sets are consistent but release consisting of several issues, significant overhead was caused adding depth quickly decreases the share of consistent test sets. As a comparison, the same test script was ran for all Jira data using another laptop running Cubbli Linux (Ubuntu variant of University of Helsinki) having Intel Core i5-8250U CPU @1.60GHz and 16 GB RAM. These test were carried out during a weekend when the com- puter was otherwise idle. The tests took about 25 hours. These tests used another snapshot of all data in Jira, which was about half year old data downloaded for development and testing purposes. For ex- ample, the largest graph of maximum depth 47 in this data consisted of only 3146 issues. That is, some of the graphs were apparently combined later by new dependencies. 171498 tests were executed. No timeout occurred and the longest execution time was 2652ms.

13 The form of the graph is such that using any QTCREATORBUG node as Figure 6. Consistency check and diagnosis of project QTCREATORBUG: starting points does not create the maximal depth of 52, i.e., the nodes are Number of inconsistent and time-out results per depth. not at the ’outer front’ of the graph.

6 6 Discussion

Validity of this work is exposed to some threats. First, the tests were performed with all project data available. Therefore transitive clo- sures span several projects. This has a side effect on the reliability of the results: Because versions are not comparable across projects, cross-project dependencies may be consistent or inconsistent in a faulty manner. Therefore, we decided against reporting the number of erroneous dependencies. Because versions are not comparable across projects, many dependencies would be considered as not satisfied al- though they are satisfied, and vice versa. Second, it is noteworthy to observe that the results for the num- ber of dependencies greater than 100 are done for the different sub- Figure 7. Consistency check results for QTBUG’s releases using 10s graphs of the few large graphs. Some of these sub-graphs are very timeout and the smallest number of issues in the timeouting release. similar as the test are done for all possible sub-graphs. In particular, our test scripts analyzed numerous sub-graphs of the largest graph with 6755 dependencies as a different issue of the graph was se- by generating all the graphs and repeating same data over and over. lected as the root issue. A preliminary inspection did not indicate Here, we applied 10s timeouts. Consistency checks were performed that this large graph or its sub-graphs would otherwise differ from for all non-empty releases of QTBUG. The five largest releases con- other graphs but this would deserve a more thorough analysis. taining over 700 issues caused an error at the REST interface of the Third, we did not control the test environment rigorously. Espe- service due to the number of parameters given as root. Fig. 7 illus- cially other software running at the same time probably affected the trates the results. The first timeout occurred at depth 3 with a release results. Even the computer was a normal office laptop rather than a containing 610 requirements. It is omitted from Fig. 7 for readability. proper server computer. Despite the above mentioned non-trivial threats to construct valid- ity, our view is that the big picture of results is still valid, although 5 Analysis of performance results some details might be incorrect. In other words, our view is that the approach performs well enough for practical use at Qt. We believe The performance tests were carried out using the Jira data of the that this can be generalized to other contexts too. Qt Company that we consider as a large empirically valid data set. Future work is required to more realistically gain benefits from Although more complex synthetic data could be constructed, the Jira the approach and the system developed. data forms a more realistic and solid base for performance testing for The visualization tool should be extended so that it can highlight fast enough response times. inconsistent dependencies and also show diagnosis results graphi- The dependencies form graphs of issues as transitive closures that cally. In our view, these extensions will showing diagnosis vary in their depths and sizes. We used the depth from a selected issue results to stakeholders much more intuitive than current textual de- as a variable to vary the sizes of graphs. A user cannot initially know scriptions. Empirical studies on the benefits of the approach are best the size of the graph at the context of analysis. Thus, means to limit performed with this support at hand. the size are required, and limiting the depth is a natural approach. The In addition to individual issues and releases of different depths, majority of issues remain orphans. It is noteworthy that there is only other context of analysis can be relevant. For example, small projects one very large graph of over 6000 issues, a few graphs of hundreds of can potentially be relevant contexts of analysis, such as QT3DS that issues, and several graphs of tens of issues. The releases merge these has around 3300 requirements and is under active development. Sim- graphs whenever they include issues from different graphs. ilarly, a specific component or domain, such as Bluetooth or all net- For all issues of Qt, the performance is adequate for performing working, could form a context of analysis or a be used as a filtering consistency analysis of the neighborhood of an issue interactively, factor similarly as depth. practically even with any depth. The timeouts started to appear for The visualization tool currently can visualize the neighbourhood graphs of around 4000 issues and the depth of 14. Diagnosis is more of a requirement up to 5 levels of depth. When all 5 levels exist, the computationally heavy but it performs quite well until depth 7 which graph has on the average 170 dependencies. This would suggest that is adequate taking into account that small (3 s) timeout was applied, 5 levels is enough. However, the minimum is 5 issues and the 10% three diagnoses were performed and the number of issue is in average percentile has only 21 issues. The ability to constrain the scope of close to 200 already at depth 5. the graph is important because too large graphs may not be useful Consistency checks for releases work sufficiently well until depth for stakeholders and smaller contexts of analysis are easy for con- 6 in most of the cases. When the context of analysis is a release, sistency checks and perform well also with diagnosis. Instead of a the performance results are probably significantly too pessimistic: fixed depth limit, it might be practical to be give as parameters any This new feature has no direct support for calculating the transitive desired depth and an upper bound on the number of issues to retrieve closure. Instead, individual transitive closures of the issues of the for visualization and analysis. release are calculated and finally combined. This leaves significant As the issue tracker data is manually constructed by different potential for optimizing. stakeholders, not all dependencies are marked. We are studying the We applied mainly a three second time-out in performance testing detection of dependencies via natural language processing. The chal- to shorten test duration with large data sets. In our view, analysis and lenges with Qt’s Jira data in many approaches is that they can pro- diagnosis of a whole release justifies, also from user point of view, a pose too many dependencies, they are computationally heavy and the much longer time-out value in the worst case scenarios. semantics of only duplication dependency is easy to detect. In consis-

7 tency check, proposed dependencies should probably not be treated under grant agreement No 732463. We thank Elina Kettunen, Miia as equal to existing ones unless manually accepted by a user. Ram¨ o¨ and Tomi Laurinen for their contributions to implementation. We currently consider the whole database of issues but do not take into account many of the issue properties. For example, status, res- REFERENCES olution, creation date, and modification date could be taken into ac- count as filters. For example, inconsistent dependencies among com- [1] Philip Achimugu, Ali Selamat, Roliana Ibrahim, and Mohd Naz’ri Mahrin, ‘A systematic literature review of software requirements prior- pleted, very old issues may be irrelevant, even if they are broken. itization research’, Information and Software Technology, 56(6), 568– Besides Jira, a tight integration to other trackers could be added by 585, (2014). developing similar integration services. However, it is already possi- [2] David Ameller, Carles Farre,´ Xavier Franch, and Guillem Rufian, ‘A ble to communicate through a JSON-based REST interface. survey on software release planning models’, in 17th International We focused primarily on the technical approach and its perfor- Conference Product-Focused Software Process Improvement (PRO- FES), pp. 48–65, (2016). mance. The user point of view was considered only in terms of the [3] P. Carlshamre, K. Sandahl, M. Lindvall, B. Regnell, and J. Natt och relevant size of issue graphs. While users should be studied in more Dag, ‘An industrial survey of requirements interdependencies in soft- depth, also the technical proposals deserves user studies. Currently ware product release planning’, in Proceedings Fifth IEEE Interna- the system calculates and provides all three different diagnoses. If tional Symposium on Requirements Engineering, pp. 84–91, (2001). [4] John Wilmar Castro Llanos and Silvia Teresita Acuna˜ Castillo, ‘Differ- the user is interested in only one of them, a significant increase of ences between traditional and open source development activities’, in performance would be achieved simply by performing only the de- Product-Focused Software Process Improvement, pp. 131–144, (2012). sired diagnosis. Besides repairing a release plan by removing incon- [5] Morakot Choetkiertikul, Hoa Khanh Dam, Truyen Tran, and Aditya sistent requirements or relationships, future work could consider re- Ghose, ‘Predicting the delay of issues with due dates in software assigning requirements to other releases. However, such decisions projects’, Empirical Software Engineering, 22(3), 1223–1263, (Jun 2017). are at the heart of product management decisions. There are often [6] Asa˚ G. Dahlstedt and Anne Persson, Engineering and Managing Soft- aspects in decision making of product management that are not easy ware Requirements, chapter Requirements Interdependencies: State of to formalize. It may often be more important to get understanding the Art and Future Challenges, 95–116, Springer, 2005. about the problem than get less-than-solid proposals of repair. [7] Maya Daneva and Andrea Herrmann, ‘Requirements prioritization based on benefit and cost prediction: A method classification frame- work’, in 34th Euromicro Conference on Software Engineering and Ad- vanced Applications (SEAA), pp. 240–247, (2008). 7 Conclusions [8] A. Felfernig, M. Schubert, and . Zehentner, ‘An efficient diagnosis al- This work is a contribution in the area of KBC: we assist in pro- gorithm for inconsistent constraint sets’, Artificial Intelligence for En- gineering Design, Analysis and Manufacturing, 26(01), 53–62, (2011). ducing consistent, connected configurations of requirements, apply [9] Alexander Felfernig, Johannes Spocklberger,¨ Ralph Samer, Martin techniques of KBC to a relatively new domain, and apply our ap- Stettinger, Musl¨ um¨ Atas, Juha Tiihonen, and Mikko Raatikainen, ‘Con- proach to a large set of real industrial data providing evidence that figuring release plans’, in Proceedings of the 20th Configuration Work- the approach is viable. We identified major requirements and de- shop, Graz, Austria, September 27-28, 2018., pp. 9–14, (2018). [10] S. Gregor, ‘The nature of theory in information systems’, MIS Quar- veloped an approach that can support product management, require- terly, 30(3), 611–642, (2006). ments engineering and developers in practically important use cases [11] Laura Lehtola, Marjo Kauppinen, and Sari Kujala, ‘Requirements pri- in contexts where large, inconsistent bodies of requirements are typ- oritization challenges in practice’, in 5th International Conference ical. Empirical evaluation shows that the approach scales to usage in Product Focused Software Process Improvement: (PROFES), pp. 497– large projects, but future work for improving performance in some 508, (2004). [12] Clara Marie Luders,¨ Mikko Raatikainen, Joaquim Motger, and Walid use cases is still required. Maalej, ‘Openreq issue link map: A tool to visualize issue links in jira’, The approach builds on considering a body of requirements as a in IEEE Requirements Engineering Conference, (2019 (submitted)). configuration of requirements that should be consistent, but it often [13] Klaus Pohl, Process-centered requirements engineering, John Wiley & is not. Via neighbourhoods of different depth from a requirement or Sons, Inc., 1996. [14] Charles Prud’homme, Jean-Guillaume Fages, and Xavier Lorca, Choco a release, we support different sizes of contexts of analysis. Contexts Documentation, TASC, INRIA Rennes, LINA CNRS UMR 6241, of a reasonable size facilitate solving identified problems. COSLING S.A.S. www.choco-solver.org, 2016. With the support developed, developers can conveniently visualize [15] Carme Quer, Xavier Franch, Cristina Palomares, Andreas Falkner, related requirements and their dependencies; a requirement engineer Alexander Felfernig, Davide Fucci, Walid Maalej, Jennifer Nerlich, can identify problematic dependencies and attempt to remedy them; Mikko Raatikainen, Gottfried Schenner, Martin Stettinger, and Juha Ti- ihonen, ‘Reconciling practice and rigour in ontology-based heteroge- and a product manager can more easily manage the consistency of neous information systems construction’, in The Practice of Enterprise a release. The performance of the tool is adequate for these tasks, Modeling, pp. 205–220, (2018). except that the diagnosis of a whole release needs further work and [16] Mikael Svahnberg, Tony Gorschek, Robert Feldt, Richard Torkar, modifications to the REST interface that cannot currently accommo- Saad Bin Saleem, and Muhammad Usman Shafique, ‘A systematic re- view on strategic release planning models’, Information and Software date releases of 700 issues of more. Technology, 52(3), 237 – 248, (2010). Value in real use is highly plausible but demonstration requires [17] R. Thakurta, ‘Understanding requirement prioritization artifacts: a sys- tighter integration with a developed visualization tool, which would tematic mapping study’, Requirements Engineering, 22(4), 491–526, enable experiments with real users. Our work can be seen as (con- (2017). tinuation of) extending Knowledge Based Configuration to require- [18] A. Vogelsang and S. Fuhrmann, ‘Why feature dependencies challenge the requirements engineering of automotive systems: An empirical ments engineering and product management. study’, in 21st IEEE International Requirements Engineering Confer- ence (RE), pp. 267–272, (2013). [19] Roel J Wieringa, Design Science Methodology for Information Systems ACKNOWLEDGEMENTS and Software Engineering, Springer, 2014. [20] H. Zhang, J. Li, L. Zhu, R. Jeffery, Y. Liu, Q. Wang, and M. Li, ‘Inves- This work is a part of OpenReq project that is funded by the Eu- tigating dependencies in software requirements for change propagation ropean Union’s Horizon 2020 Research and Innovation programme analysis’, Information and Software Technology, 56(1), 40–53, (2014).

8 Consistency-based Merging of Variability Models

Mathias Uta1 and Alexander Felfernig2 and Gottfried Schenner3 and Johannes Spocklberger¨ 2

Abstract. Globally operating enterprises selling large and complex integrated into a distributed configuration process in which individ- products and services often have to deal with situations where vari- ual configurators are responsible for configuring individual parts of a ability models are locally developed to take into account the require- complex product or service. The underlying assumption is that indi- ments of local markets. For example, cars sold on the U.S. market vidual knowledge bases are consistent and that there are no (or only are represented by variability models in some or many aspects differ- a low number of) dependencies between the given knowledge bases. ent from European ones. In order to support global variability man- The merging of knowledge bases is related to the task of exploiting agement processes, variability models and the underlying knowledge various merging operators to different belief sets [5, 11]. For exam- bases often need to be integrated. This is a challenging task since an ple, Delgrande and Schaub [5] introduce a consistency-based merg- integrated knowledge base should not produce results which are dif- ing approach where the result of a merging process is a maximum ferent from those produced by the individual knowledge bases. In consistent set of logical formulas representing the union of the in- this paper, we introduce an approach to variability model integration dividual knowledge bases. In the line of existing consistency-based that is based on the concepts of contextual modeling and conflict analysis approaches, the resulting knowledge bases represent a logi- detection. We present the underlying concepts and the results of a cal union of the original knowledge bases that omits minimal sets of corresponding performance analysis. logical sentences inducing an inconsistency [12]. Contextual model- ing [8] is related to the task of decentralizing variability knowledge related development and maintenance tasks. 1 Introduction Approaches to merging feature models represented on a graph- Configuration [7, 14] is one of the most successful applications of Ar- ical level on the basis of merging rules have been introduced, for tificial Intelligence technologies applied in domains such as telecom- example, in [16, 13]. In this context, feature models including spe- munication switches, financial services, furniture, and software com- cific constraint types such as requires and excludes, are merged in a ponents. In many cases, configuration knowledge bases are repre- semantics-preserving fashion. Compared to our approach, the merg- sented in terms of variability models such as feature models that pro- ing of variability models introduced in [16, 13] is restricted to spe- vide an intuitive way of representing variability properties of com- cific constraint types and does not take into account redundancy. plex systems [10, 4]. Starting with rule-based approaches, formaliza- Our approach provides a generalization to existing approaches es- tions of variability models have been transformed into model-based pecially due to the generalization to arbitrary constraint types and knowledge representations which are more applicable for the han- redundancy-free knowledge bases as a result of the merge operation. dling of large and complex knowledge bases, for example, in terms We propose an approach to the merging of variability models (repre- of knowledge base maintainability and expressivity of complex con- sented as constraint satisfaction problems) which guarantees seman- straints [2, 7]. Examples of model-based knowledge representations tics preservation, i.e., the union of the solutions determined by indi- are constraint-based representations [15], description logic and an- vidual constraint solvers (configurators) is equivalent to the solution swer set programming (ASP) [7]. Besides variability reasoning for space of the integrated variability model (knowledge base). In this single users, latest research also shows how to deal with scenarios context, we assume that the knowledge bases to be integrated (1) are where groups of users are completing a configuration task [6]. In this consistent and (2) use the same variable names for representing indi- paper, we focus on single user scenarios where variability models are vidual item properties (knowledge base alignment issues are beyond represented as a constraint satisfaction problem (CSP) [3, 15]. the scope of this paper). There exist a couple of approaches dealing with the issue of in- The contributions of this paper are the following. (1) We provide tegrating knowledge bases. First, knowledge base alignment is the a short analysis of existing approaches to knowledge base integra- process of identifying relationships between concepts in different tion and point out specific properties of variability model integration knowledge bases, for example, classes describe the same concept scenarios that require alternative approaches. (2) We introduce a new but have different class names (and/or attribute names). Approaches approach to variability knowledge integration which is based on the supporting the alignment of knowledge bases are relevant in scenar- concepts of contextualization and conflict detection. (3) We show the where numerous and large knowledge bases have to be integrated applicability of a our approach on the basis of a performance analy- (see, for example, [9]). Ardissono et al. [1] introduce an approach sis. to distributed configuration where individual knowledge bases are The remainder of this paper is organized as follows. First, we in- troduce a working example from the automotive domain (see Section 1 Siemens Erlangen, Germany, email: [email protected] 2). On the basis of this example, we introduce our approach to vari- 2 Graz University of Technology, Austria, email: alexan- ability model integration (merging) in Section 3. In Section 4, we [email protected], [email protected] 3 Siemens AG, Austria, email: [email protected] present a performance evaluation. Section 5 includes a discussion of

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

9 threats to validity of the presented merging approach. The paper is on the basis of predefined contextualization variables. For example: concluded in Section 6 with a discussion of issues for future work. assuming a context variable country(US,GER), each constraint c[i]us of the US knowledge base is contextualized with (transformed into) country = US → (c ) c : fuel 6= hybrid 2 Example Variability Models [i]us . Constraint 1us would be translated into c1us0 : country = US → (fuel 6= In the following, we introduce a working example which will serve hybrid). CKBus and CKBger have been transformed into their 0 0 0 as a basis for the discussion of our approach to knowledge integration contextualized variants CKBus and CKBger where CKBus ∪ 0 0 (Section 3). Let us assume the existence of two different variability CKBger = CKB . models. For the purpose of our example, we introduce two car con- 0 figuration knowledge bases represented as a constraint satisfaction • CKBus: {country(US), type(combi, limo, city, suv), color(white, problem. One car configuration knowledge base is assumed to be black), engine(1l, 1.5l, 2l), couplingdev(yes,no), fuel(electro, 0 defined for the U.S. market and one for the German market. For sim- diesel, gas, hybrid), service(15k, 20k, 25k), c1us : country = 0 plicity, we assume that (1) both knowledge bases are represented as US → (fuel 6= hybrid), c2us : country = US → (fuel = a constraint satisfaction problem (CSP) [15] and (2) that both knowl- electro → couplingdev = no), c3us : country = US → edge bases operate on the same set of variables and corresponding (fuel = diesel → color = black)} 0 domain definitions.4 Our two knowledge bases consisting of variable • CKBger: {country(GER), type(combi, limo, city, suv), color(white, black), engine(1l, 1.5l, 2l), couplingdev(yes,no), definitions and corresponding constraints {CKBus, CKBger} are the following. fuel(electro, diesel, gas, hybrid), service(15k, 20k, 25k), 0 0 c1ger : country = GER → (fuel 6= gas), c2ger :

• CKBus: {country(US), type(combi, limo, city, suv), color(white, country = GER → (fuel = electro → couplingdev = no), 0 black), engine(1l, 1.5l, 2l), couplingdev(yes,no), fuel(electro, c3ger : country = GER → (fuel = diesel → type 6= city)} diesel, gas, hybrid), service(15k, 20k, 25k), c1us : fuel 6= The solution spaces of the contextualized knowledge bases hybrid, c2us : fuel = electro → coupling− dev = no, 0 0 CKBus and CKBger are shown in Table 2. They have the same c3us : fuel = diesel → color = black} solution spaces as CKBus and CKBger. • CKBger: {country(GER), type(combi, limo, city, suv), color(white, black), engine(1l, 1.5l, 2l), couplingdev(yes,no), [!ht] fuel(electro, diesel, gas, hybrid), service(15k, 20k, 25k), c1ger : fuel 6= gas, c2ger : fuel = electro → couplingdev = no, Table 2. Solution spaces when merging knowledge bases. c3ger : fuel = diesel → type 6= city}

In these knowledge bases, we denote the variable country as con- Knowledge base #solutions textual variable since it is used to specify the country a configuration 0 CKBus 288 0 belongs to but is not directly associated with a specific component of CKBger 324 0 0 0 the car. Table 1 shows a summary of the solution spaces (in terms CKB = CKBus ∪ CKBger 612 0 0 of the number of potential solutions) that are associated with the CKBus ∩ CKBger 126 country-specific knowledge bases CKBus and CKBger. For sim- plicity, we kept the number of constraints the same in both knowl- edge bases, however, the integration concepts introduced in Section On the basis of such a contextualization, we are able to preserve 3 are also applicable to knowledge bases with differing numbers of the consistency and semantics of the two original knowledge bases in the sense that (1) the solution space (CKB1) is equivalent to the constraints. 0 solution space (CKB1), (2) the solution space (CKB2) is equivalent [!ht] 0 to the solution space (CKB2), and (3) the solution space (CKB1) ∪ 0 solution space (CKB2) is equivalent to the solution space (CKB1 ∪ Table 1. Solution spaces of individual knowledge bases. 0 0 CKB2 = CKB ). Based on this representation, we are able to (1) get rid of contex- 7 Knowledge base #constraints #solutions tualizations (see line of Algorithm 1) that are not needed in the integrated version of the two original configuration knowledge bases CKBus 3 288 and (2) delete redundant constraints (see line 15 of Algorithm 1). In CKBger 3 324 Line 7 it is checked whether a contextualization is needed for the constraint c (c is the decontextualized version of c0). If the negation of c is consistent with the union of the contextualized knowledge 3 Merging Variability Models bases, solutions exist that support ¬c. Consequently, c must remain contextualized. Otherwise, the contextualization is not needed and c In this section, we introduce our approach to merge variability mod- is added to the resulting knowledge base – with this, it replaces c0, els represented as constraint satisfaction problems (CSPs) [15]. Our i.e., the corresponding contextualized constraint. approach is based on the assumption that the constraints of the two Each constraint in the resulting knowledge base CKB (the de- original knowledge bases CKB1 and CKB2 are contextualized, contextualized knowledge base) is thereafter checked with regard to i.e., each constraint of knowledge base CKB1 gets contextualized redundancy (see Line 15). A constraint c is regarded as redundant if 4 We are aware of the fact that this assumption does not hold for real-world CKB − {c} is inconsistent with ¬c. In this case, c does not reduce scenarios in general. However, we consider tasks of concept matching as the search space and thus can be deleted from CKB – it is redundant an upstream task we do not take into account when integrating knowledge with regard to CKB. bases on a formal level.

10 0 0 efficiency can be explained by the fact that a higher degree of contex- Algorithm 1 CKB-MERGE(CKB1,CKB2): CKB tualization includes more situations where the inconsistency check in 1: {CKB 0 0 : two contextualized and consistent configuration 1 ,2 Line 7 (Algorithm 1) terminates earlier (a solution has been found) knowledge bases} compared to situations where no solution could be found. In addi- 2: {c0: a contextualized version of constraint c} tion, Table 4 indicates that the performance of solution search does 3: {CKB: knowledge base resulting from merge operation} not differ depending on the degree of contextualization in the result- 4: CKB ← ∅; 0 ing knowledge base CKB. 5: CKB ← CKB 0 ∪ CKB 0 ; 1 2 Consequently, integrating individual variability models can trig- 6: for all c0 ∈ CKB0 do ger the following improvements. (1) De-contextualization in CKB 7: if inconsistent({¬c} ∪ CKB0 ∪ CKB) then can lead to less cognitive efforts when adapting / extending knowl- 8: CKB ← CKB ∪ {c}; edge bases (due to a potentially lower number of constraints and a 9: else lower degree of contextualization). (2) Reducing the overall number 10: CKB ← CKB ∪ {c0}; of constraints in CKB can also improve runtime performance of the 11: end if resulting integrated knowledge base. 12: CKB0 ← CKB0 − {c0}; 13: end for 14: for all c ∈ CKB do Table 3. Avg. runtime (msec) of CKB-MERGE measured with different 15: if inconsistent((CKB − {c}) ∪ {¬c}) then knowledge base sizes (CKB0) and shares of contextualized constraints in 16: CKB ← CKB − {c}; CKB (10-50% contextualization). 17: end if CKB0 #constraints 10% 20% 30% 40% 50% 18: end for 1 10 749 219 195 118 97 19: return CKB; 2 20 559 653 666 679 487 3 30 1541 813 644 588 664 The knowledge base CKB resulting from applying Algorithm 1 4 40 1888 1541 1345 1177 1182 0 0 to the individual knowledge bases CKBus and CKBger looks like 5 50 3773 3324 3027 3171 2643 0 6 60 5376 4458 4425 3304 3056 as follows. In CKB, constraint c2us is represented in a decontextu- alized fashion since the context information is not needed. Further- 7 70 7300 6912 7362 5619 4896 0 8 80 10795 8793 7580 6821 5909 more, constraint c2ger has been deleted since it is redundant. 9 90 13365 11770 10103 8916 7831 10 100 15992 14443 14679 12417 11066 • CKB: {country(US, GER), type(combi, limo, city, suv), color(white, black), engine(1l, 1.5l, 2l), couplingdev(yes,no), 0 fuel(electro, diesel, gas, hybrid), service(15k, 20k, 25k), c1us : country = US → (fuel 6= hybrid), c2us : fuel = electro → 0 Table 4. Avg. runtime (msec) of the merged configuration knowledge couplingdev = no, c3us : country = US → (fuel = bases (CKB) measured with different knowledge base sizes (CKB0) and 0 diesel → color = black), c1ger : country = GER → shares of contextualized constraints in CKB (10-50% contextualization). 0 (fuel 6= gas), c3ger : country = GER → (fuel = diesel → type 6= city)} CKB0 #constraints 10% 20% 30% 40% 50% 1 10 244 159 203 167 274 2 20 305 230 250 362 271 4 Performance Evaluation 3 30 310 378 251 426 243 4 40 425 453 522 502 563 In this section, we discuss the results of an initial analysis we have 5 50 500 640 603 637 657 conducted to evaluate CKB-MERGE (Algorithm 1). For this anal- 6 60 881 728 899 801 698 0 ysis, we applied 10 different synthesized variability models CKB 7 70 830 778 802 888 876 0 0 0 (CKB = CKB1 ∪ CKB2) represented as constraint satisfaction 8 80 917 1054 1011 848 1030 problems [15]) that differ individually in terms of the number of con- 9 90 1017 1117 1042 960 667 straints (#constraints) and the degree of contextualization (expressed 10 100 1387 1363 1297 1297 1308 as percentages in Tables 3 and 4). In order to take into account devi- ations in time measurements, we repeated each experimental setting 10 times where in each repetition cycle the constraints in the indi- vidual (contextualized) knowledge bases CKB0 were ordered ran- domly. 5 Threats to Validity The number of consistency checks needed for decontextualiza- 0 tion is linear in terms of the number of constraints in CKB .A The main threat to (external) validity is the overall representativeness performance evaluation of CKB-MERGE with different knowledge of the knowledge bases used for evaluating the performance of CKB- base sizes and degrees of contextualized constraints in CKB is de- MERGE. The current evaluation is based on a set of synthesized picted in Table 3. In CKB-MERGE, the runtime (measured in terms knowledge bases which do not directly reflect real-world variabil- 5 of milliseconds needed by the constraint solver to find a solution) in- ity models. We want to point out that the major focus of our work is 0 creases with the number of constraints in CKB and decreases with to provide an algorithmic solution that allows semantics-preserving the number of contextualized constraints in CKB. The increase in knowledge integration which is a new approach and regarded as the 5 For the purposes of our evaluation we generated variability models rep- major contribution of our work. The application of CKB-MERGE resented as constraint satisfaction problems formulated using the CHOCO to real-world variability models, i.e., not synthesized ones, is in the constraint solver – www.choco-solver.org. focus of our future work.

11 6 Conclusions and Future Work In this paper, we have introduced an approach to the consistency- based merging of variability models represented as constraint satis- faction problems. The approach helps to build semantics-preserving knowledge bases in the sense that the solution space of the result- ing knowledge base (result of the merging process) corresponds to the union of the solution spaces of the original knowledge bases. Be- sides the preservation of the original semantics, our approach also helps to make the resulting knowledge base compact in the sense of deleting redundant constraints and not needed contextual informa- tion. The performance of our approach is shown on the basis of a first performance analysis with synthesized configuration knowledge bases. Future work will include the evaluation of our concepts with more complex knowledge bases and the development of alternative merge algorithms with the goal to further improve runtime perfor- mance.

REFERENCES [1] L. Ardissono, A. Felfernig, G. Friedrich, A. Goy, D. Jannach, G. Petrone, R. Schaefer, and M. Zanker, ‘A framework for the devel- opment of personalized, distributed web-based configuration systems’, AI Magazine, 24(3), 93–110, (2003). [2] D. Benavides, S. Segura, and A. Ruiz-Cortes, ‘Automated analysis of feature models 20 years later: A literature review’, Information Systems, 35(6), 615–636, (2010). [3] D. Benavides, P. Trinidad, and A. Ruiz-Cortes, ‘Using constraint programming to reason on feature models’, in 17th International Conference on Software Engineering and Knowledge Engineering (SEKE’2005), pp. 677–682, Taipei, Taiwan, (2005). [4] K. Czarnecki, S. Helsen, and U. Eisenecker, ‘Formalizing cardinality- based feature models and their specialization’, SoftwareProcess: Im- provement and Practice, 10(1), 7–29, (2005). [5] J. Delgrande and T. Schaub, ‘A consistency-based framework for merg- ing knowledge bases’, Journal of Applied Logic, 5(3), 459–477, (2007). [6] A. Felfernig, L. Boratto, M. Stettinger, and M. Tkalcic, Group Recom- mender Systems – An Introduction, Springer, 2018. [7] A. Felfernig, L. Hotz, C. Bagley, and J. Tiihonen, Knowledge-based Configuration: From Research to Business Cases, Morgan Kaufmann Publishers, 1st edn., 2014. [8] A. Felfernig, D. Jannach, and M. Zanker, ‘Contextual diagrams as structuring mechanisms for designing configuration knowledge bases in uml’, in 3rd International Conference on the Unified Modeling Lan- guage (UML2000), volume 1939 of Lecture Notes in Computer Sci- ence, pp. 240–254, York, UK, (2000). Springer. [9] L. Galarraga, N. Preda, and F. Suchanek, ‘Mining rules to align knowl- edge bases’, in Proceedings of the 2013 Workshop on Automated Knowledge Base Construction, pp. 43–48, San Francisco, CA, (2013). [10] K. Kang, S. Cohen, J. Hess, W. Novak, and S. Peterson, ‘Feature- oriented domain analysis feasibility study (foda)’, Technical Report, CMU/SEI-90-TR-021, (1990). [11] P. Liberatore and M. Schaerf, ‘Arbitraton (or how to merge knowl- edge bases)’, IEEE Transactions on Knowledge and Data Engineering, 10(1), 76–90, (1998). [12] R. Reiter, ‘A theory of diagnosis from first principles’, AI Journal, 23(1), 57–95, (1987). [13] S. Segura, D. Benavides, A. Ruiz-Cortes, and P. Trinidad, ‘Automated merging of feature models using graph transformations’, in Genera- tive and Transformational Techniques in Software Engineering, num- ber 5235 in Springer Lecture Notes in Computer Science, pp. 489–505, (2007). [14] M. Stumptner, ‘An overview of knowledge-based configuration’, AI Communications, 10(2), 111–125, (1997). [15] E. Tsang, Foundations of Constraint Satisfaction, Academic Press, London, 1993. [16] P. van den Broek, I. Galvao, and J. Noppen, ‘Merging feature models’, in 15th International Software Product Line Conference, pp. 83–90, Jeju Island, South Korea, (2010).

12 Conversational Recommendations Using Model-based Reasoning

Oliver A. Tazl and Alexander Perko and Franz Wotawa1

Abstract. Chatbots as conversational recommender have gained in- and finally introduce the results of an evaluation of the implementa- creasing importance over the years. The chatbot market offers a va- tion. The evaluation is based on a case study from the tourism do- riety of applications for research and industry alike. In this paper, we main, i.e., a scenario where a user wants to book a hotel in a certain discuss an implementation that supports the use of our recommen- city. The obtained results show that the proposed chatbot approach is dation algorithm during chatbot communication. The program eases applicable and beneficial for the intended purpose. Furthermore, we communication and improves the underlying recommendation flow. gained experiences about the limitations of the approach. For exam- In particular, the implementation makes use of our model-based rea- ple, it seems that entropy is not always the best measure for selecting soning approach for improving user experience during a chat, i.e., in questions to be answered, and further research is needed. cases where user configurations cause inconsistencies. The approach The main contributions of this paper can be summarized as the deals with such issues by removing inconsistencies in order to gener- follows: ate a valid recommendation. In addition to the underlying definitions, we demonstrate our implementation along use cases from the tourism 1. An implementation of an algorithm that is based on model-based domain. diagnosis and Shannon’s information entropy to solve recommen- dation problems and 2. the evaluation of the system with synthetic and real-world data 1 INTRODUCTION sets. Recommender systems aim to lead users in a helpful and individu- The remainder of this paper is organized as follows: In the next alized way to interesting or useful items drawn from a large space section we give an overview of our algorithmic approach. After- of possible options. Recommender systems may utilize knowledge- wards, we present the implementation of the algorithms and show bases for guiding the users through the whole process of finding the evaluation results in greater detail. Finally, we discuss related re- right recommendation, i.e., a recommendation that satisfies the user’s search and conclude the paper. requirements, needs, or expectations. Most recently, conversational agents like chatbots have gained importance because of the fact that they – in principle – offer a well-known and ideally more intuitive 2 FOUNDATIONS AND ALGORITHM interface for human users, i.e., either textual or speech interaction. In our previous work [17], we introduced the algorithm EntRecom, In previous work [17] we introduced the basic foundations and which utilizes model-based diagnosis and in particular the ConDiag principles behind a chatbot-based recommender system that allows algorithm [18], and on a method that applies Shannon’s information to interact with users in a smart way being capable of finding contra- entropy [23]. To be self-contained, we briefly recapitulate the under- dictions during observation and efficiently pruning the overall rec- lying definitions and EntRecom. We first formalize the inconsistent ommendation process via selecting the right questions to be asked requirements problem by exploiting the concepts of Model-Based to the user. The basic principles behind our approach rely on classi- Diagnosis (MBD) [1, 20] and constraint solving [2]. cal model-based reasoning. In case, the conversation leads to an in- The inconsistent requirements problem requires information on consistent state, e.g., caused by contradictions between user require- the item catalog (i.e., the knowledge-base of the recommendation ments and the recommendation knowledge-base, the chatbot is able system) and the current customer’s requirements. Note that the to react and to resolve this issue. For this purpose, the chatbot asks knowledge-base of the recommender may be consistent with the cus- the user which requirements to retract in order to eliminate incon- tomer’s requirements (i.e., the customer’s query) and an appropriate sistencies. In the case where the chatbot has far to many solutions number of recommendations can be offered. In this case, the recom- to be presented effectively to the user, the system makes use of an mendation system shows the recommendations to the customer and entropy-based approach for selecting those requirements or attributes no further algorithms have to be applied. Otherwise, if no solutions that have to be fixed in order to reduce the number of possible solu- to the recommendation problem are available, then the minimal set tions. When using entropy the number of steps necessary to reach to of requirements, which determined the inconsistency with the knowl- a solution can be substantially reduced. edge base, have to be identified and consequently offered to the user This paper is a direct successor of our previous work, where we as explanation for not finding any recommendation. The user can in report on an implementation of our chatbot approach. In particular, this case adapt the requirement(s) (relax it/them). Here we borrow the we discuss the implementation details, present experiences gained, idea from MBD and introduce abnormal modes for the given require- 1 Graz University of Technology, Austria, email: {oliver.tazl, perko, ments, i.e., we use Ab predicates stating whether a requirement i is wotawa}@ist.tugraz.at should be assumed valid (¬Abi) or not (Abi) in a particular context.

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

13 The Ab values for the requirements are set by the model-based di- Algorithm 1 EntRecom(KBD, REQ, n) agnosis algorithm so that the assumptions together with the require- ments and the knowledge-base are consistent. In the following, we Input: A modified knowledge base KBD, a set of customer define the inconsistent requirements problem and its solutions. requirements REQ and the maximum number of recommendations More formally, we stating the inconsistent requirements problem n as follows: Output: All recommendations S

Definition 1 (Inconsistent Requirements Problem). Given a tuple 1: Generate the constraint model CM from KBD and REQ (KB,REQ) where KB denotes the knowledge base of the recom- 2: Call CSolver(CM) to check consistency and store the result mender system, i.e., the item catalog, and REQ denotes the cus- in S tomer requirements. The Inconsistent Requirements Problem arises 3: if S = ∅ then when KB together with REQ is inconsistent. In this case we are 4: Call MI REQ(CM, |REQ|) and store the inconsistent re- interested in identifying those requirements that are responsible for quirements in IncReqs the inconsistency. 5: Call askUser(IncReqs) and store the answer in AdaptedReqs A solution or explanation to the inconsistent requirements prob- 6: CM = KB ∪ (REQ \ IncReqs ∪ AdaptedReqs) lem can be easily formalized using the analogy with the definition of 7: go to Step 2 diagnosis from Reiter [20]. We first introduce a modified representa- 8: end if tion of (KB,REQ) comprising (KBD,REQ) where KBD com- 9: while |S| > n do prises KB together with rules of the form AbR for each requirement 10: Call GetBestEntrAttr(AS ) and store the result in a R in REQ. The solution to the Inconsistent Requirements Problem 11: AS = AS \ a can now be defined using the modified representation as follows: 12: Call askUser(a) and store the answer in va 13: S = R (S, va)) Definition 2 (Inconsistent Requirements). Given a modified recom- 14: end while mendation model (KBD,REQ). A subset Γ ⊆ REQ is a valid set 15: return S of inconsistent requirements iff KBD ∪ {¬AbR|R ∈ REQ \ Γ} ∪ {AbR|R ∈ Γ} is satisfiable. 3 IMPLEMENTATION AND EVALUATION

A set of inconsistent requirements Γ is minimal iff no other set of inconsistent requirements Γ0 ⊂ Γ exists. A set of inconsistent For the user, the interaction with the recommender system within requirements Γ is minimal with respect to cardinality iff no other set the chatbot starts, when he or she formulates a query for a search of inconsistent requirements Γ0 with |Γ0| < |Γ| exists. From here within the domain. The natural language processing framework on we assume minimal cardinality sets when using the term minimal Rasa2 parses this query and passes it on to the EntRecom-algorithm. sets. The recommender algorithm then searches a previously conducted The second problem occurring during a recommendation session and preprocessed knowledge base. Depending on the satisfiability of is the availability of too large number of recommendations, which the generated constraint model, results are presented to the user. If have to be narrowed down to a reasonable number. The too many necessary the user is asked follow up questions regarding his or her recommendation problem can be solved again using ideas bor- requirements until we have results to return to the user. rowed from model-based diagnosis. In diagnosis, we have the similar problem of coming up with too many diagnoses because of too less observations known. The corresponding problem in case of recom- mendation is that we do have far too less requirements from the user. Hence, we have to ask the user about adding more information in order to reduce the number of available solutions. In model-based diagnosis Shannon’s information entropy [23] is used to come up with observations and in our case requirements that should be known in order to reduce the recommendations as fast as possible. Algorithm 1 provides recommendations in the context of chatbots making use of diagnosis and Shannon’s information entropy com- putation. EntRecom converts the available knowledge into a corre- Figure 1: A Chatbot-Based Recommender as Bridge between User sponding constraint model and checks its consistency. If the knowl- Input and Results from a Database edge is inconsistent, the algorithm tries to find a requirements that can be retracted by the user in order to get rid of the inconsistency. When the query can be answered, EntRecom exits and the user Afterwards, EntRecom searches for the best requirement to be set by can continue interacting with the Rasa-based chatbot. the user in order to reduce the number of solutions if necessary. The As mentioned before, we choose a tourism domain, searching for algorithm stops when reaching a set of recommendations that has a a hotel to be more specific. The process of searching a hotel in an it- cardinality of less than n. erative conversation helps us to see the capabilities and shortcomings With the provide algorithms a chatbot for recommendations can of the algorithm. be build that is able to deal with inconsistent requirements as well as missing requirements in a more or less straightforward way making use of previously invented algorithms. 2 see https://rasa.com/

14 3.1 Framework and Data Preprocessing every attribute in the domain and the second one being string trans- lation for every value, regardless of the specific class of an attribute. For our tests, we chose the community-curated data from Open Street In our implementation, every attribute and value is translated to a Maps (OSM). We use the python bindings for Microsofts’ Z3 frame- Z3 string. While this may result in slower runtimes, we ensure flex- work as a constraint solver. For language processing and basic inter- ibility, as data types may vary across attributes. Usually, the user is action with the user, we utilize the Rasa framework. Rasa is used for asked to select a region of interest. For our tests, we use a data set natural language understanding by extrating the users’ intend and re- of a specific size. This is a very critical and, depending of the size quirements but can also respond immediatly with an answer (without of the test set, time consuming step. Therefore, we try to do it only calling EntRecom) when it detects the users’ intend to chitchat. once when the user initially specifies an area. This Z3 constraint In a first step, we query the OSM-API for hotels in a predefined model, consisting of OR-connected uniform clauses of Z3 strings is region. The size of the region is a major factor for performance. Be- our main knwoledge base within EntRecom and shall be refered to cause of this, it is reasonable to let the user select a region on initial- as kb and subsets therof as S in the remainder of this paper. ization. The first interaction the user has with our chatbot implementa- tion is handled via the trained natural language understanding (NLU) model within Rasa. If the user’s intent to search within the domain ”hotels”, our recommender is called via a webhook, and the param- eters are passed over. This call to our internal API is the only inter- action between EntRecom and the chatbot framework, Rasa, in our case, which leads to very low coupling and a high degree of flexibil- ity. The parameters represent the NLUs interpretation of the users’ query and give us our initial set of requirements.

3.2 Recommender Algorithm In this section, we are describing the implementation of the previ- ously introduced algorithm EntRecom. Before the algorithm ready to use, we have to prepare our data set, Figure 2: Data from Open Street Maps generate the constraint model and interpret the user’s intent. Then we enter the EntRecom-implementation. As described before, if the query is satisfiable for our knowledge base and the given maximum To conduct our internal knowledge base, we process the exported number of results, we return our recommendations and exit EntRe- data set in the following way: As some attributes do not add human- com.A maximum of n hotels is presented to the user, we retrieved readable information or tend to mislead users, we have to filter them from the knowledge base, with n representing a preselected maxi- out. This is especially necessary because some classes of attributes mum number of results. have a severe impact on later steps in our recommender system. For Though, as stated in [17], we are confronted with two potential example, categories like ”fixme” (annotation from an OSM user) problems at this point. Given a knowledge base kb, a maximum num- would give the user no advantage in a real-world scenario and in this ber of results n and a user-defined set of requirements REQ: case, would lead to unwanted recommendations. We use a whitelist- filter for the attributes of every entry in the data set. Furthermore, not • We could get too many results to present in a meaningful way. In every entry in the OSM data has the same fields, why we maintain a this case, the function GetBestEntrAttr is called. list of all attributes in a data set. If an entry does not include a cer- • We could get no results at all. In this scenario, we call the function tain attribute, we add it with a value of False. This leaves us with a MiREQ. ”normalized” data set. 3.2.1 GetBestEntrAttr This part of the algorithm is called, when the query is satisfiable, but the result does not lie within [n]. Therefore, we have to add further constraints to our model. Because we want to occupy as little of the user’s time as possible, we have to efficiently select additional con- straints. This is done by choosing a category out of the domain which Figure 3: Preprocessed Data Set best splits the current subset S of the data set. The criterion for our selection is Shannon’s information entropy [22]: After our preprocessing step, every entry in the dictionary has X H(X) = − P (xi)log(P (xi)) the same number of attributes we add uniform clauses to our con- i straint model. An exemplary normalized clause looks like this: (amenity = ”none” ∧ addr : city = ”Graz” ∧ name = To apply entropy as a measure, we have to restructure the ”P arkhotelGraz” ∧ cuisine = ”none” ∧ ... ∧ smoking = data slightly. AS represents this restructured version of S, which ”isolated” ∧ wheelchair = ”limited” ∧ swimming − pool = maps every attribute in the domain to all values of its occur- ”none” ∧ stars = ”4” ∧ tourism = ”hotel”). Regarding data rences in S. This is realized with python dictionaries of the form: types, we choose between two different approaches, when creating ”attribute1” : [”value1”, ”value2”], ”attribute2” : [”value1”], .... the Z3 constraint model. The first one being data type selection for After computing the number of occurrences of every value for a

15 (a) SAT (b) MiREQ (c) GetBestEntrAttr

Figure 4: User Interface specific attribute, we now calculate the entropy, for every attribute. consisting of our knowledge base and requirements with varying val- The attribute with the highest entropy splits the data set S most ues for AB. Given the cardinality of Γ, we iterate over all possible effectively. As mentioned above, some attributes may lead to distributions of AB with |REQ| − |Γ| considered requirements. Be- unwanted recommendations. One reason for this is the appearance of cause we want to preserve as much of the user’s initial query as pos- seemingly random values when looked at them lacking context. This sible, we want to find a minimal set of inconsistent constraints. As could be attributes internally used within OSM, for instance. Fields proposed in [17], to find such a minimal set of inconsistencies, we like ”source” often have seemingly random values like ”survey” or have to obtain a constraint model that is satisfiable for the smallest ”Kleine Zeitung”. The same pseudo-randomness can be observed cardinality of Γ possible. For this, we repeat the process of check- with attributes like ”housenumber” which can take an arbitrary ing on combinatoric variations of subsets, with increasing numbers integer value. Those values appear in many data points and, when of assumed inconsistencies, starting with one. This we do until we observed isolated, do not contribute any information with regard reach |REQ|, in which case there are no consistent requirements. to splitting the data set. When computing the information entropy When a satisfiable constraint model is found, we are able to retrieve based on attributes lacking spatial ordering or clustering properties, all inconsistent requirements in the form of all unconsidered require- we are not dividing the data set strategically but randomly. This is ments. With this subset of REQ, we return from MiREQ. After the why we chose to exclude them from our knowledge base beforehand. MiREQ-function returns, we ask the user for his or her preference If several attributes occur with equal entropies in the data set, we for dropping one of his or her previously defined requirements within can randomly select one of these categories, as all of them split the the minimal set of inconsistencies. While the current implementa- data equally well. In the last step, the user is asked for selecting one tions assure us to find minimal sets of inconsistent requirements, in value for this category. The user’s selection is added to the set of future implementations, we hope to be able to make use of the unsat- requirements and EntRecom gets called again. core functionality of Z3 for this task. This is expected to significantly improve performance.

3.2.2 MiREQ 3.3 Evaluation This part of the algorithm is called, when the query is not satisfiable for our knowledge base KB with the user-defined set of require- We developed our tests concentrating on the algorithm itself and the ments REQ. Following [17], we state the Inconsistent Requirements data structures needed for execution. The data preprocessing was not Problem. As a counter measurement to the Inconsistent Requirement in the focus of this test. For our tests with synthetic test data this is Problem, we have to soften the query to get results. To achieve this, especially true, as they use a adapted version of EntRecom without we have to find inconsistencies in REQ, being a subset Γ thereof. ∀ user interaction, as depicted in Figure 5. The goal of our experiments inconsistent subsets Γ, KB ∪{¬AbR | R ∈ REQ\Γ}∪{AbR|R ∈ is to show potentials for optimizations of the implementation, as well Γ} is satisfiable, with AB being a Boolean variable for selecting as proofing the versatility of the algorithm proposed in [17]. and deselecting a requirement R to be considered [18] This, we im- In our tests, we used both, synthetic and real-world data. For plemented by checking on models with combinatoric variations of benchmarking and basic experiments, we mainly used synthesized subsets of REQ. This means, that we evaluate a constraint model data, while real-world data, specifically from Open Street Maps is

16 h) Ranges (e.g. distance-to-the-center, price-category)

ad a) Boolean values appear often in real-world data and are easy to process. This class represents attributes like internet-access or payment-credit-card in the domain tourism and esp or abs in the automotive domain. ad b) Arbitrary integer values appear often as counter variables like vis- itors, likes or 5-star-reviews ad c) Floating-point numbers occur within both domains, tourism, and cars in various forms. This class covers all attributes in the context Figure 5: Focus of Tests with Synthetic Data of distance, like distance-to-the-center for a hotel or mileage for a car. It also stands for location attributes given in coordinates important to ensure the flexibility and robustness of the implementa- (longitude, latitude) and mean values like user-rating. tion in real-world scenarios. ad d) Strings are very versatile and therefore used in many ways in dif- ferent sources. In many cases, strings contain informational value 3.3.1 Real-World Test Data beyond what a number may cover. But often strings are used in places, where the informative content is not higher than number- As described before, we use data from Open Street Maps for our valued attributes and could be represented with Boolean values or tests. First, we want to observe, how the algorithm copes on real- numbers instead. This is true for several classes of attributes like world data and which degree of data preprocessing and filtering of stars with values of ”4-star”. the data is necessary. Furthermore, we developed the rules for gener- ad e) The name attribute is always a string and does not have to but is ating our synthetic test data based on the exported data from OSM. likely to include a domain-specific term in its value. For hotels, For this part of our evaluation, we empirically tested the algorithm this would be ”Hotel” or a synonym thereof. Because of these re- for usable results as well as automated tests for exhaustive testing on curring terms and the omnipresence over all domain-specific data the data. sources, we treat this attribute separately from the more general string class. When generating data sets, we introduce these terms 3.3.2 Synthetic Test Data into our samples regularly. ad f) Fields like city or manufacturer contain frequently reappearing Our modus operandi for generating and conducting tests with syn- values, which makes it special regarding information entropy and thetic data can be described as follows: therefore interesting to us. ad g) Stars of a hotel can be represented with 1 to 5, which makes a 1.) Definition of attribute classes and generation of data sets reappearance in this category very probable. 2.) Definition of requirement sets and classifying them within a test- ad h) As we want to simplify the selection process, we reduce the date ing matrix to the year. This results in an integer value often constrained by an 3.) Performing speed tests following the testing matrix upper boundary being the current year and a case dependent lower To evaluate our results for the performance test, we order tests boundary. Depending on the specific attribute, these values may within a three-dimensional testing matrix. The first axis of this ma- recur frequently. trix represents the size of the test set. The second axis represents the ad i) Ranges are special because they consist not only of one, but two number of attributes and the last axis represents the number of al- boundary values. For simplification reasons, we categorize ranges lowed results n. The value at every position in the matrix represents within the relative classes: low, med and high. Of course, these the called subfunction within our recommender algorithm. range classes are also very likely to reoccur. We randomly generated data sets of different sizes and with dif- Our synthetic data is randomly generated with respect to those ferent numbers of attributes for our performance tests. Because data classes of attributes and with varying occurrences of the different within the domains ”hotels” and ”cars”, have several distinct proper- types. ties, we set up our test data following certain rules. To achieve this, we categorized the attributes dependent on the data type of their cor- responding values. Furthermore, we added complementary classes 3.4 Results based on certain characteristics of the domain. The results are splited in two parts. First, we discuss the results of This results in the following basic classes of attributes based on the synthetic data, followed by the proof of concept results with real- their data type: world data. a) Boolean values Using the synthetic data, we define sets of requirements to test b) Integer values the algorithm on. These sets belong to one of the following classes c) Float values relative to the knowledge base e.g. the test data and the given n, they d) String values are tested with:

Additionally, we added the following supplementary classes: i) Satisfiable with n or less results. This leads to a direct return from the EntRecom-algorithm, presenting us the results S the con- e) Often reappearing parts in string values (e.g. names) straint solver found. f) Frequently recurring string values (e.g. city, manufacturer) ii) Satisfiable with more than n results. In this case, we have to call g) Restricted numbers (e.g. star-rating, number of seats) GetBestEntrAttr, which calculates entropies for all attributes. h) Dates (e.g. registration date) Another interaction with the algorithm is needed, as we have to

17 select a value for the attribute with the highest informational con- 4 RELATED WORK tent. iii) Not satisfiable. In this case, we have to call MiREQ which iter- The application of model-based reasoning, and especially model- atively chooses subsets of REQ until it finds the largest satisfiable based diagnosis, in the field of recommender systems is not novel. subset. Its complement is the set of inconsistent requirements δs. For example, papers like [4, 11, 19] compute the minimal sets of Again, another interaction through AskUser is necessary. Now faulty requirements. These requirements should be changed in or- we have to select a requirement out of δs we want to keep in REQ der to find a solution. In these papers, the authors rely on the ex- the other constraints are dropped. istence of minimal conflict sets computing the diagnosis for incon- sistent requirements. Felfernig et al. [4] present an algorithm that To classify the requirement sets for every test set according to the calculates personalized repairs for inconsistent requirements. The al- three classes from above, we use an adapted version of EntRecom, gorithm combines concepts of MBD with a collaborative problem which does not return results or perform any recursive calls. The pur- solving approach to improve the quality of repairs in terms of pre- pose of this classification is, to be able to identify test results with diction accuracy. In [19], the concept of representative explanations regard to the sub-functions called within EntRecom. The class of is introduced. This concept follows the idea of generating diversity REQ - either i, ii, or iii - for a certain test configuration represents in alternative diagnoses informally, constraints that occur in con- the value of the three-dimensional testing matrix. flicts should as well be included in diagnoses presented to the user. After classification, we perform our tests on the generated data. Jannach [11] proposes to determine preferred conflicts ”on demand” A typical test-scenario starts with the users first interaction with the and use a general-purpose and fast conflict detection algorithm for chatbot environment, which, in turn, results in setting up the knowl- this task, instead of computing all minimal conflicts within the user edge base. Regularly the user is asked to select a region of interest. requirements in advance. For our tests, we use a data set of a specific size. Then a set of re- Papers that deal with the integration of diagnosis and constraint quirements and a maximum for the expected results are chosen. We solving are [3] and [24, 25], who proposed a diagnosis algorithm for now perform a test for every class in the testing matrix and get results tree-structured models. The approach is generally applicable due to in the form of execution times. the fact that all general constraint models can be converted into an For a fixed n of 5 and tests performed with one to five re- equivalent tree-structured model using decomposition methods, e.g., quirements in the sets, we plot the function calls of MiREQ and hyper tree decomposition [7, 8]. [26] provides more details regarding GetBestEntrAttr on increasingly sized test sets. In case, we ob- the coupling of decomposition methods and the diagnosis algorithms tain a result for our query which lies within n, our constraint solver for tree-structured models. In addition to that, [21] generalized the CSolver is the only function call made. algorithms of [3] and [24]. In [15] the authors also propose the use of constraints for diagnosis where conflicts are used to drive the com- putation. In [6], which is maybe the earliest work that describes the use of constraints for diagnosis, the authors introduce the use of con- straints for computing conflicts under the correctness assumptions. For this purpose they developed the concept of constraint propaga- tion. Despite of the fact that all of these algorithms use constraints for modeling, they mainly focus on the integration of constraint solving for conflict generation, which is different to our approach. For pre- senting recommendation tasks as constraint satisfaction problem, we refer to [12]. Human-chatbot communication represents a broad domain. It cov- ers technical aspects as well as psychological and human perspec- tives. Contributions like [9, 30] show several ways of implementing Figure 6: Classification Based on Function Calls within EntRecom chatbots in different domains. Wallace [30] demonstrates an artificial intelligence robot based on a natural language interface (A.L.I.C.E.) that extends ELIZA [31], which is based on an experiment of Alan As you see, the function calls of MiREQ decrease, while the calls M. Turing in 1950 [29]. This work describes how to create a robot of GetBestEntrAttr increase for larger test sets. While this basic personality using AIML, an artificial intelligence modelling lan- assumption holds for real data, the impact is not as drastic, as users guage, to pretend intelligence and self-awareness. tend to perform queries with less than five requirements initially. Sun et al. [27] introduced a conversational recommendation sys- The algorithm was also used with real-world data. Therefore, we tem based on unsupervised learning techniques. The bot was trained inserted the data from OSM into the knowledge base. We interact by successful order conversations between user and real human with the textual chat-like web interface in the intended way and get agents. the correct results from the algorithm. The algorithm returned a result Papers like [5, 10, 13, 32] address the topics user acceptance and set within a few iterations and these results fit to the user specifica- experience. In [32] a pre-study shows that users infer the authentic- tion. ity of a chat agent by two different categories of cues: agent-related These experiments show also that there is a problem of relevance cues and conversational-related cues. To get an optimal conversa- of attributes. The attributes, which have a high entropy to reduce the tional result, the bot should provide a human-like interaction. Ques- result set, are not nessesarily relevant for users. Our tests show that tions of conversational UX design raised by [5] and [16] demonstrate i.e. the attribute wheelchair, which is part of accessibility, has a high the need to rethink user interaction at all. entropy but will not be interesting for a large amount of users. This The topic of recommender systems with conversational interfaces issue has to be addressed in an upcoming version of the algorithm. is shown in [14], where an adaptive recommendation strategy was

18 shown based on reinforcement learning methods. In the paper [28], [15] Jakob Mauss and Martin Sachenbacher, ‘Conflict-driven diagnosis us- the authors proposed a deep reinforcment learning framework to ing relational aggregations’, in Working Papers of the 10th Interna- tional Workshop on Principles of Diagnosis (DX-99) build personalized conversational recommendation agents. In this , Loch Awe, Scot- land, (1999). work, a recommendation model trained from conversational sessions [16] R.J. Moore, R. Arar, G.-J. Ren, and M.H. Szymanski, ‘Conversational and rankings is also presented. ux design’, volume Part F127655, pp. 492–497, (2017). [17] Iulia Nica, Oliver A. Tazl, and Franz Wotawa, ‘Chatbot-based tourist recommendations using model-based reasoning’, in ConfWS, (2018). 5 CONCLUSION AND FUTURE WORK [18] Iulia Nica and Franz Wotawa, ‘Condiag - computing minimal diagnoses using a constraint solver’, (2012). International Workshop on Principles In this paper, we showed an implementation and its evaluation of En- of Diagnosis ; Conference date: 31-07-2012 Through 03-08-2012. tRecomm, an algorithm using model-based diagnosis and Shanon’s [19] Barry O’Sullivan, Alexandre Papadopoulos, Boi Faltings, and Pearl information entropy. In our tests, we used both, synthetic and real- Pu, ‘Representative explanations for over-constrained problems’, 1, (07 world data. For benchmarking and basic experiments, we mainly 2007). [20] Raymond Reiter, ‘A theory of diagnosis from first principles’, Artificial used synthesized data, while real-world data, specifically from Open Intelligence, 32(1), 57–95, (1987). Street Maps is important to ensure the flexibility and robustness of [21] Martin Sachenbacher and Brian C. Williams, ‘Diagnosis as semiring- the implementation in real-world scenarios. We also showed the per- based constraint optimization’, in European Conference on Artificial formance for different data sets and also revealed open issues, like Intelligence, pp. 873–877, (2004). the relevance of chosen attributes, which is already a starting point [22] C. E. Shannon, ‘A Mathematical Theory of Communication’, The Bell System Technical Journal, 27(3), 379–423, (1948). for future work. Another important step will be a user study to eva- [23] C. E. Shannon, ‘A mathematical theory of communication’, Bell System lute the acceptance of the algorithm. Technical Journal, 27, 379–623, (1948). [24] Markus Stumptner and Franz Wotawa, ‘Diagnosing Tree-Structured Systems’, in Proceedings 15th International Joint Conf. on Artificial ACKNOWLEDGEMENTS Intelligence, Nagoya, Japan, (1997). [25] Markus Stumptner and Franz Wotawa, ‘Diagnosing tree-structured sys- Research presented in this paper was carried out as part of the AS- tems’, Artificial Intelligence, 127(1), 1–29, (2001). IT-IC project that is co-financed by the Cooperation Programme In- [26] Markus Stumptner and Franz Wotawa, ‘Coupling CSP decomposition terreg V-A Slovenia-Austria 2014-2020, European Union, European methods and diagnosis algorithms for tree-structured systems’, in Pro- th Regional Development Fund. ceedings of the 18 International Joint Conference on Artificial Intel- ligence (IJCAI-03), pp. 388–393, Acapulco, Mexico, (2003). [27] Y. Sun, Y. Zhang, Y. Chen, and R. Jin, ‘Conversational recommendation REFERENCES system with unsupervised learning’, pp. 397–398, (2016). [28] Yueming Sun and Yi Zhang, ‘Conversational recommender system’, [1] Johan de Kleer and Brian C. Williams, ‘Diagnosing multiple faults’, in The 41st International ACM SIGIR Conference on Research & Artificial Intelligence, 32(1), 97–130, (1987). Development in Information Retrieval, SIGIR ’18, pp. 235–244, New [2] Rina Dechter, Constraint Processing, Morgan Kaufmann, 2003. York, NY, USA, (2018). ACM. [3] Yousri El Fattah and Rina Dechter, ‘Diagnosing tree-decomposable cir- [29] Alan M. Turing, Computing Machinery and Intelligence, 23–65, cuits’, in Proceedings 14th International Joint Conf. on Artificial In- Springer Netherlands, Dordrecht, 2009. telligence, pp. 1742 – 1748, (1995). [30] R.S. Wallace, The anatomy of A.L.I.C.E., 2009. [4] Alexander Felfernig, Gerhard Friedrich, Monika Schubert, Monika [31] J. Weizenbaum, ‘Eliza-a computer program for the study of natural lan- Mandl, Markus Mairitsch, and Erich Teppan, ‘Plausible repairs for in- guage communication between man and machine’, Communications of consistent requirements.’, in IJCAI International Joint Conference on the ACM, 9(1), 36–45, (1966). Artificial Intelligence, pp. 791–796, (01 2009). [32] N.V. Wunderlich¨ and S. Paluch, ‘A nice and friendly chat with a bot: [5] Asbjørn Følstad and Petter Bae Brandtzæg, ‘Chatbots and the new User perceptions of ai-based service agents’, (2018). world of hci’, interactions, 24(4), 38–42, (June 2017). [6] Hector Geffner and Judea Pearl, ‘An Improved Constraint-Propagation Algorithm for Diagnosis’, in Proceedings 10th International Joint Conf. on Artificial Intelligence, pp. 1105–1111, (1987). [7] Georg Gottlob, Nicola Leone, and Francesco Scarcello, ‘Hypertree De- composition and Tractable Queries’, in Proc. 18th ACM SIGACT SIG- MOD SIGART Symposium on Principles of Database Systems (PODS- 99), pp. 21–32, Philadelphia, PA, (1999). [8] Georg Gottlob, Nicola Leone, and Francesco Scarcello, ‘A compari- son of structural CSP decomposition methods’, Artificial Intelligence, 124(2), 243–282, (December 2000). [9] B. Graf, M. Kruger,¨ F. Muller,¨ A. Ruhland, and A. Zech, ‘Nombot - sim- plify food tracking’, volume 30-November-2015, pp. 360–363, (2015). [10] Jennifer Hill, W. Randolph Ford, and Ingrid G. Farreras, ‘Real con- versations with artificial intelligence: A comparison between human- human online conversations and human-chatbot conversations’, Com- puters in Human Behavior, 49, 245 – 250, (2015). [11] Dietmar Jannach, ‘Finding preferred query relaxations in content-based recommenders’, IEEE Intelligent Systems, 109, 81–97, (04 2008). [12] Dietmar Jannach, Markus Zanker, and Matthias Fuchs, ‘Constraint- based recommendation in tourism: A multiperspective case study’, Journal of IT and Tourism, 11, 139–155, (2009). [13] A. Khanna, M. Jain, T. Kumar, D. Singh, B. Pandey, and V. Jha, ‘Anatomy and utilities of an artificial intelligence conversational en- tity’, pp. 594–597, (2016). [14] Tariq Mahmood, Francesco Ricci, and Adriano Venturini, ‘Learning adaptive recommendation strategies for online travel planning’, Infor- mation and Communication Technologies in Tourism 2009, 149–160, (2009).

19 Decision Biases in Preference Acquisition

Alexander Felfernig1 and Martin Stettinger1 and Ralph Samer1

Abstract. Decision support systems in many cases are based on in a software project. A structured questionnaire with A/B testing user interfaces used to collect preferences and requirements of users. was used to analyze the decision behavior of students. The average For example, configurators in the automotive domain ask users to time needed to complete the questionnaire was 4 minutes. In order to provide preference information regarding the car color and car en- simulate decision scenarios, scenario descriptions where integrated gine. Stakeholders in release planning scenarios provide feedback on in questions where needed. In the following, we provide an overview software requirements in terms of importance evaluations of differ- of the study results. ent interest dimensions. In such scenarios, decision biases can trigger Framing. The way a decision alternative is presented can influ- situations where users take suboptimal decisions. In this paper, we ence a user’s decision behavior. One example of framing is that users provide a short overview of example decision biases and report the prefer meat that is characterized with ”80 percent lean” over meat results of an empirical study that show the existence of such biases that is ”20 percent fat”. Another example is the framing of prices: in the context of release planning (configuration) decision making. when comparing the offers of company x and y, the offer ”pellets fore 24.50 per 100kg with a discount of 2.50 if the customer pays 1 Introduction with cash” from x appears to be the more attractive one compared to the offer ”pellets for 22.00 per 100kg with a 2.50 surcharge if When interacting with decision support systems such as recom- the customer pays with credit card” from company y. The increased mender systems [3] or configuration systems [5], users do not know attractiveness of x’s offer can be explained by prospect theory [6] their preferences beforehand but construct and frequently adapt these which points out that alternatives are evaluated with regard to both, within the scope of a decision process [2]. In most of the cases, users gains and losses, and losses (in our example, fat meat and surcharge) also do not try to optimize decisions but apply heuristics to take a de- have a higher negative impact on a decision compared to equal gains. cision. For example, ”elimination by aspects” (EBA) is based on the Framing: Study Results. In our study, we described a scenario simple idea of an attribute-wise comparison of different decision al- where stakeholders had to estimate the acceptability of a given prob- ternatives where only those alternatives remain in a consideration set ability of successful project completion. In one setting, the probabil- which satisfy a pre-defined set of preferences. This strategy can lead ity was specified as ”probability of success”, in the other setting, the to suboptimal outcomes since alternatives that could become more probability was expressed as ”failure probability”. In the first setting, preferable in the future have already been eliminated in the past. study participants evaluated the acceptability on an average with 86 Recently, decision support for groups became increasingly popu- out of 100 points (1: not acceptable, 100: definitely acceptable). In lar due to the fact that in many scenarios user groups are engaged the second setting, study participants evaluated the acceptability on in decision processes [3]. Examples thereof are (1) release planning an average with 77 points (out of 100). where a group of stakeholders is in charge of prioritizing a given set Anchoring. It is known that preference visibility has various neg- of requirements and (2) open innovation scenarios where customer ative impacts on the quality of a group decision. An example thereof groups are contributing when deciding about the features of a new is anchoring where indicated reference values (e.g., item evaluations product. In many cases, the underlying decision scenario can be re- of a group member) can have an influence on the evaluation behavior garded as a group decision problem [3]. In this paper, we provide of other group members. An example thereof is the visualization of an overview of example decision biases that can occur in preference the average rating given to an item by a community: increasing the acquisition scenarios for individual users as well as groups. In this shown average item rating results in a increased rating by individual context, we discuss the results of an empirical study conducted in the community members [1]. context of preference acquisition for release planning. Anchoring: Study Results. In our study, the participants were asked The remainder of this paper is organized as follows. In Section 2, whether it is important for them to have as soon as possible knowl- we exemplify decision biases on the basis of examples from the do- edge about the preferences (which requirements should be imple- main of software requirements engineering. In this context, we dis- mented when?) of other stakeholders. Nearly 70 percent of the study cuss the results of a related empirical study. In Section 3, we discuss participants agreed on the fact that the mentioned preference visibil- issues for future work. We conclude this paper with Section 4. ity is important. These persons are vulnerable to limited information exchange which can result in suboptimal decisions [3]. 2 Decision Biases in Preference Acquisition Decoy Effects. Decisions are taken depending on the context in The major goal of our study was to analyze biases in decision sce- which the alternatives are presented. It can be the case that com- narios. The study focused on an analysis of the decision behavior of pletely inferior decision alternatives added to a set of alternatives can computer science students (N=222) working in groups of 6-8 persons change the selection behavior of users. Such alternatives are denoted as decoy items since they manage to draw the attention of users to- 1 Graz University of Technology, Austria, email: {alexander.felfernig, mar- wards specific alternatives. An example of a decoy effect is asymmet- tin.stettinger, ralph.samer}@ist.tugraz.at

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

20 ric dominance which denotes a situation where a decoy alternative is 3 Future Work dominated by an item T in all dimensions. Dominance is evaluated in terms of a pairwise comparison of attribute values characterizing There are a couple of issues that are within the scope of our future the alternatives. An example of asymmetric dominance is shown in research. First, the majority of researchers still focuses on the identi- Table 1. Alternative c can be regarded as a decoy item since it is out- fication of new biases and the analysis of biases in specific decision performed by alternative a in both dimensions (higher project returns scenarios. A major goal of our ongoing and future work is to focus on and lower project efforts) and thus makes alternative a even more at- approaches to automatically identify potential sources of suboptimal tractive compared to alternative b. decisions and to adapt the underlying decision support. For example, decoy effects can be predicted on the basis of a formal model [4] release project returns project efforts - our focus for future work in this context is to figure out interac- a 30.000 15.000 tions between different decoy effects and to find ways to counteract b 50.000 35.000 such biases. Second, we will investigate how explanations can help to c 28.000 16.000 counteract biases and what kind of explanations are useful in which Table 1. An example of an asymmetric dominance effect. context. For example, in release planning, stakeholders could be in- formed about the fact that some of the candidate requirements should Decoy Effects: Study Results. The study participants were asked be analyzed in more detail. Third, we will extent the scope of our user to select one out of two alternative software release plans (charac- studies to industrial settings. terized by the corresponding estimated returns and efforts). Release alternative c is completely dominated by release alternative a which 4 Conclusions has been selected in 86 percent of the cases (only 9 percent of the participants selected alternative b). In this paper, we discussed the results of a user study related to the ex- Table 2 includes a variant of the previous setting where alternative istence of decision biases in preference acquisition. The results were c is arranged near to alternative b. Compared to the setting shown in discussed on the basis of an empirical study that was conducted with Table 1, the share of participants selected this alternative was only 77 computer science students within the scope of a software engineer- percent wheres 22 percent of the participants selected alternative b. ing course. The outcomes of this study clearly indicate the existence Consequently, the inclusion of inferior alternatives can trigger a shift of decision biases and suboptimal decision practices that can lead to in the selection behavior of stakeholders. One way to counteract such suboptimal outcomes in group decisions. Our future work will in- situations is to point out inferior alternatives or to simply delete these clude a.o. an analysis to which extent explanations can help to coun- from the set of available options. teract decision biases. Furthermore, we will extend the scope of our user studies to industrial scenarios. release project returns project efforts a 30.000 15.000 b 50.000 35.000 ACKNOWLEDGEMENTS c 52.000 40.000 The work presented in this paper has been conducted within the Table 2. Another example of an asymmetric dominance effect. scope of the Horizon 2020 project OpenReq (openreq.eu).

Decision Strategies of Study Participants. In addition to the above mentioned biases, the study participants were asked a couple of ques- REFERENCES tions regarding their practices in group decision making. First, early [1] G. Adomavicius, J. Bockstedt, S. Curley, and J. Zhang, ‘Recom- knowledge about the preferences of other stakeholders was consid- mender systems,consumer preferences, and anchoring effects’, in De- ered as a positive element that helps to improve the quality of re- cisions@RecSys11, pp. 35–42, Chicago, IL, USA, (2011). [2] A. Felfernig, ‘Biases in decision making’, in Proceedings of the Interna- quirements prioritization (84% of the study participants supported tional Workshop on Decision Making and Recommender Systems 2014, this statement). However, as indicated in the literature, early knowl- number 1278 in CEUR Proceedings, pp. 32–37, Bolzano, Italy, (2014). edge about the preferences of other stakeholders can have a negative [3] A. Felfernig, L. Boratto, M. Stettinger, and M. Tkalcic, Group Recom- impact on decision quality since focusing on preferences triggers less mender Systems – An Introduction, Springer, 2018. [4] A. Felfernig, B. Gula, G. Leitner, M. Maier, R. Melcher, S. Schippel, and efforts related to the exchange of decision-relevant information [7]. E. Teppan, ‘A dominance model for the calculation of decoy products Second, participants regarded consensus as a positive aspect at the in recommendation environments’, in AISB Symposium on Persuasive beginning of a decision process (80% support for this statement), i.e., Technologies, pp. 43–50, Aberdeen, Scotland, (2008). consensus at the beginning is regarded as a precondition for high- [5] A. Felfernig, L. Hotz, C. Bagley, and J. Tiihonen, Knowledge-based Con- quality prioritization. However, the contrary is the case: consensus figuration: From Research to Business Cases, Morgan Kaufmann Pub- lishers, 1st edn., 2014. at the very beginning contributes to the avoidance of knowledge in- [6] D. Kahneman and A. Tversky, ‘Prospect theory: An analysis of decision terchange between stakeholders [8]. Third, study participants were under risk’, Econometrica, 47(2), 263–291, (1979). asked regarding their opinion on the impact of preference visibil- [7] A. Mojzisch and S. Schulz-Hardt, ‘Knowing others’ preferences de- ity on the probability of decision manipulation. In this context, the grades the quality of group decisions’, Journal of Personality and Social Psychology, 98(5), 794–808, (2010). majority of study participants (64% support) agreed that preference [8] G. Ninaus, A. Felfernig, and F. Reinfrank, ‘Anonymous preference elic- visibility increases the probability of manipulation. However, 36% itation for requirements prioritization’, in ISMIS’12, volume 7661 of still think that this is not the case. LNCS, pp. 349–356, Macau, China, (2012). Summarizing, biases in preference acquisition exist and can have a negative impact on the outcome of the decision process. As a result of our user study, it could be observed that study participants (in our case Computer Science students) were often not aware of this and thus vulnerable to such biases.

21 Enrichment of Geometric CAD Models for Service Configuration

Daniel Kloock-Schreiber 1 and Lukas Domarkas and Paul Christoph Gembarski and Roland Lachmayer

Abstract. The boundaries between products and services are van- like choice navigation and solution space modelling [22]. In order to ishing and offers such as hybrid product service bundles, are becom- deal with the upcoming complexity and to allow co-creation between ing increasingly important. These solutions are referred to as Product PSS-supplier and consumer, the application of Knowledge-based- Service Systems (PSS), which address individual customer needs as Engineering (KBE) and the implementation of reasoning mecha- problem-oriented solutions. nisms into product models is a promising approach [7, 12]. In order to enable the configuration of such systems as well as the In the area of Mass Customization (MC), solution spaces and prod- possibility of planning and supporting services on the basis of a uct configurators for physical products have already been described, holistic model, a data model that contains both product and service furthermore there are already approaches to service configuration. information as well as their dependencies, is required. For this pur- For example, there are papers dealing with the bidding process and pose, existing CAD models must be extended with further informa- configuration ([10]) and the effects to assemble/make-to-order up to tion. These are e.g. maintenance intervals of parts, costs of the parts, engineer-to-order situations ([31]). In this article, however, the focus or also index numbers for the complexity of installation. This pa- is on services that occur at a later point in time, the service is regarded per shows how to enrich a CAD model by integrating information as a component of the usage phase of products (e.g. maintenance and into it and creating an interface with Excel. Thereby, the model can repair as well as documentation of existing product versions). be used by different actors in the PSS for configuration and devel- With the Service Explorer, Sakao provides a computer-aided service opment, service planning and support of service technicians. Finally modelling tool based on a provider-consumer system. The main point the approach for enriching a CAD model is implemented using the of this approach is to change the state of the receiver. In the system, example of an engine test bench. the requirements and condition of a buyer are first modeled and trans- formation rules are designed based on these[21]. But without effect 1 INTRODUCTION or direct dependence to the physical product model. In the PSS literature rule-based and case-based configurators can be In mechanical and plant engineering it is becoming increasingly diffi- found (e.g. in the work of Laurischkat [16]), but a model-based con- cult for a company to distinguish the offering from competitors only figuration for PSS is missing [28]. For such a configuration a para- by technical product features [14]. A development from the recent metric model is needed that represents product and service parts of a years is extending and strengthening the (after-sales) service activi- PSS and also documents all their dependencies. Using a rule-based ties. Since service has not to be seen as an add-on in order to lever- or case-based configurator without a parametric model leads to a age its full economic potential, a joint development of product and very high effort in the creation, or to the fact that the configurators service is beneficial. In the scientific literature, this is introduced only operate with a small data base and therefore cannot use their and discussed under the term of ”Product-Service Systems” (PSS) strengths or only use them to a limited extent. [18, 33]. Literature describes PSS as solutions that meet individual As mentioned by Wagner [35], it is an important prerequisite for the customer needs, regardless of whether the value proposition and rev- development of PSS to adequately combine product and service parts enue are primarily achieved through the product or service compo- with all their dependencies [35]. In the area of MC and configuration nents [30, 32]. PSS may be regarded as customer specific problem existing domain models which are suitable for the development of solution. As such, relations between product and service components solution spaces for products. must be taken into account during development. In order to reduce Important factors for the design of PSS is the coequal development of development and adaptation costs, the configuration of PSS is a pos- product and service and the addressing of individual customers and sible way [1, 3, 15, 17]. their needs. To realize a coequal product and service development as well as the configuration of the system for service planning and sup- port, an enriched CAD model is a promising approach. Such a CAD 1.1 Motivation and Aim model can be a start for a constraint-based model which includes the Due to the conceptual similarity of the enterprise types PSS and MC, data about the physical product as well as service data. Beside the PSS can be understood as a MC offer and thus MC development CAD-model this service data is part of a modeling language and pro- processes and modeling tools can be applied to PSS [8]. One of the cess model of the service. They map the service processes and are the key principles of MC is the solution space modeling. The devel- an important prerequisite for meeting the requirements of the gener- opment and configuration of PSS can benefit from MC techniques ation, customizing, and configuration techniques [23]. An approach how this can be build on CAD-model and extended with 1 Leibniz University of Hannover, Institute of Product Development, email: the event-driven process chain (EPC), will be shown in this paper. [email protected]

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

22 The EPC is a modeling language that can be parameterized within In addition to the above mentioned CAD systems (conventional and limits and is therefore well suited to extend a (physical) CAD model, parametric), VDI Guideline 2209 [34] includes two other types of because CAD and EPC can be modelled similarly and knowledge CAD systems that provide additional functionality for creating vari- can be integrated into this model in a similar way (e.g. by formulas, able geometry models and mapping design knowledge (see figure 1). rules). The EPC can be used with single extensible templates up to Feature-based systems are an extension of parametric CAD systems. the parameterization of the displayed services. The discussion and outlook takes a look how this approach will be transfered in to a do- main approach with a constraint-based model.

1.2 Structure of the Paper In the following section 2, the theoretical background to PSS from the literature is presented, as well as an overview of geometry based solution space modeling in modern CAD systems, service modeling and existing approaches for PSS configuration. Based on this state of the art in section 3 the enrichment of CAD data is described and how a data model can be built up. As well as the description how this model can be used for service modeling and for the support of the service planing and execution. Afterwards in section 4 an exem- plary implementation for a HIL (hardware in the loop) test bench provider, which offers engine test benches, is described. The paper is concluded with a discussion and an outlook on further research potentials in section 5.

2 THEORETICAL BACKGROUND

2.1 Product Service Systems Figure 1. Overview of the principles of 3D modeling [34] The literature agrees that the quality of a PSS is influenced by the structure of the PSS development process [32]. In order to respond to individual customer needs and since a combined product and ser- A feature consists of several geometric elements with parametrics vice development is necessary, a PSS-specific development is neces- and behavior rules and can be understood as a semantic information sary. The literature agrees, but the existing approaches remain mostly object [11]. Features can (to a limited extent) adapt to their environ- vague and conceptual [4]. In addition, they are discussed using very ment. simple or very concrete examples, which makes it difficult to trans- The fourth principle is the knowledge-based engineering (KBE) with fer them to relevant applications [8]. Its multidisciplinarity, which the ability to draw conclusions from the current design situation (ge- involve researchers from different fields of interest is a challenge for ometrical and also background informations). It aims the automation the research and development of PSS. With respect to the evaluation of routine design tasks. To realize this two different knowledge cat- of existing approaches, none of them can be regarded as a generally egories have to be considered, which are shown in figure 2, domain accepted and standardized approach to the development of PSS [9]. knowledge and control knowledge. However, based on literature studies (documented in earlier papers The Domain knowledge describes a solution space build up with [8, 24]) on the existing characterizations, the existing theses and ap- constraints (e.g. by dimensioning formulas that constrain parameters proaches in PSS design research, the following main implications for of the CAD-model), templates (as reusable building blocks), param- PSS development can be identified [26]: eter tables, features, design rules or grammars. In this solution space a suitable solution for a design problem may be found [20, 5]. • coequal development of product and service components The control knowledge is the knowledge which determines the way • integration and addressing of individual customers and their needs a solution space is explored. In literature it is referred to infer- • monitoring and addressing of the customers requirements during ences and reasoning techniques to adapted the system to new or the whole life-cycle of the PSS adapted requirements. Basically, three different techniques may be used [12, 20]: 2.2 CAD-based Solution Space Modeling • Rule-based reasoning: Rules are executed procedurally and can A parametric CAD, in contrast to rigid (conventional) geometry perform subordinate rules or delete them from the working stor- modeling, is able to represent a solution space. To do this, knowl- age in order to realize more complex tasks. The knowledge repre- edge must be explicitly translated into digital prototypes. This is sentation is based on IF-THEN-ELSE-statements. made possible by the parametric systems in particular by the fact that • Model-based reasoning: The possible solution space is described mathematical and logical constraints and boundary conditions can be as a constraint-based physical and/or logical model or by the rep- defined between the parameters in a CAD system. For development, resentation of allocation and resource consumption. the designer must not only specify the product shape, but also the • Case-based reasoning: The knowledge is not explicitly modeled variant design and the associated control and configuration concept as a constraint based model or by a rule based. The knowledge for the components. Thus a solution space is described by the devel- necessary for reasoning is stored in examples (former approved oper [27, 11]. solutions). A simple case-based reasoning system can assort a set

23 The control view is the central view which combine the elements of the four other views and their relationships [23]. Central part of the house is the EPC, a process model which the

Figure 2. Knowledge Modeling in KBE and KBD [25]

of cases, which represent the best-fit or retrieve single already ex- isting cases. Highly developed systems can mix or alter exiting cases and adapt them to new situations.

2.3 Service Modeling

In service development exist just little software support compared to Figure 3. The five ARIS views (ARIS-house) based on Scheer and the product development. For displaying services diagram-based meth- elements of the EPC [23] ods are used (these can be data flow based, object-oriented or control flow-oriented). For service modeling the documentation and presen- tation of the processes is necessary as well as further information process-related relationship of functions presents. The functions are like data needed in the process and involved organizations or people. represented by the function block they are triggered by an event and Goals of the modeling are the targeted detection of weak spots which result in another event (they are represented by event blocks). The can be media breaks within a process, or the analysis of certain prop- functions and events are linked by control flows and the connectors erties of the processes (for example, throughput times or the costs AND, OR and XOR. Beside these fundamental parts the EPC can of a process (activity costing)). Furthermore, the simulation of pro- be extended by further informations which are already shown in the cesses is possible with information about included activities as well ARIS-house (see fig.3) [13]. as further information (e.g. throughput or set-up times) and an exact Beside the EPC, there are other modeling languages, but they are process description [6, 19]. not as accepted as the EPC or limited in mapping information about For this, modeling languages become more and more important. used infrastructure or resources. The EPC is promising for the use in They are also seen as an relevant enabler to fulfill the requirements combination with CAD models thus the characteristics of the service of generating, customizing and configuration techniques [23]. The parts can be developed by means of the EPC [25]. event-driven process chain (EPC) is such a modeling language. It is based on approaches of stochastic network procedures and Petri 2.4 PSS Modeling networks and the central modeling language of the architecture of integrated information systems (ARIS) [6]. Holistic development systems for PSS currently do not exist, pre- Originally the ARIS-approach provided a framework for the mod- dominantly the domains are processed side by side. Integration takes eling of computer-aided information systems. It offers a generic place through allocation mechanisms, e.g. simple combination ma- methodological framework which allows a holistic view on process trices or simple rule-based configuration systems. The configuration design, management, workflow and application processing. In fig- of a PSS is an important part of the development in order to meet ure 3 the ARIS-house is shown, it contains five different views on individual customer needs with a reasonable amount of work [8]. the modeling language with their parts and extensions. The organi- Existing approaches of PSS configuration discussed in literature are zation view represents the resources required to execute a function. shown in the following. An approach based on the idea of modular- The data view contains the information objects that are required or ization which uses combination matrices and focus on the possible arise during the transformation process. The functional view shows product and service architectures for PSS is presented by Aurich et the processes that transform input into output performance, as well al. [2]. The configurability (of service components) of PSS, based as the goals related to the single functions. The performance view on configuration rules (if-then rules) or decision tables, is part of the includes the structural design of the tangible and intangible input and approach of Laurischkat [16]. She specifies that a generation (equiv- output performance required or created in the transformation process. alent to a configuration) of PPS can be made out of five basic PSS

24 types. Bochnig et al. [3] introduced a CAE tool, in which variants data model are the product configurator, the CAD model, the product are generated by combining existing PSS modules, this is part of database and the service register. an integrated PSS development approach. An approach to develop an industrial PSS with predefined blocks (which are predominantly product components) is presented by Mannweiler [17]. The approaches, documented in the PSS literature are only using two of the three reasoning techniques for CAD-based solution space modeling. They uses ether rule-based or case-based techniques. For a model-based configuration of PSS a constraint-based PSS model is necessary [25]. An approach which is a helpful starting point to develop a constraint-based model, is the approach of Steinbach [26]. He adapts the definitions of characteristics and properties of Weber’s Characteristics Properties Modeling/ Properties Driven Development (CPM/ PDD) approach to PSS and extends the model with inter- nal relations of product and service parts [29]. With this approach a Figure 4. Structure of the data model schematic documentation of the PSS is possible which can be trans- ferred in to a CAD model. How such a model can be build up will be shown in the following.

3 CAD MODEL ENRICHMEND 3.2 Configurator The area, which is the starting part for realizing customer oriented The requirements for the development of PSS have already been solutions is the product configurator. It is working like known con- mentioned, the coequal development of product and service com- figurator for physical products and helps to adjust the system in a first ponents, the integration and addressing of individual customers and step to the customer needs. To create configurations, a master CAD their needs and the monitoring and fulfillment of these during the model in Inventor is created which contains the master parameter. In entire life cycle. An important step for the coequal development of order to control the model with these parameters and to create rea- PSS is the integration of existing development tools. In the present sonable configurations in Autodesk Inventor, the Inventor modeling work, parametric and knowledge-based CAD was linked with the language iLogic is used. In figure 5 the iLogic code for activating and EPK and thus a tool for the development of PSS was set up. In this, deactivating parts of the model is displayed. The master parameters services can be developed and planned depending on the physical product, and the effects of services on the physical product can be documented. To implement this, simple references, formulas, matrix operations and hierarchical decision structures are used. Essentially, no additional tools are required for this and the implementation can be implemented using Autodesk Inventor 2017 (as CAD environ- ment) with an Excel integration. The Excel-Inventor combination is sufficient and is used to keep the effort for the creation within limits. Additional the enriched CAD model is also a tool which helps to reach the third requirement, because it can be used for the moni- toring and fulfillment of customer needs during the PSS life with a data model with a representation and documentation of the product and service interfaces, the documentation of changes on parts and their impact on other parts. Furthermore, the model helps to ensure a smooth exchange of information between the individual departments in the development of PSS components (product components or ser- vices). The structure of this CAD model is described in the next sec- tion.

3.1 Model Structure Figure 5. Part of the iLogic code for the configurator The data model allows the configuration of a PSS likewise the sup- port of service planing and the assistance of service technicians. To realize this a CAD-model is build up and enriched with additional data in the CAD environment, as well as an interface to a table cal- are embedded in a in Excel. By using the provided interface between culation program. In this program data is stored and calculations are Inventor and Excel a feasible realization of the configuration control executed. can be realized without additional coding or external software. By The structure of the model and parts of the PSS is divided into four varying the master parameters in Excel, a 3D model of the entered main areas (shown in figure 4) in which information can be stored, configuration is instantaneously created by updating the master CAD entered and retrieved. The different actors in the PSS have access to model and saving it as a new configuration. the model in different places. The areas which can be identified in the Since the topic of the paper is located in the area of PSS and not

25 restricted to pure products, the configurator has been extended to im- in an Excel sheet. This allows the maintenance personnel to update plement the services as well. The parameters in the CAD model are the product data base with information depending on what has been not tied explicitly physical properties. With the help of non-physical done. If a part has been replaced, the stock, the status, the installation parameters and the tools of the programming languages VBA (Ex- date and more will update. cel) and iLogic (Inventor), services are also be incorporated in the configurator and the CAD. Like for the physical parameters, non- 3.4 Product Database physical parameters include constraints to ensure compatibility of the system. For example, commands like if component A is chosen, fol- The product database was already mentioned in the sections of the lowing Services are available are used. Such constrains can be based configurator and the CAD model due to the internal relations of the on physical (components) as well as economic reasons. A mainte- data model. The product database contains the product-related data nance of a cheaper produced product might for example be possible, for individual products, so it implements the digital twin of the ex- but from a economical view point not reasonable because during the isting PSS. In contrast to the CAD model, the data stored here is not whole lifetime the maintenance cost will be higher than the costs for order-neutral. a product with lower need of maintenance. The implementation of The database is created parallel to the CAD model, the parts and in- services in a system depends on different factors which needs to be formations generated configuration are stored in another Excel sheet. considered. All of the components get automatically listed in a structured hierar- chy thus splitting assemblies into sub-assemblies and single parts. In 3.3 CAD Model such way the dependencies between the components are easy to iden- tify. The database gets filled out with relevant information such as The CAD model consists of (the already mentioned) master param- amount installed, stock left, provider information, order date, main- eters and slave parameters which adapt depending on the user input tenance interval and required maintenance certifications and more. automatically thus influencing the existence and geometry of com- ponents. The parameters need to be entered in a specific manner in- 3.5 Service Register cluding the parameter name, unit, value and other optional fields. The number of parameters for each part are not limited and not re- Beside the product database exists the fourth area the service regis- stricted to geometric parameters, also parameters like neighborhood ter because for the PSS, concrete services need to be implemented. relations, number of parts, installation sequences or tightening torque For this purpose the service register, another data base in form of an can be derived from CAD models (these can also be transferred to the Excel sheet, is created. The service job register includes information EPC to elaborate services (more in [25])). Once the part parameters like the job description, job location, needed certification and the due are activated, they can also create and modify other databases such date listed in a structured manner. In addition to that there is a cell as a parts list in Excel. To create and modify data due an interaction to assign personnel for that specific job. To automate the assigning of product and service a communication is needed. process a calendar has been created in another Excel sheet. It in- To enable a communication between users (here service technicians) cludes the information about the availability and the certification of and Autodesk Inventor the software provides forms. They can be the personnel. Based on the input in the service register a VBA code used to extend a user interface that allows the user to view, enter can easily identify suitable and available personnel with a push of a information or perform actions. The forms can be created using an button. editor integrated into the CAD system and usually associated with iLogic rules. The forms which are included in the system presented 3.6 Integration of the Views on the Model in a PSS in this paper are ”request job information”, ”identify spare parts” and ”input retrieved information”, all of them realize the communication With the help of integrated iLogic commands information from excel with a service technician and provides informations from the CAD sheets can be extracted and presented to the user in form of a mes- model for the service or transfers informations from the service to sage box directly in the CAD software. Such tool can for example the product data. improve the communication efficiency between departments that ex- The first form, request job information, allows the technician to recall plicitly use a specific software and are dependent on it. In this case, informations which are relevant for their next job by entering their the maintenance department could recall the relevant information for identification number (ID) in an input field of the form. The data is their job directly from the order stored in an Excel sheet (task date, stored in the service register and the iLogic code presents them all problem description, task location). Built in iLogic the function of a of the relevant information from the cells of the newest job to which text box allows display any cells in an excel sheet which can be iden- they have been assigned and which suits to the technician (capabili- tified automatically if the sheet has a defined a basic structure. Ad- ties/ requirement comparison). The direct connection to the product ditionally the manual identification of cells is possible. For example data will be executed by the next form which provides the second the entering of a part ID in a form and the extraction of steps of the support step for the technician. The form ”identify spare parts” al- disassembly process. In this proposed system, the service personnel lows the user to see the part dependencies of any component of the can retrieve relevant information entered by other departments. Con- product (e.g. complexity of installation, neighborhood relations). Ad- crete examples are provided in the next chapter as well. ditionally due to the hierarchical structure of the part data base a sim- The system proposed in this paper includes two roles representing ple combination of iLogic and VBA allows the identification of spare some the most typical branches of service: sales and maintenance. parts linked to the part of whichs ID has been entered in the form. The advantages of the proposed model enrichment techniques can be The first two forms were used to supply data to technicians, while applied to any branch, these two have been chosen as example. The the third form is used to return data collected during the service. In sales department is responsible for the service register in which they the form ”Input retrieved information” the entered information gets fill out the cells based on the customer input. Here an access to the translated to parameters which automatically update specific cells standardized excel forms is necessary. The maintenance personnel is

26 in charge of the manual tasks in a company, the Inventor forms are intended to them as a simple but effective communication with the databases. In this way the power of the maintenance personnel is au- tomatically restricted to only allow modification of the data which are needed or relevant to their job.

4 APPLICATION EXAMPLE To illustrate the system described above an example of an industrial company producing test benches is used. They offers solutions of hardware in the loop (HIL) test benches for load test of engines. The following chapters shows concrete examples of the suggested CAD model enrichment techniques, their realization and advantages.

4.1 Configurator Figure 7. Parameter of the configurator The excel configurator containing four in the CAD model embed- ded parameters. The parameters describe the existence of following engine test bench parts or their size: The database table gets filled with the information such as amount • An engine replacement cart used, stock, provider information, order date, maintenance interval • A conditioning equipment and more. • An electric or a hydraulic brake • Three different engine sizes

The input form of the excel configurator for these parts are shown in 4.2 Maintenance and Repair figure 6, as well as two models of the test bench (without displaying the engines). Until now the configurator does not differ much from the other al- ready widely used configurators. Following the non-material factors have been implemented to the CAD model thus expanding the func- tionality of the configurator described above. To upgrade the product to a PSS, the maintenance and repair of the products are integrated into the system. So if a maintenance need is known, qualified personnel must be employed to perform that main- tenance. To achieve this, the CAD data is extended by an Excel file in which a calendar has been created on an Excel sheet showing the availability of the personnel and the qualification of the personnel. When a main- tenance need arises and a customer contacts the service department, a number of items are identified: the description of the problem, iden- Figure 6. Configurator for test bench parts tification of a product, preferred maintenance dates and other infor- mation beyond. A VBA code can be activated in the Excel file at the push of a button, identifying all available and qualified employees With the mentioned parameters 16 different product configura- from the calendar for the defined date and duration. If no employee is tions can be created initially. Since the CAD model is built para- found, the system will also notify them and suggest a different date or metrically, the compatibility is automatically ensured. For example duration. With this system, the customer can be given a confirmation if a larger engine size is chosen, heavier loads and dimensions are for a specific date during the call, which increases communication needed. For this, the support structures for the engine will adapt their effectiveness. position and the slot table will reduce or expand its size (dependent parameters are programmed in iLogic). Also the standard parts like screws will be replaced if the allowed loads are exceeded. The param- eter and their dependencies of the configuration are shown schemat- ically in figure 7. Newer versions of Inventor even have a function for automatic standard part replacement using its material library directly thus reducing the programming effort. Parallel with a CAD model the database of the configuration is gen- erated in an Excel sheet. This sheet contains all of the components which get automatically listed hierarchically splitting assemblies into single parts (For example the engine transportation cart gets Figure 8. Job management in the PSS model split into the profiles for the frame, wheels, screws and bolts). In this way the dependencies between the components are easy to identify.

27 Figure 8 shows an exemplary extract from the Excel spreadsheet, the right type of bolts on to the maintenance could save an additional both from sheet 1 with the existing orders and from sheet 2 with the trip or delay trying to get the needed bolts. The effectiveness of this employees and the employee-specific information and additionally tool does obviously depend on many factors like the experience level a communication window. When the maintenance order is placed, of the maintenance personnel, product art and its complexity, the de- the submitted data is stored in a separate Excel spreadsheet for tail of the problem description, etc. the maintenance work. This information can now be retrieved by The third form used by maintenance technicians implements a data service personnel using a function integrated into the CAD model in feedback into the system and thus realizes the essential function of Inventor. This is done using an iLogic code that displays a message information feedback from the service into the model of physical box containing the information entered by Sales in the Maintenance components. Excel table. If the engine test bench has been repaired, serviced or parts replaced, In addition to planning service, an important part of the data model the service personnel fill out a form in the Inventor. Input informa- shown here is supporting the maintenance technician in the execution tion is the identification of parts, the activity performed, the date of of his work. For this purpose, forms have been created in Inventor the activity, the reason for the activity, the new condition of the prod- that realize the communication interface between the technician uct and the proposed future activities, including any other parts that and the data model. The technician use the first form to request an have not been repaired or maintained but may require attention. An order and Inventor provide the relevant informations including the iLogic code sends this information to the parts database in Excel and order number, the date, information about the customer, the existing updates the relevant cells. For example, if a part has been replaced, configuration of the hardware and the order description. Figure 9 the inventory will be reduced, the implementation date will also be shows the dialog windows of this form. updated, it will now indicate that the part has been replaced once and its condition will be set to ”good”. If a part has been replaced too often, or if it has been replaced before the end of its life, it may be a reason for a more detailed investigation of why this is happening. In this way, important information is exchanged immediately and auto- matically across different hierarchical levels (from maintenance per- sonnel to project managers). Figure 10 shows the data feedback parameters that are transferred to the documentation of the product (its digital twin).

Figure 9. Job information providing for service technicians

The second form provides the technician additional informations about the maintenance task. With a relatively complex product like an engine test bench it can be hard to identify parts that also might be defect or need a replacement before a failure analysis has been per- formed. To assist in such task another configurator has been created in the Inventor environment. Firstly a button with an iLogic code has Figure 10. User parameter of the PSS been created in Inventor which retrieves all the relevant job informa- tion that the sales engineer entered during the call with the customer. If the customer was able identify the broken or to be maintained com- ponent, it is then also included in the retrieved data. By entering that part or assembly number in another Inventor form, second iLogic 5 DISCUSSION AND CONCLUSION code gets activated identifying that very same part/assembly in the data bank created at the beginning. Due to the hierarchical structure In the context of this article, an approach was presented that shows of the data bank the subcomponents or sub assemblies of that part the implementation of a parametric PSS data model based on a CAD can be identified and communicated to the user in a form of a mes- application. This was applied to the example of an engine test bench sage box. Dynamic machines often use parts that need to be replaced and the advantages of the model were worked out. after every disassembly like special anti-friction bolts that cannot be Although CAD documents usually represent order-neutral data, this reused due to the glue layer on the thread. Taking the exact amount of approach makes it possible to extend the CAD and create digital

28 twins based on the stored data of individual PSS models, which sup- [11] Mario Hirz, Wilhelm Dietrich, Anton Gfrerrer, and Johann Lang, Inte- port the documentation, adaptation and execution of product and ser- grated computer-aided design in automotive development, Springer. Product customization vice components of the PSS during the life cycle phase. [12] L. Hvam, N.H. Mortensen, and J. Riis, , Springer Science + Business Media, 2008. The basic structure of a configurator makes it possible to initially [13] G. Keller, M. Nuttgens,¨ and A.-W. Scheer, ‘Semantische prozess- respond to individual customer needs. By linking products and ser- modellierung auf der grundlage ereignisgesteuerter prozeketten (epk)’, vices in a model, it is possible to document changes to components Veroffentlichungen¨ Des Instituts Fr Wirtschaftsinformatik. and services and their effects. In this way the information exchange [14] Y. Koren, The global manufacturing revolution: product-process- business integration and reconfigurable systems, Wiley series in sys- can take place without friction losses (by the translation into differ- tems engineering and managements. ent models). Service planning and development also benefits from [15] K. Kuntzky, Systematische Entwicklung von Produkt-Service-Systemen, the model because it can use the information available in CAD about Schriftenreihe des Instituts fur¨ Werkzeugmaschinen und Fertigung- neighborhood relationships, number of components (e.g. screws) and stechnik der TU Braunschweig, Vulkan-Verl. Product-Service Systems: IT-gestutzte¨ Generierung additional information such as tightening torques and transfer it to [16] K. Laurischkat, und Modellierung von PSS-Dienstleistungsanteilen, number 2012,3 tools such as the EPC. The common data model not only provides in Schriftenreihe des Lehrstuhls fur¨ Produktionssysteme, Ruhr- a common communication basis, but also guarantees consistency in Universitat¨ Bochum, Shaker. the model and enables view management of the configuration in the [17] C. Mannweiler, Konfiguration investiver Produkt-Service Systeme, PSS. number 2014,1 in Produktionstechnische Berichte aus dem FBK, Lehrstuhl fur¨ Fertigungstechnik und Betriebsorganisation, Techn. Univ. The problem with the model is that it is still a relatively rigid model [18] O.K Mont, ‘Clarifying the concept of productservice system’, Journal that is limited to a specific application case. In further research, this of Cleaner Production, 10(3), 237–245. model will be transformed into an approach that allows constraint- [19] F.J. Nuttgens,¨ M. und Rump, ‘Syntax und semantik ereignisgesteuerter based creation of models. There, the individual parts of the PSS are to prozessketten (epk)’, Prozessorientierte Methoden Und Werkzeuge Fur¨ Die Entwicklung von Informationssystemen - Promise be built within the framework of a constraint network, so that an op- . [20] D. Sabin and R. Weigel, ‘Product configuration frameworks - a survey’, timization of the system to different boundary conditions (e.g. Main- IEEE intelligent systems, 42–49. tenance interval, costs, installation duration, remaining service life) [21] T. Sakao, Y. Shimomura, E. Sundin, and M. Comstock, ‘Modeling de- is also possible. sign objects in CAD system for service/product engineering’, 41(3), 197–213. [22] F. Salvador, P.M. De Holan, and F. Piller, ‘Cracking the code of mass ACKNOWLEDGEMENTS customization’, MIT Sloan management review, 50, 71–78. [23] A.-W. Scheer, ARIS Vom Geschaftsprozess¨ zum Anwendungssystem, This research was conducted in the scope of the research project Springer. SmartHybrid – Product Engineering (ID: 85003608) which is partly [24] D. Schreiber, P.C. Gembarski, and R. Lachmayer, ‘Datamodels for pss development and configuration: Existing approaches and future re- funded by the European Regional Development Fund (ERDF) and search’, World Conference on Mass Customization, Personalization and the State of Lower Saxony (Investitions- und Forderbank¨ Nieder- Co-Creation (MCPC 2017), 9. sachsen NBank). We like to thank them for their support. [25] D. Schreiber, P.C. Gembarski, and R. Lachmayer, ‘Developing a constraint-based solution space for product-service systems’, Interna- tional Conference on Mass Customization and Personalization - Com- REFERENCES munity of Europe (MCP-CE 2018), 8. [26] D. Schreiber, P.C. Gembarski, and R. Lachmayer, ‘Modeling and con- [1] J.C. Aurich and C. Wagenknecht C. Fuchs, ‘Life cycle oriented design figuration for product-service systems: State of the art and future re- of technical product-service systems’, Journal of Cleaner Production, search’, International Configuration Workshop (CWS 2017), 19. 17, 1480–1494. [27] J.J. Shah, ‘Designing with parametric cad: Classification and compari- [2] J.C. Aurich, N. Wolf, M. Siener, and E. Schweitzer, ‘Configuration of son of construction techniques’, Geometric Modelling Proceedings of productservice systems’, Journal of Manufacturing Technology Man- the Sixth International Workshop on Geometric Modelling, 6, 53–68. agement, 20, 591–605. [28] D. Spath and L. Demuß, ‘Entwicklung hybrider produkte - gestaltung [3] H. Bochnig, E. Uhlmann, and A. Ziefle, ‘Assistenzsystem IPSS-CAD materieller und immaterieller leistungsbundel’,¨ in Service Engineer- als informationstechnische unterstutzung¨ der integrierten sach- und di- ing, eds., Hans-Jrg Bullinger and August-Wilhelm Scheer, 463–502, enstleistungsentwicklung in der IPSS-entwurfsphase’, in Industrielle Springer-Verlag. Produkt-Service Systeme, eds., Horst Meier and Eckart Uhlmann, 95– [29] M. Steinbach, ‘Systematische gestaltung von product-service-systems: 115, Springer Berlin Heidelberg. integrierte entwicklung von product-service-systems auf basis der lehre [4] M. Boehm and O. Thomas, ‘Looking beyond the rim of ones teacup: von merkmalen und eigenschaften’. a multidisciplinary literature review of product-service systems, in in- [30] F. Sturm, A. Bading, and M. Schubert, Investitionsguterhersteller¨ auf formation systems, business management, and engineering and design’, dem Weg zum Losungsanbieter:¨ eine empirische Studie; fit2solve., IAT, Journal of Cleaner Production, 51, 246–260. Stuttgart. [5] J.J. Cox, ‘Product templates - a parametric approach to mass customiza- [31] A. Sylla, D. Guillon, E. Vareilles, M. Aldanondo, T. Coudert, and tion’, CAD Tools and Algorithms for Product Design, 3–15. L. Geneste, ‘Configuration knowledge modeling: How to extend con- [6] A. Gadatsch, Grundkurs Geschaftsprozess-Management¨ , figuration from assemble/make to order towards engineer to order for Vieweg+Teubner. the bidding process’, Computers in Industry, 99, 29–41, (08 2018). [7] P.C. Gembarski and R. Lachmayer, ‘Designing customer co-creation: [32] O. Thomas, P. Walter, and P. Loos, Konstruktion und Anwendung einer Business models and co-design activities’, International Journal of In- Entwicklungsmethodik fr Product-Service Systems, Hybride Wertschp- dustrial Engineering and Management (IJIEM), 13(8.3), 121–130. fung, Springer, Berlin, Heidelberg, 2010. [8] P.C. Gembarski and R. Lachmayer, ‘Product-service-systems - what [33] Arnold Tukker, ‘Eight types of productservice system: eight ways to and why developers can learn from mass customization’, Enterprise sustainability? experiences from SusProNet’, Business Strategy and the Modelling and Information Systems Architectures, 13(16), 1–16. Environment, 13(4), 246–260. [9] M. Grassle,¨ O. Thomas, M. Fellmann, and J. Krumeich, ‘Vorgehens- [34] VDI, VDI Guideline 2209 - 3D Product Modelling, Beuth. modelle des product-service systems engineering: uberblick,¨ klassifika- [35] L. Wagner, D. Baureis, and J. Warschat, Developing Product-Service tion und vergleich’, Integration von Produkt und Dienstleistung - Hy- Systems with InnoFuncs, volume 1, 2013. bride Wertschopfung¨ , 51, 246–260. [10] D. Guillon, A. Sylla, E. Vareilles, Mi. Aldanondo, E. Villeneuve, C. Merlo, T. Coudert, and L. Geneste, ‘Configuration and response to calls for tenders: an open bid configuration model’, (09 2017).

29 smartfit: Using Knowledge-based Configuration for Automatic Training Plan Generation

Florian Grigoleit, Peter Struss, Florian Kreuzpointner Technische Universität München Boltzmannstr. 3, 85748 Garching b. München {grigolei, struss}@in.tum.de, [email protected]

Abstract Creating a good training plan is a very complex task, consisting The fitness industry has been booming for several decades, and of selecting and parameterizing exercises based on user parameters there is an increasing awareness of the essential impact of phys- and domain knowledge. This is analogous to configuring a system ical exercise on health. Those who are interested in exercising based on a repository of components (which are usually physical usually lack detailed knowledge about how to do this in a way building blocks or software modules), [16], and, therefore, we base that is effective and appropriate. Existing apps mainly offer a set smartfit on GECKO. Together with researchers from sports and of standard training plans that do not take all relevant individual health sciences, we created a descriptive domain theory for fitness and contextual conditions into account. The resulting effect of training. This domain theory is a specialization of generic GECKO following these apps may not only be ineffective, but even concepts and a collection of constraints on their attributes. harmful to health. Properly designed training plans, as usually In this paper, we focus on presenting the solution to configuring produced by an experienced trainer, must consider both individ- a plan for a single training session based on an initial version of the ual goals and physical abilities of the trainees to avoid adverse knowledge base. Section 2 introduces training science and moti- effects. We developed smartfit as a knowledge-based system for vates our work on generating training plans automatically. Next, generating training plans tailored to the individual trainee with- we introduce our formalization of the configuration task and the out requiring detailed knowledge. It has been developed as an key concepts of GECKO. The knowledge representation of fitness application of our generic constraint-based configuration system training is described in Section 4, while Section 5 evaluates the GECKO, which generates optimal or optimized configurations solution. Finally, we comment on our current work and some open that satisfy high-level user demands. We briefly introduce issues. GECKO, present the application problem and the domain knowledgebase, and discuss the evaluation of the current system and future work. 2 Generation of Training Plans Training science is a discipline of sport sciences focused on ana- lyzing the effects of training stimuli on the human body. The ef- 1 Introduction fects of training can vary significantly. It can enhance aerobic ca- Creating a training plan at home appears to be simple. A trainee pacity, increase flexibility, or improve strength abilities. Trainees chooses exercises and performs them. Usually, such an approach can have several reasons for training, but all have one goal in com- results in unsatisfactory training results. First, an average trainee mon: they want to enhance their physical performance. One major lacks the necessary training knowledge. Second, background insight of training science is that adaptation to training is highly knowledge regarding health and training effects, implicitly in- individual regarding the trainee. The same training stimulus can cluded in a professional training plan, is either unavailable to the have different effects on different trainees depending on the indi- average trainee or too complex for him/her to include it in a train- vidual physiological capacity. Therefore, it is very important to ing plan. For these reasons, homemade training plans tend to be train under optimal conditions with an appropriate training plan to insufficient. The same applies to most training plans available on have individual success. For this, a trainer has to select a set of the internet, which just consider very few parameters like gender exercises, the load (training weight) of the exercises, and the and training goal. This leads to an unsatisfactory training plan, amount of rest between the sets and exercises. To create a plan for which does not reflect the needs of the trainees. an individual trainee, he must consider parameters such as age or To provide trainees with effective, customized training plans the individual fitness of the trainee, because an intensive exercise with a positive impact on health, we developed a knowledge-based e.g. burpees, is well-suited for young and fit trainees, but would solution based on GECKO (Generic, constraint-based Konfigura- overwhelm and potentially even harm beginners or elderly train- tor), [9]: smartfit. smartfit is designed for trainees who want to cre- ees. With this information, two important pillars are covered: What ate plans based on deep background knowledge and which cover should I train, and how? the needs for individual personalized expectancies.

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

30 Elite athletes perform highly individualized training, which makes • Task: requirements and restrictions on the resulting them stronger and better. This is only possible because their train- configuration (including at least one goal specified by ers, scientific and medical advisors etc. possess the required spe- the user), such as available training equipment and cific knowledge. Common trainees do not have access to this physical properties of the trainee. knowledge. Therefore, with our work, we aim at collecting it in a Domain knowledge is expressed by constraints on (attributes of) knowledge-base and making it exploitable in the generation of in- subclasses or instances of these concepts. Figure 1 displays the dividual training plans without requiring the user to acquire the de- different types of constraints. For instance, certain parameters tailed domain knowledge him/herself. With this, we expect to con- characterizing the task may by incompatible with certain goals tribute to making exercising more effective and satisfactory to (“low body-mass-index excludes goal weight-loss”) or exclude sports amateurs and occasional practitioners. In this way, the train- some components (an injury may prevent certain exercises). There ees may avoid frustration and adverse results. are interdependencies among components (two particular exercises The starting point for creating a training plan is considering the must not appear together in a session) or goals, which can be used aim of the trainee and break it down into desired improvements in to introduce a decomposition of goals (“muscle gain” requires various fitness categories, for example strength or endurance. In “muscle gain of upper body” and “muscle gain of lower body”). addition, further information about the trainee and the training con- GECKO offers the basic constraints “requires” (implication) and ditions is necessary: age, height, weight, and gender as well as in- “excludes” (implication of negation) to represent this. formation about available equipment and training duration are re- A key part of the configuration knowledge is related to the proper quired. Further questions include: are there any health or injury selection of components given the goals stated in or resulting from concerns? What is the desired or available training frequency per the task. GECKO uses the construct of Choice to represent this: it week? Which muscle groups have to or are desired to be or trained? is a collection of components each with an associated Contribu- Training planning includes arranging the exercises within a ses- tion, a numerical or qualitative value. The contributions of the sion appropriately (obviously, the warm-up should be before the components included in a configuration are summarized by the main training). However, despite the term “planning”, no elaborate choice to deliver a certain reached AchievementLevel. The goal planning in the sense of producing complex structural and temporal which has this choice associated has an AchievementThreshold interdependencies of actions is required. The main task is selecting that needs to be reached by the AchievementLevel in order to be exercises from a repository and parameterizing them. Ordering considered fulfilled. them in an appropriate way is usually not a major problem and will Since a component may occur in several choices (an exercise af- not lead to the rejection of a set of exercises. fects several muscle groups), we obtain an m:n relationship be- tween goals and components (and introduce the potential of a com- binatorial problem). GECKO also uses choices of goals to express 3 A Generic Knowledge-based Approach how sub-goals together achieve a higher-level goal. to Generating Optimized Configurations Finally, goals may have an associated priority, which assures that the more important goals receive more contributions, and compo- nents have a cost. One important task parameter is a limit to the The main objectives of our development of GECKO are cost of the entire configuration, which typically (but not neces- • a fairly domain-independent solution to configuration problems, sarily) is the sum of the components’ costs. While cost will often • based on a small set of generic concepts that support a really mean “money”, in the training plan domain, time is the re- clear structuring of the knowledge base, source which is limited and consumed by the exercises. • allowing its use without detailed domain knowledge, Contributions and cost are the factors that allow characterizing the • considering optimality criteria. utility of selecting a component and, hence, for specifying optimal The creation of a knowledge base for a specific application system solutions and for guiding a best-first search. is done by providing • domain specific specializations of the generic concepts and specification of variables associated with them and • constraints of different types on these variables. We first briefly introduce the concepts structuring the knowledge- base, using smartfit to illustrate them, and then present the theoret- ical and algorithmic foundations (for more information, see [9]).

3.1 GECKO Concepts

The three key concepts underlying the system have a straightfor- ward intuitive meaning. • Component: the elements to be chosen and included in a configuration (in smartfit: exercises), Figure 1 Constraint for configuration • Goal: achievements expected from a configuration; they can express high-level user expectations (“muscle gain”) or detailed sub-goals the user is unaware of (“bi- ceps hypertrophy”)

31 3.2 Consistency-based Configuration 퐴퐴(훤) ∪ 퐶표푛푓푖푔퐾퐵 ⊨ ∀ 푔표푎푙 ∈ 푇푎푠푘퐺표푎푙푠 푔표푎푙. 퐴푐ℎ푖푒푣푒푑 = 푇 We formalize the configuration problem as identifying a subset of the components that satisfies the task specified by the user and is This view on configuration was inspired by the formalization of consistent with the configuration knowledge base ConfigKB. This consistency-based diagnosis [1], [7], where modes OK or ¬OK are can be seen as an assignment, AA, of activity to the components, assigned to the components of a system, and a diagnosis is defined which indicates the inclusion in or exclusion from the configura- as a mode assignment MA(훥) that is consistent with the model li- tion. brary, the structural description of the system and a set of observa- tions (which are all sets of constraints, just like ConfigKB and Definition 1 (Activity Assignment) Task): An activity assignment for a set COMPS0 ⊆ COMPS is the con- 푀표푑푒푙퐿푖푏 ∪ 푆푡푟푢푐푡푢푟푒 ∪ 푂푏푠 ∪ { 푀퐴(훥) } ⊭ ⊥ junction AA(COMPS0) = In consequence, solutions to consistency-based diagnosis can also be exploited for generating configurations. This includes the intro- [ ⋀ 퐴퐶푇(푐표푚푝)] duction of a utility function and the application of best-first search

푐표푚푝∈퐶푂푀푃푆0 to generate solution.

⋀ [ ⋀ ¬퐴퐶푇(푐표푚푝)] 3.3 Search for Optimal Configurations 푐표푚푝∈퐶푂푀푃푆 \퐶푂푀푃푆0 In consistency-based diagnosis, a utility function is often based on probabilities of component modes (assuming independent failures ACT(comp) is a literal which holds when a component comp ∈ of components) ([7]) or, weaker, some order on the modes ([13]). COMPS is part of a configuration. In GECKO, we consider the contributions of components to the satisfaction of goals (possibly weighted by priorities of goals) and Definition 2 (Configuration Task) their cost. A configuration task is a pair (ConfigKB, Task) where: • ConfigKB, the knowledge base, containing the do- Definition 4 (Utility Function) main-specific objects and constraints, A function h(AA(훤) , Task) is a utility function for a configuration • Task is a triple (TaskGoals, TaskParameters, problem iff it is admissible for A* search. TaskRestrictions) where: • TaskGoals, is the assignment of Definition 5 (Optimal configuration) goal.Achieved=T to a set of user selected A configuration AA(훤) is optimal regarding a utility function goals a solution to a configuration prob- h(AA(훤), Task) iff for no configuration AA(훤′) h(AA(훤′), Task)is lem has to satisfy larger. • TaskParameters, domain-specific value The utility of a configuration represents the fulfillment of the re- assignments to parameters, are constants quired goals and the cost of the configuration. • TaskRestriction, user selected constraints The utility depends on its active components only. In the follow- on the activity of components ing, it is assumed that • the contribution of a configuration is obtained solely as To establish a solution to a configuration task, a set of active com- a combination of contributions of the active components ponents has to be consistent with the task and the knowledge base. included in the configuration and otherwise independent of the type of properties of the components, Definition 3 (Configuration for a Task) • the cost of the contribution is given as the sum of the cost A configuration for a Task is an activity assignment AA(훤) such of the involved active components and will usually be that 퐶표푛푓푖푔퐾퐵 ∪ 푇퐴푆퐾 ∪ { 퐴퐴(훤) } is satisfiable. numerical, and 퐶표푛푓푖푔퐾퐵 ∪ 푇퐴푆퐾 ∪ { 퐴퐴(훤) } ⊭ ⊥ • we can define a ratio “/” of contributions and cost. ′ A configuration is minimal iff for no proper subset 훤 of 훤 is The first defined function sums up the AchievementLevels (i.e. the AA(훤′) is a configuration. combined contributions of all active components) multiplied with a weight dependent on the goal priority of all active goals and di- Consistency seems to be a weak condition. After all, we want the vides this by cost of all active components. (In the definition, we configuration to satisfy the goals, not just be consistent with them. simplify the notation by writing Goalj.AchievementLevel instead But this is ensured by the definition, as stated by the following of Goalj.Choicej. AchievementLevel etc.). proposition. Intuitively, if an activity assignment yields an AchievementLevel lower than the AchievementThreshold of a Definition 6 (GECKO Utility Function) goal, it would be inconsistent with the goal.Achieved=T as re- ℎ푙(퐴퐴(훤, 퐴푐푡퐺표푎푙푠) ≔ quired by the task. ∑ 푤푒푖푔ℎ푡(퐺표푎푙 . 푃푟푖표푟푖푡푦) ∗ 퐺표푎푙 . 퐴푐ℎ푖푒푣푒푚푒푛푡퐿푒푣푒푙 퐺표푎푙푗 ∈ 퐴푐푡퐺표푎푙푠 푗 푗 Proposition 1 ∑퐶표푚푝 ∈훤 퐶표푚푝 . 퐶표푠푡 If AA(훤) is a solution to a configuration task (ConfigKB, Task) 푖 푖 then

32 This function ignores an important aspect: If the Achieve- - TrainingGoals represent a specific aspect of fitness mentThreshold of some choice has already been reached, training, e.g. strength training, under consideration of a the utility of adding yet another component with a contribu- TraineeGoal, i.e. strength training to support weight loss. The corresponding TrainingGoal is Weight- tion to this choice is overestimated. The second utility func- Loss.Strength. tion tries to capture this by disregarding any excesses above - TargetGoals: A TargetGoal represents a single fitness the AchievementThresholds. target, i.e. a body region or a muscle, to be trained. Tar- getGoals are RegionsGoals, MuscleGroupGoals, and Definition 7 (GECKO Utility Function with contribution MuscleGoals, see Figure 2 The intensity with which it is limit) to be trained depends on the corresponding Train- ingGoal. ℎ푙(퐴퐴(훤, 퐴푐푡퐺표푎푙푠) ≔ Goals are organized in a hierarchical structure via requires con-

∑퐺표푎푙 ∈ 퐴푐푡퐺표푎푙푠 푤푒푖푔ℎ푡(퐺표푎푙푗. 푃푟푖표푟푖푡푦) ∗ 퐶푢푟푏푒푑퐿푒푣푒푙(퐺표푎푙푗) straints (Goal-Goal Constraints). Requires (x, y) is defined by 푗 ∑ 퐶표푚푝 . 퐶표푠푡 x.active=T => y.active=T, 퐶표푚푝푖∈훤 푖 Where CurbedLevel is defined by for configuration constraints, see [9]. For example, the Train- 퐶푢푟푏푒푑퐿푒푣푒푙 eeGoal MuscleGain would require the TrainingGoals Muscle- Gain.Strength and MuscleGain.Endurance. The TrainingGoals in ≔ min (퐺표푎푙푗. 퐴푐ℎ푖푒푣푒푚푒푛푡퐿푒푣푒푙, 퐺표푎푙푗. 퐴푐ℎ푖푒푣푒푚푒푛푡푇ℎ푟푒푠ℎ표푙푑) turn require subordinate TargetGoals, as shown in Figure 4. Based on this, we can exploit best-first search and solutions that have been developed in the context of consistency-based diagnosis. Table 1: Overview of specialized GECKO concepts This includes pruning the search space based on inconsistent par- tial mode assignments that have been previously detected during GECKO Fitness Concept Example the search (called conflicts), e.g. exploiting a truth-maintenance Concept system (TMS, such as the assumption-based TMS [6]) as in SHER- Goal TraineeGoal Muscle Gain LOCK does ([8]). Classical A* search has been extended and im- TrainingGoal Strength proved by. From the diagnostic solutions, this approach has been TargetGoal Biceps generalized later as conflict-directed A* search, see [9]. Component Exercise Push-up Task Training Request - Task – TrainingDuration 90 minutes 4 smartfit Restriction ExerciseRestriction Exclude(push-up) Task – TrainingProperty Equipment In this section, we discuss the configKB for the domain of training Parameter TraineeProperty FitnessTraget.Biceps plan generation. Of course, we can present only the basic principles Configuration TrainingPlan and some typical examples for illustration purposes. We will pro- vide some details on the scope and size of the knowledge base. The conceptualization and structure of the domain knowledge is noth- ing that can be extracted from a textbook or obtained directly from interviewing experts. It is the result of major knowledge acquisi- tion efforts requiring several person years and involved sports sci- entists, professionals from the fitness business, and AI researchers. With this application, we support our claim that GECKO provides a basis for creating specific application systems by specializing the generic classes and providing a structured set of constraints, see Table 1. The presentation is restricted: • to generating a plan for one training session • as a set of exercises, i.e. without ordering them and • without their parameterization.

4.1 smartfit’s Essential Concepts

Goals Goals in smartfit represent certain aspects or requirements a Figure 2: Goal-Structure for smartfit Trainee must fulfill to improve his/her fitness. The smartfit domain theory contains three hierarchically ordered types of goals: - TraineeGoals are high-level goals selected by the user. They are also the only goals that the user has to be aware of. They represent an abstract achievement the user wants to achieve, e.g. weight loss or muscle gain.

33 strength exercises cannot be used for endurance training. An Ex- ercise contributes to a set of TargetGoals with (potentially) differ- ent levels of contribution, see Figure 5. Figure 6 shows Exercise and its associations. The exercise catalog currently contains 603 exercises. For the test cases in section 5, we used different subsets ranging from 10 to 500 exercises. Task A Task in smartfit is the triple of TraineeGoal (TaskGoal), and Trainee- and TrainingProperties (TaskParameters) and TaskRe- strictions. Task goals: one of the TraineeGoals, selected by the user. Figure 7 shows the TraineeGoals in the knowledgebase

Figure 4: Expanding Goal-Structure for MuscleGain

Figure 5: Component Contributions to multiple goals Figure 3 Relation Goal-Choice-Component

The expanded Goal-Structure in Figure 4 shows the body regions UpperBody and Core as lowest level. The granularity of this struc- ture depends on the associated TrainingGoal and, in some cases, on the TraineeGoal. While it is necessary for strength training to break down the body region into muscle groups, such as upper leg or abdominal region, and specific muscles, e.g. biceps or triceps, this is not the case for endurance training, for which either the en- tire body or body regions, e.g. legs, are sufficiently precise. What is important to note is that only the last level or TargetGoals is con- nected to exercises (components) via requiresChoice constraints.

As explained in [9], Goals can have priorities. In smartfit, priorities (domain={1,2,3,4,5}) indicate not (only) the importance of a goal, but the focus of the training. If the priority of the TrainingGoal strength is higher than the priority of endurance, more exercises and more time are required to achieve strength than for endurance.

The existing knowledge base contains 8 TraineeGoals, 24 Train- Figure 6: Exercise in smartfit ingGoals (3 fitness categories * 8 TraineeGoals), 72 RegionGoals (3 body regions * 24 TrainingGoals). Table 2: Training and TraineeProperties The priority of each goal is defined in the knowledge base. If the Parameter Values priority is changed, e.g. by the training focus (see Task), the in- Age 18-40; 40-55; 55-65, 65-75; >75 crease/decrease is propagated downwards through the goal struc- Sex Male; Female ture. Body-Mass-Index <18; 18-25; 25-30; >30 Components Available Equipment Machines; free weights; … Components in smartfit are Exercises. An Exercise is an activity in Fitness Level Untrained, somewhat trained, trained, fitness training designed to train a FitnessTarget, such as the upper very trained body. A configuration, i.e. a training plan consists of a set of Exer- Working Position Sitting, standing, overhead cises selected to achieve the TraineeGoals and its subordinate Training duration 1,2,3… (given in exercises per ses- Goals. Most Exercises require preconditions to be satisfied, for ex- sion) ample Equipment, for example dumb bells, and a minimum Fit- Training Focus Body regions: upper body, core, legs nessLevel, e.g. trained, to be performed. Also, Exercises are asso- ciated with one fitness category, such as strength. This is because

34 The trainee (user) is represented by a set of properties, including or exercise types. E.g. a high BMI prohibits body weight exercises. age, working position, body mass index (BMI) and his fitness The number of constraints necessary to encode this n*k were n is level. The parameters and their domains are given in Table 2. Each the number of applicable exercises and k is the number of relevant fitness category and each body region is associated with a fitness parameters. For the current knowledge base, this means that there level, e.g. fitnesslevel(strength.upperbody). Initially, the trainee are about 1.500 constraints. The utility for fitness training is given states either a single value after a self-assessment or performs a by Definition 6, where the cost of an exercise is given by its dura- series of fitness tests, which in the application determine the fitness tion. level of each category. Later, with feedback on the performed train- ing, the specific fitness levels are refined, so that the training plans become continuously more individual. For the training, the Trainee 5 Evaluation and Case Study can state a set of TrainingProperties, specifying the parameters of the training, these include currently the available equipment, the To debug and assess the quality of the knowledge base and the in- training duration (given in exercises) and the training focus. The fluence of goals and parameters, we performed a set of tests with focus allows to user to increase or decrease the priority of the active both hand-made and automatically generated instances of tasks. TrainingGoals and RegionGoals. Besides identifying obvious bugs in the knowledge base (such as missing components in choices, improper values of contributions 4.2 Constraints or priorities), the goal was to assess the adequacy of the generated training plans. GoalComponentConstraints We emphasize that evaluation cannot mean checking whether As discussed earlier, exercises are linked to the lowest level of smartfit generates the correct solution. There exists no single cor- goals, via component choices. A choice comprises all exercises rect or best training plan for a task. Different human trainers will that contribute to the respective goal and combines the actual con- inevitably come up with different proposals. Therefore, evaluation tributions of the active exercises during the configuration process. means that experts must analyze and argue in detail whether a gen- A component choice is achieved if the combined contributions of erated solution violates accepted principles, e.g. because it includes all active exercises exceeds the AchievementThreshold. The an inappropriate exercise or a prohibitive ordering (rather than threshold depends on the priority of the goal requiring the choice. comparing it to their own favorite plan) Figure 3 shows the relation between Goal, Component and Choice. This evaluation provides the feedback needed to tune parameters The domain of the contributions is currently given by used in the knowledge base and to identify missing factors and con- DOM(compi.contributioni) ={20,40,60,80,100}. The utility of a TrainingPlan in smartfit depends on the contributions of the active straints that influence a good training plan. The content of the do- exercises to required Choices. main knowledge, especially regarding the breaking down of goals, The AchievementThreshold of the Choices depends on the priority their interrelations with exercises and their quantification, is just a of the associated goal (24) with DOM(Priority) = {1,2,3,4}. formal model and nothing that can be simply extracted from a text- AchievementLevel = combine(Goali.Priority, normThreshold) book or guideline or would be told by a trainer. Hence, a major task The combine function for smartfit is the sum of all contributions to now is to adjust contribution values, computation of priorities to the choice. For each lowest level TargetGoal a choice is created. better approximate what is judged to be a good training plan by the There are up to 85 TargetGoals for each TraineeGoal (42 muscle experts and, beyond this, to identify limitations of the chosen rep- goals for strength, 42 muscle goals for flexibility, and 3 region resentation of the domain knowledge and the inferences used. goals for endurance). GECKO can exploit different search algorithms and constraint TaskParameterGoalconstraints solvers to generate solutions. In this case study, we used haifacsp There are TaskParameters, e.g. the working position, that are asso- [14]. ciated with Goals and their priorities. In the current knowledge- One focus of the evaluation was assessing whether the generated base, some TaskParameters can limit the priority of certain goals solutions properly reflected TaskParameters (Training- and Train- or exclude specific goals, i.e. prohibiting their achievement. The latter is applied for injuries or health problems, such as back pain eeProperties) and TraineeGoals, i.e. whether they would dedicate or a broken leg. For example, for a high BMI, the priority of a reasonable amount of accumulated contributions to the various strength goals is reduced, to avoid unhealthy stress on joints or the goals and sub-goals. In the following, we present the most im- back, and increased for endurance goals to assist weight loss. portant results and some examples. So far, there are about 20 of such TaskParameterGoalConstraints in the knowledgebase. As soon as various health issues are consid- ered, we expect this number to rise into the lower hundreds. 5.1 Assessment of parameters and goal achievement TaskParameterComponentConstraints Figure 7 shows the impact of the TraineeGoals on the accumulated Trainee and training properties have a strong impact on exercises contributions of the configurations on the fitness categories. What to be selected, mainly by excluding large sets of exercises from the Figure 7 clearly shows is that the TraineeGoals significantly influ- component catalog. The most important examples here for are fit- ence the training plans. Strength-oriented goals, such as muscle ness level and equipment. Most exercises in the catalog do not re- gain or definition, have a significantly larger contribution in quire equipment and only a low fitness level. Thus, for a beginner strength than in endurance, while more balanced goals, like general with basic equipment, the majority (more than 60%) of the catalog fitness show a more even distribution. Finally, weight loss and car- is available. On the opposite side, for advanced exercises and spe- dio contain far more endurance training than strength training. To cial equipment, only a small subset (<20%) is available. Other ex- exemplary illustrate the impact of TraineeProperties, we picked the amples are that certain TraineeProperties prohibit certain exercises parameter working position.

35 • Training plans often contained exercises, which, seen in isolation, were correct, but in combination were too ex- haustive (see Table 5: UC1) • Training plans contained multiple versions of the same exercise, e.g. pushup and pushup with narrow arms, which is technically correct, but usually considered as faulty and inefficient by sport scientists • Training plans with goal cardio are incorrect for longer training, because of the time scale (see Table 5: UC2)

6 Discussion and Future Work Figure 7: Comparison of Training Goal Contributions GECKO has proven to be an appropriate foundation for gener- ating training plan. Most of the evaluated test cases were consid- ered as correct and fulfilled their purpose, but the assessment also showed some deficits.

Table 3: Case Studies

Task Variable UC1 UC2 TaskGoal TraineeGoal General Fit- Cardio ness Figure 8: Working position impact TaskRe- Duration 8 exercises 12 exer- striction cises The purpose of this parameter is to support body regions that are TaskParame- Age 35 69 especially stressed in a particular working position, e.g. sitting at a desk. The results are given in Figure 8. Figure 8 shows that the ter BMI normal over- parameter working position changes the distribution of the exercise weight contributions according to the focused region. For example, for FitnessLevel Little Trained Trained working overhead, as in construction, the focus is on the upper body, while for standing, the legs are emphasized. Available- No Equipment No Equip- Apart from technical and computational issues, it is crucial to de- Equipment ment velop a solution that users accept and that adheres to standards and Working po- sitting sitting practice from sport and training sciences. To evaluate the correct- sition ness and practicality of our solution, we created a set of test cases and used them to generate training plans. A series of 21 test cases Table 4: Generated Training Plans was assessed by a sport scientist. Focus of the initial assessment were the usability of the training plans, with the criteria: Fitness Cate- Exercises UC1 Exercises UC2 • Technical Correctness: are the plans correct gory • Intuitiveness: are the plans understandable for trainers Strength Bridge_one_leg Bridge_one_leg • Usefulness: do the plans achieve the trainee goal Sumo-squat Sumo-squat • Intensity: are the training plans appropriate for the TRX_Rollout_side TRX_Rollout_side trainee’s fitness level Endurance Plank with leg lifting Push up – positive The majority (81%) of the training plans were correct and achieved their goals. But roughly half the training plans (54%) were too in- Back lifting tensive for their respective trainees and a majority (63%) appeared Bridge unintuitive to the expert. Section 5.2 offers a details look at the Push up -single problems and potential solutions. armed rowing 5.2 Case studies Running Bridge with thera- band To illustrate the assessment of the training plans, we present two Burpees Side lifting with case studies and the conducted expert evaluation. Table 3 shows dumb bells these two cases. The results of the two exemplary case studies are shown in Table 4. The use cases were chosen for detailed discus- Burpees sion, because one (UC1) fulfills its purpose and suffers only from Flexibility Stretching latissiumus Stretching lower minor issues, while the other (UC2) fails to fulfill its goal. Stretching core back / gluteus The most common faults or anomalies (ignoring bugs in the knowledge base) the expert found were:

36 Table 5: Expert evaluation of training plans knowledge base that incorporates all available state-of-the-art knowledge from training sciences. Use Aspects (In-)appro- Cause Poten- A related potential application, with a modified knowledge base, Case (plan/exer- priate due to tial so- would be planning of physiotherapy, following the current de- cise/combina- lution mand for highly personalized medical treatment. While in smartfit tion of exer- injuries and diseases have a restrictive impact on choosing exer- cise) cises, in this context curing them would define goals that are satis- UC1 Training Plan OK - Session fied by exercises and treatment. achieves all fitness targets TRX_Rollout_s Requires exer- Incorrect Correc- Acknowledgements ide cise entry tion of Con- We would like to thank our project partners for providing their do- figKB main knowledge and their assistance, esp. Florian Eibl from eGym Endurance ex- The combina- Intensity GmbH. Special thanks to Oskar Dressler (OCC’M Software) for ercises tion of of com- proving the constraint system (CS3) The project was funded by the burpees, plank binations German Federal Ministry of Economics and Technology under the and running is not con- ZIM program (KF2080209DB3). too exhaustive sidered References UC2 Training Plan The training Exercise [1] Kleer, J. de, Williams, B.C.: Diagnosing multiple faults. Arti- plan does not based ficial Intelligence, 32(1), pp. 97-130, (1987). fulfill the duration [2] Williams, B.C., Ragno, R.J.: Conflict-directed A* and its role trainee goal. inappro- Too many ex- priate for in model-based embedded systems. Discrete Applied Mathemat- ercises with cardio ics, 155(12), pp. 1562-1595, (2007). endurance fo- [3] Junker, U.: Configuration. In: Rossi. F., Beek, P., Walsh, T. cus, but all too (eds.): Handbook of constraint programming. 1st ed. Amsterdam, short Boston: Elsevier, (2006). Push up posi- Two variants Both ex- Group- [4] Friedrich, G.; Stumptner, M.: Consistency-Based Configura- tive/single arm of the same ercises ing vari- tion. In: Faltings, B., Freuder, E.C., Friedrich, G., Felfernig, A. Bridge/bridge exercise in the have a ants in a (eds.): Configuration. Papers from the AAAI Workshop. Menlo with theraband same plan high util- hierar- Park, California: AAAI Press (99-05), (1999). ity chy [5] Reiter, R.: A theory of diagnosis from first principles. In Arti- Bridge Not an endur- Incorrect s.a. ficial Intelligence 32 (1), pp. 57–95, (1987). ance exercise entry [6] Kleer, J.de: An assumption based TMS. In: Artificial Intelli- gence 28 (2), pp. 127–162, (1986). Currently GECKO does not offer a general mechanism for gen- [7] Struss, P., 2008. Model-based Problem Solving. In: F. von Har- erating more than one instance of each component type, which is melen, V. Lifschitz, and B. Porter, eds. Handbook of knowledge not a relevant restriction for a training plan, which should usually representation. Amsterdam: Elsevier, pp. 395-465. (Struss, 2004) avoid repeating an exercise in the same session. Struss, P. In other applications, there may be stronger constraints on the [8] Kleer J. De, Williams BC, ”Diagnosis with Behavioral Model”, structure of configurations that have to be reflected during the so- IJCAI, 1993. lution generation rather than being applied a posteriori. [9] F. Grigoleit and P. Struss. Configuration as diagnosis: Generat- Prioritization of goals is the basis for another extension, which ing configura tions with conflict-directed a* - an application to may even be relevant to smartfit: the configuration process could training plan generation. In DX@ Safeprocess, International be iteratively related to goals with decreasing priority, thus guar- Workshop on Principles of Diagnosis, pages 91–98. DX, 2015. anteeing that the most important goals are satisfied, even though the overall cost may not allow lower-priority goals to be fully ac- [11] MiniZinc is a free and open-source constraint modeling lan- complished. This also helps to break down the complexity of the guage. https://www.minizinc.org/ task. [12] SUNNY-CP: https://github.com/CP-Unibo/sunny-cp The current version of smartfit has an important limitation in [13] Dressler, O. and Struss, P.: Model-based Diagnosis with the being confined to the selection of appropriate exercises, but not Default-based Diagnostic Engine: Effective Control Strategies that fixing how the exercise has to be executed. For instance, weight- Work in Practice. In: 11th European Conference on Artificial In- based exercises can be performed with low weight and many rep- telligence, ECAI-94, 1994. etitions or vice versa and have different impacts: strength, endur- [15] HaifaCSP https://strichman.net.technion.ac.il/haifacsp/ ance or muscle gain. Integrating the assignment of such training Technion, Haifa methods to exercises is the most important extension of smartfit. [16] Felfernig A., Hotz L., Baglay C., and Tiihonen.L.: In its current version, smartfit is designed to deliver well-de- Knowledge-based Configuration from Research to Business Cases. signed training plans to trainees without detailed domain Amsterdam: Morgan Kaufmann, 2014 knowledge. In perspective, we want to extend smartfit to become a tool that can even support fitness coaches to create highly indi- vidual and complex training plans efficiently, in exploiting a

37 ﻳﺎﻫﻮ Prioritizing Products for Profitable Investments on Product Configuration Systems

Sara Shafiee1 and Lars Hvam and Poorang Piroozfar

Abstract.1Product configuration systems are among the most To be able to gain the benefits of configurators, great effort and popular expert systems for automating sales and manufacturing investment must be accepted [3]. There are several research which processes. Therefore, there are numerous studies on the qualitative discuss about the high investments on configuration projects [8], benefits and quantitative profitability of configurators considering [10]. This research uses a case study to provide some guidelines on the required investments. This paper uses real case company data how to prioritize and decide about the investment on configuration to demonstrate the most cost efficient and viable products for investment in configurators by calculating the profitability of the projects. Although the literature provides a variety of methods to product types. ABC analysis (A, B and C categorization) is support the decision about the investments on configuration conducted to calculate the net profit and gross margins to be able to systems, there are enough guidelines to determine the most classify the products based on the available 3-years data. We profitable projects and receive the highest benefits from categorize the products into A-, B- and C-products based on ABC configurators’ development. Hence, the companies need to decide analysis and Pareto principle to calculate both the net profits and about the types of products to be prioritized for Configurators’ sale quantity of different product types. The demonstrated case developments. study reveals that the analysis of the products based on ABC The aim of this paper is to evaluate the investment on analysis of the quantity of sales and net profits will be a suitable configuration systems and predict their profitability using the data solution to prioritize and predict the most financially viable from the product portfolio at the case company. More specifically, investments for the future configuration projects. the objective of the paper is to do the ABC analysis in order to categorize different groups of the products based on the net profit 1 INTRODUCTION and sale quantity to be able to prioritize them. This prioritization will guarantee the profitability of the configuration project and the Configuration systems are the expert systems developed through correct decision to invest on configuration systems’ development. incorporating information about product features, product The paper will investigate the following question: structure, production processes, costs and prices [1]. Configuration RQ. How can industrial companies increase the benefits systems support decision-making processes in the engineering and through a profitable investment on the configuration systems by sales phases of a product, which can determine the most important prioritizing the products? decisions regarding product features and cost [2], [3]. In this paper, we chose a case study with highly engineered Configuration systems can bring substantial benefits to companies products and evaluate one of the whole product family to such as, shorter lead time for generating quotations, fewer errors, determine the most profitable products. Through the ABC analysis, increased ability to meet customers’ requirements regarding company can decide to invest on configuration systems for the product functionality, use of fewer resources, optimized product most profitable product types. designs, less routine work and improved on-time delivery [1], [4]– Firstly, we carried out the ABC analysis on a specific product [6]. portfolio from 2011 to March 2013. Secondly, we classified the Although advantages of configuration systems are evident, there products as A-, B- or C- by calculating their sales quantity and net are still some difficulties associated with required high investment profits. In this research, we query the real data from the selected [1], [7] and the chances of failure [8] in their implementation case company to compare different products and suggests the phase. Hence, researchers attempt to provide the empirical data company to invest on configuration systems based on this analysis. from case companies to illustrate the potential expectations and risks associated with configuration projects [3]. Besides, increasing 2 LITERATURE STUDY complexity is considered a major cause for rising costs and deterioration of operational performance, leading, in particular, to In this section, the relevant literatures for analyzing the complexity decreased quality, long delivery times, delayed deliveries, and low of the products and process in enterprises are reviewed which will process flexibility [9]. Therefore, companies need to control the then be utilized to support the choice of ABC analysis. Then, ABC levels of complexity and how reductions in this regard can analysis is introduced. The ABC analysis will then be used to positively affect their competitiveness in the market. determine the most suitable investment for the configuration projects in the future.

1 Mechanical Engineering Department, Technical University of Denmark, Denmark, email: [email protected]

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

38 ﻳﺎﻫﻮ

2.1 Product and process complexity B-items are the interclass items, with a medium dollar value. Around 15-25% of the total annual revenue typically comes from Product architecture is widely recognized as the main factor of 20-30% of the items. product complexity [11], and product architecture management C-items are, on the other hand, items with the lowest dollar enables the efficient design of new products that are targeted at value. Around 5% of the total annual revenue typically comes from individual market requirements [12]. Besides, product architecture 50-60% of total items. would help control the structure of the product and the number of The ABC-analysis therefore gives the company the possibility product variants, both of which affect the performance of sales, to focus their energy on a few critical items. However, a similar engineering, the production/supply chain, distribution, and after- analysis can be undertaken for the customers of the product range sales service [13]. to determine which ones are the most profitable. Therefore, for One of the main reasons for increasing product complexity is each customer (or group of customers), the contribution margin the vast product variety to be offered to the customer [14]. and the revenues are plotted in a diagram in the same way as However, researchers offered various approaches and techniques to described for products [18]. both recognize and solve the complexity challenges in the product range [15]. Blecker et al. [16] described how to apply mass customization to eliminate the process complexity caused by the 3 RESEARCH METHODOLOGY increasing variation in the product architecture, inventory, and order-taking process. On the one hand, applying a pure The relevant literature was reviewed to clarify the present study’s customization strategy would results in increasing product variety position in relation to existing research. In this respect, the and the customer-order decoupling point moves towards the front literature related to product and process complexity has been end [16]. studied. Moreover, the literature demonstrates the solution to identifying the product/process complexities and ABC analysis which will determine the most profitable product types. The 2.2 ABC Analysis of the product range complexity identified by calculating the net profit and gross margin. The analysis of the product range is another fundamental step In this article, we use single case study to evaluate the towards developing a configuration system [2]. It should help to propositions in one ETO (Engineer To Order) company. The single provide an overview of the company's product range and describe case study can be described as having a holistic, representative the necessary product knowledge to be incorporated into the design with a single unit of analysis (the case company) [19]. The configuration system. One approach is to start the modularization case is representative because the company is typical of many and standardization project before starting a configuration project, major manufacturers that have had problems managing product and so that basically a ‘clean up’ is performed in the product program process complexity. As this type of case study methodology and the associated IT systems [17]. Another approach, for instance pertains to a single case, it is possible to generate only an analytical in sales configuration system, is to consider which variants are to generalization, as opposed to a statistical one [19]. We analyzed be offered to the customers [2]. After this, it is ‘market the results from product portfolio during the 3 years at the case mechanisms’ that decide which variants of the products are needed. company. Case-based research seeks to find logical connections In order to clarify which variants should be offered to the among observed events, relying on knowledge of how systems, customers, a project team should clarify some important facts organizations, and individuals work [20], [21]. about the company's product line, such as the product range The entire project was followed by three researchers. The readiness to be dealt with in a configuration system, the most initiative of the research was the decision of the case company to profitable products, variants to be offered to the customers, etc. [7], invest on configuration systems and their challenges regarding the [10]. One way to create an overview of the product range, as well product prioritizations. Hence, the research idea was to explain the as defining what should be entered into the configuration system, is product portfolio complexities and profitability. However, the main to set up an ABC-analysis. The purpose of applying this type of goal was to illustrate the most profitable products and help improve analysis is to identify (and, later, possibly eliminate) product ROI (return on investment) for successful implementation of variants that contribute only minimally to revenue but add configuration systems. significantly to the complexity. The ABC-analysis is a categorization method for dividing items into three categories; A, B and C. A-items are the most valuable economically, while C- 4 THE CASE STUDY items are the least valuable [12]. This method aims at drawing attention to the critical few A-items and away from the many trivial The company is an international business Engineer-To-Order C-items. enterprise which provides specialized solutions within the field of The ABC-analysis is based on the Pareto principle, which states marine tank management for marine and offshore industries. that 80% of the overall revenue comes from only 20% of the items. Within some of the areas of valve remote control, ballast and In other words, demand and profit is not evenly distributed service tank gauging, as well as cargo monitoring, the company between items: top sellers vastly outperform the rest. The ABC strives to open up new possibilities for more uptime, higher approach states that, when reviewing the product range, a company productivity and safer, more reliable conditions for all types of should rate items from A to C, based on the following rules [17]: ships and offshore units. This project will focus exclusively on the A-items are goods, where the economic value is the highest. products ranges in the valve remote control systems at the case The top 70-80% of the total annual revenue of the company company and their after sales department. The reasons for selecting typically comes from only 10-20% of the items. the case company are: (1) it has highly engineered and complex products, (2) there is an urgent need for developing configuration

39 ﻳﺎﻫﻮ systems and elimination of time and resources for sales and after process of selecting those products, which should be entered into sales processes; (3) the company has a huge range of product types the configuration system. with different net profits and sales quantity; (4) it offers a unique level of access to project data. The whole product range in the remote valve control department has been investigated and all the relevant data related to the net profits, gross margins and quantity of the sales has been extracted and analyzed. If the case company uses configuration systems instead of the ongoing situation, they could save up to 1.162.505 DKK per year by using a web-based configuration system. In order to invest on the configuration systems, the first step is to categorize and determine the business cases by reviewing the product ranges and determine the most profitable products (among all product types) in valve remote control system to invest in. One approach is to start a modularization and standardization project before starting a configuration project, so that basically a ‘clean up’ is performed in the product program and the associated IT systems. Another approach is, for example, in sales configuration system, to Figure 1. ABC classification of the product ranges based on Gross consider which variants are to be offered to the customers more margin and net profit often with higher profitability. After this, it is ‘market mechanisms’ that decide which variants of the products are needed or which Figure 2 illustrates the relationship between the net profit of the ones are the customers’ most popular and company’s most categories and the amount of products in the categories. The figure profitable products. shows that 9% of the products return 80% of the total net profit, In order to clarify which variants should be offered to the while 74% of the products only return 5% of the total net profit. customers, a project team should clarify some important facts Figure 2 also confirms the theory of the 80/20 rule in the Pareto about the company's product line, such as: is the product range principle (see Section 2.2), and illustrates that a small part of the ready to be dealt with in a configuration system? Which products case company’s products return the vast majority of the earnings. are the most profitable? Which variants are to be offered to the Hence, the case company should be especially attentive to their customers? etc. To make this clear, it is necessary to carry out a class A-products. In terms of selecting products for the process in the company, where all the different stakeholders (sales configuration system, we suggest that inserting the A-products into staff, product developer, production staff, purchasers etc.) come the configuration system should have first priority. together to form a team, to create an overview of the overall product range and determine which variants can be offered via configuration system.

5 RESULTS In order to find out which products are the most profitable within the case company, an ABC classification was made based on how many percentages of the total net profit the different products return. The idea of an ABC analysis is to categorize the products into three different categories; A-, B- and C- products. This is done in order to estimate the importance of the products sold at the after sales department. A-products are the most important, while C- products are the least important. In accordance with the Pareto principle, this analysis has categorized the products that return 80% of the total net profit as Figure 2 Relationship between the net profit of the categories and the amount of products in the categories A-products, while the products that return 15% are B-products and the products that return the remaining 5% are C-products. Figure 1 Figure 2 also shows that B-products (17%) return 15% of the demonstrates an ABC classification of the 4345 types of products total net profit. If or when the configuration system should be that were sold in the period from the beginning of 2011 to March extended beyond inserting A-products. Finally, 74% of the 2013 at the case company. products return only 5% of the total net profit. However, this The ABC analysis shows that only 389 (9%) of the 4345 means that the order-sales process takes up an excessive amount of products are A-products, 744 (17%) products are B products, while time and resources on handling the sales of small and unprofitable a staggering 3212 (74%) products are C-products. This products. classification provides a general overview of the products, which Table 1 shows a selection of product types that were classified are the big sellers that should be kept under very tight control, but as A-, B- or C-products. The products belonged A-, B- or C- also of the products that are not so profitable, and which may take categories are grouped in different types based on the highest to up too much inventory space thereby tying up too much capital lowest net profits. The reason for using “Type” is to avoid the investment. The classification of products can be helpful in the products names due to the confidentiality. This means, that for

40 ﻳﺎﻫﻮ instance the type 1 cell under “A-products” shows the total sales calculation is done to determine the quantity of the sale. The numbers of all the variants of type 1 that were classified as A- categorization of the products is carried out and tabulated for cross products; while the type 1 cell under “B-products” shows the total examination. sales numbers of all the variants of type 1 that were classified as B- In addition to the ABC analysis, the inventory turnover was products. When selecting product types to appear in Table 1, the A- investigated in order to see if there were any items lying still and and B-products were selected by the highest net profit, while the C- thereby tying up too much capital. Furthermore, it was investigated products were selected based on the highest quantity. The reason whether or not the after sales department is creating orders, which was that C-products doesn’t have significant net profit while the are not returning any profit for the company, for example if the company might sell them in high quantity. resources spent on handling the order exceed the profit of the Table 1 illustrates, that it is advantageous to insert the Type 1 order. It was found that 2% (122) of the orders were not returning variants from Class A into the configuration system, since they any profit. This was not investigated further, since the number was alone return almost 21% of the total net profit. If or when B- not considered critical. However, it should be mentioned that a products should be inserted into the configuration system, then it configuration system would have eliminated unprofitable orders would be advantageous to first insert the product variants that are altogether. the most profitable. The table also shows, that in C-product, for The analysis led to the conclusion that the investment on example type 1 and many other small products are sold in big configuration systems can be done based on the product quantities, but are not contributing much in the total net profit. In prioritization. The first reason is that they are the most profitable order to save time and resources on selling these unprofitable products at the company and the benefits are remarkable. The products individually, case company should stick to selling them second reason will be due to the high quantity of sales which only in package solutions (set of seals, common parts etc.). In a means the amount of time and resources to produce and sell these configuration system rules could be made in order to make sure, product types are significant. Hence, developing a configuration that these small products can only be sold in packages, which could system for these product types will save a considerable amount of be helpful for the salespersons because configuration system would man-hours and a striking market benefit. automatically reduce time and resources in the order-sales process. This research in the first step is using the ABC analysis method However, it is obvious that configuration system can save to prioritize the product types. Secondly, we did some additional significant amount of time for the products with high quantity in analysis to categorize the product for a profitable investment in case the case company desires to continue with the same scenario. configuration projects. This study considers only one case company and one case product and assumed as an exploratory Table 1. Selection of product types that were classified as A-, B- or C- research. Therefore, it requires further research and additional products cases to use ABC or other methods to prioritize the products to develop configuration systems. Also, the verification of the results is appreciated which requires a longitudinal study after years of configurators’ implementation at the company and in a comparative case study. A-products Net profit QTY B-products Net profit QTY C-products Net profit QTY

Type 42.219.776 4.711 Type 3.345.487 354 Type 42.910 27.609 1 1 1 REFERENCES Type 15.722.870 6.215 Type 2.135.903 924 Type 600.151 6.470 2 2 2 [1] C. Forza and F. Salvador, Product information management for mass Type 5.859.767 9.236 Type 1.442.432 5.444 Type 39.787 5.914 3 3 3 customization: connecting customer, front-office and back-office for Type 5.194.738 3.218 Type 1.297.569 87 Type 105.109 3.180 fast and efficient customization. New York: Palgrave Macmillan, 4 4 4 2007. Type 4.217.715 260 Type 1.004.666 231 Type 242.776 3.064 5 5 5 [2] L. Hvam, N. H. Mortensen, and J. Riis, Product customization. Type 3.858.531 23 Type 886.308 1.021 Type 35.222 2.510 Springer Science & Business Media, 2008. 6 6 6 [3] S. Shafiee, L. Hvam, and M. Bonev, “Scoping a product Type 3.584.171 1.028 Type 776.489 978 Type 85.984 1.876 7 7 7 configuration project for engineer-to-order companies,” International Type 3.427.903 2.075 Type 686.272 188 Type 30.194 1.339 Journal of Industrial Engineering and Management, vol. 5, no. 4, pp. 8 8 8 207–220, 2014. Type 3.356.950 692 Type 601.362 311 Type 27.881 1.202 [4] A. Felfernig, L. Hotz, C. Bagley, and J. Tiihonen, Knowledge-Based 9 9 9 Type 3.153.570 1.140 Type 589.302 3.509 Type 54.186 1.180 Configuration From Research to Business Cases. Newnes: Morgan 10 10 10 Kaufman, 2014. [5] C. Forza and F. Salvador, “Managing for variety in the order acquisition and fulfilment process: The contribution of product 6 CONCLUSION configuration systems,” International Journal of Production Economics, vol. 76, no. 1, pp. 87–98, Mar. 2002. The aim of this study was to prioritize the products in one product [6] S. Shafiee, “Conceptual Modelling for Product Configuration portfolio in order to have the most profitable investment on product Systems,” Technical University of Denmark, 2017. configuration systems. The empirical data is gathered from an ETO [7] A. Haug, S. Shafiee, and L. Hvam, “The costs and benefits of product company based on the analysis of 3 years’ worth of data. In detail, configuration projects in engineer-to-order companies,” Computers the gross margin and net profits calculations verifies the Pareto in Industry, vol. 105, pp. 133–142, 2019. principles (which states that 80% of the overall revenue comes [8] A. Haug, S. Shafiee, and L. Hvam, “The causes of product from only 20% of the items). For this specific example, 80% of the configuration project failure,” Computers in Industry, 2019. net profits is coming from 9% of the products. Then, more [9] J. L. Mariotti, The Complexity Crisis: Why too many products, markets, and customers are crippling your company--and what to do

41 ﻳﺎﻫﻮ

about it. Simon and Schuster, 2007. [10] S. Shafiee, A. Felfernig, L. Hvam, P. Piroozfar, and C. Forza, “Cost benefit analysis in product configuration systems,” CEUR Workshop Proceedings, vol. 2220, pp. 37–40, 2018. [11] H. ElMaraghy et al., “Product variety management,” CIRP Annals - Manufacturing Technology, vol. 62, no. 2, pp. 629–652, Jan. 2013. [12] L. Hvam, C. Hansen, C. Forza, N. H. Mortensen, and A. Haug, “The reduction of product and process complexity based on the quantification of product complexity costs,” International Journal of Production Research, vol. 0, no. 0, pp. 1–17, 2019. [13] J. Olivares Aguila and W. ElMaraghy, “Structural complexity and robustness of supply chain networks based on product architecture,” International Journal of Production Research, vol. 56, no. 20, pp. 6701–6718, 2018. [14] S. Shafiee, K. Kristjansdottir, and L. Hvam, “Business cases for product configuration systems,” in 7th international conference on mass customization and personalization in Central Europe, 2016. [15] U. Lindemann, M. Maurer, and T. Braun, Structural Complexity Management: An Approach for the Field of Product Design. Berlin, Heidelberg: Springer, 2009. [16] T. Blecker, N. Abdelkafi, G. Kreutler, and G. Friedrich, “Product configuration systems: state of the art, conceptualization and extensions,” in Proceedings of the Eight Maghrebian Conference on Software Engineering (MCSEAI 2004), 2004, pp. 25–36. [17] R. S. Russell and B. W. Taylor-Iii, Operations management along the supply chain. John Wiley & Sons, 2008. [18] M. L. George and S. A. Wilson, Conquering Complexity in Your Business: How Wal-Mart, Toyota, and Other Top Companies Are Breaking Through the Ceiling on Profits and Growth: How Wal- Mart, Toyota, and Other Top Companies Are Breaking Through the Ceiling on Profits and Growth. McGraw Hill Professional, 2004. [19] R. K. Yin, Case study research: Design and methods (applied social research methods). Thousand Oaks, CA: London and Singapore: Sage, 2009. [20] D. M. McCutcheon and J. R. Meredith, “Conducting case study research in operations management,” Journal of Operations Management, vol. 11, no. 3, pp. 239–256, 1993. [21] B. B. B. Kaplan and D. Duchon, “Combining Qualitative and Quantitative Methods in Information Systems Research: A Case Study.,” MIS Quarterly, vol. 12, no. 4, pp. 571–586, 1988.

42 A Search Engine Optimization Recommender System

Christian D. Hoyos 1 and Juan C. Duque1 and Andres´ F. Barco2 and Elise´ Vareilles3

Abstract. Search Engine Optimization reefers to the process of im- the modules of the system are described in Section 3. An experi- proving the position of a given website in a web search engine results. mental test and its results are shown in Section 4. Conclusions are This is typically done by adding a set of parameters and metadata to presented in Section 5. the hypertext files of the website. As nowadays the majority of the web-content creators are non-experts, automation of the search en- gine optimization process becomes a necessity. On this regard, this 2 Overview paper presents a recommender system to improve search engine op- To provide recommendations for indexation of a web page, aspects timization based on the site’s content and creator’s preferences. It such as content topic, keywords, intention of the (authors’) web page, exploits text analysis for labels and tags, artificial intelligence for metadata, related web pages and the specific raking system of the deducing content intention and topics, and case-based reasoning for search engine, should be taken into account. These aspects allow the generating recommendations of parameters and metadata. Recom- expert system to understand the website communication goals and mendations are given in natural language using a predefined set of to create recommendations that respect the search engine implemen- sentences. tation. The expert system proposed here tries to unveil the previous aspects using three modules in charge of analysis and one module in 1 Introduction charge of recommendation generation (see Figure 1). The systems receives three inputs, two of which are optional. The Normally, web content creators require their websites to be easily first input of the system is either an HTML source file or an hy- found by content consumers through search engines [6]. They do so perlink (URL to an HTML). If the HTML contains scripts or CSS by setting parameters and adding metadata to the hypertext source definitions, they are ignore are they not provide useful information files of the websites. These parameters and metadata allow the algo- for the indexation. Hyperlinks should be accessible from the web. rithms of the search engines to index and retrieve data of millions of The second input is the topic of the web page, which is an optional websites in an efficient way [7]. For instance, parameters about the value. The last input value is the intention of the web page and it is intention of the website allow to classify content and metadata stating as well optional. It is worth noticing that having explicitly defined the location is useful to customize content or restrict access. Further, topic and intention will help the system’s accuracy and performance this information makes possible for the search engine to rank the re- (no topic and intention processing). Having the inputs, the system sults of a query by priority. As reported by Chitika [5], configuring executes the following steps and throws as output a web page score websites for correct indexing is a key element of their success. This and its recommendations. configuration of values is called Search Engine Optimization (SEO). First, the web page is analyzed using text analysis over the HTML Now, although every website is implemented following a stan- source code. The analysis throws an score depending on the presence dard, namely HTML, there is no standard for web page ranking as or absence of 22 of the more important factors for indexation accord- each search engine (Google, Yahoo, Bing, etc) implements its own ing to Google [2, 5]. These factors add positive values to the score ranking system. This implies that improving the indexing position of when present and negative values when not. This is the first source a website requires an expert on both the content as well as on the of knowledge to build a recommendation of a web page. search engine ranking system. Once the text analysis is done, a topic and intention analysis is On this regard, this paper proposes an expert recommendation sys- performed using the IBM Watson system (a state-of-the-art artificial tem in charge of performing SEO for a given web page4 targeting the intelligent API) [9]. The topic and intention are useful in two ways. Google search engine. It uses artificial intelligence to deduce the in- At the one hand, they allows to classify the content of the web page. tention and content topic of the web page, it uses text analysis over And, on the other hand, they are basis a case-based reasoning recom- labels and tags in order to classify and comparison, and it uses case- mendation executed in the last step. based reasoning to provide recommendations for improving SEO on Next, using the obtained topic and intention as keywords, the sys- the web page. tem performs a search query in the Google search engine and re- The documents is structured as follows. The overall behavior of trieves the first 10 pages from the result. It then proceeds by analyz- the system, and its architecture, are presented in Section 2. Each of ing each web page in the aim of extracting key values, such as key- 1 Universidad de San Buenaventura Cali. Santiago de Cali, Colombia. email: words and metadata, that made those pages the 10 first ranked pages {christian, juan.duque}@gmail.com of Google. This is an implementation of case-based reasoning [8] 2 Universidad Santiago de Cali. Santiago de Cali, Colombia. email: anfel- and are the second source of knowledge to build a recommendation [email protected] 3 Universite´ de Toulouse, Mines Albi. Albi, France. email: of a web page. [email protected] Finally, the system builds a recommendation using HTML code and 4 This means it analyzes each web page individually. natural language [4] using predefined sentences. They are based on

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

43 labels as . These aspects include keywords definition, char-set codification, description of web page, copyright, content duplication and broken links, among others. Each factor has associated a positive value if included in the source file and a nega- tive value if not. The Table 1 present some of the key aspects and its respective values.

Label Description Benefit Penalty F1 User of keywords in tag title. 13,5 -16,8 F2 Connection among keywords (in- 13,5 -16,8 terrelated) F3 Low density on keywords (not too 10,5 -16,8 many) F4 Description in tag meta with a 10,5 -16,8 maximum of 200 words. F5 Excessive use of meta and alt 10,5 -16,8 tags. F6 Definition codification in tag 13,5 -12,6 char-set. F7 Avoid the use of tag refresh 7,5 -16,8 F8 Use of tag alt in and 12 -12,6 F9 No broken URLs in source file 13,5 -10,5 F10 Use of tag H (h1, h2, h3) 10,5 -12,6 F11 Exceeding maximum number of 6 -14,7 characters in tag title F12 Use of tag keyword with maxi- 12 -8,4 mum of 200 characters. F13 Percentage (between 5 and 20) of 10,5 -8,4 keywords in text F14 Hyperlinks to pages of the same 13,5 -4,2 website F15 Content strongly connected to the 10,6 -6,3 web page topic and keywords F16 Duplicated content. 10,5 -6,3 F17 Use of strong, bold and 12 -4,2 italic for fonts. F18 Use of cache-control tag. 9 0 F19 Keywords in URL. 6 -6,3 F20 Use of keywords in numbered lists. 7,5 -4,2 F21 Use of tag author. 3 -2,1 F22 Definition of tag copyright 3 -2,1

Table 1. Evaluated factors and scoring.

Note: It is important to know that one of the most important factor in the Google search engine is the value determined by the PageR- ank algorithm [1]. This algorithm takes into account the number and quality of other web pages pointing to the web page in reference. Figure 1. Recommendation System Architecture. Simply put, the more pages on the web point at the referenced page the better. More points are given if the other web page is high ranked. This works as a kind of endorsement. The PageRank is not included in the recommendation system analysis as it is not based in HTML the identified negative evaluated factors (e.g., missing tags) and the tags and metadata. extracted data from the first 10 pages (e.g., new keywords). 3.2 Module 2: Intention and topic deduction 3 System’s Core The intention and topic is deduced from the content, meaning that The recommendation system is divided in four modules. only the text within the labels ... are ana- lyzed. Both intention and topic are deduced using the IBM Watson 3.1 Module 1: HTML analysis system through its public API only if no user input is given. Wat- son is, en essence, an on-line system that exploits several techniques This module focuses in labels and metadata of the web page HTML from artificial intelligence to provide services as speech to text, nat- source files. In particular, it looks for specific information that is re- ural language understanding and query answer system, emotion and lated with the Google ranking system and 22 key aspects in specific sentiment analysis, translator and visual recognition [3, 9].

44 The topic and intention are deduced by Watson using Natural Lan- 4 Test guage Understanding/Classification for the analysis of text. In the case of the topic, classification is done through a set of categories, Two type of tests have been made; tests using public web pages and concepts and keywords. In case of the intention the system classifies tests using an authors’ web page. according to how positive or negative is the web page. Then the anal- ysis assigns one of the following labels to the page: Very Positive, 4.1 Public websites tests Positive, Neutral, Negative and Very Negative. Each of these labels are connected to numerical values thrown by Watson, as presented in In these tests, five topics have been chosen and the following five Table 2. queries have been designed.

Label Min Max 1. Football soccer critic. Very Positive 0.6 1 2. Mediterranean food. Positive 02 0.6 3. Vaccines for cats. Neutral -0.2 0.2 Negative -0.6 -0.2 4. Contamination of the Oceans. Very Negative -1 -0.6 5. Renewable energies.

The first three results of each query have been feed to the system Table 2. Table with this with automatic execution. Table 3 shows the number of recommen- dations of each found page.

Query Index # Reco Score 1 43 191 Football critic 2 18 215 3.3 Module 3: Case-based reasoning 3 63 292 1 32 494 The set of categories, concepts, keywords and the intention are used Mediterranean food 2 14 111 for constructing a search query in the aim of obtaining similar web 3 33 114 1 25 306 pages. The main idea is to extract the parameters used by top ranked Vaccinations for cats 2 16 439 web pages, the first 10 pages in Google’s search engine, that address 3 23 171 the same topic and has the same intention. Potentially, those 10 pages 1 36 222 include data in their HTML files that made them the first ranked by Contamination of the Oceans 2 17 67 the search engine. Arguably, using the same or similar parameters 3 48 50 1 66 600 (as new keywords or tags), will help to improve indexation of other Renewable energies 2 65 223 pages. For instance, adding keywords that were not previously in- 3 10 90 cluded in the web page but that are a common for most of the 10 retrieved pages. Table 3. Test results with five queries.

Note: The system only retrieves the first 10 pages for two reasons. At the one hand, according to the literature, the probability of user access to a web page ranked after 10th position is around 1% [5]. From the results we conclude the following. Thus, the system obtains only those web pages that are likely to have high user access rate. And, at the other hand, the analysis of more • There is no direct relation between the number of recommenda- pages may reduce its efficiency. Consider that each of the 10 pages tions and the score of the web page. Indeed, it depends on what is analyzed using the same techniques. Ergo, the process, plus com- factor is being recommended and its impact on the score. For in- parison, must be executed 11 times, which is time consuming. stance, a web page may have few recommendations on factors, but one of the factor is being repeated within the page ergo reducing the score significantly. 3.4 Module 4: Natural language recommendation • Of the five queries, three of them show tendency of score decreas- ing, which is expected. The first pages of the other two queries do Recommendation are build with structured predefined sentences of not have high score but may be affected by other factor not taken the form: target factor + recommendation over factor + explanation into account, mainly traffic and results of PageRank algorithm. of recommendation + example in HTML. Each recommendation is • The best ranked pages are part of sites like Wikipedia. In fact, two classified into four categories in according with its importance: of the found web pages are from Wikipedia and are the top ranked pages. This is mostly due to the many external and self-reference • Black: Critical recommendation to be applied for basic indexation hyperlinks of the site. in Google search engine. • Red: Not following the recommendation may significantly affect 4.2 Authors’ designed web page the position of the web page in the results. • Yellow: Not following the recommendation may moderately affect For these tests, a web page created by the authors is feed to the sys- the position of the web page in the results. tem in three different round. Recommendations (from rounds one and • Blue: Not following the recommendation may minimally affect two) are implemented before the next round (rounds two and three). the position of the web page in the results. The designed page is a basic HTML file, without styles or scripts,

45 used to show the improvement of a given web page through the sys- [2] Pablo Fernandez,´ ‘Google’s pagerank and beyond: The science of search tem’s recommendations. The title of the web page is the “The fall engine rankings’, The Mathematical Intelligencer, 30(1), 68–69, (Mar of JQuery”, and addresses the descend of developers using JQuery. 2008). 5 [3] D. A. Ferrucci, ‘Introduction to “This is Watson”’, IBM Journal of Re- Figure 2 show the recommendations of the first round (in Spanish ) search and Development, 56(3.4), 1:1–1:15, (May 2012). with different colors for their importance. As an example of the re- [4] Chowdhury G., ‘Natural language processing’, Annual Review of Infor- sult, first line of recommendation states “You should use labels h1, mation Science and Technology, 37(1), 51–89, (2003). h2, h3...h6 more often, as they help defining the importance of conent [5] Chitika Insights. The value of google result positioning. http: //info.chitika.com/uploads/4/9/2/1/49215843/ within the page”. Table 4 presents the results of the three rounds of chitikainsights-valueofgoogleresultspositioning. execution. pdf, cited June 2019. [6] J. B. Killoran, ‘How to use search engine optimization techniques to in- Round # Recommendations Score crease website visibility’, IEEE Transactions on Professional Communi- 1 13 -116 cation, 56(1), 50–66, (March 2013). 2 5 100 [7] Atanas Kiryakov, Borislav Popov, Damyan Ognyanoff, Dimitar Manov, 3 0 167 Angel Kirilov, and Miroslav Goranov, ‘Semantic annotation, indexing, and retrieval’, in The Semantic Web - ISWC 2003, eds., Dieter Fensel, Katia Sycara, and John Mylopoulos, pp. 484–499, Berlin, Heidelberg, Table 4. Follow-up of authors’ web page. (2003). Springer Berlin Heidelberg. [8] J. Kolodner, Case-Based Reasoning, Elsevier Science, 2014. [9] Punkaj Vohra, ‘The new era of watson computing this article intro- duces cognitive computing using ibm watson and how to leverage cogni- tive computing with ecm centric solutions’, IBM Developer Works, (02 The values thrown by the system in the three rounds show an evo- 2014). lution of the web page through the recommendations. As expected, for a web page with no external links referencing at it, the system assigns low score and several recommendations for the first run, aug- menting score and decreasing recommendations. Bear in mind that the number of recommendations is lower than the tests in the previ- ous sections given that the content of the designed web page is not as big and does not have as many links as the other pages.

5 Conclusions Although the internals of web search engines are very similar, each of them implements different ranking system for indexing web pages. In consequence, the identification of factors that are included in the ranking systems, and its tuning by means of hypertext (metadata), is critical for the success of a given web page. In this context, tags, topic and intention are relevant for recommending changes in the aim of improving results position. This paper proposed a recommender system for improving the search optimization of a web page in the Google’s search engine. The system evaluates 22 main factors used by Google search engine to classify the web pages (ranking them). The system represents a positive contribution because:

• Basic and fundamental factors are handled so that the search en- gine can identify the content and structure of the web page. • Each recommendation explains with details and examples, and in natural language, how the improvement of a factor in the website can be made. • An user without much experience in SEO can make use of the recommendation system as it is intuitive. • Recommendations are different for each factor and each web page (customized recommendations). • The analysis and recommendations are made based on the top 10 bests indexed sites in Google, that deal with the same topic and intention (instance of case-based reasoning).

REFERENCES [1] Monica Bianchini, Marco Gori, and Franco Scarselli, ‘Inside pagerank’, ACM Trans. Internet Technol., 5(1), 92–128, (February 2005).

5 The system interface is in Spanish as it is being used in a multimedia engi- neering program in Colombia.

46 Figure 2. Recommendations for first round to “The fall of JQuery” web page..

47 ﻳﺎﻫﻮ Comparing the Gained Benefits from Product Configuration Systems Based on Maintenance Efforts

Sara Shafiee1, Lars Hvam and Anders Haug

Abstract.1 Product Configuration Systems (PCSs) are automatic associated with PCS projects [5], [10]. Such studies are, however, solutions to support and facilitate sales and engineering processes. often relatively undetailed or unspecific about the costs and PCSs are among the most successful applications of expert system benefits of such projects. Thus, although the literature provides a technology and one of the drivers in the digitalization era. variety of methods to support the development and implementation Therefore, there are several studies on the benefits of PCS. Such of product configurators, it remains unclear how to estimate the studies are, however, often relatively undetailed or unspecific about the costs and benefits of such projects. To address this issue, costs and benefits for different scenarios [9]–[11]. this paper presents studies of four PCS projects, which quantify To harvest the benefits of a PCS, great efforts and investments benefits in terms of reduced working hours, and the costs in terms must be undertaken [5]. In this context, some research point to a of development, implementation, and maintenance costs. The lack of PCS maintenance as the main reason for project failure studies of the PCS projects each concern a 3-year utilization [10], [12]. The maintenance process in PCS involves constantly period. Our results show that the gained benefits from PCS has a updating the PCS knowledge base and being responsive to growing trend over the years in case of proper maintenance. We including alternative requirements [12]. also demonstrate the opposite is the case if not properly The research presented in this paper uses a case company to maintaining the PCSs. Furthermore, the study reveals that PCSs investigate the financial consequences of not maintaining a PCS with the constant maintenance grow increasingly popular (i.e., use properly by comparing four of its projects. The aim is to evaluate frequency) over time, while PCSs with poor maintenance decrease in popularity. the trend of the gained benefits from PCS during the years after development. Here, it is the assumption that PCS projects benefits from proper and constant maintenance and updating in the years 1 INTRODUCTION after implementation [12]. This study also sets out to generalize based on these findings concerning how the profitability of the Customers have become accustomed to having products PCS projects in the years after development can be forecasted. customized to their personal needs while retaining the price With the aim to investigate these effects, the following associated with mass production [1]. Here, Product Configuration propositions are developed: Systems (PCSs) can facilitate sales and production processes for Proposition 1. A PCS project will increase popularity and customized products at prices comparable to mass produced [2]. produce greater benefits in the years after development in case of PCSs affect the company’s ability to increase the accuracy of the continuous maintenance and updating. cost calculations in the sales phase, consequently increasing the Proposition 2. A PCS project will lose popularity and benefits efficiency of sales and engineering process [3]. PCSs are in the years after development in case of not employing continuous developed through describing information about product features, maintenance and updating. product structure, production processes, costs and prices in their To achieve this, we calculate the costs and benefits of four knowledge bases [3]. PCSs support decision-making processes in different projects during their last 3-years. In this context, we focus the product engineering and sales phases by determining important on the saved man-hours in the calculation of benefits in the four decisions regarding product features and offering users information PCS projects. Then, we compare the yearly benefits from each about product designs and costs [4], [5]. project to illustrate any trend of change during the 3-years. Finally, PCSs can bring substantial benefits to companies such as, based on the knowledge in the literature and our research shorter lead time for generating quotations, fewer errors, increased propositions, we demonstrate the results using graphs and discuss ability to meet customers’ requirements regarding product the findings. functionality, use of fewer resources, optimized product designs, less routine work and improved on-time delivery [3], [6]–[8]. 2 RELATED WORKS Although advantages of PCSs are evident, there are still some difficulties associated with required high investment [3], [9] and The relevant literature was reviewed to clarify the present study’s the high chances of failure [10]. Hence, researchers have provided position in relation to existing research. This allowed us not only to empirical data from case studies to better understand the ascertain whether this research has the potential to add to the advantages, challenges, failure causes, expectations and risks existing knowledge but also to identify which parts of the available knowledge are relevant to define the study’s scope. In this section, the relevant literature on calculating the PCS cost- benefits and 1 Mechanical Engineering Department, Technical University of Denmark, Denmark, email: [email protected] PCS complexity is reviewed and subsequently utilized for

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

48 ﻳﺎﻫﻮ calculating the ROI (return on investment) and PCS complexity in collaboration) cards) [8], [24]. Other studies report that one the four cases of this study. The importance of constant maintenance of main reason for the PCS projects failure is the lack of the proper PCS projects and its influence on PCS projects’ profitability is also documentation and maintenance [10]. In other words, not investigated. Finally, PCS complexity and complexity performing maintenance and updating tasks can have significant measurements are discussed. negative consequences [25], [4], [26], [27]. The economic implications of the maintenance of data repositories can be examined in terms of costs and benefits [28]. 2.1 Cost benefit analysis for PCS The results of such analyses indicate that although the cost of Several studies have addressed the cost factors of PCSs. Forza and maintaining a PCS can be relatively high, it cannot be ignored as it Salvador [7] mentioned that a large investment in terms of man- risks wasting the investment. Hence, economic and business hours may be needed to implement a PCS. Hvam [13] reported that benefits can justify the maintenance costs, as continuous updates of in one case, the cost of developing and implementing a PCS in a the PCS are needed to ensure the accuracy and timeliness of the large ETO companies was approximately USD 1 million, with configuration data. operating costs of USD 100,000 per year. These costs are compared to the usage of the system, which is estimated in order to 2.3 Complexity analysis for PCS generate a budget and detailed quotations, according to which the total sales price is USD 200 million [11]. Haug et al. [14] elaborate Complexity is one the most discussed challenges in software on how man-hours in the configuration process can be reduced by development and maintenance [16], [29]. In PCS projects, the up to 78.4%. Moreover, Hvam et al.’s [15] study indicates that complexity of PCS is associated directly with the complexity of after utilization of PCS at the case company, the lead time required targeted products. As PCS complexity increases, the task of to generate an offer was reduced by 94–99%. The reduction can be maintaining the PCS becomes more challenging and costly [30]. traced to automation of routine tasks and elimination of the To measure the complexity of PCSs, Brown et al. [31] defines iterative loops between domain experts, as PCS makes all product three major complexity dimensions; 1) execution complexity, 2) knowledge available [16]. Three main types of benefits are parameter complexity, and 3) memory complexity. Execution mentioned in the literature [9]: (1) time reduction (man-hours and complexity covers the complexity involved in performing the lead time), (2) product specification quality improvement and (3) configuration actions that make up the configuration procedure, sales increase. while the memory complexity refers to the number of parameters Costs of configurators has also been discussed [7], [13] and they that system manager must remember. In this paper, we measure the include software licenses (the cost of buying the software and complexity involved in the knowledge that domain expert provides annual licenses) as well as internal and external man-hours for during the creation of the configuration model [31], which could modelling, programming, and implementing the configurator. The also be an indicator of the product complexity leading to the PCS costs consist partly of the initial costs of making the configurator complexity. Therefore, we assess the parameter complexity in and partly of the annual costs of maintaining and operating it [17]. terms of two major PCS knowledge base characteristics: attributes However, there are still some hidden costs, such as the time needed and constraints (Table 1). for people to learn and use the system – costs that, however, can be measured as man-hours [18]. Table 1. Complexity assessment in terms of parameters in PCS [12] No. attributes No. constraints Low complexity 500 - 1300 200-800 2.2 Documentation and maintenance of PCSs Medium complexity 1300-2000 800-1200 One of the main challenges when using PCSs concerns a lack of High complexity >2000 >1200 documentation, which can lead to incomplete and outdated systems that are difficult to understand [19], [20]. For a company using a In this paper, we measure the complexity of each PCS projects as PCS, it is therefore crucial to have an efficient system for sum of attributes and constraints. The complexity is highlighted as documenting the structure, attributes, and constraints modelled one of the main the background information on PCS projects as the within the system, as well as to facilitate communication between higher complexity dictates the higher effort in development and PCS developers and domain experts [12]. Documentation is a vital maintenance tasks for PCS projects. part of all IT projects, as it is used for sharing knowledge between people and reducing knowledge loss, when team members become 3 RESEARCH METHODOLOGY inaccessible [10], [21], [22]. The documentation of PCS includes modelling, maintaining and updating the product model, and Different research has employed cost-benefit analysis for PCSs storing all information related to the products’ attributes, using different cost factors such as the saved man-hours, increased constraints and rules in the PCS [4]. sales, improved quality and reduction in errors and defects. To Studies of companies using configurators have revealed that, date, there is little research that investigates the trends of cost- without a planned systematic approach, companies are unable to benefits in PCS projects over time to estimate the changes in develop and maintain their configurators [20]. Modelling profitability considering different variants. The number of research techniques are used as documentation tools alongside the task of papers providing detailed data from real case projects are also communication and validation of product information [16], [23]. limited. Thus, a explorative approach was employed in the form of Research supports the modelling process by adding software a case study approach. support and integrating these different modelling techniques (PVM The company selected for case studies produces highly (product variant master) and CRC (class-responsibility- engineered products and technology. More specifically, studies of ﻳﺎﻫﻮ

49 ﻳﺎﻫﻮ four configurator projects were carried out at a large Danish ETO choosing man-hours is the ease of access to these data, as well as company, which produces chemical processing systems. The case the uncertainty related to other factors – as, for example, increased company was chosen because it: sales and improved product quality, which could be results of other factors than the PCS. The number of saved man-hours before and  offers highly engineered and complex products; after using the configurator and the gained benefits based on the saved man-hours are calculated for the last 3-years. The total costs  had recently implemented PCS projects – including of each project is calculated based on the development, projects with frequent and more sporadic maintenance implementation and the yearly running costs (such as licenses and efforts; maintenance activities) for the last 3-years.  had measured/estimated costs and benefits of PCS Analyzing the costs and benefits from the last 3 years at the projects over the last three years; and case company allows us to benefit from the use comparative  offered a unique level of access to project data. multiple case study method [32], [33]. Case-based research seeks to find logical connections among observed events, relying on The reason for choosing one case company for the four studies of knowledge of how systems, organizations, and individuals work PCS projects was to provide the in-depth data analysis and be able [33], [34]. Furthermore, case studies provide researchers with a to observe the changes in benefits over time, while keeping many deeper understanding of the relations among the variables and external factors (such as organizational culture, IT department and phenomena that are not fully examined or understood [35], for PCS software shell) as fixed as possible. instance, the impact of the proper maintenance of the PCS projects ETO companies normally engineer the products with high on the gained and constant benefits increase during the years. complexity based on customer request. Hence, we chose the case projects with the most complex products. The initial criteria for choosing the four projects were: 4 CASE STUDIES Table 3 illustrates all the figures related to the gained benefits  maximum similarities between the four PCS project based on saved man-hours for each project during the first year contexts to keep external factors constant; including development. All the costs are in Danish Kroner.  differences in costs and benefits; Table 3. Calculation of the total benefits in DKK based on the saved man-  similar users (engineers); hours in the first year of development  different PCS use frequency (number of generated quotes);  two projects with continuous maintenance and two with

limited maintenance during the last three years; year hours) licenses) licenses) per year per through Total Costs Costs Total man-hours) man-hours) configurator Case Studies

 same IT team and the involvement of similar tasks during + maintenance (development + (development Total benefit Total benefit per Number of Number of quotes hours (saved man- Benefit per quotein development and maintenance; (just based on saved Case 1 240 10,3 987.840 527.000  similar software platform and integrations. Case 2 295 1 118.000 157.000 Case 3 200 2 160.000 565.000 Case 4 270 0,5 54.000 437.000 All the PCS projects focused on sales process automation in situations where generated quotations were not significantly The calculation of the total benefits in Table 4 illustrates all the affected by market fluctuations, depended on the requests from the figures related to the gained benefits based on saved man-hours for customer. Here, the company experts generate the quotes from the each project during the second year. For the second year, we only sale PCS in order to offer the price and all the specifications to the calculated the maintenance costs and license costs (only for users), customer based on his/her requirements. Cases 1 and 2 received as there was not any development, but only maintenance. continues proper maintenance during the last three year; while cases 3 and 4 has been rarely updated. Table 2 shows information Table 4. DKK based on the saved man-hours in the second year after related to the four selected PCS projects. The complexity ratio on development different projects indicates the task of maintaining the PCS as the higher complex projects require more challenging and costly maintenance tasks. year

Table 2. Background information on case studies hours) licenses) licenses) per year per through Total Costs Total configurator Case Studies

Complexity of the configurator (sum of (just based on (maintenance + Total benefit per saved man-hours) savedman-hours) Case Studies Number of quotes hours (saved man- attributes and constraints) in quote per Benefit Case 1 High = 3400 Case 1 380 10,3 1.560.000 136.000 Case 2 310 1 124.000 70.000 Case 2 Medium = 2100 Case 3 80 2 64.000 40.000 Case 3 Medium = 1850 Case 4 150 0,5 30.000 70.000 Case 4 Low = 790 Table 5 illustrates all the figures related to the gained benefits In this paper, we focus only on saved man-hours as our benefits based on saved man-hours for each project during the third year. measure when comparing the four projects. The reason for Again, for the third year, we only calculated the maintenance costs ﻳﺎﻫﻮ

50 ﻳﺎﻫﻮ and licenses costs (only for users), as there was not any Moreover, as the benefits of the projects increase, there is a development, but only maintenance for case 1 and 2, while not for trend of decrease in the total costs of the projects per year. As case 3 and 4. illustrated in Figure 2, for cases 1 and 2, the cost in year 1 includes The numbers in Table 3-5 show a positive trend towards higher development cost based on the man-hours spent on development, number of generated quotations and higher benefits for the case while in years 2 and 3, there will be just minor updates and PCS projects with continues maintenance effort. It also maintenance. For Cases 3 and 4, the maintenance efforts are just demonstrates the negative trend in both cost and benefits for the the costs paid for licenses, after which they decrease significantly. PCS project with no maintenance efforts. The results demonstrate that the rate of using the PCSs increases over time if the system is 600.000 maintained and updated. 500.000 Table 5. Calculation of the total benefits in DKK based on the saved man- 400.000 hours in the third year after development 300.000

Yearly costs 200.000

100.000 year hours) licenses) licenses) per year year per through through 0 Total Costs Total man-hours) configurator Case Studies Case 1 Case 2 Case 3 Case 4 (maintenance + (maintenance Total benefit per Number of quotes hours (saved man- Benefit per quote in in quote per Benefit (just based on saved Total Costs in first year (development + maintenance + licenses) Case 1 422 10,3 1.736.000 115.000 Case 2 320 1 128.000 52.000 Total Costs in second year (maintenance + licenses) Case 3 20 2 16.000 22.000 Total Costs in third year (maintenance + licenses) 60 0,5 12.000 22.000 Case 4 Figure 2 The total benefits based on the saved man-hours for each of the PCSs over three years

5 DISCUSSION It can be understood from the numbers and graphs that the cost of The case studies demonstrate a positive trend in popularity and use maintenance are very low as compared to the investments for the of the PCS with continuous updates, which leads to the increase in initial development of a PCS. However, the maintenance efforts PCS profitability. and costs can have a dramatic influence on the popularity and Analyzing the cost benefits across three years for case 1 and 2 benefits of PCS projects. clarifies that if the PCS is maintained and updated continuously, they will be used more over the time and turn to become an 6 CONCLUSION important tool among the experts. In other words, experts would give up using other tools (such as excel sheets) and all use the The aim of this study was to understand the influence of the configuration system. As the number of the quotations generated maintenance of the PCSs on gained benefits in terms of saved man- increase, the more man-hours will be saved due to the automation hours. Empirical data was gathered from an ETO company based and the benefits of using the PCS increase. Figure 1 demonstrates on the previous 3-year results, which confirmed the propositions the yearly benefits for each of the case studies. Apart from the made. Specifically, numbers for the cost and benefits for the last 3 complexity or size of the PCS and number of saved hours, case 1 years was available from the case company, and the complexity and 2 presented PCS projects with the tendency to increases in was estimated based on the number of attributes and the number of benefits during the years. However, as illustrated in Figure 1 cases constraints in PCS. The analysis of these data led to the conclusion 3 and 4 have a huge decrease in use and benefits during the last that there is a positive correlation between continuous maintenance three years due to the lack of maintenance. in PCS projects and the level of gained benefits in the selected case projects. If not engaging in maintenance efforts, in the case 1,000,000 projects the money is wasted on licenses and other fixed costs for 900,000 PCSs. As the studied projects showed, projects that receive no 800,000 700,000 maintenance can result in financial loss – since users do not use 600,000 PCS that are not up-to-date because of the lack of maintenance 500,000 efforts. 400,000 300,000 This research is a first step in exploring the impact of 200,000 maintenance on the saved man-hours in PCS project. However, 100,000 there are generalization limitation for the paper due to the limited

Yearly beenfits (saved man‐hours) 0 Case 1 Case 2 Case 3 Case 4 number of cases and data. For future research, the study of this Total benefit in first year 987,840 118,000 160,000 54,000 paper identified a number of factors, which can influence PCS Total benefit in second year 1,560,000 124,000 64,000 30,000 projects’ costs and benefits, needing to be further studied. These Total benefit in third year) 1,736,000 128,000 16,000 12,000 factors include employee experiences, user expertise, level of Total benefit in first year Total benefit in second year Total benefit in third year) details included in PCSs, and organizational culture. Figure 1 The total benefits based on the saved man-hours for each of the In this study, we compared four projects using only one PCSs during the last three years variable, namely ‘saved man-hours’. Thus, there is a need for further research to analyze different factors, which may contribute ﻳﺎﻫﻮ

51 ﻳﺎﻫﻮ to the benefits of PCS projects. Future research needs to cover both from IT Investments,” Standish Report, 2003. the variety of companies except the ETOs as well as a wide range [18] P. R. Massingham and R. K. Massingham, “Does knowledge of case studies. For practice, the results of the paper may motivate management produce practical outcomes?,” Journal of Knowledge companies to give maintenance efforts a higher priority, as Management, vol. 18, no. 2, pp. 221–254, 2014. opposed to solely focusing on the development and implementation [19] J. Tiihonen, M. Heiskala, A. Anderson, and T. Soininen, “WeCoTin- A practical logic-based sales configurator,” AI Communications, vol. tasks. 26, no. 1, pp. 99–131, 2013. [20] A. Haug, “A software system to support the development and maintenance of complex product configurators,” The International REFERENCES Journal of Advanced Manufacturing Technology, vol. 49, no. 1–4, [1] A. Trentin, E. Perin, and C. Forza, “Increasing the consumer- pp. 393–406, Nov. 2009. perceived benefits of a mass-customization experience through sales- [21] M. Alqudah and R. Razali, “A comparison of scrum and Kanban for configurator capabilities,” Computers in Industry, vol. 65, no. 4, pp. identifying their selection factors,” Proceedings of the 2017 6th 693–705, May 2014. International Conference on Electrical Engineering and Informatics: [2] A. Felfernig, S. Reiterer, F. Reinfrank, G. Ninaus, and M. Jeran, Sustainable Society Through Digital Innovation, ICEEI 2017, vol. “Conflict Detection and Diagnosis in Configuration,” in Knowledge- 2017-Novem, no. April 2018, pp. 1–6, 2018. Based Configuration: From Research to Business Cases, A. [22] S. Shafiee, “An agile documentation system for highly engineered, Felfernig, L. Hotz, C. Bagley, and J. Tiihonen, Eds. Morgan complex product configuration systems,” Proceedings of the 22nd Kaufman, 2014, pp. 73–87. Euroma Conference, 2015. [3] C. Forza and F. Salvador, Product information management for mass [23] S. Shafiee, L. Hvam, A. Haug, and Y. Wautelet, “Behavior-Driven customization: connecting customer, front-office and back-office for Development in Product Configuration Systems,” in 20th fast and efficient customization. New York: Palgrave Macmillan, Configuration Workshop, 2018. 2007. [24] A. Haug and L. Hvam, “The Modelling Techniques of a [4] L. Hvam, N. H. Mortensen, and J. Riis, Product customization. Documentation System that Supports the Development and Springer Science & Business Media, 2008. Maintenance of Product Configuration Systems,” International [5] S. Shafiee, L. Hvam, and M. Bonev, “Scoping a product Journal of Mass Customisation, vol. 2, no. 1–2, pp. 1–18, 2007. configuration project for engineer-to-order companies,” International [25] B. P. Lientz, E. B. Swanson, and G. E. Tompkins, “Characteristics of Journal of Industrial Engineering and Management, vol. 5, no. 4, pp. application software maintenance,” Communications of the ACM, 207–220, 2014. vol. 21, no. 6, pp. 466–471, 2002. [6] A. Felfernig, L. Hotz, C. Bagley, and J. Tiihonen, Knowledge-Based [26] T. Blecker, N. Abdelkafi, B. Kaluza, and G. Friedrich, “Controlling Configuration From Research to Business Cases. Newnes: Morgan variety-induced complexity in mass customisation: a key metrics- Kaufman, 2014. based approach,” International Journal of Mass Customisation, vol. [7] C. Forza and F. Salvador, “Managing for variety in the order 1, no. 2, pp. 272–298, 2006. acquisition and fulfilment process: The contribution of product [27] A. Felfernig, G. Friedrich, D. Jannach, and M. Stumptner, configuration systems,” International Journal of Production “Consistency-based diagnosis of configuration knowledge bases,” Economics, vol. 76, no. 1, pp. 87–98, Mar. 2002. Artificial Intelligence, vol. 152, no. 2, pp. 213–234, Feb. 2004. [8] S. Shafiee, “Conceptual Modelling for Product Configuration [28] A. Even and G. Shankaranarayanan, “Utility-driven configuration of Systems,” Technical University of Denmark, 2017. data quality in data repositories,” International Journal of [9] A. Haug, S. Shafiee, and L. Hvam, “The costs and benefits of product Information Quality, vol. 1, no. 1, pp. 22–40, 2007. configuration projects in engineer-to-order companies,” Computers [29] B. Renzl, “Trust in management and knowledge sharing: The in Industry, vol. 105, pp. 133–142, 2019. mediating effects of fear and knowledge documentation,” Omega, [10] A. Haug, S. Shafiee, and L. Hvam, “The causes of product vol. 36, no. 2, pp. 206–220, Apr. 2008. configuration project failure,” Computers in Industry, vol. 108, pp. [30] J. Tiihonen, T. Soininen, T. Männistö, and R. Sulonen, “State-of-the- 121–131, 2019. Practice in Product Configuration — A Survey of 10 Cases in the [11] K. Kristjansdottir, S. Shafiee, L. Hvam, M. Bonev, and A. Myrodia, Finnish Industry,” in Knowledge Intensive CAD, 1996, vol. 1, pp. 95– “Return on investment from the use of product configuration systems 114. – A case study,” Computers in Industry, vol. 100, no. July 2017, pp. [31] A. B. Brown, A. Keller, and J. L. Hellerstein, “A Model of 57–69, 2018. Configuration Complexity and its Application to a Change [12] S. Shafiee, L. Hvam, A. Haug, M. Dam, and K. Kristjansdottir, “The Management System Aaron,” IEEE Transactions on Network and documentation of product configuration systems: A framework and Service Management, vol. 4, no. 1, pp. 13–27, Jun. 2007. an IT solution,” Advanced Engineering Informatics, vol. 32, pp. 163– [32] A. H. Van de Ven, “Nothing is quite so practical as a good theory,” 175, 2017. Academy of Management Review, vol. 14, no. 4, pp. 486–489, 1989. [13] L. Hvam, “Mass customisation of process plants,” International [33] D. M. McCutcheon and J. R. Meredith, “Conducting case study Journal of Mass Customisation, vol. 1, no. 4, pp. 445–462, 2006. research in operations management,” Journal of Operations [14] A. Haug, L. Hvam, and N. H. Mortensen, “The impact of product Management, vol. 11, no. 3, pp. 239–256, 1993. configurators on lead times in engineering-oriented companies,” [34] B. B. B. Kaplan and D. Duchon, “Combining Qualitative and Artificial Intelligence for Engineering Design, Analysis and Quantitative Methods in Information Systems Research: A Case Manufacturing, vol. 25, no. 02, pp. 197–206, Apr. 2011. Study.,” MIS Quarterly, vol. 12, no. 4, pp. 571–586, 1988. [15] L. Hvam, A. Haug, N. H. Mortensen, and C. Thuesen, “Observed [35] J. Meredith, “Building operations management theory through case benefits from product configuration systems,” International Journal and field research,” Journal of Operations Management, vol. 16, no. of Industrial Engineering: Theory, Applications and Practice, vol. 4, pp. 441–454, 1998. 20, no. 5–6, pp. 329–338, 2013. [16] S. Shafiee, K. Kristjansdottir, L. Hvam, and C. Forza, “How to scope configuration projects and manage the knowledge they require,” Journal of Knowledge Management, vol. 22, no. 5, pp. 982–1014, 2018. [17] T. Pisello, “IT Value Chain Management – Maximizing the ROI ﻳﺎﻫﻮ

52 Reusing Components across Multiple Configurators Amartya Ghosh1 and Anna Myrodia2 and Niels Henrik Mortensen3 and Lars Hvam4

Abstract1. The purpose of this paper is to examine the way in their constituent attributes, and constraints organized in which an engineering company reuses components of existing generalization and aggregation hierarchies [4]. configurators across multiple configurators. As the use of The use of product configurators is associated with several configurators has been extended across all lifecycle phases of benefits, that have both direct and indirect impact on the lead time, products, product families, and services, companies tend to develop quality, and cost of the customizable products [1,5,6]. The multiple configurators to support their business processes. Often, companies develop new configurators from scratch even though literature reports these benefits in relation to the different lifecycle some existing configurators comprise components that serve a phases of complex configurable products [7], the impact on human similar purpose. While the concept of reusability is discussed resources and sales performance [8], the return on investment of a extensively in software and expert systems development literature, product configurator project [9] and the level of maturity of the it has not been addressed in the existing literature on product company [10]. Examples from case studies demonstrate the configurators. In this study, the research team primarily focuses on quantitative value of these benefits [11,12]. the approach of reusing and sharing components from existing On the other hand, several studies report a number of challenges configurators to develop new configurators in a multi-configurator that companies face in realizing the benefits from their projects. portfolio. We also examine the benefits and challenges of this These challenges are categorized in relation to IT systems, product reusability approach. The research is supplemented with empirical modeling, organizational issues, resource constraints, type of evidence based on an exploratory case study. The results demonstrate the way in which an engineering company uses and products, and knowledge acquisition [13]. Versioning control [14], structures multiple configurators, the experiences with the concepts ensuring data quality [15,16] and data maintenance [15] are some of reusing and sharing of configurator components and the lessons additional challenges that companies face while implementing a learned. product configurator. A number of the previously mentioned challenges arise because of the high number of product variants, the complexity associated with the level of difficulty to model and 1 INTRODUCTION maintain configurable products within a configurator, and the The increased demand for highly customized product and service number of resources needed [17]. In the case of software and offerings has led to companies adopting mass customization expert systems, the use of the concepts of modularisation and strategies to reduce delivery times, lower costs, and to combat the reusability has led to a reduction in development effort and risk and challenges of product variant proliferation. This increase in product maintenance effort [18]. A number of studies on product variation is accompanied by an increased amount of product configurators have addressed the issue of modeling a product information. This information is traded among the customers, the family within a configurator, by incorporating the principles of sales and the production departments at the company, and the product modularisation and product platform strategies into the suppliers to generate valid customized product variants and the underlying product model and developing system-level structures requisite product documentation [1]. Companies use information [19,20]. However, these studies do not address the implementation technology (IT) tools such as product and service configurators to of the concept of reusability across multiple configurators or the automate the handling of the product information [1]. benefits and challenges associated with the implementation of such Configurators are knowledge-based IT systems, which fulfill a an approach in an industrial setting. configuration task. A configuration task is a special type of design Therefore, this study aims to address this gap in the existing activity [2] facilitated by a number of components, their literature by exploring the experiences of companies reusing or corresponding properties and ports, and constraints which restrict sharing components across multiple configurators. The practical the number of feasible combinations associated with the implications of this study are examined via a case study of an components [3]. Similarly, for service configurators, the engineering company using multiple configurators. The research configuration models for configurable services comprise types, investigates how the configuration team at the company reuses and shares components across multiple configurator projects and the benefits and challenges associated with the use of the concepts of reusability and sharing in the development of multiple 1 Department of Mechanical Engineering, Technical University of configurators. Denmark, Denmark, email: [email protected] The structure of the article is as follows. Section 2 presents a 2 Department of Management Engineering, Technical University of Denmark, Denmark, email: [email protected] literature review that provides a theoretical background on the 3 Department of Mechanical Engineering, Technical University of reuse of parts or modules across multiple product architectures, Denmark, Denmark, email: [email protected] general software systems, expert systems, and configurators. 4 Department of Management Engineering, Technical University of Section 3 presents the research method. Section 4 introduces the Denmark, Denmark, email: [email protected]

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

53 case company and the landscape of its configurator portfolio, and 2.2 Modularity and reusability in software in section 5, the results from the case study are presented. Section 6 discusses the results from the case study in relation to the research The reusability of software has been examined by academia in question and the existing literature base. Section 7 presents the depth during the last few decades [27,28], regarding approaches conclusions to the study and the areas of future research. and methods. With regards to engineering, there are three main concepts of reusability of software identified: application system, component and object and function reuse [29]. The entire 2 THEORETICAL BACKGROUND application system can be integrated into other systems, without significant adaptations and changes to the reused system. An This section provides an overview of the existing literature on example of this reusability concept could be a commercial ERP related topics such as the reuse and sharing of modules in software, system that is used by different companies with entirely different product platform design and expert systems and identifies the gap product portfolios. The component reusability refers to cases of in terms of reusability in configurators. reusing only a component of the whole system or sub-system to another. For instance, libraries in software can be reused by several 2.1 Modularity and reusability in products and systems containing information about different products. The last product families concept describes the reusability of software components that implement a single and well-defined object or function in a system. The concept of modularity has been discussed in depth in the An example of this is the reusability of the price calculation logic recent literature in relation to the concepts of mass customization, described in a software component across several systems, such as product design, and complexity management [21]. The reason for quotation solutions, ERP systems, and order placing systems the development and the extended advancement of the modularity As expected, there are several benefits associated with the concept, in both academia and industry, is related to the various reusability of software [18]. One of the main benefits is the ensured benefits that its implementation is bringing along. dependability of the software since the software has been already A module is defined as an “essential and self-contained tested in another environment. Moreover, the reuse of exiting functional unit relative to the product of which it is part” [22]. The software leads to a reduction in development effort and the risk of standardized interfaces and interactions of each module enable the implementation. For commercial software, the familiarity of the creation of product variants by developing and producing a user with the system increases efficiency, productivity, and reduces combination of different modules. By pursuing a modularization the risk of errors. In particular, when reusing components or strategy, companies can achieve economies of scale while offering objects/functions of a system, it supports faster software greater product variety to their customers, increase strategic development in terms of time, costs and resources, and it allows for flexibility by reusing modules across various product models and specialists, who carry experience, knowledge and best practices model generations and concurrently develop the modules and regarding the reused modules, to be involved. product components [23]. In order to successfully implement a On the other hand, there are a number of challenges associated modularization strategy, the companies need to choose the suitable with the concept of software reusability in any of the three forms degree of modularity in their products, effectively prioritize the described before. The integration of a new piece of software into requirements of the company functions while designing the the existing IT landscape or the integration of an add-on modular modules and coordinate the modular product development process component into an existing system usually requires customizations. across all the concerned organizational units [23]. These customizations can lead to compatibility issues. The modularity of a product is an important characteristic of its Furthermore, companies incur a high cost of hiring experts to make architecture [24]. Based on the sharing of product architectures and these customizations to the existing software. standardized interfaces [25], the product platform approach enables Companies may also utilize a software-product line engineering companies to handle the proliferation of product variety. This (SPLE) approach to develop software systems by reusing assets approach entails the sharing of components, processes, knowledge, created throughout the software product development lifecycle. people, and relationships across a set of products [26]. This approach is characterized by two lifecycle processes: domain The use of a platform strategy leads to a reduction in the engineering and application engineering [30]. Domain engineering development cost and time for new product variants. Moreover, the deals with the development of the reusable assets, which constitute consequent reduction in the volume of parts and the number of the product line infrastructure [30]. Application engineering associated processes leads to a reduction in material costs, logistics involves combining these reusable assets with product-specific costs, procurement costs, inventory costs, and sales and services assets to create the final software product [30]. Companies may costs. [26] combine several inter-dependent SPLs to create a multi-product However, companies face several challenges while adopting a line (MPL) to develop large or ultra-large software systems [31]. product platform approach. Product planning and marketing The benefits associated with the implementation of the SPLE managers have to decide on the product variants that will meet the approach include increased developer productivity, improved demands of various market segments while saving development quality, reduced maintenance efforts, reduced code size, reduced and production costs. Designers face challenges in deciding what consumption of resources, and reduced time-to-market of the product architectures to use in deriving product variants from products [32]. However, companies utilizing the MPL approach product platforms. As more departments with differing goals and also face challenges in structuring the MPL models during the objectives get involved in the decision-making process, companies domain engineering process [33] and handling the technical and also face difficulties in maintaining a balance between the organizational dependencies between the constituent product lines commonality and the distinctiveness of their products. [26] during the derivation of a software product by multiple users [34].

54 2.3 Reusability in expert systems and product rapid product changes and the inability of the configurators to configurators cover the entire product portfolio [41]. Companies producing highly complex products also faced challenges in clearly defining As with software systems, several studies have explored the use of the product families that would be represented in the configurators the concept of reusability in developing expert systems [35]. An [13]. Moreover, a number of companies faced challenges due to a expert system refers to “a computer program that represents and lack of resources for developing and maintaining the configurators reasons with knowledge of some specialist subject with a view to and developing integrations to other IT systems [42]. The solving problems or giving advice” [36]. It comprises a knowledge companies can reduce the development and maintenance efforts for base, an inference engine, a knowledge acquisition system, and a their portfolio of configurators by adopting the concepts of user interface system [35]. At the highest level of abstraction, an reusability and sharing, as utilized in the design of product expert system can be decomposed into two components: the platforms, general software systems, and expert systems, in the knowledge base and the problem-solving methods, both of which development of new configurators. However, the aforementioned can be reused to build other expert systems [35]. studies on the SAP2 configurator and the system-level configurator The creation of the knowledge base is a very resource-intensive do not explicitly address how companies can adopt the concepts of task and is often a bottleneck in the development process of an reusability and sharing in structuring multiple configurators and the expert system. Therefore, developers can achieve significant benefits and challenges they face in the implementation of such an savings in time and costs by reusing the knowledge base across approach. different problem-solving methods, even though the knowledge Thus, there is a lack of empirical evidence into how companies, base may require adaptation to suit the problem scenarios. which have successfully implemented multiple configurators, Developers can also reduce maintenance efforts by reusing structure their configurators, and what components are reused or previously tested problem-solving methods across multiple shared across different configurators. This study aims to address knowledge domains [35]. In certain studies, researchers have also this gap by conducting a case study at a company using multiple addressed the decomposition of the problem-solving methods into configurators to find out how companies utilize the concept of constituent lower-level components in different environments, reusability in structuring their configurators and what benefits and using architectures such as INDEX [37] and PROTÉGÉ [38]. challenges they face by adopting this particular approach. To Moreover, certain architectures and specifications such as CORBA achieve this goal, the following research question is formulated: [39] allow for the reuse of expert system components across RQ: How do engineering companies utilize the concept of different platforms and different development environments. reusability and sharing to structure their product configurators? As configurators can generate valid configurations based on the underlying configuration model, they are considered to be typical examples of expert systems [40]. A couple of studies have 3 RESEARCH METHOD proposed approaches for making system-level configurations. To answer the RQ, the research team conducted an exploratory An example of this approach is the SAP2 configurator that case study at a Danish engineering company utilizing the concept integrates sales, product and production configuration using of reusability and sharing of components across multiple specific sub-modules [19]. The configurator uses an underlying configurators to support the sales processes. The specific company configuration model for a product family that unifies the functional was selected as it is considered to be representative in terms of view, the product component view and the corresponding using multiple product and service configurators to support the production operations and resources view, called the GBoFMO sales processes pertaining to their ETO products. The company is (Generic Bill of Functions, Materials and Operations). sufficiently mature as it has been using configurators for sixteen Another approach for system-level configuration identified in years. The reason for applying case study research is to test still the literature is a configurator prototype that manages system-level unknown variables and not entirely understood phenomena in their platforms and incomplete product configurations for engineer-to- natural settings [43]. The unit of analysis in this study is the order (ETO) companies and project-based businesses [20]. These configurator portfolio which has been built up by the company. projects consist of system-level configurations and multiple Data collection was conducted in the form of semi-structured product configurations. Each product configuration is decomposed interviews with members of the configuration team at the further into its constituent subsystems and parts. The system-level company. The interviewees were selected based on their years of configurations are based on high-level templates, containing the experience, knowledge, and level of involvement in designing the system level parameters. The system configuration provides inputs configurator set up. The research team interviewed the IT project into the product configurations, while the product configurations manager and the business owner from the configuration team as feedback any changes to the parameters to the system-level both these team members had extensive experience in the configuration. The system-level configuration instantiates the development, implementation and maintenance of the product configurations through the use of a common template and configurators. The business owner is responsible for coordinating domain-specific vocabulary. with the stakeholders from the different business areas and in the prioritization of new configurator projects. She has been working 2.4 Benefits and challenges of reusability on configurators at the company for the last seven years. The IT project manager was responsible for the project management tasks In the previously mentioned studies on the challenges of related to the configurators and supported the team in handling configurator implementation and usage, a number of companies some maintenance tasks on the configurators. She has been were unable to realize the benefits from the implementation of working with the configuration team for nearly four years. configurators. The reasons for this inability include difficulties in acquiring, formalizing and managing product knowledge, handling

55 The research team opted to conduct semi-structured interviews who is responsible for collating change requests from the end-users to impose an overall structure to the ensuing discussion and to of the configurators and documenting these requests in a dedicated provide a direction to guide the interviewees while allowing for documentation system. Moreover, the configuration team is also some flexibility for the interviewees. Based on the reviewed supported in the development and maintenance of a number of literature on the concepts of modularity and reusability in physical configurators in a specific business area by another configuration products, software systems and expert systems discussed in the team. Only the configuration teams are allowed to make updates theoretical background section, the research team prepared a list of and perform maintenance tasks on the configurators. questions, which are presented in the Appendix. The duration of each interview was one hour. The questions address the topic of reusing and sharing 4.1 IT landscape: Configurators and configurator components across multiple configurators and the integrations benefits and challenges that the company has faced while The overall set-up of the configurators at the company can be implementing such an approach. The questions, categorized as described in four levels. Figure 1 illustrates the distinction between “Company-specific questions”, aim at providing an overview of the the different levels. The plant configurator describes the complete case company’s experience in developing and implementing plant, including all the specific equipment and the services related configurators. The interviewee specific questions address how the to them. The plant configurator also contains some constraints and configuration team has set up multiple configurators and how they knowledge about the specific plant type covered by the have utilized the concept of reusing and sharing configurator configurator. Each plant is first decomposed into several systems, components across multiple configurators. specific to each plant. Each system-level configurator contains As the research team was also investigating the aforementioned knowledge about the variants of each system. On the equipment approach from a business perspective, the last two interviewee- level, each configurator contains product knowledge and specific questions address the impact of implementing this constraints pertaining to specific equipment. The company also approach. In particular, the interviewees were asked about the uses a global service configurator, which is used across the benefits and the challenges of implementing a modular concept on equipment configurators. the product configurator portfolio. First, they were asked to answer When configuring a plant, the plant configurator calls the this question based on their experience and then they were specific system configurators, which, in turn, calls the requisite presented with a list from the benefits and challenges identified in equipment level configurators. The service configurator provides the literature, to ensure that all the relevant topics were taken into service information for different plants and equipment. account. All the configurators are developed using the same commercial The collected data used for the analysis were qualitative in configuration system software. nature, primarily based on the interviews. Supplementary material provided included schematics of the IT landscape and a demonstration of the configurators, to allow the research team to develop a better understanding of the configurator portfolio.

4 CASE DESCRIPTION The selected case company is an engineering company, providing solutions from specific equipment to complete plant solutions. The company operates globally, serving the food, dairy, chemical, and pharmaceutical industries, and has a yearly turnover of approximately 305 million €. The complete solutions provided include some standardized customize-to-order (CTO) products, but they also require customized ETO products for specific customers. As mentioned earlier, the company has been using configurators Figure 1.Multi-level configurator set up at the case company for sixteen years, with the current portfolio of configurators covering approximately 50% of the entire product portfolio of the At the plant level, the plant configurators have integrations to the company. In the case of ETO products, the concerned configurators internal software system for making engineering calculations, the are capable of generating full or partial configurations. pricing and the quotation databases, the product data management The configurators support the sales and service phases of the system for the generation of piping and instrumentation drawings, products and solutions offered to the customers, particularly the the document generator system and the calculation portal that is tendering and procurement processes. The configuration team used for manual cost calculations and importing the values to the consists of 5 people in total: the business owner, an IT project ERP system for project budgeting purposes. At the equipment manager, one configuration engineer focusing on product and level, the configurators are only integrated to the document service modeling activities, and two software developers. The generator system. software developers focus primarily on developing add-ons and Apart from the integrations to external systems, the commercial plug-ins to enhance the functionality of the configurators. They are configuration system software that the company uses enables each also responsible for the development and maintenance of the configurator to call other configurators at a lower level. From a integrations with the other IT systems being used by the company. practical point of view, to configure a complete solution (plant), The team is supported by a super-user from each business area, the configuration process starts from the plant level. At this level,

56 the user decides on the plant systems and based on this selection, blowers can be used in two different plants, then the company only the required equipment are then individually configured. However, uses one configurator to store the product knowledge, and the two in certain scenarios, the end-users can also use the equipment different plant configurators call that equipment configurator when configurators independently, without having to configure any required. overarching plants or systems. For example, end-users in the As the plant level configuration is highly dependent on the procurement department might require the prices for a specific business application environment, the responsible business units configuration of particular equipment and therefore would only are primarily responsible for deciding what can be shared, for need to use an equipment configurator instead of creating a plant- example, based on the material used for the different equipment level configuration and a plant system-level configuration first. and the prices. The business units are responsible for the overall A super-user from each business unit is responsible for the setup of the plants and the constituent plant-systems and collation of change requests or bugs from the end-users of the equipment. When a particular business unit requests the particular configurators under the purview of the business unit. The development of a configurator for new equipment, the super-users document their requests in a dedicated documentation configuration team starts building a configurator model based on system, which the configuration team can use while updating the the input from that business unit. However, based on the tacit concerned configurator models. After making a change to the knowledge of the configuration team and the documentation particular configurator model, the configuration team follows a available on the existing configurator models, the team may decide standard procedure to document the changes. Then, the IT project to reuse an existing model in another application environment and manager approves the changes before they are released for use by adapt it to meet the needs of the current context. Furthermore, the the business units. The configuration team also uses a version business units may also coordinate with the procurement control system for the configurator models allowing them to revert department to find any existing configurators which may already to an older version of the configurator model in case any issues cover their user requirements. arise from the changes made. 5.2 Benefits 5 RESULTS The primary benefit reported during the interviews was the The following section presents and discusses the results of the case standardization achieved across the business units in the company. study. It focuses on the implementation of the concept of reuse and Standardization is a benefit that is usually associated with sharing of configurator components at the case company and the configurators. In this case, the standardization refers to both the benefits and the challenges the company has experienced using this products and the processes. The standardization covers not only the approach. product models that were modeled into the configurators but also the sales processes supported by them, the roles and responsibilities of the stakeholders, and the maintenance and 5.1 Reusability and sharing change tasks associated with the configurators. Having these The configurators have a high degree of sharing and reusing parts standard procedures in place allow the configuration team and the of the system, both within the same level and across different business units to improve their efficiency, to improve levels, as illustrated in Fig. 1. When reusing components from communication and keep track of the change requests and changes existing configurators to develop new configurators, the to the configurator models. configuration team uses the product model from an existing Furthermore, the multi-configurator set-up allows for a more configurator, either entirely or partially, and adapts it to suit the modular representation of the product portfolio, including both purposes of the new configurator project. For instance, two individual products but also complete solutions. In particular, the equipment in two separate business application environments introduction of the plant-system configurators supported the might have product models which are similar, but one equipment decomposition of the plant into its constituent systems, which may be more complex than the other. In this case, the configuration resulted in rendering the development, maintenance, and testing engineer may reuse the existing product model of one equipment, tasks much easier and quicker. either partially (by making some changes) or fully while The interviewees assigned more importance to the benefits of developing the configurator for the other equipment. standardization of the associated processes as compared to the Additional areas of component reuse across different reduction in the development, maintenance, and testing efforts. configurators at the same level relate to the reuse of logic Nevertheless, there are variations identified regarding the time components, e.g. ways of calculating values and generating spent of these tasks. In particular, the development time for an documents. The components of the configurator model that are equipment configurator is approximately six months along with an reused are usually the ones that do not require frequent changes to additional one or two months allocated for initial testing, whereas the structure of the product model. the plant configurators take around one year to develop. During the At the plant system level, the concept of sharing becomes more development phase of a configurator project, the effort spent in apparent. As mentioned earlier, each plant is decomposed into a knowledge acquisition is more significant compared to the actual number of systems, which in turn, are composed of certain effort dedicated to the modeling of the configurator. equipment. These plant-systems are unique to each plant. However, With reference to handling changes, the interviewees again the same type of equipment may be used to meet the needs of highlighted the improvements in efficiency. These improvements different plant-systems and plants. In such cases, the equipment arise due to the differences in frequencies of change requests level configurators are shared across multiple plant system across different levels. For instance, the plant configurator models configurators at the plant system level. For example, if a family of were quite stable and required one or two changes annually. Since

57 the frequent changes are generally limited to the equipment The company also faces challenges in deciding on the role of configurators, the maintenance and testing effort is lower as the business units in the scoping and decision-making phases of the compared to the effort required for maintaining and testing the configurator development projects. The business units are plant configurators. By having a clear overview of the tasks responsible for making decisions regarding the overall structure of required and to what level of the configurator portfolio they are the configurators. They also play a crucial role in defining the parts assigned to, the team can predict the maintenance efforts more of the configurators that are shared based on their alignment with accurately and improve the efficiency of executing these tasks the strategic goals of the company and the market needs. The initial Another benefit highlighted is related to improved user requirements, that are set from the business units, always need communication and knowledge sharing in the company. The to be adjusted, requiring several iterations and quite often the final particular setup of the configurator portfolio allows for better result is very different from the initial one. communication among various stakeholders, especially cross- organizational, e.g. when a business unit raises a request for new configurators with the configuration team. In such a scenario, the 6 DISCUSSION configuration team can use existing configurators for similar This study addresses the issue of how engineering companies using equipment or plants within the same or different application multiple configurators utilize the concept of reusability and sharing environment to show the scope and the functionalities, which the of configurator components in developing their configurators, and configurators offer, to the stakeholders from the business unit. In the benefits and challenges that the companies face in adopting this that sense, the reusability refers both to the configuration models approach. The study presents the case of a Danish engineering and the knowledge and experience for the processes supporting the company, which has been using configurators to configure ETO configurator. plants and the constituent plant-systems and equipment. The findings from the literature study indicate that the concepts 5.3 Challenges of reusability and sharing have been extensively examined in the fields of product development, software systems, and expert On the other hand, the interviews revealed several challenges systems. In the case of products, the use of reusable modules and that the case company is facing while implementing the concepts product platform strategies benefits companies by leading to a of sharing and reusability across the different configurators. The reduction in development time, cost of new product variants, main challenge concerned the coordination of the teams. As manufacturing costs, and inventory costs. In case of software mentioned earlier, two configuration teams are responsible for systems and expert systems, the reuse of components, such as developing, implementing and maintaining the configurators and libraries, problem-solving methods, and knowledge bases, lead to their integrations to other IT systems. If an equipment configurator the reduction in development, testing and maintenance effort model is updated without informing the rest of the team, then this associated with these systems. These findings from the literature might adversely affect the functioning of the overarching plant study, along with the results from the case study, provide an models containing that equipment configurator. The interviewees answer to the RQ. The empirical evidence demonstrates how an emphasized the coordination between the teams to ensure that such ETO company develops multiple configurators to support its issues would not arise while making changes or updates to existing business processes by reusing and sharing different configurators models. This is because the case company operates globally and and configurator components. It also explains how these concepts the geographical location differs among the team members. The are utilized when developing new configurators for different coordination related challenges are also in terms of roles and business units of the company. Furthermore, the study also responsibilities. Even though the set-up of the configurators ended addresses the benefits and challenges associated with the up supporting the transparency of the roles and tasks, in the implementation of such an approach. beginning, it was not clear how the distinction was made and how The way in which the case company structures their the changes to the configurator models were communicated to the configurators into different levels is similar to the system-level stakeholders. configuration approach proposed in [20]. In both the approaches, Version control is another important challenge faced by the the system-level (plant-level, in case of the case company) company. While the configuration team utilizes a versioning configurator is modeled first, followed by the modeling of the control system for the entire set of configurators, the team still constituent configurations. Both approaches also allow for partial faces a challenge in deciding which version of the product model to configurations for ETO products. However, the system-level save. When the changes to any of the configurators are limited, configurator prototype, as described in [20], does not address the (e.g. update of prices), the configuration team does not save the issue of using the concept of reusability and sharing of configurator previous version first before making the changes. However, when components across multiple configurators covering different the change affects several levels of the configurator set up, the projects or ETO products and its benefits and challenges. The team requires more time for implementation and testing. In this benefits that the case company has generated from the use of the case, the previous version is saved as it is significantly different concept of the reusability and sharing of configurator components from the updated version. Another challenge faced by the are similar to the benefits of the reuse of knowledge bases and configuration team is the maintenance of compatibility across the problem-solving methods in expert systems [1,5]. However, the plant configurators, the plant-system configurators and equipment interviewees noted that the development time of the model itself configurators. In certain situations, the changes made to the plant was insignificant compared to the time required for product configurators result in errors if the lower-level models are not knowledge acquisition from the business units. updated accordingly. With reference to the challenges of implementing a modular approach to the set-up of the configurator portfolio, the findings

58 from the case study are aligned with the findings from the literature configuration systems. Int J Prod Econ 2002;76:87–98. [13,14]. Change management, knowledge acquisition and doi:10.1016/S0925-5273(01)00157-8. maintenance of the models are the main issues addressed in the [2] Mittal S, Frayman F. Towards a generic model of configuration findings of this research and can be primarily associated with the tasks. Ijcai-89 Proc Elev Int Jt Conf Artif Intell 1989:1395–401 concepts of sharing and reusing parts of the configurators. vol.2. These findings provide strong empirical evidence to support [3] Felfernig A, Friedrich GE, Jannach D. UML as domain specific managerial decisions in terms of designing and structuring a configurator portfolio. The insights from the case study can be used language for the construction of knowledge-based configuration as guidance when defining the scope of a multi-level configurator systems. Int J Softw Eng Knowl Eng 2000;10:449–69. set up and the establishment of the cross-organizational [4] Heiskala M, Tiihonen J, Soininen T. A conceptual model for collaboration among the teams involved in the development configurable services. Pap. from Config. Work. IJCAI’05, 2005, process. p. 19. [5] Hvam L, Haug A, Mortensen NH, Thuesen C. Observed benefits 7 CONCLUSION AND FUTURE from product configuration systems. Int J Ind Eng Appl Pract RESEARCH 2013;20:329–38. [6] Felfernig A, Hotz L, Bagley C, Tiihonen J. Knowledge-based The focus of this study is on examining the concepts of reusability configuration: From research to business cases. Newnes.; 2014. and sharing across multiple configurators. While these concepts are [7] Myrodia A, Kristjansdottir K, Shafiee S, Hvam L. Product well established in the fields of product modeling and software configuration system and its impact on product’s life cycle development, they have not been addressed in depth in relation to complexity. IEEE Int. Conf. Ind. Eng. Eng. Manag., vol. 2016- product configurators. The research team conducts a case study to Decem, 2016. doi:10.1109/IEEM.2016.7797960. investigate how these concepts are implemented in an industrial [8] Haug A, Shafiee S, Hvam L. The costs and benefits of product setting. The results from the literature review and the case study provide an answer to the developed RQ on how the concept of configuration projects in engineer-to-order companies. Comput reusability in configurators is being used by engineering Ind 2019;105:133–42. doi:10.1016/j.compind.2018.11.005. companies. [9] Kristjansdottir K, Shafiee S, Hvam L, Bonev M, Myrodia A. This study contributes to the existing knowledge on the Return on investment from the use of product configuration modeling and scoping of configurators by looking at how an systems – A case study. Comput Ind 2018;100:57–69. engineering company structures multiple configurators for doi:10.1016/j.compind.2018.04.003. configuring ETO products using the concepts of reusability and [10] Myrodia A, Randrup T, Hvam L. Configuration lifecycle sharing of different configurator components. Practitioners in the management maturity model. Comput Ind 2019;106:30–47. industry can also gain some insights into adopting these concepts doi:10.1016/j.compind.2018.12.006. while modeling and scoping product configurator projects, as the [11] Hvam L. Mass Customization in the electronics industry. Int J case company is representative of engineering companies Mass Cust 2006;1:410–26. producing complex ETO products and utilizing multiple configurators to support their business processes. [12] Forza C, Salvador F. Product configuration and inter-firm co- This study focuses only on one case company and how they use ordination: an innovative solution from a small manufacturing the concept of reusability and sharing of configurator components enterprise. Comput Ind 2002;49:37–41. to structure multiple configurators. Consequently, the discussion of [13] Kristjansdottir K, Shafiee S, Hvam L, Forza C, Mortensen NH. the benefits and the challenges, which arise out of the usage of this The main challenges for manufacturing companies in concept, pertain only to the specific case company and the scope implementing and utilizing configurators. Comput Ind 2018:196– and structure of their configurators. Therefore, future work will 211. doi:10.1016/j.compind.2018.05.001. focus on increasing the number of case companies, thereby making [14] Heiskala M, Tihonen J, Paloheimo K-S, Soininen T. Mass the results of the study more generalizable to the industry at large. Customization with Configurable Products and Configurators. Another limitation of the study relates to the role of the Mass Cust. Pers. Commun. Environ., IGI Global; 2011, p. 75– interviewees. For this study, we have interviewed only the business 106. doi:10.4018/978-1-60566-260-2.ch006. manager and IT project manager on the company’s configuration team. However, the business and technical units are responsible for [15] Rasmussen JB, Myrodia A, Mortensen NH. Cost of Not the underlying product architectures that are modeled in the Maintaining a Product Configuration System. Int J Ind Eng configurators. Therefore, in future work, we aim to interview Manag 2018;9:205–14. doi:10.24867/IJIEM-2018-4-205. business unit stakeholders at the case companies to incorporate [16] Haug A, Zachariassen F, Liempd D van. The costs of poor data their insights into the way in which the concept of reusability and quality. J Ind Eng Manag 2011;4:168–93. sharing is implemented and the benefits and challenges that they doi:10.3926/jiem.2011.v4n2.p168-193. perceive arising out of the implementation of this approach in [17] Asadi M, Soltani S, Gašević D, Hatala M, Bagheri E, Benavides developing configurators. D, et al. The effects of visualization and interaction techniques on feature model configuration. Empir Softw Eng 2016;21:1706–43. REFERENCES doi:10.1007/s10664-014-9353-5. [18] Williams LG, Smith CU. PASASM: A Method for the [1] Forza C, Salvador F. Managing for variety in the order Performance Assessment of Software Architectures. In: Balsamo acquisition and fulfilment process: The contribution of product

59 S, Inverardi P, Selic B, editors. WOSP ’02 Proc. 3rd Int. Work. [36] Jackson P. Introduction to Expert Systems. 3rd ed. Boston, MA, Softw. Perform., Rome, Italy: ACM New York, NY, USA; 2002, USA: Addison-Wesley Longman Publishing Co., Inc.; 1998. p. 179–89. doi:10.1145/584369.584397. [37] Dai W. Reusable expert system components development. Appl [19] Zhang LL, Vareilles E, Aldanondo M. Generic bill of functions, Artif Intell 1996;10:225–38. doi:10.1080/088395196118560. materials, and operations for SAP2 configuration. Int J Prod Res [38] Gennari JH, Musen MA, Fergerson RW, Grosso WE, Crubézy M, 2013;51:465–78. doi:10.1080/00207543.2011.652745. Eriksson H, et al. The evolution of Protégé: an environment for [20] Kristianto Y, Helo P, Jiao RJ. A system level product knowledge-based systems development. Int J Hum Comput Stud configurator for engineer-to-order supply chains. Comput Ind 2003;58:89–123. 2015;72:82–91. doi:10.1016/j.compind.2015.04.004. [39] Gennari JH, Cheng H, Altman RB, Musen MA. Reuse, CORBA, [21] Suzić N, Forza C, Trentin A, Anišić Z, Suzi N, Ani si Z. and knowledge-based systems. Int J Hum Comput Stud Implementation guidelines for mass customization: current 1998;49:523–46. characteristics and suggestions for improvement. Prod Plan [40] Hvam L, Riis J, Mortensen NH. Product customization. Berlin Control Manag Oper 2018;29:856–71. Heidelberg: Springer; 2008. doi:10.1080/09537287.2018.1485983. [41] He W, Mingz XG, Nix QF, Luy WF, Lee BH. A unified product [22] Miller TD, Elgard P. Defining modules, modularity and structure management for enterprise business process integration modularization. Proc. 13th IPS Res. Semin. Fuglsoe, 1998. throughout the product lifecycle. Int J Prod Res 2006;44:1757– [23] Persson M, Åhlström P. Managerial issues in modularising 76. doi:10.1080/00207540500445453. complex products. Technovation 2006;26:1201–9. [42] Colombo G, Furini F, Rossoni M. The Role of Knowledge Based doi:10.1016/j.technovation.2005.09.020. Engineering in Product Configuration. In: Eynard B, Nigrelli V, [24] Ulrich K, Eppinger SD. Product Design and Development. New Oliveri S, Peris-Fajarnes G, Rizzuti S, editors. Lect. Notes Mech. York: McGraw-Hill; 1995. Eng., Cham: Springer; 2017, p. 1141–8. doi:10.1007/978-3-319- [25] Harlou U. Developing product families based on architectures: 45781-9_114. Contribution to a theory of product families. Technical University [43] Voss C, Tsikriktsis N, Frohlich M. Case research in operations of Denmark (DTU), 2006. management. Int J Oper Prod Manag 2002;22:198–219. [26] Robertson D, Ulrich K. Planning for Product Platforms. Sloan doi:10.1108/01443570210414329. Manage Rev 1998;39:19–31. [27] Raatikainen M, Tiihonen J, Männistö T. Software product lines Appendix and variability modeling: A tertiary study. J Syst Softw Company specific questions 2018;149:485–510. doi:10.1016/j.jss.2018.12.027. [28] Mohabbati B, Hatala M, Gašević D, Asadi M, Bošković M. 1. How long has the company been using configurators? Development and Configuration of Service-Oriented Systems 2. What is the scope of the configurators? Families. Proc. 2011 ACM Symp. Appl. Comput. (SAC ’11), 3. Which areas of the product lifecycle do the configurators support? TaiChung, Taiwan: ACM; 2011, p. 1606–13. 4. To what extent do the configurators cover the product and doi:10.1145/1982185.1982522. services portfolio? [29] Singh S, Singh S, Singh G. Reusability of the Software. Int J 5. What is the nature of the products that are covered by the Comput Appl 2010;7:38–41. configurators? 6. What is the setup of the configuration team at the company? [30] Pohl K, Böckle G, Van Der Linden F. Software product line engineering: Foundations, principles, and techniques. Springer Interviewee specific questions Berlin Heidelberg; 2005. doi:10.1007/3-540-28901-1. [31] Holl G, Grünbacher P, Rabiser R. A systematic review and an 1. How is the configurator portfolio set up? expert survey on capabilities supporting multi product lines. Inf 2. What components do configurators share/reuse within/across the Softw Technol 2012;54:828–52. configurator portfolio? How do the configurators share/reuse [32] Van Der Linden F, Schmid K, Rommes E. Software product lines these components? in action: The best industrial practice in product line engineering. 3. How does the configuration team decide on what components to Springer Berlin Heidelberg; 2007. doi:10.1007/978-3-540-71437- share? 8. 4. How do you visualize/represent your product configurator models [33] Rabiser R, Grünbacher P, Holl G. Improving awareness during and how do you introduce new configurators into the existing portfolio/architecture? product derivation in multi-user multi product line environments. 5. What are the benefts and challenges of implementing this CEUR Workshop Proc 2010;688:1–5. approach? [34] Holl G, Vierhauser M, Heider W, Grünbacher P, Rabiser R. 6. How has the planning, development and implementation of Product line bundles for tool support in multi product lines. Proc. configurators changed over the years at the company? What are 5th Work. Var. Model. Software-Intensive Syst., 2011, p. 21–7. the reasons behind this change? [35] Gennari JH, Stein AR, Musen MA. Reuse for knowledge-based systems and CORBA components. Proc 10th Banff Knowl Acquis Knowledge-Based Syst Work 1996:461–4616.

60 Adaptive Autonomous Machines - Requirements and Challenges

Lothar Hotz1 and Stephanie von Riegen2 and Rainer Herzog3 and Matthias Riebisch4 and Markus Kiele-Dunsche5

Abstract. In mechanical and plant engineering, the general chal- realizing such machines, especially in respect to configuration tasks lenge is to achieve flexibility in order to process changes in the re- such as reconfiguration. quirements or operating conditions of a machine on the site of the plant operator. Changes to the machine and its configuration require the operator to work together with the machine builder (or plant con- 2 CURRENT SITUATION IN PLANT structor for several machines) and, if necessary, with his suppliers, ENGINEERING which requires time and effort due to communication and delivery The component manufacturer has developed products whose prod- routes. Hence, an autonomous acting machine or component that deal uct features can be configured in a variety of ways to cover a wide with needed changes through automatically triggered adaptations range of missions [7]. For a power unit, such product characteristics would facilitate this process. In this paper, subtasks for construct- are: Torque, rotary speed, type of sensors, type of actuator, mechani- ing autonomous adaptive machines are identified and discussed. The cal interfaces, electrical interfaces, control characteristics, functional underlying assumption is that changes of machines and components characteristics, but also something basic like color. These have to can be supported through configuration technologies because those be considered as drive systems, which in some cases have to be un- handle variability and supply automatic derivation methods for com- derstood as poly systems, since there are central components, such puting needed changes in terms of machine and component updates. as the power supply, or a controller, which specifies a coordinated movement, such as in robot kinematics. 1 INTRODUCTION The interactions between the power unit components of a drive axle or a drive system, starting with the connection to the control In recent years, the demand for the industrial production of small system up to the driven components or the mechanics, can be very quantities has increased steadily. Whereas in the past larger industrial extensive, so that the machine builder is dependent on the knowledge plants were designed for the production of large quantities of exactly of the component or solution supplier. The development of suitable one product whose parameters did not change, today the possibility drive solutions is therefore usually carried out in close cooperation of fast, flexible adaptation to changes in product lines is becoming in- these days. On the other hand, machine builders want to reuse their creasingly important. While an adjustment of the machine settings is developed results wherever possible, which forces them to modular- often sufficient for minor changes, larger adjustments require a mod- ize their machine solutions. Machine modules are then defined which ification of a machine by the machine manufacturer, or even changes combine one or more drive axes or drive systems. of a complete production plant. For this purpose, the dependencies of A general overview of the current plant engineering process is individual plant components must be taken into account, e.g., the use given in Figure 1. After the mission and the classification of the of a stronger motor at one point would possibly also require the use client’s requirements, a concept is created that may include exist- of a drive shaft that can withstand higher torques. If individual plant ing solutions. Requirements for such solutions can be functional modules can be configured to give a higher or lower speed, instead (movements, production steps) and constructive characteristics (di- of a higher throughput other modules could be enabled to achieve a mensions, interfaces such as connecting elements etc.). When decid- higher accuracy. ing on a solution, various aspects have to be taken into account. Some In this paper, we present first considerations for enabling machines lead to severe restrictions, others are free, and still others have con- or components to themselves start adaptations of their configurations. sequences for other solutions or components. If automation or partial Firstly, we discuss the current process in plant engineering for adapt- solutions are available (selction of partial or automation solutions), ing machines (Section 2). In Section 3, we provide an illustrating these can be integrated into a machine solution. Integration refers to example and in Section 4, we present our concept for autonomous function, design, parameterization, wiring, but also to organizational adapting machines and in Section 5, we discuss main challenges for issues such as spare parts inventory and documentation. The planned 1 University of Hamburg, Germany, email: [email protected] mechanical design can be verified by means of a simulation (simula- 2 HITeC, University of Hamburg, Germany, email: tion). Once a decision has been made on an overall solution, the elec- [email protected] trical, pneumatic and hydraulic design (construction) is carried out. 3 HITeC, University of Hamburg, Germany, email: [email protected] This is incorporated into the development of machine control and op- hamburg.de 4 University of Hamburg, Germany, email: [email protected] eration and can be put into operation virtually (virtual commission- hamburg.de ing). After the customer has placed his order (order and logistics) on 5 Lenze GmbH, Germany, email: [email protected] the designed solution, the montage and initial commissioning with

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

61 simulation

mechanical montage & mission concept creation order & logistics acceptance design commissioning

decision selection of about partial solutions overall solution

selection of automation solution construction virtual commissioning

development of machine control and operation system

Figure 1. Current plant engineering process the commissioned construction and machine control and operation a plant do not adversely affect plants placed nearby, virtual verifica- will take place. The process concludes with customer acceptance. tion (C4) will check whether the improved cooling capacity will be sufficient even under summer factory workshop temperatures and if proper cooling is sufficient to lower the system load below 1. Adapta- 3 EXAMPLE FOR AN AUTONOMOUS AGENT tion planning (C5) means determining that shutting down the plant is Our goal is to deliver not only a machine (consisting of several necessary for the given changes and could also determine the optimal components) or a component to a customer, but also so-called time to do so. Both virtual verification (C4) and adaptation planning autonomous agents. These have the task of monitoring the ma- (C5) are monitored and accompanied by human experts. Finally, the chine/component and adapting it to requirement changes. The ma- change must be made (C6), which then changes the system properties chine/component together with the autonomous agent forms the au- (B). tonomous adapting machine. If changes are made in the machine, the autonomous agent also changes the description of the machine. 4 GENERAL CONCEPT FOR AUTONOMOUS The agent can be regarded as an IT system for changing the machine, ADAPTING MACHINES whereby the agent, i.e. the IT system, also adapts itself dynamically. Figure 2 shows the interaction of the autonomous agent with the We expect that the machine is accompanied by a complete descrip- machine and its environment in a scenario with independent adapta- tion of the currently installed (probably parameterized) components, tion to new or changed requirements, where only one agent and one the configuration. The configuration is an instance of a configura- machine are shown. tion model. The configuration model is given in machine-readable, Since configuring a plant is a common example that illustrates semantically interpretable form [4]. It represents the variants of sys- configuration processes, the following are examples of the individ- tem components as well as mappings between external parameters ual elements in Figure 2 based on a plant engineering configuration. describing requirements and components realizing those. Addition- In addition to the main components (drive unit, fan, running gear, ally to this configuration model a here called action model describes rack feeder, sensor, picker arm, etc.), a classic plant can have fans how the agent can acquire new requirements in the productive en- for cooling the internal case temperature or individual components. vironment. This happens on the one hand by sensors, which seize We assume that a plant is already configured in a certain way (B) to the environment (e.g., pressure, temperature), on the other hand, by solve a certain task (A). Using sensors, an autonomous agent could accesses to other systems in the productive environment (like for ex- now determine that the system load is slightly higher than 1, which ample other machines and the development system with the machine- means that the plant is slightly overloaded (C1). An internal list of builder). Adaptation now is defined by changing the configuration suitable components (C2) could show that it is possible to replace and the actual system, the machine. Changes in turn can be parame- the drive unit with a more powerful model. However, we assume that ter changes, additional components, or component replacements. the drive unit delivers weaker performance than possible due to over- The requirements are continuously determined and compared with heating. Since this information is also made available via sensors the external system parameters of the current machine. This is used (C1) to the agent, he will decide (C3) to optimize the cooling instead to determine which properties, parameters or functions are not ful- of changing the drive unit. Since the installed fans already deliver filled by the current configuration or solution (C1 in Figure 2). If the full performance, a solution could be to install another fan, as the boundary conditions of the environment or process parameters far as the case offers this possibility, which would also be ensured change during system operation so that requirements are no longer by checking (C2). Alternatively, the manufacturer could also offer fulfilled, the autonomous agent becomes active in the same way as an improved fan with higher cooling capacity (E), which would re- when requirements are changed in order to achieve fulfillment of the place an installed one. While monitoring (D) ensures that changes to requirements by changing parameters, configuration or solution ele-

62 autonomous agent for machine component manufacturer A D C1 requirements monitoring requirements C3 acquisi�on configura�on solu�on cloud C2 process engineering machine incl. product knowledge C4 E lifecycle management model virtual configura�on B verifica�on system C6 proper�es adapta�on C5 parameter se�ng adapta�on upgrade planning component change

engineer

Figure 2. Scenario: independent adaptation to new requirements ments. tonomous decisions, is recognized and ultimately prevented by mon- The models will be delivered with the machine in order to achieve itoring. Knowledge-based monitoring monitors the activities of the autonomy. On the other hand, it should be possible to extend the autonomous agent. For this purpose, knowledge modeling about the models by using remote knowledge structures at the component man- possible activities of the agent as well as of the machine and its en- ufacturer (E). This solution cloud contains a lot of solutions that the vironment is used. This makes it possible to analyze and reflect on agent cannot calculate on his own. This can be the case with complex actions while the agent is performing them and thus to recognize calculations, innovations regarding other components or integration unsafe actions and interactions. The simulation shows on the other tasks. These descriptions are available as configuration models in a hand the system’s adapted behavior by means of a simulation model, form that can also be evaluated by the autonomous agent. which processes the changes of the system behavior. In addition to access the set of solutions, the autonomous agent The autonomous agent involves various solution component ven- also has an engineering knowledge model (C2) that can be used to de- dors in order to obtain additional or new solution elements and new termine solutions. This knowledge model contains different reusable configurations. These new solution elements can also be deployed means such as component, machine and context models, procedures after the component vendor has analyzed the new requirements and (e.g. planning and design methods), different libraries of solution then developed a new component - which of course takes some time. ideas (descriptions of earlier solutions together with their properties Several solution agents based on our concepts carry out self- and evaluations in the form of models) and solution knowledge (such organization in the sense that solution knowledge is brought together as heuristic elements or regular information). both at the (special) machine manufacturer and at the supplier of so- In some cases, the autonomous agent can find a solution that meets lution parts such as drive solutions in the search for suitable solu- the requirements (C3). This is tested and realized e.g. by parameter- tions. There is also interaction with domain experts who contribute izing or changing the configuration of the system. In other cases, the further engineering knowledge or make decisions. autonomous agent proposes solution variants that require additional Some examples of scenarios that an autonomous agent can support development activities by a machine builder or an exchange of so- are briefly listed below: lution parts. In these cases, the autonomous agent provides the nec- essary information such as requirements and boundary conditions to Scenario 1: Increasing the load on a machine. the solution provider (e.g. special machine builder or drive supplier), • Examples: Higher number of cycles than previously planned, which are checked and evaluated by a development engineer and then larger masses than previously planned. implemented. Due to its engineering knowledge model, the autonomous agent • Possible solutions of the agent: A different machine model is can participate in the verification by the developer by, for example, proposed, a larger engine is proposed. checking the consistency of the solution, simulating processes and Scenario 2: Changing the requirements of a machine. evaluating predictions of the behavior after the changes (C4). • Examples: Different environmental conditions, higher accuracy If necessary, the autonomous agent is accompanied by an engi- neer while he applies the steps involved in implementing the solution, • Possible solutions of the agent: Other devices are proposed, a such as selecting elements to be replaced, changing the configuration reduced machine speed is proposed, another position sensor is and defining new parameters. By using the engineering configuration proposed. model with the means of finding solutions, risks during implemen- Scenario 3: Change of the solution offered by the supplier. tation are reduced by monitoring (D) the consistency of parameters and configurations at each intermediate step and applying success- • Examples: A new low-maintenance gearbox or a new, more ful procedures at implementation steps. In addition to simulation powerful conveyor module is available. and verification, undesired emergence, which could arise from au- • Possible solutions from the agent: A redesign of the machine or the line is proposed and argued (costs, benefits).

63 5 CHALLENGES FOR CREATING data is considered. Their combination with configuration tasks were AUTONOMOUS ADAPTING MACHINES also discussed by others, e.g. [3, 9]. We identify following technologies for realizing autonomous adapt- ing machines. Figure 3 depicts a summary of the proposed knowl- 6 SUMMARY edge separation. The configuration model of a machine represents In this paper, we propose the use of configuration technologies not all variants of the machine and its components [2]. The configura- only in the beginning of a product lifecycle, but also during run- tion model (depicted as CM-C) is distributed, i.e., the autonomous time of machines in production. Knowledge about variants and de- agent contains one part (CMA-C) of the configuration model and the pendencies, as well as reasoning methods known from the area of cloud of the component manufacturer another part (CMC -C). CMA- knowledge-based configuration can support the adaptation of ma- C contains the variants that are used and included during the time the chines. However, additional technologies, such as sensor evaluation, machine was manufactured. It is updated if the machine is adapted. as well as adaptation planning, monitoring and simulation have to Considering that only some components supplied by a component be considered. During our further research we will identify concrete manufacturer constitute a machine, CMA-C has to be extracted from application scenarios for guiding the research in the direction of au- the configuration model that represents all components of a manu- tonomous adaptive machines. facturer. CMC -C changes over time if the component manufacturer develops new components. Besides the configuration model the au- tonomous agent contains the actual configuration of the machine ACKNOWLEDGEMENTS (current running hardware and software of the plant), i.e., an instance We would like to thank the referees for their comments, which helped CM-I of the configuration model. Besides the configuration model improve this paper considerably. The paper is supported through the CM-C, a requirement model RM will describe all possible require- project ADAM granted by the Federal Ministry of Education and ments the components of CM-C shall supply [6]. Additionally to the Research of Germany. requirements and the configuration model, we consider here a sen- sor model as a further artifact for structuring the knowledge of an autonomous agent. The sensor model SM represents all sensors that REFERENCES can acquire values about states in the environment [3, 5]. This model [1] Wilfried Bohlken, Patrick Koopmann, Lothar Hotz, and Bernd Neu- also entails knowledge about thresholds for deriving qualitative val- mann, ‘Towards ontology-based realtime behaviour interpretation’, in ues about the world external to the machine. Those are mapped to Human Behavior Recognition Technologies: Intelligent Applications for the RM for deriving possible requirements R the machine has to ful- Monitoring and Security, eds., H.W. Guesgen and S. Marsland, IGI Global, pp. 33–64, (2013). fill [5, 6]. [2] A. Felfernig, L. Hotz, C. Bagley, and J. Tiihonen, Knowledge-Based Configuration: From Research to Business Cases, Morgan Kaufmann Mapping Publishers, Massachusetts, US, 2014. [3] Alexander Felfernig, Andreas Falkner, Muslum Atas, Seda Polat Erd- sensor requirements configura�on eniz, Christoph Uran, and Paolo Azzoni, ‘ASP-based Knowledge Repre- model model CM-C Instan�a�on model RM sentations for IoT Configuration Scenarios’, in Proc. of the 19th Config- uration Workshop, Paris, France, (September 2017). sensor [4] L. Hotz, A. Felfernig, M. Stumptner, A. Ryabokon, C. Bagley, and requirements R configura�on values CM-I K. Wolter, ‘Configuration Knowledge Representation & Reasoning’, in Knowledge-based Configuration – From Research to Business Cases, Figure 3. Separation of models for sensor, requirements, and component eds., A. Felfernig, L. Hotz, C. Bagley, and J. Tiihonen, chapter 6, 59– knowledge in general (upper row) and for one machine (lower row) 96, Morgan Kaufmann Publishers, (2013). [5] L. Hotz and K. Wolter, ‘Smarthome Configuration Model’, in Knowledge-based Configuration – From Research to Business Cases, By representing all those models and mappings as well as the iden- eds., A. Felfernig, L. Hotz, C. Bagley, and J. Tiihonen, chapter 10, 157– tified new sensor values in a reasoning tool, a new configuration can 174, Morgan Kaufmann Publishers, (2013). be inferred with commonly known technologies [4]. Monitoring and [6] L. Hotz, K. Wolter, T. Krebs, S. Deelstra, M. Sinnema, J. Nijhuis, and J. MacGregor, Configuration in Industrial Product Families - The verification of intended adaptations are further tasks which will apply ConIPF Methodology, IOS Press, Berlin, 2006. simulation technologies and high-level monitoring of (here intended) [7] K.C. Ranze, T. Scholz, T. Wagner, A. Gunter,¨ O. Herzog, O. Holl- activities [1]. The needed adaptations (e.g., component changes or mann, C. Schlieder, and V. Arlt, ‘A Structure-Based Configuration Tool: updates) have to be identified, e.g., by comparing the original config- Drive Solution Designer DSD’, 14. Conf. Innovative Applications of AI, uration and the adapted configuration. Furthermore, necessary plan- (2002). [8] S. Rockel, B. Neuman, J. Zhang, K. S. R. Dubba, A. G. Cohn, S.˘ ning actions have to be derived from a planning domain and finally Konecn˘ y,´ M. Mansouri, F. Pecora, A. Saffiotti, M. Gunther,¨ S. Stock, executed [8]. All those technologies have to be combined in an archi- J. Hertzberg, A. M. Tome,´ A. J. Pinho, L. S. Lopes, S. von Riegen, and tecture for autonomous adapting machines which includes decision L. Hotz, ‘An ontology-based multi-level robot architecture for learning about local and remote computations [8]. from experiences’, in Designing Intelligent Robots: Reintegrating AI II, AAAI Spring Symposium, Stanford (USA), (March 2013). In a later step of our research, a further challenge comes into play [9] D. Schreiber, Gembarski P.C., and R. Lachmayer, ‘Modeling and config- when interactions with other machines that become part of a collabo- uration for Product-Service Systems: State of the art and future research’, rative system as part of a changed manufacturing process are consid- in Proc. of the 19th Configuration Workshop, Paris, France, (September ered. Collaborative systems in manufacturing processes are also con- 2017). sidered in the area of Industrie 4.0. However, in this field the focus is to automatically setting up collaborative cyber physical systems for production, through adaptive systems, as we are considering here, the adaptation comes as a further challenge into consideration. In the field of Internet of Things (IoT) similarily the processing of senor

64 Constraint Solver Requirements for Interactive Configuration

Andreas Falkner and Alois Haselböck and Gerfried Krames and Gottfried Schenner and Richard Taupe1

Abstract. Interactive configuration includes the user as an essen- parameter, propagation of information, checking the consistency for tial factor in the configuration process. The main two components of a value, checking a configuration, autocompletion, explanation, and an interactive configurator are a user interface on the front-end and backtracking. a knowledge representation and reasoning (KRR) framework on the Matthieu Queva et al. describe in [17] requirements on interac- back-end. In this paper we discuss important requirements for the tive configurators mainly from the modelling perspective and call underlying KRR system to support an interactive configuration pro- for high-level, expressive languages like UML or SysML. They also cess while focusing on classical constraint satisfaction as one of the mention constraint modelling as an important aspect of configura- most prominent KRR technologies for configuration problems. We tion. From the viewpoint of constraint reasoning, Jeppe Madsen iden- evaluate several freely available constraint systems with respect to tified three fundamental interactivity operations in his master the- the identified requirements for interactive configuration and observe sis [14]: add constraint, remove constraint, and restoration. that many of those requirements are not well supported. In this paper, we concretize those ideas and focus on applications where a tailor-made user interface (UI) or legacy system is enhanced with solving capabilities, i.e., by calling a general constraint solver as 1 INTRODUCTION a component. An alternative would be to implement a special UI for Constraint satisfaction [18] is often used as an underlying reason- an integrated configuration system such as a commercial CPQ tool ing system for product configuration problems [12]. Product config- or sales configurator suite, but that bears the risk of vendor-lock-in. uration is usually an interactive task: Iteratively, the user makes a Figure 1 gives an overview of the addressed scenario: Users in- decision and the configurator computes the consequences of this de- teract with the configurator to achieve their goals, thus putting chal- cision. In his PhD thesis [6], David Ferrucci summarized the inherent lenges to user interface functionality. Those challenges pose reason- interactivity of configurators as follows: ing interaction requirements on the API of the used constraint solver. The solver is an off-the-shelf product (open-source or commercial) “Interactive configuration is a view of the configuration task and has a general API, partly dedicated to configuration problems. which includes the user as an essential component of a dy- Together with a domain-specific product model (or knowledge base, namic process. The interactive configurator is designed to as- KB), it forms the KRR component. This back-end is called by the sist the user in an interactive and incremental exploration of the control component which first hands over product model and user- configuration space. It may guide or advise the user’s decision set values for decision variables from the UI to the API and then making but it must communicate requirements or inconsisten- sends the solver results back to the UI. cies effected by the constraints in response to the user’s choices. This feedback helps the user to refine the configuration space toward a satisfying solution. The key component of an interactive configurator is the con- user interaction straint manager which is responsible for incrementally main- User taining the relationship between choices and constraint viola- Interface Configurator tions. [. . . ] The requirement to deliver meaningful and timely reasoner interaction feedback imposes significant demands on the flexibility, effi- Control Configuration API ciency and explainability of constraint management.”

Pieter van Hertum et al. studied in [9] how the knowledge base Product Model Constraint Solver paradigm – the separation of concerns between information and problem solving – could hold in the context of interactive config- uration. They identified a set of subtasks that overlaps well with the set of requirements we propose in this paper in Section 4: Ac- Figure 1. Components of an interactive configurator quiring information from the user, generating consistent values for a The configurator shall help a user to configure a product accord- 1 Siemens AG Österreich, Corporate Technology, Vienna, Austria ing to his/her needs and in full compliance to the product model. The [email protected], [email protected], [email protected], [email protected], user expects the configurator to show all necessary decisions (vari- [email protected] ations) in a clear and well-arranged way, to highlight or preset the Author names are ordered alphabetically. “best” alternative (value), to filter or grey-out infeasible values, to

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

65 recommend alternatives in case of conflicts, and to respond quickly the UML classes to arrays of variables and some other implemen- (preferably instantaneously) to user inputs. tation decisions, e.g., special handling of the “dynamic” parts, i.e., The remainder of this paper is organized as follows: In Section 2, number of seats dependent on the expected load and length – see the an example for interactive configuration is introduced. In Section 3, MiniZinc program in Listing 1. we give a problem definition of interactive configuration. We define the requirements for an interactive configuration API in Section 4 Listing 1. MiniZinc program for the Wagon example and investigate in Section 5 how some typical constraint solvers sat- % Constants , Domains i n t : m i n _ l e n g t h = 10000 ; isfy these requirements. We conclude the paper in Section 6 with a i n t : max_length = 20000 ; summary and future work. i n t : max_seats = max_length *4 d i v 1000 ; i n t : min_load = 50 ; i n t : max_load = 200 ; enum Color = { blue, red, white, noColor } ; 2 EXAMPLE enum Type = { standard , premium, special , noType } ; % Wagon We show the challenges of interactive configuration in a small ex- var m i n _ l e n g t h .. max_length : length_mm ; ample for a configurable product with components that can occur var min_load .. max_load : nr_passengers ; var 0 .. max_seats : n r _ s e a t s ; multiple times (similar to generative constraint satisfaction [7] or var 0 .. max_load : standing_room ; cardinality-based feature modelling [3]). var 0 .. 1 : nr_handrails ; A Metro train wagon has as configurable attributes the size (length % S e a t s array [ 1 .. max_seats ] of var Color : s e a t _ c o l o r ; in millimetres: 10000..20000) and the expected load (number of pas- array [ 1 .. max_seats ] of var Type : s e a t _ t y p e ; sengers: 50..200) which can be realized as seats or standing room. As % Handrail components we consider only seats (max. 4 per meter of length) and var Type : handrail_type ; handrails, and their number is configurable. % Constrain numbers c o n s t r a i n t nr_seats + standing_room = nr_passengers ; There is at most one handrail in a wagon (mandatory if there is c o n s t r a i n t nr_seats + standing_room/3 <= standing room) and it has a configurable type: “standard” or “pre- ,→ length_mm *4/1000 ; mium”. % Mandatory handrail for standing room with proper type c o n s t r a i n t standing_room > 0 −> nr_handrails = 1 ; A single seat consumes standing room for 3 persons and has as c o n s t r a i n t handrail_type ! = s p e c i a l ; configurable attributes the type (“standard”, “premium”, “special”) c o n s t r a i n t nr_handrails = 0 <−> handrail_type = noType ; c o n s t r a i n t nr_handrails > 0 −> f o r a l l ( i in 1 .. n r _ s e a t s and the color (“blue”, “red”, “white”). The type is constrained such ,→ where seat_type [ i ] ! = special) (handrail_type = that standard is not allowed to be mixed with premium (for seats and ,→ s e a t _ t y p e [ i ] ) ; handrails). The color of all seats must be the same, except for special % Same color and type for all seats but special constraint forall ( i in n r _ s e a t s +1 .. max_seats ) seats which have to be “red”. ,→ (seat_color [ i ] = noColor ) ; constraint forall ( i in n r _ s e a t s +1 .. max_seats ) Users expect the following (static) default values: type = standard, ,→ ( s e a t _ t y p e [ i ] = noType ) ; color = blue. Furthermore, they prefer to use all available space (as constraint forall ( i , j in 1 .. nr_seats where i < j ) ,→ ( s e a t _ t y p e [ i ] ! = s p e c i a l /\ s e a t _ t y p e [ j ] ! = defined by the length) for passengers (i.e., maximize the load factor). ,→ s p e c i a l −> s e a t _ t y p e [ i ] = s e a t _ t y p e [ j ] ) ; constraint forall ( i , j in 1 .. nr_seats where i < j ) ,→ ( s e a t _ t y p e [ i ] ! = s p e c i a l /\ s e a t _ t y p e [ j ] ! = Wagon ,→ s p e c i a l −> s e a t _ c o l o r [ i ] = s e a t _ c o l o r [ j ] ) ; constraint forall ( i in 1 .. nr_seats) (seat_type [ i ] = length_mm: 10000...20000 ,→ s p e c i a l −> s e a t _ c o l o r [ i ] = r e d ) ; nr_passengers: 50..200 % Use full length for passengers (avoid dead space) nr_seats: 0..200 solve maximize nr_passengers /length_mm ; % load factor standing_room: 0..200 nr_seats + standing_room = nr_passengers nr_seats + standing_room/3 ≤ 4*length_mm/1000 3 PROBLEM DEFINITION nr_seats = count(Seat) standing_room>0 → count(Handrail)=1 The main two parties in interactive configuration are the user and all-equal-type() the configurator. The user’s goal is to configure a product such that all-equal-color() maximize nr_passengers/length_mm it is a valid and complete product variant that meets all his/her in- dividual requirements. The configurator is a digital companion that 0..1 0..80 supports the configuration process by deriving consequences of the Handrail Seat user’s choices and by assisting to avoid and resolve conflicts. type: {standard, premium} type: {standard, premium, special} The simplest type of a user interaction is to set the value of a color: {blue, red, white} configuration parameter. This can be seen as answering a question type=special → color=red that the configuration tool asked. Example: For how many passen- gers should the wagon provide seats? Of course, the user should also Figure 2. Class diagram of the Wagon example. Default values are be able to withdraw his/her decision, which corresponds to unsetting underlined. Wagon.all-equal-type() stands for a constraint that all sub-parts a configuration parameter. The value of the parameter will then be must have the same type except for special. Wagon.all-equal-color() stands either undefined or the default value or set by solving. for a constraint that all associated seats (except if type=special) must have In most non-trivial configuration problems, a dynamic number of the same color. configuration objects plays an important role. Some authors of this paper have developed configurators for large industrial products for Figure 2 shows a UML class diagram for this sample specification, more than 25 years and always faced problems that could not be rep- including pseudo code for all constraints. A modelling in a standard resented as simple, static lists of configuration parameters (see [4]). constraint solver is more verbose because it requires the mapping of Therefore, another important type of user interaction is the creation

66 and deletion of configuration objects. In the example problem in Sec- tion 2, seats are configuration objects whose number is not known User action 1: beforehand, and where each seat can be configured individually (seat Set nr_passengers = 160 Solver changes: color and type). Typically, there are two different ways how the user domain(length_mm) = [13334,20000] manipulates the number of individual configuration objects in the domain(nr_seats) = [0,40] user interface: either by creating them one by one, or by specifying domain(standing_room) = [120,160] the number of objects and letting the configuration tool create the create handrail with type = standard individual objects. But in both cases, the result is a set of configu- ration objects whose number was not known beforehand and whose User action 2: properties can be configured further. This is not the case for the rep- Set nr_seats = 30 Solver changes: resentation of standing_room in our example. The user can set only domain(length_mm) = [18334,20000] the capacity as a number but no further individual properties. Thus, standing_room = 130 standing room need not to be modelled as configuration objects in create 30 seats with our example. type = standard color = blue

Definition (User Interaction). The main types of user interactions User action 3: in interactive configuration are: (i) create configuration object, (ii) Set standing_room = 140 delete configuration object, (iii) set/unset configuration object at- Solver proposes alternative conflict resolutions: tribute, (iv) set/unset association between configuration objects. 1. nr_passengers = 170 2. nr_seats = 20

We are aware that various approaches to interactive configuration User action 4: may define the list of main types of user interactions differently. For Unset nr_seats (accept proposal 2) example, structure-based configuration considers the following types Solver changes: of user interaction: parametrization, decomposition, integration, and domain(length_mm) = [16667,20000] nr_seats = 20 specialization [10]. Due to the limited scope of this paper, we focus delete 10 seats on types of user interaction that are most relevant in our experience. User interactions change the state of the configuration by mak- User action 5: ing decisions. Solver interactions make implicit knowledge explicit Set first seat’s type = special Solver changes: to assist the user. For instance, the consequence of a user set- first seat’s color = red ting the attribute type of a seat to special is that the solver will automatically set its attribute color to red because of the User action 6: Autocomplete (with optimization) according constraint – see, e.g., user action 5 in Figure 3. An- Solver changes: other typical solver interaction is domain filtering. If, e.g., the length_mm = 16667 (maximize load factor) nr_passengers was set to 160 as by user action 1 in Fig- ure 3, then the lower bound of length_mm would change to 13334 User action 7: Set handrail’s type = premium because of the constraint nr_seats + standing_room/3 ≤ Solver changes: 4*length_mm/1000 (for the case that nr_seats is 0) and anal- for all seats except first, type = premium ogously, for the case that length_mm is 20000, the upper bound of nr_seats would be 40 (not more because the lower bound of standing_room must be 120 to achieve the 160 passengers). Figure 3. Example of a configuration dialog for the configuration problem This knowledge was already implicitly contained in the configura- in Section 2 tion model of the problem, as described above, but the solver made its consequences explicit, thus creating a distinct benefit to the user. learned from previous configuration sessions or the support of group decisions. Definition (Solver Interaction). A solver interaction is a set of con- sequences following a user interaction. Typical solver interactions Definition (Interactive Configuration). An interactive configuration are: is an alternating sequence of user and solver interactions. • Set or change the value of a variable not yet set by the user • Remove or add a value to a variable domain An example of a typical configuration dialog between user and • Create or delete a configuration object because of resource de- configurator is shown in Figure 3. mands by the user (such as the number of seats) Having set up the definition of interactive configuration, we can • Explain a conflict in variable values sharpen our research question: To what extent do existing solver • Propose alternative solutions to a conflict frameworks support the different types of solver interactions in in- • Automatically complete a partial configuration (autocompletion) teractive configuration? This question is relevant both for compa- nies which sell configurable products/solutions and for solver tool Many of the solver interactions mentioned above must distinguish providers (which often originate from academia). Product vendors whether a configuration parameter has no value or a default value, or want to use solvers which cover as many requirements as possible whether it has been set by the user or by the solver, because user-set to avoid cumbersome workarounds or proprietary implementations. values are typically not allowed to be overwritten by the solver. Tool providers are interested in this question to better focus on those With the current rise of data analytics and machine learning tech- features of their tools that customers really need. Presently, solvers niques and tools, additional types of solver interactions will become seem to concentrate mainly on optimizing the performance of search state-of-the-art in the future, like recommendation of input values for a solution. An indicator for this is the high number of perfor-

67 mance challenges and competitions, such as the MiniZinc Challenge2 Requirement (StaticDefaults). Static default values are the most or the international SAT competitions.3 common form with built-in support by database systems and pro- Our contribution in this paper is the definition of the main require- gramming languages. See, e.g., solver changes after user actions 1 ments on an interactive configurator and a survey to which extent ex- and 2 in Figure 3. isting, non-commercial solvers fulfill those requirements. However, Requirement (DynamicDefaults). Dynamic default values can be we do not claim to be exhaustive, neither in requirements nor in eval- computed by almost arbitrary functions, which may use as input uated tools. other variables of the same configuration (similar to the constraint of Seat in Figure 2) as well as variable values from historic configu- 4 REQUIREMENTS rations (e.g., most popular value). In this section we summarize the most important requirements on Requirement (UnsetVariable). Support Undo and Unset (i.e., set to a configuration API for constraint solvers with the aim to facilitate UNDEF) of an earlier user decision, as, for example, in user action interactive product configuration. 4 in Figure 3. The user interface shall support overriding the default value as well as reverting to the default value. A “revert to default” capability is essential for good usability, guiding the users back to 4.1 Basic interactions the paved path from which they got lost. The most basic interaction between the front-end (UI) and the config- Requirement (SoftConstraints). An essential property of default uration API is to set or unset a configuration parameter. Usually, the values is that they can be overridden by solvers without loss of data. configuration domain is modelled in an object-oriented way, where If user-provided input values are regarded as constraints, then de- configuration parameters correspond to attributes of configuration fault values are soft constraints. objects. Creation and deletion of configuration objects and relation- ships between them are also fundamental user actions. For example, given that the default color of a seat is blue, but spe- cial seats are only available in red, then the solver overrides the de- Requirement (SetAttributeOrAssociation). The user can set or fault color selection with red for all special seats – see user action 5 change an attribute of a configuration object (e.g., the length of a in Figure 3. Usually this would not be perceived as data loss. On the Wagon) or an association link between two configuration objects. other hand, if color blue has been chosen by a deliberate user action, The constraint solver model must be updated to be in sync with the and the seat type is changed to “special”, then the configuration is front-end model. inconsistent. In this case, the user must be notified and prompted for Requirement (CreateOrDeleteConfigurationObject). The user can a corrective action as discussed in Section 4.5 – the color must not create or delete configuration objects (e.g., create an instance of be changed without prior confirmation by the user. Handrail). The constraint solver model must be updated to be in sync with the front-end model. 4.3 Filtering Our assumption to use a 3rd-party constraint solver as underlying In an interactive configuration session, a user may be faced with reasoning system makes a transformation from the domain model to many possible choices. For example, the number of configuration the solver model necessary. Most constraint solvers are optimized in parameters for which the user can choose a value can be very high, dealing with flat (i.e., not object-oriented) integer-valued variable do- and the number of possible values that can be chosen for a given pa- mains. This mapping from the object model to the constraint model rameter can also be very high. A user might consecutively set several is often cumbersome. Especially the encoding of objects and links parameters to values which at the end, possibly after several further between objects is not trivial (for instance see [20]). steps of interaction between solver and user, do not allow a valid so- lution compatible to all those choices. The goal of filtering is to offer Requirement (ModelTransformation). The configuration API to the user as few alternatives as possible which do not lead to a valid should support mapping between a high-level, object-oriented do- solution. main model to the constraint model. Filtering can be used to grey-out values in the UI that are incon- sistent with already user-set values (or communicate to the user in Requirement (ExpressivenessOfConstraintLanguage). Constraints another proper way that they are infeasible) so that the user cannot are the central tool for expressing dependencies between configu- choose them and end up in an invalid solution, e.g., the domains after ration objects. The language of the constraint solver must be rich user actions 1 and 2 in Figure 3. enough to allow the formulation of all necessary integrity/consisten- Hiding those inconsistent values completely from the user, e.g., cy/resource constraints of the configuration domain. not showing values < 13334 for length_mm after user action 2 in Figure 3, is often not a good option as it reduces information and flex- 4.2 Preferred and default solutions ibility (i.e., the possibility to change decisions) of the user too much because the user would not know that he/she can set length_mm to Default values are an essential means for enhanced usability, because 12000, for example (see Section 4.5 for more details). the user is not forced to type in each parameter. In practice, default values are also a means of recommendation, guiding the user to the Requirement (Filtering). The solver shall be able to filter invalid most common configuration. We distinguish static and dynamic (or values from domains (up to a certain degree of consistency) and to computed) default values. return the current domain of a given variable on request.

2 https://www.minizinc.org/challenge.html The challenge is to efficiently find all inconsistent values and show 3 https://www.satcompetition.org/ those for variables in the window that the user currently sees, even

68 for complicated constraints. Besides the computational performance 4.5 Explanation (the problem is NP-hard in general), it may also be difficult to present them to the user in a clear and understandable way – e.g., for an Unless the solver maintains global consistency after each user inter- integer number variable show all even values greater than 100 except action (which is too expensive especially when fast response times 2000. are required, cf. Section 4.3), users can reach a dead end, i.e., a state Filtering, which is also found as propagation or constraint infer- where they have to revise a decision to be able to find a solution. It ence in the literature, is one of the techniques which constraint solv- may also happen that a conflict is produced because the user delib- ing uses to tackle its underlying computational complexity. While erately sets a value that has already been filtered away by the solver, search explores a solution space opened up by variables for which like in user action 3 in Figure 3. several possible value assignments exist, propagation deals with de- The goal of explanation is to assist the user in this situation by terministic assignments that are forced by constraints. It may be in- suggesting previously made choices to be undone. For a good user terleaved with search or done as a preprocessing step [2, 19]. experience it is important that these suggestions are understandable, Constraint propagation algorithms usually assume the domains of i.e., violated constraints should be explained by descriptive prose. the variables to be finite. They also assume constraints to be binary, Requirement (Explanation). The solver shall be able to explain in i.e., to involve exactly two variables. However, all n-ary constraints understandable terms why a current state cannot be extended to a can be transformed to binary ones [2, 19]. If a CSP (in which some valid solution. domains may already have been tightened down) can be extended to a solution, it is called globally consistent. Since global consistency Usually we distinguish between background constraints and user is very expensive to maintain, various forms of local consistency – constraints. Background constraints originate from the problem spec- mainly arc consistency – are used in practice [2, 19]. Another ap- ification, cannot be changed and are assumed to be correct. Only user proach is to compile a CSP into a data structure that can maintain constraints, i.e., constraints originating from user interactions, can be global consistency, such as an automaton [1] or a binary decision part of explanations. A subset of user constraints is a conflict if the diagram (BDD) [15]. problem obtained by combining this subset with background con- straints has no solution. There might be an exponential number of conflicts that explain the inconsistency of an over-constrained prob- 4.4 Solving lem. For this reason, QuickXPlain [11] can be used to compute pre- After having decided values for “important” variables, a user expects ferred explanations. Conflict diagnosis like QuickXPlain can be com- the system to automatically complete the partial solution (i.e., the bined with a hitting-set algorithm to compute minimal diagnoses, user-set values) to a valid solution. As an alternative, the commercial i.e., minimal sets of faulty constraints. A diagnosis is a subset of user configurator Tacton CPQ4 offers a complete solution to the user all constraints so that the problem becomes consistent when this subset the time. In general, the following user actions shall be supported by is removed from it [5]. the API of a constraint solver. While methods like QuickXPlain focus on conflicts, i.e., on what has gone wrong, corrective explanations have been proposed to fo- Requirement (ValidSolution). Computation of a valid solution for cus on how to proceed in interactive solving towards a valid solution. the current state of the UI, i.e., user-set values are to be treated as A corrective explanation is more than a diagnosis in the sense that fixed. The implementation of this requirement can also be used for it does not just point out constraints (i.e., assignments of values to checking whether there is a valid solution or not. variables) that have to be retracted, but also proposes alternative as- signments that guarantee to yield a solution [16]. Requirement (OptimalSolution). Computation of an optimal solu- tion, preferably with multiple optimization criteria. Requirement (CorrectiveExplanation). The solver shall be able to suggest user actions suitable to correct a current state that cannot be User action 6 in Figure 3 is an example for a simple objective extended to a valid solution. function: maximizing the load factor minimizes the length_mm if the nr_passengers is already set. 4.6 Integrability Requirement (NextSolution). Computation of the next solution: In the case of Requirement (ValidSolution), this should be no- Today’s typical enterprise IT landscape is a system of systems with ticeably distinct from previous solutions so that the user gets many internal and external interfaces. It must be possible to integrate the full bandwidth of potential solutions. In the case of Require- a newly developed configurator into the existing infrastructure in or- ment (OptimalSolution), it will be equally good as the previous solu- der to be accepted and used. While the configurator will use other tions or – for multiple optimization criteria – another instance in the components such as a generic solver or a persistence service to fulfil Pareto front. its tasks, it will also itself be invoked by other components. In order to get widely adopted, a constraint solver should help companies to ClaferMOO Visualizer5 is an example for an optimizer which sup- meet the challenges of integration, or at least it should not impose ports handling the Pareto front. new ones. Requirement (IncrementalSolving). In an interactive configuration Requirement (LanguageAPI). The constraint solver shall provide scenario, the user sets one variable after the other – without a prede- a well-documented API to a programming language that is widely fined order. Performance might be increased if the solver supported accepted and used by companies, such as Java. incremental solving, i.e., continue with the latest state and not start Command-line interfaces to the solver are appreciated for testing, from scratch after each user input. but using them for integration is a potential source of problems for 4 http://www.tacton.com/tacton-cpq/ integration and long-term maintenance. For this reason, a command- 5 http://www.clafer.org/p/software.html line interface alone is not deemed sufficient.

69 Requirement (TestSupport). The constraint solver shall support de- To make our findings easily reproducible we have chosen freely bugging and automated testing. available systems, each representing a different constraint solving paradigm: One offers a standardized constraint modelling language, This can be implemented by compatibility with existing debug- one is propagation-based, one is a representative of constraint logic gers and testing tools of the host language as well as tool-specific programming, and one is SAT-based. All systems allow the definition facilities. of classical static constraint problems with variables and constraints Requirement (MaintainanceSupport). The constraint solver shall and provide basic solving capability (solving, optimization), but they support long-term maintenance of the configurator. differ in their support for interactive solving. We use a simplified ver- sion of the running example to illustrate the different ways to realize While the exact requirements cannot be anticipated, contributing user inputs and domain filtering. factors could be: We are aware that there are integrated configuration systems on the market that fulfil the requirements posed in this paper at least • type-safe API language partially. However, as already mentioned in Section 1, we focus on • portable, standardized language for implementation and API pure constraint systems that can be integrated into other applications • use of standard and well-established tools without bearing the risk of vendor-lock-in. Requirement (NotificationAPI). The constraint solver API shall support a notification concept, such as listeners or callbacks. Given 5.1 MiniZinc the change of one input variable, which output variables change their MiniZinc is a solver-independent constraint modelling language. values? What previously violated constraints are now fulfilled and The MiniZinc system is free and open-source. It includes various vice versa? command line tools and the MiniZinc IDE for editing and solv- Requirement (ConstraintLanguage). The constraint solver shall ing MiniZinc models. For solving, the high-level MiniZinc mod- support a well-established language for constraint definition, such els are compiled into FlatZinc, which is a low-level standard for as MiniZinc or XCSP. defining constraint problems supported by most current constraint solvers. One reason for this is the annual MiniZinc challenge, which This requirement is not a replacement but complements Require- is the most popular solving competition for constraint solvers. As ment (LanguageAPI). MiniZinc is a de-facto standard for representing constraint prob- lems, it is a possible solution for a system satisfying Require- Requirement (LicenseCompatibility). The license model shall be ment (ConstraintLanguage). friendly in order to encourage industrial usage. Currently there is no way to directly access the domain filtering ca- Companies are reluctant to adopt open-source components with pabilities of a solver from MiniZinc. However, users can use solving “sticky” licenses such as GPL, because they fear the legal risks im- to emulate domain filtering. For example, to compute the bounds of a posed on the own intellectual property. A commercial license at a variable domain the optimization statement of the original Listing 1 fair price is more likely to be accepted than a sticky open-source li- can be replaced with the following solve statements to determine the cense. In general, industry-friendly licenses such as BSD and MIT lower and upper bounds of a variable. style licenses will be appreciated. # determine lower bound Note that this also applies to the components depending on the solve minimize standing_room ; solver: a dependency on a 3rd-party component with restrictive li- # determine upper bound cense conditions is very likely to inhibit the adoption of the solver. solve maximize standing_room ; As a rule of thumb, all dependencies shall have the same or a more MiniZinc does not provide default reasoning out of the box, but liberal license than the constraint solver component. it can be implemented with optimization or special heuristics all of which can be expressed in MiniZinc. Another option would be to use Requirement (MinimumDependencies). The list of dependencies a system like MiniBrass6 which adds soft constraints and preferences (libraries but also other resources) of the solver shall be as short to the MiniZinc system. as possible, because each added dependency is also regarded as an To integrate MiniZinc into a software system, various approaches additional burden in terms of version management, license and secu- are possible. One such approach consists of manipulating the Mini- rity assessment, and long-term maintenance. Zinc files programmatically and call the various tools (MiniZinc to In this section we concentrated on the integration of a constraint FlatZinc compiler, solver executable) via system calls and standard solver into a product configurator application via a configuration I/O. A more efficient way for manipulating MiniZinc models is using 7 8 API. There are many other aspects for integration of configurators, libraries like JMiniZinc or PyMzn. e.g., interfaces to customer relation management (CRM), product Another integration option would be to compile the MiniZinc file data management (PDM), enterprise resource planning (ERP) sys- to FlatZinc and use the FlatZinc parser of the used solver. This has tems and the like, or generation of quotations and other documents. the benefit that solving can be controlled by the solver API, but the Despite being considered more important by product managers and downside is that it is not always straightforward to map the low-level consuming much more effort in configurator solutions than product FlatZinc constraints and variables to that of the high-level MiniZinc modeling itself, such aspects are out of scope of this work. model. This mapping is essential for interactive solving as the user interface must provide feedback in terms of the high-level constraints and variables. 5 SURVEY OF EXISTING SOLVERS 6 https://isse.uni-augsburg.de/software/minibrass In this chapter we will investigate how some constraint systems sat- 7 https://github.com/siemens/JMiniZinc isfy our proposed requirements for an interactive configuration API. 8 http://paolodragone.com/pymzn/

70 5.2 Choco P i c a t > NR_SEATS:: 0 .. 80 , STANDING_ROOM:: 0 .. 200 , NR_PASSENGERS:: 50 .. 200 , Choco is a well-established constraint library written in Java. It sup- NR_SEATS + STANDING_ROOM #= NR_PASSENGERS, ports integer, boolean, set and real variables as well as basic con- NR_SEATS #= 4 0 . / / o u t p u t : straint expressions and global constraints. Constraint problems are / / NR_SEATS = 40 / / STANDING_ROOM = DV_015c90_10 .. 160 defined using Choco’s Java API. / / NR_PASSENGERS = DV_015d28_50 .. 200 The following example shows how interactive configuration can be realized with Choco’s API. A user input is simulated by post- Using the concept of action rules, it is possible to get notified of ing a constraint containing the assignment selected by the user. The domain changes of constraint variables. Action rules have the form Choco constraint solver is based on constraint propagation [13]. Con- Head, Cond, {Event} => Body. Examples for events are: straint propagation can not only be utilized during solving, but also to compute the current domains of the constraint variables. The ex- • ins(X), when a variable gets instantiated ample shows how variable domains are narrowed down by propaga- • bound(X), when the bounds of a variable change tion after calling Solver.propagate(), i.e., Choco satisfies Require- • dom(X,T), when a value T gets excluded from the domain of X ment (Filtering).

// Initial domains : Therefore it is possible to trace all variable domains that have been / / n r _ s e a t s = {0 .. 80} effected by a user interaction, cf. Requirement (NotificationAPI). // standing_room = {0 .. 200} // nr_passengers = {50 .. 200} Picat programs can be executed as shell scripts. Picat allows to // post constraint define predicates by C functions. Unfortunately, it is currently not m.arithm(nr_seats ,"+", standing_room , "=", ,→ nr_passengers).post() ; possible to call Picat from C, so integration must be done via standard // User input I/O. m.arithm(nr_seats , "=", 40).post() ; // Domain filtering m.getSolver() .propagate() ; // Updated domains 5.4 CP-SAT Solver / / n r _ s e a t s = 40 // standing_room = {10 .. 160} // nr_passengers = {50 .. 200} As an example of a constraint solver based on a non-constraint prop- agation paradigm we have chosen the SAT-based Google CP-SAT If the propagation engine encounters an inconsistency, a Solver, which is part of Google OR-Tools.11 ContradictionException is raised. Choco provides an expla- As the CP-SAT Solver is SAT-based, it does not provide an API nation engine based on [22] for generating explanations for contra- for accessing the current domain of variables. Therefore it is best dictions found during solving. Unfortunately, the explanation engine treated as a black-box solver, i.e., by defining the constraint prob- is by default not active during propagation. As an alternative a solver- lem and solving it. Domain filtering can be simulated by calling the independent algorithm like QuickXPlain can be adapted for Choco. solver for a specific domain value or, as in the example below for Choco does not support default values out of the box. One light- bounded domains, calling minimize/maximize for a variable to com- weight implementation of default values is to use search strategies pute its lower and upper bound. In the example the lower bound of similar to the approach in [8]. the variable standing_room is computed. Of course this strategy Choco is implemented in Java and therefore easy to integrate into only works for constraint problems where efficient solving is pos- an enterprise IT landscape. The source code of Choco is available on sible, otherwise the response time of the interactive system would GitHub9 and pre-built libraries are available for Maven.10 As Choco deteriorate. is open source, missing features can be easily added to the current code base. On the downside the implementation of the features often from ortools.sat.python import cp_model model = cp_model .CpModel() requires detailed knowledge of the API and must be maintained when n r _ s e a t s = model.NewIntVar(0,80,"") the API changes considerably (which has occurred in the past). standing_room = model.NewIntVar(0,200,"") nr_passengers = model.NewIntVar(50,200,"") model . Add ( sum ( [ nr_seats ,standing_room ] ) == nr_passengers) u i = model.NewIntVar(40,40,"") 5.3 Picat model.Add( nr_seats == u i ) # Simulate domain filtering : Picat is a logic-based multi-paradigm language well-suited for solv- # Calling minimize for a variable ing constraint problems. Most that will be said about Picat and in- # finds the lower bound teractive constraint solving will also apply to other constraint logic model.Minimize(standing_room) s o l v e r = cp_model.CpSolver () programming systems. s t a t u s = solver .Solve(model) In Picat, constraint programming support is added through the cp p r i n t (solver .Value(standing_room)) # Output : 10 module. After importing the cp module, constraint variables can be declared with VAR::DOMAIN. Constraints can be formulated with The CP-SAT solver is written in C++, but via SWIG12 also an variable expressions (using special operators preceded with #), pred- API in Java, C# and Python is provided, which makes it one of icates and global constraints like all_different etc. The fol- the few constraint solvers available in Python. Regarding Require- lowing shows the Picat implementation of the aforementioned exam- ment (LanguageAPI) this provides a very good integrability for the ple. In Picat every additional constraint expression triggers constraint basic functionality of the solver, although some special features can propagation and the variable domains are adapted accordingly. only be accessed through the C++ API. 9 https://github.com/chocoteam/choco-solver 10 https://search.maven.org/artifact/org. 11 https://github.com/google/or-tools/ choco-solver/choco-solver/ 12 http://www.swig.org/

71 5.5 Preliminary findings REFERENCES This brief survey of currently available constraint solvers is by no [1] Jérôme Amilhastre, Hélène Fargier, and Pierre Marquis, ‘Consistency means complete. Its main purpose was to illustrate the diversity of restoration and explanations in dynamic csps application to configura- tion’, Artif. Intell., 135(1-2), 199–234, (2002). systems available for constraint solving and how each has particular [2] Christian Bessiere, ‘Constraint propagation’, In Rossi et al. [18], 29–83. strengths and weaknesses when it comes to satisfying the proposed [3] Krzysztof Czarnecki, Simon Helsen, and Ulrich W. Eisenecker, ‘For- requirements for interactive solving. malizing cardinality-based feature models and their specialization’, The focus of most constraint systems is on modelling and effi- Software Process: Improvement and Practice, 10(1), 7–29, (2005). [4] Andreas A. Falkner, Gerhard Friedrich, Alois Haselböck, Gottfried cient solving. Only few solvers contain special features for interac- Schenner, and Herwig Schreiner, ‘Twenty-five years of successful ap- tive solving. Even in the case of solvers based on constraint prop- plication of constraint technologies at Siemens’, AI Magazine, 37(4), agation, the main purpose of constraint propagation is to assist the 67–80, (2016). solver during search. Another indicator that features like domain fil- [5] Alexander Felfernig, Monika Schubert, and Christoph Zehentner, AI tering are sometimes less important than performance is the fact that ‘An efficient diagnosis algorithm for inconsistent constraint sets’, EDAM, 26(1), 53–62, (2012). Google recommends to use the new SAT-based constraint solver in- [6] David Angelo Ferrucci, Interactive configuration: a logic stead of the legacy constraint propagation based solver contained in programming-based approach, Ph.D. dissertation, Rensselaer Poly- Google’s OR-Tools due to performance.13 Only one system (Picat) technic Institute Troy, NY, USA, 1994. provides a documented mechanism how to get notified about domain [7] Gerhard Fleischanderl, Gerhard Friedrich, Alois Haselböck, Herwig Schreiner, and Markus Stumptner, ‘Configuring large systems using changes of constraints variables. Some solvers like Choco include generative constraint satisfaction’, IEEE Intelligent Systems, 13(4), 59– an explanation engine. None of the solvers we are aware of supports 68, (1998). default reasoning out of the box. All in all this supports our initial [8] Alois Haselböck and Gottfried Schenner, ‘A heuristic, replay-based ap- suspicion that interactive solving is a neglected aspect of constraint proach for reconfiguration’, in Proceedings of the 17th International Configuration Workshop, Vienna, Austria, September 10-11, 2015. systems. , eds., Juha Tiihonen, Andreas A. Falkner, and Tomas Axling, volume Of course, the decision which solver to use is complex and will 1453 of CEUR Workshop Proceedings, pp. 73–80. CEUR-WS.org, not be driven by interactive features alone. Most of the interactive (2015). features can be implemented (although sometimes less efficiently) [9] Pieter Van Hertum, Ingmar Dasseville, Gerda Janssens, and Marc De- by treating the solver as a black box. necker, ‘The KB paradigm and its application to interactive configura- tion’, TPLP, 17(1), 91–117, (2017). As can be seen even in our simple example, it is not trivial to trans- [10] Lothar Hotz, Thorsten Krebs, and Katharina Wolter, ‘Combining soft- late a constraint problem from one solver to another. To avoid the risk ware product lines and structure-based configuration – methods and ex- of getting locked-in to a specific solver API, it might be better to use periences’, in Workshop on Software Variability Management for Prod- a solver-independent constraint modelling language like MiniZinc. uct Derivation – Towards Tool Support, (2014). [11] Ulrich Junker, ‘QUICKXPLAIN: preferred explanations and relax- ations for over-constrained problems’, in Proceedings of the Nineteenth 6 CONCLUSION National Conference on Artificial Intelligence, Sixteenth Conference on Innovative Applications of Artificial Intelligence, July 25-29, 2004, San Real-world configuration problems are often interactive in nature, Jose, California, USA, eds., Deborah L. McGuinness and George Fer- guson, pp. 167–172. AAAI Press / The MIT Press, (2004). i.e., they include the user as an essential factor in the configuration [12] Ulrich Junker, ‘Configuration’, In Rossi et al. [18], 837–873. process. To solve configuration problems, constraint satisfaction is [13] Narendra Jussien and Olivier Lhomme, ‘Unifying search algorithms for often used as an underlying reasoning system. In this paper, we have CSP’, Rapport technique, École des Mines de Nantes, (2002). made two contributions: First, we have proposed a set of require- [14] Jeppe Nejsum Madsen, Methods for Interactive Constraint Satisfac- ments that a constraint solver should fulfil to support interactive con- tion, Master’s thesis, Department of Computer Science, University of Copenhagen, 2003. figuration. Second, we have presented the results of a small survey [15] Andreas Hau Nørgaard, Morten Riiskjær Boysen, Rune Møller Jensen, covering a selected set of non-commercial off-the-shelf constraint and Peter Tiedemann, ‘Combining binary decision diagrams and back- systems. tracking search for scalable backtrack-free interactive product configu- Our findings show that interactive aspects of configuration are ne- ration’, In Stumptner and Albert [21], pp. 31–38. [16] Barry O’Callaghan, Barry O’Sullivan, and Eugene C. Freuder, ‘Gener- glected by most constraint systems. They also show that the land- ating corrective explanations for interactive constraint satisfaction’, in scape of available system is highly diverse and that each solver has Principles and Practice of Constraint Programming - CP 2005, 11th its own strengths and weaknesses when it comes to satisfying the International Conference, CP 2005, Sitges, Spain, October 1-5, 2005, requirements proposed in this work. Proceedings, ed., Peter van Beek, volume 3709 of Lecture Notes in This is preliminary work. Neither our list of user and solver inter- Computer Science, pp. 445–459. Springer, (2005). [17] Matthieu Stéphane Benoit Queva, Christian W. Probst, and Per Vikkel- actions nor our list of requirements nor our survey is exhaustive. We søe, ‘Industrial requirements for interactive product configurators’, In focused here on those requirements and tools that we experienced as Stumptner and Albert [21], pp. 39–46. most important in our daily work on industrial product configuration. [18] Handbook of Constraint Programming, eds., Francesca Rossi, Peter van Future work should extend this study to cover more requirements Beek, and Toby Walsh, volume 2 of Foundations of Artificial Intelli- gence, Elsevier, 2006. (like aspects of guided selling, prediction of default values, recom- [19] Stuart J. Russell and Peter Norvig, Artificial Intelligence - A Modern mender features) and more constraint systems. Assessment of each Approach (3. internat. ed.), Pearson Education, 2010. constraint system shall be done in a more systematic and complete [20] Gottfried Schenner and Richard Taupe, ‘Encoding object-oriented mod- way and summarized in tables. els in MiniZinc’, in Fifteenth International Workshop on Constraint We invite the configuration and constraints communities to pro- Modelling and Reformulation, (2016). [21] Markus Stumptner and Patrick Albert, eds. Proceedings of the IJ- pose implementations of constraint systems that are suitable to sup- CAI–09 Workshop on Configuration (ConfWS–09), 2009. port interactive configuration. [22] Michael Veksler and Ofer Strichman, ‘A proof-producing CSP solver’, in Proceedings of the Twenty-Fourth AAAI Conference on Artificial In- 13 https://developers.google.com/optimization/cp/cp_ telligence, AAAI 2010, Atlanta, Georgia, USA, July 11-15, 2010, eds., solver Maria Fox and David Poole. AAAI Press, (2010).

72 Portfolio Management: How to Find Your Standard Variants

Frank Dylla and Daniel Jeuken and Thorsten Krebs 1

Abstract. necessarily), two trucks, i.e. spring mounted axles, and four wheels Product portfolio management is one of the most important tasks with bearings. Optional components may be sliptape, paintings, ris- for companies to secure their future competitiveness. A crucial as- ers, shock pads, nose/tail guards, etc. Consider that not all truck types pect for portfolio management decisions is the volume of products fit to each deck and that not all wheel/truck combinations fit. An in- sold and the sales numbers development over time – one could say: dividual composition of these components is sought by the customer What are your current or upcoming best selling products, often used – leading to very few skateboards that are sold with the exact same as “standard products” in sales? Especially for these products it is composition of deck, axes, wheels, and so on. worthwhile to take actions in reducing costs and improving revenue. From our experience, for new multi-variant products it is common Regarding discrete products the task is, simply said, looking for prod- that product managers guess which variants will be the top selling ucts with the highest quantities or profit sold or significant changes in ones in future, i.e. the decision is based on their gut feeling. Evalu- these quantities over a certain period of time (business intelligence). ation of the quality of their initial decision is barely feasible as only In contrast, this approach does not work satisfactorily with complex standard BI techniques are available. These techniques are not suffi- multi-variant products. An aggregated view on products, i.e. ignoring cient for portfolio planning of multi-variant products as they ignore the sales numbers of the variants with their individual features, does the structural information of the variants themselves. Standard tech- not give sufficient insights or may even lead to wrong decisions in niques typically analyze the list of sales over a certain period of time portfolio management. The recurring combination of features across and use product identifiers as the key to identify which one is sold the multiple types of products might be more important than the type of most and predict how this will change in future. But for variant-rich the product itself. In this paper we investigate differences in identify- products that are sold in lot size 1 the product identifier cannot be ing potential standard products in comparison to identifying potential used as a key criterion. It is rather important to compare characteris- standard variants of products. Thereon we derive a high-level frame- tics and their values. For example, comparing the product ID, which work how standard variants may be deduced from a given set of vari- identifies an individual composition, does not identify that a lot of ants described by characteristics and provide an algorithmic sketch skateboards use the same wheels. Thus we consider it is important to and discuss resulting challenges from a pragmatic perspective. use the configuration model - containing product data and rule sets - as an input for a new kind of algorithm that does not compare on the level of product identifiers but on the level of a set of product char- 1 Introduction acteristics, which supports better predictions of top-selling variants, Portfolio management is a dynamic decision process evaluating, i.e. what the market really is willing to pay for. prioritizing, reorganizing, cancelling, etc. products throughout their In order to support the step of evaluating past sales in comparison lifecycle [6]. As managers have to deal with uncertain and changing to original plannings on the level of characteristics and their values, information portfolio management is a complex task. One of the ma- we introduce the notion of central representative and propose a po- jor difficulties in product portfolio management is predicting what tential calculation thereof. We discriminate against the term ”stan- the customers are willing to pay for. This includes knowing the mar- dard product”, standard variants respectively, as this term describes ket, i.e. knowing the current customer demand and knowing how it products which were actually built many times. As you will see later will most likely change in future. Thus, product portfolio manage- a central representative does not need to have been built once. We are ment is complex already when considering simple products, but gets convinced that central representatives will help to recognize changes more complex when considering configurable and thus multi-variant in client behavior – or the market in general – over time and whether products. adaptions are reasonable in order to meet the goals of portfolio man- But what exactly is the challenging part of this task? Forecasts agement. are created in order to plan the supply chain and production capac- We start with introducing our understanding of product configura- ities. For simple products this is a rather straightforward task: one tion, which is constraint-based, and introduce diverse variant spaces can assign a sales forecast to the product identifiers, e.g. material for later use (Sec. 2.1). We consider definitions of discrete standard numbers, and use the bill of materials in order to get a list of compo- and basic products (Sec. 2.2) and elaborate how this relates to stan- nents that are required. For variant-rich products such as skateboards, dards for multi-variant products (Sec. 2.3). We sketch our approach however this is not that easy. In general necessary components of a in Section 3. To avoid misunderstandings with varying definitions of skateboard2 are the deck, i.e. a plank (in general wooden, but not ‘standard’ we introduce the term central representative of a given variant space described by characteristics (Sec. 3.1). In order to find 1 encoway GmbH, Germany, email: {dylla,jeuken,krebs}@encoway.de such a central representative a measure of dissimilarity needs to be 2 see en.wikipedia.org/wiki/Skateboard (retrieved 8.5.2019)

Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

73 defined (Sec. 3.2). In Section 3.3 we exemplify how a representative In more detail, if an assignment contains multiple values for one can be computed and how a deviation can be derived thereon. We or more characteristics ki it contains at least two different atomic summarize our considerations in an algorithm sketch (Sec. 3.4). We solutions. For example, given an assignment where a characteristic discuss our approach from various pragmatic perspectives (Section contains a value, e.g. for deck A and deck B, this means that the 4). First, we revisit the choice of the set of product vectors P for customer at some configuration front-end can still decide for either which the central representative should be computed (Sec. 4.1). Fur- deck A or B resulting in a valid assignment definitely. In the re- thermore, in general data is not available in a well defined form in re- mainder of this paper we will use ki synonymously for referring to ality, i.e. not all characteristics and values are defined in a consistent the characteristic itself as well as for its evaluation, i.e. value assign- manner (Sec. 4.2). Additionally, multi-variant products are subject to ment, as the meaning becomes clear from the context in most cases. change such that older products may falsify the results that should In case of ambiguities we clarify the meaning. reflect the current state (Sec. 4.3). Finally, we consider derivation of In order to discuss notions of standard variant, we need to define parameters and further prerequisites necessary in order to apply the several solution spaces based on the CSP definition. algorithm presented to real data. (sec. 4.4). ∅ Definition 3. Theoretical Configuration Space (S ): This space contains all combinations of characteristics which are possible from 2 Theoretical background a mereological perspective, i.e. from all minimalist configurations to all maximum configurations containing all optional components, 2.1 What is product configuration? but ignoring further constraints. In terms of CSP this is reflected by ⟨K, D, ∅⟩. Felfernig et al. [8] base their understanding of configuration on a def- inition in [18]: configuration is a special case of design activity where Consider the skateboard example. The minimalist configurations the artifact being configured is assembled from instances of a fixed consist of a deck, two trucks, and four wheels as these components set of well defined component types which can be composed conform- are necessary to obtain a functional skateboard from a mereologi- ing to a set of constraints. A configuration task is the selection of the ∅ cal perspective. A configuration with two decks is not part of S , components and their properties to get a valid combination of the whereas configurations with different truck or wheel sizes are part product components, the outcome is also called product variant [4]. ∅ of S , although this may make the skateboard unusable. Maximum As a result the component types span a space of potential config- configurations consist of the above components plus all optional urations which are further restricted by constraints, which limit the components which can be installed in parallel. As risers and shock possibilities of how components can be combined. Practice shows pads are installed in the same place3, there is no maximum config- that the restrictions may arise from technical feasibility, legal re- uration containing both components. As we are interested in valid quirements, product-design, or marketing purposes. In general, com- configurations in the end, we need to define a second configuration ponents or properties of a product are described by characteristics in space. formal product representations. There are additional notions to de- scribe properties of components like attributes or features. For rea- Definition 4. Valid Configuration Space or variant space (S): This sons of simplicity we will restrict to the term characteristics through- space contains only valid configurations, i.e. configurations that sat- out this paper. Based on this we can define a product characteristics isfy all constraints of the underlying configuration model. Therefore vector, product vector for short. ∅ it is given S ⊆ S . As valid configurations are also called variants, we will talk of ’variant space’ in the remainder of this paper. The Definition 1. Given a set of characteristics k with i i ∈ K ∈ variant space directly relates to the space of atomic solutions of a 0,...,N 1 with values from domain D each, we define { − } i ∈ D CSP. [k0, k1, . . . , kN−1] as the product (characteristics) vector p⃗.

We note that N denotes the maximum number of possible charac- Taking the skateboard example again, configurations with differ- teristics. Especially, if a characteristic is optional, a specific domain ent wheel sizes are not part of S, whereas configurations with differ- value must be available defining that this characteristic is not chosen, ent wheel colors may be, depending whether such configurations are evaluated respectively. As combinations of domain values are not re- permissible with reference to the configuration model. stricted the product vector may reflect a product which is technically Other variant spaces may be defined on the ‘trading status’ of each not feasible. p⃗ contained, for example:

Naturally, a configuration task can be considered as a constraint O Definition 5. Offered variant space (S ) and sold variant space satisfaction problem (CSP), see e.g. [8]. $ O (S ): S is defined as the space of all variants which have been quoted to customers. $ contains only those variants which have Definition 2. Constraint Satisfaction Problem (CSP): ⟨K, D, C⟩:A S been sold. CSP is defined as a set of variables ki ∈ K with i ∈ {0,...,N − 1} with values from domain Di ∈ D together with a set of constraints As in mass customization the variant space is rather large, in gen- cj ∈ C and j ∈ {0,...,M −1} defining which combinations of values are allowed or not. A solution of a CSP is a consistent evaluation to eral it can be assumed that not all variants were sold or offered. Nev- all variables (value assignment to all ki), i.e. no constraint is vio- ertheless, in the very extreme case all possible variants have been $ O lated. Otherwise the assignment is called inconsistent. Furthermore, offered and sold and thus S ⊆ S ⊆ S. Based on the presented def- within an assignment values of ki do not need to be unique, i.e. that inition of variant space, an arbitrary number of variant spaces based ki may contain multiple valid values which can be considered as al- on relevant criteria can be defined for investigation and comparison. ternatives. Given a solution with a unique value per ki, it is called an atomic solution or according to variant management a variant. 3 between trucks and deck

74 2.2 Discrete standard and basic products efficiency of mass production or as stated in [8]: . . . is based on the idea of the customer-individual production of highly variant prod- In the context of discrete products a central term for entrepreneurial ucts under near mass production pricing conditions. In general, in considerations and decisions is standard product. 4 this context products are multi-variant, i.e. there is more than one According to the Lexico dictionary (Oxford) on a general level option available. One important question for variant management is a standard is (a) a certain quality or attainment level reached or (b) how the variants can be compared in a reasonable manner. Buchholz something considered exemplary or as a measure or model according states that all variants need to be considered with respect to their to which others assess to (cf. benchmark, scale, guideline). product type and that relevant characteristics need to be selected for a Following information given by Wikipedia a technical standard is reasonable comparison [2]. Buchholz also discusses the relationship an established norm or requirement in regard to technical systems. between variants and standard. It is critically scrutinised whether a It is usually a formal document that establishes uniform engineering standard variant is the one with maximum quantity, some sort of av- or technical criteria, methods, processes, and practices. In contrast, erage or a yardstick for other variants. Nevertheless, it is specifically a custom, convention, company product, corporate standard, and so emphasized that a standard variant is something special compared to forth that becomes generally accepted and dominant is often called 5 other variants. For comparison a measure of discrimination between a de facto standard. variants is necessary, but not all characteristics are important such Specifically considering discrete products a wide variety of def- that relevant characteristics need to be selected. In our notation this initions is available which take different aspects into account. For means, that the product vector K = [k0, k1, . . . , kN−1] is abstracted example, in the Gabler Wirtschaftslexikon standard product is de- ′ ′ to a reduced product vector K ⊂ K with N < N. fined with a focus on quality: Products that have a generally agreed Buchholz also presents different views from literature whether (standardized) minimum quality. Product changes focus on quanti- such a standard variant needs to be part of the variant space itself ties, prices and times. Standard products can be traded on the stock or not. For example, according to Boysen a basic or standard product exchange.6 Other definitions base on the criteria whether they are 7 may be a theoretical construct that has never been physically man- ready for batch production. ufactured [1]. Whether it needs to be manufacturable at all remains From our experience the term standard product is mainly used in unclear. For further details we refer to [2]. two different ways in manufacturing industry: On the one hand, to define a standard variant based on aggregated 1) Either as a label of a product which should be presented as a sales numbers over all variants of a variant space is unreasonable standard (defined before product is sold at all) from our perspective as it exactly ignores the possible differences of 2) or as a product which is established on the basis of different cri- the available variants. Such an approach could be rather considered teria e.g. it is sold the most within a given context, e.g. a region as a ’standard variant space’. On the other hand to only consider the or a specific type of customer. sales numbers of each variant individually bears problems as well, it even may lead to wrong interpretations. In general, the exact same In order to dissolve this ambiguity we speak of a predefined standard variant is not sold more than ’a few times’. For example, consider in case of 1) and a derived standard in case of 2). 100 skateboards of 96 different variants sold. This means that most Furthermore, a basic product – also called generic product – is variants were sold once and two may have been sold three times each. defined to realize the core benefit of the product. This implies that This also means that the standard variants may change within a few a basic product cannot be further reduced without losing the possi- new sales. Therefore, from our perspective it would not be useful to bility of intended product usage.8 In case of a skateboard this is the define these ”top selling” variants as standard variants. ability to ride on such a board with pushing oneself forward by foot. From the perspective of the product management and with the aim A basic product may not be saleable, e.g. due to legal restrictions. An of an efficient portfolio handling, it is also useful for multi-variant extended product is one which offers additional benefit to customers. products on the one hand to offer and place a standard variant in the In the context of manufacturing companies ... a basic product might be a rather simple good that experiences relatively consistent con- market and on the other hand to analyze which product variant is sold most or is never sold at all. sumer demand ....9 Sometimes a core product is differentiated from From our point of view the notion of a basic product can be di- the product: The core product of a book is information. It is not the rectly transferred to a basic variant: to cover the basic functionality book itself.10 The book itself is then the basic product. necessary characteristics must be set with corresponding values re- flecting a ”basic” quality. In case of a skateboard a deck, two trucks, 2.3 Multi-variant products and standard variants and four wheels each of rather low quality. In case only one compo- The term mass customization defines the challenge of anticipating nent (characteristic) is missing, it is no variant of a skateboard any- individualized products to be manufactured simultaneously with the more as it is non-functional. In addition top-level variants can be given: variants with a maximum number of characteristics evaluated 4 www.lexico.com/en/definition/standard (retrieved with corresponding values reflecting a high level of quality, i.e. based 2.8.2019) 5 en.wikipedia.org/wiki/Technical_standard (retrieved on the configuration model no further feature can be selected with- 2.5.2019) out deselecting at least one other feature. In some cases, depending 6 wirtschaftslexikon.gabler.de/definition/ on the context, it might appear that not more options are chosen in standardprodukte-42877 (retrieved 6.5.2019, in German) case of a professional board compared to a basic one, but components 7 e.g. www.lawinsider.com/dictionary/standard-products (retrieved 2.5.2019) of better quality, e.g. the material types of the deck or the wheels. In 8 wirtschaftslexikon.gabler.de/definition/ the end this must be reflected in the underlying metrics. produkt-42902 (retrieved 2.5.2019, in German) In Figure 1 we depict relations between basic (bi), top-level (ti), 9 www.businessdictionary.com/definition/ and ‘regular’ (pi) product variants. Furthermore, each variant may basic-product.html (retrieved 8.5.2019) also be computed or defined as a standard variant (marked with ☆). 10 www.marketing91.com/five-product-levels/ (retrieved 8.5.2019) The level, i.e. the number of selected characteristics and ’rank’ of

75 ☆ ∅, , O and $ (see Sec. 2.1) we already defined specific , t0 t2 S S S S P i.e. sets where all p fulfill certain properties. As we are interested t ⃗j 1 in the ”best representative” of we define a central representative p9 P of P based on a measure of similarity or dissimilarity. p p6 10 Definition 6. Central representative rP and deviation νP : rP is the p5 ⃗ ⃗ ⃗ product vector of a product space P which has the overall minimal ☆ p7 p8 dissimilarity to all p⃗j ∈ P considering a measure M. Furthermore, we define the deviation ν to be the vector of the individual devia- p0 ⃗P tions νi of assigned values per characteristic ki (see Figure 2). p1 Simplified, one could say r is the average product of regarding p3 p4 ⃗P P the measure M or more specific, the one that minimizes the dissimi- p2 larity to all pi ∈ P. The deviations νi can be defined in multiple ways. We detail this in Section 3.3. We note that, based on this definition, it is not necessary, that r⃗P ∈ P. Furthermore, as several solutions may ☆ have the same aggregated distance regarding pi ∈ P based on M, b0 b1 ⋯ bn−1 bn there may be no unique central representative r⃗P . We sketch how a measure M can be defined below.

Figure 1. Schematic structure of basic (bi), top-level (ti), and standard (pi) variants inbetween. Variants of any of these levels can be designated as ☆ a standard variant ( ). P

ν0 r⃗P corresponding values is reflected by height. The edges depict that the variants differentiate in a single characteristic.11 Naturally, basic ν1 variants are rather at the bottom and top-level variants at the top of the figure. Nevertheless, it is possible to have feature combinations that aren’t separable and so basic as well as top-level variants can exist on different levels. But irrevocably basic variants must not have Figure 2. Schematic depiction of a variant space P with its central another connected variant ‘below’ them, top-level variants ‘above’ representative r⃗P and its deviation ν⃗ = [ν0, ⋯, νn−1] with n = 2 respectively. All other variants inbetween have ‘smaller’ predeces- sors and ‘larger’ successors. Standard variants can be defined on any of these levels. Consider our skateboard example. We define a basic variant as standard skateboard for beginners, a mid-range skateboard as a standard for trained half pipe skaters and a top-level variant as a 3.2 Measure : Dissimilarity of variants standard for skate competitions. M could be either a measure of similarity or dissimilarity. Although M M can be defined arbitrarily, e.g. based on ∑, ∏, min, max or some 3 Approach complex aggregation function, we stick to a specific distance based measure, and thus dissimilarity, for reasons of simplicity. For future We believe that the availability of a standard variant in the sense of research a promising link is given by case-based reasoning (CBR) as an average product of the most selling variants is very helpful in port- the notion of similarity is central to this approach [9, 16, e.g.]. Nev- folio management. In order to prevent misunderstandings with other ertheless, although CBR has been applied to product configuration, definitions (see Sec. 2.2 and 2.3) we will talk of a central representa- to our knowledge specific product similarities have not been exten- tive of a variant space instead. One possibility to exploit the central sively investigated in the literature; exceptions are [12, 21, 20]. As- representative in portfolio management is to compare it with prede- pects of similarity have been studied in the context of CSP [7, 5, e.g.] fined standards and adapt them accordingly. In order to discuss the resulting in the need of Euclidian distance measures from a practical challenges in defining such a central representative in the context of perspective. In the following of this section we summarize aspects of multi-variant products, we need to give some formal definitions re- similarity measures relevant to our approach. garding configuration spaces (Section 3.1). We define a measure M A Euclidian distance measure δ for some entities o, p and q is (Sec. 3.2) for computation of a central representative (Sec. 3.3). We reflexive: δ(p, p) = 0, symmetric: δ(p, q) = δ(q, p), and transitive: close this section with an algorithmic sketch, integrating definitions δ(o, q) ≤ δ(o, p) + δ(p, q). For reasons of simplicity, we will talk of from preceding subsections (Sec. 3.4). distance in the remainder of this paper. In order to define a distance measure M consider a variant space, $ $ e.g. S, and a subset thereof, e.g. S (S ⊆ S). This implies that p⃗ ∈ S 3.1 Definition of a central representative of a $ variant space and q⃗ ∈ S contain the same characteristics kx with x ∈ {0, ⋯,N −1} in the same order. First, we need a distance between values from the In Section 2.1 we introduced the notion of a product (configuration) same characteristic δx for all x ∈ {0, ⋯,N − 1}, for example: vector p⃗, which holds all characteristics which define a certain prod- δ kp, kq kp kq (1) uct. Let P = {p⃗0, p⃗1,..., p⃗P −1} be a set of P product vectors. With x( x x) = S x − xS p 11 For reasons of simplicity we neglect that connected variants may differ in with kx denoting the value of the x-th characteristic of product q more than one characteristic as they are inseparable due to the rule set. vector p⃗, kx of q⃗ respectively. Depending on the type of scale of

76 the characteristic (i.e. nominal, ordinal, interval or ratio scale) cer- N−1 $ tain calculations may not be possible, e.g. subtraction or addition on r⃗S$ = argmin Q ∆(r,⃗ p⃗i) with p⃗i ∈ S (5) r⃗∈S nominal scale is not reasonable. On nominal scale only the equality i=0 $ between values can be determined, i.e. are two values the same or In the first case (Eq. 4) r⃗ has been sold itself as r⃗ ∈ S , whereas not. If a level of similarity is required at least an ordinal scale for the in the second case (Eq. 5) r⃗ is a general technically feasible variant values must be available, i.e. a linear order for the values for the def- (r⃗ ∈ S). One could even relax that the representative not even needs ∅ inition of a median. For interval or ratio scale a mean can be defined. to be technically feasible, and thus select r⃗ ∈ S (cf. 2.3). This results in a distance vector of distances per characteristic In conjunction with the central representative it is also of interest ’how large’ or ’how widespread’ the set is, which it represents. For ⎡ p q ⎤ ⎡ ⎤ ⎢ δ0(k0 , k0) ⎥ ⎢ d0 ⎥ this we need a notion of deviation, diameter, or variance. For now, ⃗ ⎢ ⎥ ⎢ ⎥ δ(p,⃗ q⃗) = ⎢ ⋮ ⎥ = ⎢ ⋮ ⎥ (2) we stick with the notion of average deviation per characteristic (νi) ⎢ p q ⎥ ⎢ ⎥ ⎢δN−1 k , k ⎥ ⎢dN−1 ⎥ ⎣ ( N−1 N−1) ⎦ ⎣ ⎦ for all p⃗j ∈ P as it suffices our needs.

The next step is to aggregate these individual distances into a sin- P −1 1 j gle distance value describing the distance between two product vec- νi δi ri, k (6) = P Q ( i ) tors. It needs to be reflected that not all characteristics are equally j=0 important. Therefore a weighting factor wi needs to be integrated Then ν⃗ = [ν0, . . . , νN−1] denotes a vector of all deviations per char- for each characteristic. If characteristic ki should not be considered, acteristic. the corresponding wi needs to be set to zero. Furthermore, not all It is not beneficial if a central representative covers a ’too wide distances for individual characteristics may have the same range and range’ of variants, i.e. one or several νi are rather high for some thus, one characteristic may dominate others, therefore a normaliz- characteristics ki, as it would not give much help for portfolio op- ing factor vi is necessary. For example, consider a distance vector timization, especially if members of the set of product vectors are with N = 3 where d0 represents a binary distance (d0 ∈ {0, 1}), not distributed uniformly. Consider the case depicted in Figure 4. d1 represents a distance between zero and five (d1 ∈ [0, 5]), and d2 Products were sold in two rather distant regions of the variant space. represents a distance between zero and thousand (d2 ∈ [0, 1000]). Considering them as one set would lead to a representative which In most cases d2 would dominate or overrule d1, which in turn also does not reflect the situation at hand (orange space). We need to look dominates d0. Therefore, it is import that all value ranges of the ki for separate subsets, i.e. clusters, instead, to come to a result depicted are normalized, e.g. to values between zero and one. This results in a $ $ by the two separate regions S0 and S1 (light blue). As we have de- distance between two product vectors p⃗ and q⃗. fined a central representative and a deviation thereof, various cluster 1 N−1 analysis methods are applicable, e.g. centroid-based or density based ∆ p, q wividi (3) clustering. For an overview of existing clustering methods we refer (⃗ ⃗) = N Q i=0 to [13, 23, 17, e.g.]. The adequate selection of a clustering method We give a schematic impression of a distance ∆ between two will be a crucial task for the successful application of the approach proposed. product vectors r⃗S and r⃗S$ in Figure 3. Nevertheless, it still remains For pragmatic reasons we restrict our considerations to clustering open how central representatives like r⃗S and r⃗S$ can be determined based on ∆. parameters (assuming a clustering method given) to maximum de- viation per characteristic and a minimum number of members per cluster. Therefore, a vector of thresholds θ⃗ = [θ0, . . . , θN−1] for the S corresponding characteristics ki and θ# for the minimum number needs to be given. r⃗S ∆ $ r⃗S$ S S r $ ⃗S S r $ ⃗S $ r $ $ 1 1 ⃗S 0 S r $ 0 S Figure 3. Schematic depiction of a general variant space (S, dark blue) ⃗S $ and its central representative (r⃗S ) and sales variant space (S , light blue) ⃗ also with its central representative (rS$ ). The ∆ depicts the difference ⃗ ⃗ between rS and rS$ . Figure 4. Schematic depiction of cluster splitting due to high variance in single cluster consideration.

3.3 Calculation of central representatives

We defined the central representative r⃗P as a variant which minimizes the overall dissimilarity (cf. Definition 6). Furthermore, it is not a 3.4 An algorithm sketch requirement that r⃗P is itself an element of P. Consider these two We summarize the parts of how to find adequate representatives for $ $ definitions of central representatives of S . a given set of product vectors P (e.g. sold variants S ) out of an- other given set of product vectors (e.g. the overall variant space N−1 Q $ ) in Algorithm 1. In the beginning only the single cluster P exists r⃗S$ = argmin Q ∆(r,⃗ p⃗i) with p⃗i ∈ S (4) S r⃗∈S$ i=0 for which central representative r⃗P and deviation ν⃗P is calculated.

77 If there is any deviation νi which is above its defined threshold θi P Maimon state that a focus on relevant characteristics has several ad- needs to be splitted in two clusters.12 In the next iteration at least two vantages [3]. For example, removal of irrelevant characteristics im- clusters need to be considered. At some point clusters with only few proves efficiency as well results are more conclusive and easier to members are computed (< θ#). We ignore these clusters from fur- interpret due to the focus on key features. Nevertheless, a too lim- ther consideration in this iteration. We continue with increasing the ited choice of characteristics leads to information loss and reduces number of clusters until we obtain a set of clusters with each con- the quality of the results. For further information on feature selection taining a central representative with each deviation per characteristic methods we refer to [22]. If a characteristic is considered irrelevant below the given threshold (∀i νi ≤ θi). We note, that we increase the for an evaluation at hand wi (cf. Eq. 3) should be set to zero in the number of clusters iteratively and start the cluster splitting from the calculations. For all characteristics with wi > 0 the relative relevance original set P on purpose. If not doing so the order of considering needs to considered very carefully as slight changes may lead to sig- pi ∈ P might have an effect and thus, would lead to different results nificant changes in the classification of the data. For example, if the if pi are represented in a different order. results are designed for adapting standard products a slight change in the parameters might lead to a different variant. Input: P, Q, θ⃗, M, θ# Result: S ∶= set of central representatives for P out of Q no of clusters ∶= 1 ; S = {P}; 4.2 Data preparation R ∶= calculate list of representatives from Q for all sj ∈ S based on M; Θ ∶= calculate list of all deviations for corresponding rj and sj Practice shows that within companies often master data is not co- based on M; ordinated. In general, this leads to multiple characteristics contain- j while ∃i, j with νi ∈ Θ > θi for any sj ∈ S do ing the same information, potentially represented differently, e.g. us- no of clusters ∶= no of clusters +1 ; ing different text strings, numbers, or different units. As products S ∶= clusterSplitting(P, M, no of clusters); are subject to permanent change, the inconsistency of data increases delete all sj ∈ S from S where Ssj S < θ#; over time. In order to ease and automatize analysis in the long run, R ∶= calculate list of representatives from Q for all sj ∈ S data synchronization is inevitable. Nevertheless, considering given based on M; data, data cleansing is essential to prevent bad decisions based on Θ ∶= calculate list of all deviations for corresponding rj and bad analysis results [24]. Maletic described the data preparation as sj based on M; a multistep procedure comprising (1) definition of error types, (2) end finding instances of these errors, and (3) correction of them [11]. He Algorithm 1: Algorithmic sketch for deducing central representa- emphasizes that each of these steps is a complex task in itself. tives out of the variant space Q based on the variants given by the To give an idea of the effort that needs to be taken, we present a variant space P. non-exhaustive list of different error types in (master) data below. A common error type is conditioned by different notions or represen- tations, i.e. characteristics and values holding the same information, but represented with different spellings. These errors often arise from 4 Pragmatic considerations inconsistent usage of blanks, hyphens, prefixes, suffixes or abbrevia- Not all characteristics of product vectors must be considered as rele- tions. Different units may also be used, e.g. due to different intended vant information might be covered by other characteristics (Sec. 4.1). usage. Characteristics holding complex information, i.e. connected In general, data provided by companies needs some preparation as information, are problematic as well as further processing might be this data is often not consistent concerning characteristics’ and val- limited. A common example is a combined string representation of ues’ denomination (Sec. 4.2). We consider temporal restriction of length, width, and height (sometimes without a given unit) instead of data and how observations over time can be derived (Sec. 4.3). Be- having individual numerical characteristics for each of them. A tricky fore Algorithm 1 can be applied value ordering and weighting factors type of errors comprises misleading value specifications, e.g. frame for each characteristic must be available 4.4. sizes termed with numerical values which have to be interpreted in a specific manner so that naive calculation is not possible. Consider frame sizes 5, 8, and 12 which reflect three consecutive frame sizes. 4.1 Contentual evaluation The physical difference in size cannot be calculated from these val- In order to support a business question a contentual focus on data is ues, instead other data like length, width, and height of certain com- necessary. Simplified, two levels of contentual constraints can be dif- ponents need to be considered. Furthermore, the conceptual distance cannot be calculated from these ’values’: as the categories are con- ferentiated. First, the context of each variant (p⃗i) can be considered. Context can be defined on different perspectives, e.g. in which shop secutive the distance is 1 and not 3 and 4. In order to prevent trim- or region the variant has been generated, by whom, whether it has ming of leading zeros, such terms may be even stored as strings. O $ Elimination of errors of this type requires very specific semantic been sold, only offered, or never even offered (cf. S, S , S in Sec. 2.1), or for which application, domain respectively, it was bought if knowledge, which makes it not only hard to spot these errors, but this information is available. Second, the relevance of each charac- also to correct them. For further information on data cleansing and teristic should be checked as consideration of all characteristics may data quality we refer to [14, 15]. block the view on relevant information, for example, the color of the As a result of data preparation we get a set P of product vectors trucks or some non-visible strings on some component. Chizi and p⃗i with consistent [k0, k1, . . . , kN−1], i.e. with comparable informa- tion stored in the same characteristic with the same value for every 12 How this is actually done depends on the clustering algorithm chosen. product variant.

78 4.3 Temporal evaluation interval and ratio scale data naturally a distance is given – assum- ing the characteristic is not misinterpreted as such and is ’only’ on Products are subject to permanent change. They are designed, devel- ordinal scale (cf. Sec. 4.2). For ordinal data this is not the case, a oped, sold, and refined, potentially several times. Such refinements linear ordering has to be defined manually. Although, an ordering of and changes in expectations of the market may result in changes of terms like ”basic”, ”advanced”, ”expert”, and ”professional” might central representatives. Therefore, regardless whether from technical be considered trivial in the first place, it is a tricky, currently man- or sales perspective, it is not reasonable to consider outdated data, ual and time consuming task and thus, also error prone. Looking at which leads to the application of methods from time series analysis. the terms ”expert” and ”professional” the question is whether ”ex- Furthermore, as sales numbers for the products of interest may vary pert” is before or after ”professional” or equal in the end as they significantly over time, consideration of single time points (or rather relate to completely different aspects of the product. It may be pos- small time intervals only) may show varying results for each of these sible that a reasonable distance between terms like ”basic” and ”ad- time points. vanced” is definable, i.e. how far is ”basic” from ”advanced”, ”ad- One applicable method in order to generate smoothed results is vanced” from ”expert” and so forth. We refrain from this as the re- the sliding window approach (SWA), see for example [10]. The ba- sulting costs would not be in a reasonable cost-benefit relation for sic idea is to evaluate overlapping intervals, so called windows, to an industrial company. For a start an equidistant conceptual distance get smoother and more consistent results. We depict relevant param- measure should suffice, i.e. all preceding and succeeding terms in a eters for the SWA in Figure 5. Let d denote the overall period under linear order have the same distance. review (one year in the given example). The window size is denoted In business intelligence it is common to not only consider the num- by w (three month) with w ≪ d and the corresponding step size by ber of sold units, but also profit or the number of sold units per quote s (1 month) with s ≤ w. Analysis is then performed for data in each is part of the analysis for example. On the one hand a pragmatic way window separately. without changing the algorithm is to modify the original set by re- The choice of specific values for d, w and s is very crucial and ducing or multiplying the number of equal product vectors in P. On must be considered carefully, especially if conclusions on future de- the other hand an additional weighting factor per pi could be intro- velopments are drawn. For example, if d is chosen too small the duced, which would be much more efficient regarding run-time of corresponding data set may be too small to generate significant re- the algorithm. sults. Statistical or learning methods support a reasonable choice, [19, e.g.]. Algorithm 1 can be extended in such a way that not only a sin- 5 Summary and Outlook gle time point is considered (P), but subsequent sets, i.e. subsequent windows. On this basis developments of the central representatives To support portfolio management for multi-variant products we ex- and their corresponding deviations can be observed: how they ’won- amined definitions of ’standard’ for discrete and multi-variant prod- der around’ and how the number of clusters increases or decreases. ucts. To differentiate from these definitions we introduced the term central representative of a variant space. We derived an algorithmic d sketch based on a measure M to calculate representatives for clusters with reasonable size. Finally, we discussed tasks necessary before the algorithm can be applied to real data. As the work on central representatives for a variant space is in an early stage many tasks and questions remain open. The straightfor- jan feb mar apr may jun jul aug sep oct nov dec ward next step is to experiment with large scale real data instead of few small toy examples. Furthermore, the determination of weighting w factors wi is a challenging task. We need to investigate to what ex- s w tent learning methods, either supervised or unsupervised, may ease the task. Once real data is available it will be a worthwhile task to reconsider alternative definitions of distance functions, e.g. investi- Figure 5. Sliding window approach with d denoting the overall period gating the impacts of choosing ∏, min, max or some other function considered, w denoting the window size and s the step size. as aggregation operators. In theory it is possible that multiple central representatives are available. If this case also appears with real data, we need to investigate how to deal with it.

4.4 Weighting factors and value ordering Acknowledgement The approach is significantly based on the definition of the measure We thank the anonymous reviewers for critically reading the containing the distances δ and ∆, which in turn contains weight- M manuscript and providing helpful comments for clarification and im- ing factors w for each characteristic. First experiments have shown i provement of the manuscript. that distance measures on nominal data very much influences the re- sults significantly as the distance can be only either zero or one. A rather low weighting factor for these characteristics compared to the REFERENCES other ones may be a solution, but must be evaluated further in future. For now we tend to ignore these characteristics as dissimilarity is in [1] Nils Boysen, Variantenfließfertigung, volume 49, Deutscher Univer- ¨ most cases reflected in other characteristics as well. Our gut feeling, sitatsverlag, 2005. [2] M. Buchholz, Theorie der Variantenvielfalt: Ein produktions- und ab- but without proof, tells us that similar effects may be the case for satzwirtschaftliches Erklarungsmodell¨ , SpringerLink : Bucher,¨ Gabler integrating ordinal scale data with interval and ratio scale data. For Verlag, 2012.

79 [3] Barak Chizi and Oded Maimon, ‘Dimension reduction and feature se- [23] Rui Xu and Donald C. Wunsch II, ‘Survey of clustering algorithms’, lection’, in Data Mining and Knowledge Discovery Handbook, 2nd ed., IEEE Trans. Neural Networks, 16(3), 645–678, (2005). eds., Oded Maimon and Lior Rokach, 83–100, Springer, (2010). [24] Marcus Zwirner, ‘Datenbereinigung zielgerichtet eingesetzt zur perma- [4] Bjørn Christensen and Thomas D. Brunoe, ‘Product configuration in the nenten Datenqualitatssteigerung’,¨ in Daten- und Informationsqualitat:¨ eto and capital goods industry: A literature review and challenges’, in Auf dem Weg zur Information Excellence, chapter 6, 101–120, Springer Customization 4.0, eds., Stephan Hankammer, Kjeld Nielsen, Frank T. Fachmedien Wiesbaden, (06 2018). (in German). Piller, Gunther¨ Schuh, and Ning Wang, pp. 423–438, Cham, (2018). Springer International Publishing. [5] Jean-Franc¸ois Condotta, Souhila Kaci, Pierre Marquis, and Nicolas Schwind, ‘A syntactical approach to qualitative constraint networks merging’, in Logic for Programming, Artificial Intelligence, and Rea- soning - 17th International Conference, LPAR-17, Yogyakarta, Indone- sia, October 10-15, 2010. Proceedings, eds., Christian G. Fermuller¨ and Andrei Voronkov, volume 6397 of Lecture Notes in Computer Science, pp. 233–247. Springer, (2010). [6] Robert Cooper, Scott Edgett, and Elko Kleinschmidt, ‘Portfolio man- agement - fundamental to new product success’, The PDMA Toolbook for New Product Development, (01 2002). [7] Frank Dylla, Jan Oliver Wallgrun,¨ and Jasper van de Ven, ‘Merging qualitative information: Rationality and complexity’, in QUAC2015: Workshop on Qualitative Spatial and Temporal Reasoning: Computa- tional Complexity and Algorithms, (September 2015). [8] Alexander Felfernig, Lothar Hotz, Claire Bagley, and Juha Tiihonen, Knowledge-based Configuration: From Research to Business Cases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1 edn., 2014. [9] Gavin Finnie and Zhaohao Sun, ‘Similarity and metrics in case-based reasoning’, Information Technology papers, 17, (03 2002). [10] Yupeng Hu, Cun Ji, Ming Jing, Yiming Ding, Shuo Kuai, and Xueqing Li, ‘A continuous segmentation algorithm for streaming time series’, in Collaborate Computing: Networking, Applications and Workshar- ing - 12th International Conference, CollaborateCom 2016, Beijing, China, November 10-11, 2016, Proceedings, eds., Shangguang Wang and Ao Zhou, volume 201 of Lecture Notes of the Institute for Com- puter Sciences, Social Informatics and Telecommunications Engineer- ing, pp. 140–151. Springer, (2016). [11] Jonathan I. Maletic and Andrian Marcus, Data Cleansing: A Prelude to Knowledge Discovery, 19–32, Springer US, 07 2010. [12] Hiroya Inakoshi, Seishi Okamoto, Yuiko Ohta, and Nobuhiro Yugami, ‘Effective decision support for product configuration by using CBR’, in International Conference on Case-Based Reasoning, (01 2001). [13] Leonard Kaufman and Peter J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, 1990. [14] Lukasz A. Kurgan and Petr Musilek, ‘A survey of knowledge discov- ery and data mining process models’, Knowl. Eng. Rev., 21(1), 1–24, (March 2006). [15] Ohbyung Kwon, Namyeon Lee, and Bongsik Shin, ‘Data quality man- agement, data usage experience and acquisition intention of big data an- alytics’, International Journal of Information Management, 34(3), 387 – 394, (2014). [16] Michael M. Richter and Rosina O. Weber, Case-Based Reasoning - A Textbook, Springer, 2013. [17] Lior Rokach, ‘A survey of clustering algorithms’, in Data Mining and Knowledge Discovery Handbook, 2nd ed., eds., Oded Maimon and Lior Rokach, 269–298, Springer, (2010). [18] D Sabin and R Weigel, ‘Product configuration frameworks-a survey’, Intelligent Systems and their Applications, IEEE, 13, 42 – 49, (08 1998). [19] Hela Sfar and Amel Bouzeghoub, ‘Dynamic streaming sensor data seg- mentation for smart environment applications’, in Neural Information Processing - 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13-16, 2018, Proceedings, Part VI, eds., Long Cheng, Andrew Chi-Sing Leung, and Seiichi Ozawa, volume 11306 of Lecture Notes in Computer Science, pp. 67–77. Springer, (2018). [20] Sara Shafiee, Katrin Kristjansdottir, and Lars Hvam, ‘Automatic identi- fication of similarities across products to improve the configuration pro- cess in eto companies’, International Journal of Industrial Engineering and Management, 8(3), 167–176, (2017). [21] Hwai-En Tseng, Chien-Chen Chang, and Shu-Hsuan Chang, ‘Applying case-based reasoning for product configuration in mass customization environments’, Expert Syst. Appl., 29(4), 913–925, (2005). [22] Cen Wan, Hierarchical Feature Selection for Knowledge Discovery, Advanced Information and Knowledge Processing, Springer Interna- tional Publishing, 2019.

80