Using Binary Strings for Comparing Products from Software-Intensive Systems Product Lines

Using Binary Strings for Comparing Products from Software-intensive Systems Product Lines Mike Mannion1a and Hermann Kaindl2b 1Department of Computing, Glasgow Caledonian University, 70 Cowcaddens Road, Glasgow, G4 0BA, U.K. 2Institute for Computer Technology, TU Wien, Gußhausstr. 27-29, A-1040, Vienna, Austria Keywords: Product Line, String Similarity, Metrics. Abstract: The volume, variety and velocity of products in software-intensive systems product lines is increasing. One challenge is to understand the range of similarity between products to evaluate its impact on product line management. This paper contributes to product line management by presenting a product similarity evaluation process in which (i) a product configured from a product line feature model is represented as a weighted binary string (ii) the overall similarity between products is compared using the Jaccard Coefficient similarity metric (iii) the significance of individual features and feature combinations to product similarity is explored by modifying the weights. We propose a method for automatically allocating weights to features depending on their position in a product line feature model, although we do not claim that this allocation method nor the use of the Jaccard Coefficient is optimal. We illustrate our ideas with mobile phone worked examples. 1 INTRODUCTION The research question is: to what extent can product similarity within a product line be determined by Many systems today are software-intensive and it is a product similarity evaluation process that uses a often the software features that help make a product weighted binary string to represent product features distinctive. A product line comprises products that and a binary string metric to evaluate similarity share sets of features. A feature is often but not al- (where 1 represents a feature’s presence, 0 its ab- ways a visible characteristic of a product. Some fea- sence, and the weight represents its relative im- tures may be shared across all products; some may be portance to the product)? Our interest is in the merits shared across many but not all products; some may be and difficulties of the overall process. We do not unique to a single product. Whilst product lines can claim our choices of binary string encoding, weight generate significant cost-efficiencies, the ongoing allocation method or binary string metric are optimal. management of their scope and scale is getting harder We illustrate the process using a worked example of as customers increasingly demand personalised prod- a mobile phone product line. ucts (Deloitte, 2015, Zaggi, 2019). The range of prod- Binary strings offer a straightforward flexible ucts in a product line and the features in each product means to represent product feature configurations. evolve for many reasons including supplier sales and They map easily to feature selection processes, are profit motives, customer demand, customer confu- low on storage requirements and enable fast compar- sion (Mitchell, 1999), personnel changes, market ison computations with existing similarity metrics competition, brand positioning (Punyatoya, 2014), and measuring tools e.g. (Rieck, 2016). The metrics mergers and takeovers, or changing legislation. One are based on feature counts and assume each feature challenge of the management task is understanding has equal contextual value. However, one outcome is which products in the product line are similar to each that two products can be similar to the same degree other and the extent. This paper addresses that chal- even if one product is missing what might be regarded lenge. as essential features, but makes up the feature count a https://orcid.org/0000-0003-2589-3901 b https://orcid.org/0000-0002-1133-0529 257 Mannion, M. and Kaindl, H. Using Binary Strings for Comparing Products from Software-intensive Systems Product Lines. DOI: 10.5220/0010441002570266 In Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS 2021) - Volume 2, pages 257-266 ISBN: 978-989-758-509-8 Copyright c 2021 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved ICEIS 2021 - 23rd International Conference on Enterprise Information Systems deficit with less important features. One way to ad- same retail product on different e-shops but in differ- dress this issue is to attach a weight to each feature ent formats (Ristoski, 2018). signalling its relative importance. For large feature Similarity matching can be seen in the context of trees, this is only feasible if the initial allocation is search-based software engineering (Jiang, 2017, done automatically. We devised a method for auto- Lopez-Herrejon, 2015). The value of similarity repre- matic allocation based on a feature’s position in the sentations, algorithms, and metrics was explored in product line feature model. (Cesare, 2012) for cybersecurity applications, such as There are many binary similarity (and dissimilar- malware classification, software theft detection, pla- ity) metrics. We selected the Jaccard Coefficient (JC) giarism detection and code clone detection. Product similarity metric (Jaccard, 1901) because it compares line similarity models and metrics were proposed for only the total number of common features in both mining variability from software repositories (Kaindl, products. It was also used in other product line activ- 2014, Mannion, 2015). In (Vale, 2015) metric thresh- ities e.g. to identify similar groups of products that olds were identified to support the evaluation of soft- can be manufactured on the same assembly lines ware product line component quality. A review of (Yuzgec, 2012, Kashkoush, 2014), for reverse engi- metrics for analyzing software product line variability neering a requirements repository (Li, 2019), for re- is set out in (El-Sharkawy, 2019). verse engineering software components (Naseem, There are different product comparison strategies. 2017), to detect adverse interactions when a new fea- One strategy is to compare only the total number of ture is introduced into an ageing software product line common features in both products, though a compli- (Khoshmanesh, 2018). cating factor is whether to include or exclude negative Section 2 discusses some challenges of making matches i.e. does the absence of features advance the product comparisons. Section 3 presents a similarity case for similarity or not? Another strategy is to com- evaluation process using a mobile phone product line pare only the total number of features that are differ- worked example. Section 4 shows a small real-world ent between the two products. Another variation is to iPhone example. Section 5 discusses these ideas. compare only selected sets of features e.g. the most interesting or distinguishing. The choice of feature representation mechanism is 2 BACKGROUND influenced by different factors including its purpose, the ease with which it is understood and the complex- Product comparison can have different purposes. One ity of encoding the features into that representation. is to determine how products of the same type differ- For example, if there are a large number of comparisons to be made, then there is often a trade-off be- ent for strategic product positioning and branding reasons. This might be competitors’ products or the or- tween a clear understanding of the representation ganization’s own products. A second purpose might structure and the complexity of any processing using that structure. be to gauge if a product line needs to be reorganised. A third purpose might be to understand if a product After choosing a product comparison strategy, the falls within the legislative and regulatory boundaries next step is to choose a comparison metric. Some metrics measure similarity, others dissimilarity of the countries where the product is being sold. Com- parison and decision-making criteria vary. They will (Chen, 2009). Similarity metrics often have as a nu- often include product similarity but other criteria are merator a function of the total number of features that are the same in each product. Dissimilarity metrics also used e.g. perceived customer value of specific features, sales, costs of maintenance, prices and prof- often have as a numerator a function of the total number of features that are different in each product. The its. denominator is a function of the total number of fea- Within product management, different ap- proaches to product overlap have been developed. tures but varies depending upon the inclusion or ex- clusion of features absent in both products. Examples include a product similarity scale across different product categories to aid marketing manag- Similarity metrics continue to emerge from differ- ers, consumer policymakers, and market researchers ent fields and applications e.g. industrial product design (Shih, 2011), linguistics (Coban. 2015), ecology (Walsh, 2010); a model showing how the similarity between two products can monotonically increase, (Dice. 1945, Niesterowicz. 2016), population analy- monotonically decrease, or have a non-monotonic ef- sis (Jaro, 1989). However, the combination of the application context, the range of information and data fect on cross-price elasticity (Kolay, 2018); and a neural network to collate product descriptions of the types, the choice of information representation mech- 258 Using Binary Strings for Comparing Products from Software-intensive Systems Product Lines anisms, and the intended use, have influenced the de- adaptation of the

Using Binary Strings for Comparing Products from Software-Intensive Systems Product Lines

Headprint: Detecting Anomalous Communications Through Header-Based Application Fingerprinting

Author Guidelines for 8

Comparison of Levenshtein Distance Algorithm and Needleman-Wunsch Distance Algorithm for String Matching

Automating Error Frequency Analysis Via the Phonemic Edit Distance Ratio

APPROXIMATE STRING MATCHING METHODS for DUPLICATE DETECTION and CLUSTERING TASKS by Oleksandr Rudniy

Finite State Recognizer and String Similarity Based Spelling

Adaptive Name Matching in Information Integration

Form Similarity Via Levenshtein Distance Between Ortho-Filtered Logarithmic Ruling-Gap Ratios, Procs

Using String Metrics to Improve the Design of Virtual Conversational Characters: Behavior Simulator Development Study

MS Word Template for Final Thesis Report

Ontology Alignment Based on Word Embedding and Random Forest Classification

Dedupe Documentation Release 2.0.0