Using Data Analytics to Extract Knowledge from Middle-Of-Life Product Data

International Journal of Advanced Research and Publications ISSN: 2456-9992 Using Data Analytics To Extract Knowledge From Middle-Of-Life Product Data Fatima-Zahra Abou Eddahab, Imre Horváth Delft University of Technology, Industrial Design Engineering, Landbergstraat 15, 2628 CE Delft, Netherlands, [email protected] Delft University of Technology, Industrial Design Engineering, Landbergstraat 15, 2628 CE Delft, Netherlands, [email protected] Abstract: Data analytics needs dedicated tools, which are getting complex. In this paper we summarize the results of our literature research done with special attention to existing and potential future tools. The attention is paid mainly to processing big data, rather than to effective semantic processing of middle-of-life data (MoLD). The issue of handling of big MoLD of specific characteristics efficiently has not yet been addressed by the commercialized tools and methods. Another major limitation of the current tools is that they are typically not capable to adapt themselves to designers‟ needs and to support knowledge/experience reusability in multiple design tasks. MoLD require quasi-real life handling due to their nature and direct feedback relationships to the operation process and the environment of products. Nowadays products are equipped with smart capabilities. This offers new opportunities for exploiting MoLD. The knowledge aggregated in this study will be used in the development of a toolbox, which (i) integrates various tools under a unified interface, (ii) implements various smart and semantics orientated functions, and (iii) facilitates data transformations in contexts by the practicing designers themselves. Keywords: Data analytics, middle-of-life data, analytics tools, application practices, smart analytics toolbox. 1. Introduction circumstances. This may provide insights for them concerning how to transform use patterns to specific product 1.1. Setting the stage enhancements, and how to avoid deficiencies which may The rapid rise of emerging economies and deployment of occur under circumstances that were not completely known information technologies have led to remarkable changes in or specified in the development phase of their products. the lifecycle of products and services. Because of the rapid MoLD can be obtained in multiple ways. They can be development of hardware and software, focusing on aggregated by field observations and interrogations of the exploration of data became ubiquitous [1]. Companies need users, or studying failure log files and maintenance reports, to adjust their operations to the influences of the rapidly or from relevant web resources such as social media and user evolving markets and to better manage the lifecycle of their forums. Alternatively, they can be elicited directly from products. Towards this end goals, they have to combine (i) products by sensors or self-registrations. The last mentioned static process information with dynamic ones, (ii) product approaches are getting more popular as products advance information with processes and resources information, and from traditional self-standing products to network linked (iii) human aspect information with business information advanced products to awareness and reasoning enabled smart throughout the entire product lifecycle [2]. However, as products [7]. However, due to the dynamic change of sensor reflected by the related literature, most of the past efforts data, the large volumes of data aggregated over time, and the were dedicated to the methodological and computational unknown nature of data patterns, it is unfortunately not support of the begin-of-life and end-of-life models and straightforward to perform effective data analysis using the activities. Less efforts were made to exploit middle-of-life existing traditional techniques [8]. Feeding structured MoLD data (MoLD) and to value/knowledge creation based on this and use patterns back to product designers is an kind of data. Thanks to new information technologies such as insufficiently addressed issue [9]. The key challenge is to sensors, RFID and smart tags, the chunks of information find ways to effectively use data analytics techniques in conveyed by the middle-of-life (MoL) phase of product purposeful combinations, depending on the application usage can now be identified, tracked and collected [3]. Our contexts and the specific objectives of product designers background research was stimulated by the observation that [10]. there seems to be a lack of tools to support decision-making in design and servicing using MoLD. This was placed into 1.2. Reasoning model and contents the position of a concrete research phenomenon to be studied We completed a comprehensive literature study in two [4]. The above decision was underpinned by the following phases. Referred to as „shallow exploration‟, the first phase argumentation. Effective statistical and semantic processing was conducted to identify the most relevant domains of of MoLD is not only an academic challenge, but also a useful knowledge for the study. Based on a wide range of asset for the industry [5], since logistics, operation and keywords, we tried to develop a topographic landscape of the maintenance activities are located in the MoL phase [6]. It is related publications. This topographic landscape was important for product developer and production companies supposed not only to show the distribution (the clusters of to see how their products are used by different customers in the keyword-related publications), but also the peaks and the different application contexts and under different plains of the clusters. Figure 1 shows the clustering of the Volume 4 Issue 1, January 2020 1 www.ijarp.org International Journal of Advanced Research and Publications ISSN: 2456-9992 outcome of the keyword-based mapping of the related literature. The graphical image was built using VOSviewer based on the concerned articles. Figure 1 shows not only the neighboring (semantically related) keywords, but also the distances between them as they appear in the literature. The different colors indicate the frequency of the occurrence of keywords (i.e. the formation of peaks). The most frequently occurring keywords are shown in red (and dark orange) and the less frequent ones are shown in green (and light blue). The visual representation generated by the software application let us see that actually four major clusters of papers could be identified. In a kind of transitive ordering, Figure 2: Connectivity graph of the chosen family of these are: (i) changes in the nature of data, (ii) approaches of keywords transformation of data, (iii) tools and packages for data analytics, and (iv) design applications of data analytics. The above pieces of information obtained from the These were used as descriptors of the main domains of quantitative part of the literature study were used to develop interest in the detailed literature study, in addition to the key a reasoning model for the qualitative part of the study. This phrase „evolution of data analytics support of product part focused on the interpretation on the findings and design‟. Called „deep exploration‟, in the second phase of disclosed semantic relationships. The reasoning model, our study, various sources such as subscription-based and which was also used at structuring the contents of this paper, open access journals, conference proceedings, web- is shown in Figure 3. Indicated are only the first and second repositories, and professional publications were used for level key terms while, as mentioned above, the study was collecting several hundreds of relevant publications. The actually done with key terms of the third decomposition findings made it possible to define further relevant key terms level. on a third level (not shown in Figure 1). Figure 3: Reasoning model of the literature research The considered papers were published in different periods of time, ranging from the mid-fifties until today. An important Figure 1: Citation of the chosen family of keywords in the observation was that the concepts identified by the first level literature key terms are in an implicative relationship with each other. Namely, if the nature of data changes, then it entails a The second phase was also used to provide a quantitative change in the approaches of data transformation, that in turns characterization of the interrelationships among the key implies the need for different data transformation methods and tools. These enablers can provide support for a broader terms belonging to the same cluster. Figure 2 shows the range of existing applications and can facilitate new data interrelationships found. If two terms are used in the same analytics applications with regards to enabling product document, then there is a line between them, and the enhancement and innovation by design. The investigation thickness of the line indicates how frequently they occur. In concerning the changes in the nature of data was focused other words, the thick lines refer to combinations of terms mainly on product-related use, maintenance and service data, that appear in multiple papers, whereas the thin ones refer to and on data describing the conditions and behavior of combinations that rarely appear in the studied publications. It products. can be observed in the connectivity diagram shown in Figure 2 that the thickest lines are between the above four key terms 1.3. Organization of the paper – a fact that underlines their significance and relatedness. In In the

Using Data Analytics to Extract Knowledge from Middle-Of-Life Product Data

Towards a Fully Automated Extraction and Interpretation of Tabular Data Using Machine Learning

Advanced Data Mining with Weka (Class 5)

Introduction to Weka

Mathematica Document

WEKA: the Waikato Environment for Knowledge Analysis

Artificial Intelligence for the Prediction of Exhaust Back Pressure Effect On

Statistical Software

MLJ: a Julia Package for Composable Machine Learning

A Formal Methodology for the Comparison of Results from Different Software Packages

Statistický Software

The R Project for Statistical Computing a Free Software Environment For

Cumulation of Poverty Measures: the Theory Beyond It, Possible Applications and Software Developed