2020 International Conference on Computational Science and Computational Intelligence (CSCI)

Intuitive Time-Series-Analysis-Toolbox for Inexperienced Data Scientists

Felix Pistorius∗, Daniel Baumann∗, Luca Seidel‡ and Eric Sax∗ Institute for Information Processing Technologies (ITIV), Karlsruhe Institute of Technology (KIT) 76131 Karlsruhe, Germany ∗Email:{felix.pistorius, d.baumann, eric.sax}@kit.edu ‡[email protected]

Abstract—There are many different procedures to carry out based on it [5]. However, these models offer only a rough a data mining project. Depending on the application, various granular structure of how a process can be implemented and methods have to be chosen, which is normally done by experts. do not specify an explicit sequence of methods [6]. Therefore, For inexperienced users without machine learning experience, it is difficult to analyse their data without help. standard process models for data mining and knowledge dis- To simplify the access to data mining particularly for multi- covery require experience and educated guesses (e.g. CRISP- variate time series analysis, we propose an intuitive toolbox that DM [7]). guides step-by-step through data mining process. Furthermore, The realization of such process models is suitable for it supports inexperienced users with pre-suggested methods in data science experts. These experts are highly sought after, every data mining process step. Therefore, specifications for such a toolbox are defined and a prototype for the realization of a therefore, not every company has the possibility to hire them toolbox is presented. The work steps which are based on the for data mining projects [8]. Nevertheless, companies urge knowledge discovery process can be adapted and changed to suit the use of data mining in their processes. Thus users who different application scenarios. work for the first time on data mining projects are often Index Terms —data mining application, intuitive software, inexperienced. They are neither familiar with data mining and KDD, multivariate time series KDDM processes nor carry out whole data mining projects. Without in-depth experience, it is difficult to know which I. INTRODUCTION AND MOTIVATION methods are the most appropriate for each KDDM process The advancing change to Industry 4.0 and the increasing step. A survey of the ”Fraunhofer Institute for Manufacturing connectivity of technical systems coming with it provide Engineering and Automation” (Fraunhofer IPA) has shown access to a large amount of data. Many sectors, such as that more than 60% of the interviewed participants (employees insurance or the automotive industry, hope to gain advantages from technical professions) desire simplified software solu- by saving costs and making business-critical decisions faster tions to perform data mining without specific knowledge [9]. and more efficient. Moreover, new markets can be opened up There is a lack of an intuitive and easy understanding KDDM [1]. Finding patterns, trends and associations in the collected tool, particularly for multivariate time series, as they occur in data plays a central role in achieving such goals. Useful the monitoring of machines using a large number of sensors. information only emerges, when the knowledge gets extracted This is because common data mining applications do not from the collected data. Data mining, the core of knowledge consider and compare time series as a whole, but interpret each discovery processes, provides access to this knowledge. To data point independently. Such a tool should guide the user study the data, algorithms are derived and models are devel- step by step through all tasks of a data mining process, such oped to uncover previously unknown patterns. The model is as preprocessing, transformation, data mining, visualization. used to understand phenomena from the data resulting from In doing so, the user should apply his expertise in a targeted the analysis. Predictions can also be derived from it [2]. way because such users only have a limited understanding Due to the increasing development in the research field of of data mining, but a precise insight and deep understanding ”Big Data”, the number of different methods and algorithms of the problem. We focus on a software solution that guides rises. In the area of classification analysis, for example, more the user through a defined data mining workflow. Based on than 150 different algorithms are available [3]. Every method user input, appropriate algorithms and methods for the whole or algorithm has its advantages and disadvantages depending KDDM process are applied. on the purpose of use. Furthermore, steps like prepossessing, In the first part of section II the already established software transformation, evaluation and visualization are needed to solutions are introduced. Then in the second part the spec- perform a proper analysis. Therefore, users have to choose ifications for an intuitive Time-Series-Analysis-Toolbox are methods and algorithms separately for every application. Since defined. Based on these specifications, the developed concept the introduction of the Knowledge Discovery in Databases pro- for such a toolbox is presented in section III and in IV. cess of Fayyad [4], many different Knowledge Discovery and In section V the toolbox itself and its achieved results are Data Mining process models (KDDM) have been developed discussed.

978-1-7281-7624-6/20/$31.00 ©2020 IEEE 401 DOI 10.1109/CSCI51800.2020.00075 II. RELATED TOOLBOX AND ITS LIMITATIONS FOR and accessible knowledge from data [4], especially time series. INEXPERIENCED KDDM USERS A. The most used related KDDM-Toolbox 2) Workflow: An optimal workflow is essential for the suc- cess of a knowledge discovery process. Knowledge discovery Gartners ”Magic Quadrant for Data Science and Machine process models only provide a framework for which tasks Learning Platforms” is a visualization of the annual results should be performed in which order, but not the methods of market research conducted by the company Gartner Inc. themselves, which are used for the analysis. An optimal knowl- and is divided into four ”quadrants” [10]. The further right edge discovery process is realized when the right methods and a vendor of a Gartner KDDM framework is positioned, the parameters have been chosen considering the underlying data more complete is the vision to support the user in all aspects and goals of the project. The supporting application should of KDDM. The higher up a framework is positioned, the more suggest suitable methods and parameters to the user in the suitable it is be for the intended analysis during execution. context of the existing data and user input. First-time users can Gartner includes several subcategories for this overarching be supported even more effectively by an automatic selection criterion, such as market understanding, bundling of expertise, of parameters and methods without user input. or global marketing. The leading companies include SAS, An optimal workflow is thus composed of a manual Alteryx, and Mathworks [10]. or automated run, in which parameters and methods are SAS and Alteryx offer software solutions for the analysis suggested based on the data. of business data from financial services and retail. Technical time series as sensor data cannot be interpreted as such. With 3) Visual representation: Found patterns and associations the help of a code-free environment, data mining processes in data are only useful if they can be understood [4]. can be created via a flow chart. The programs are designed to Therefore, a proper visual presentation of results is important. serve the demands of analysts and machine learning experts Especially with multivariate data mining, the representation and only offer limited help for inexperienced users [11] [12]. of patterns is often difficult. Accordingly patterns and With Matlab MathWorks offers a widely used platform for results have to be visualized comprehensibly through using analytical calculations [13]. Matlab can be extended through appropriate methods. When users can understand and track toolboxes. Concerning data mining, the Neural Network Tool- the results, it will be easier for them to become familiar box, the Optimization Toolbox, the Statistics Toolbox, and with machine learning. A modern and intuitive design of the Curve Fitting Toolbox are the most popular. With these the user interface is also essential for a successful workflow, extensions, Matlab can be used to carry out data mining as it contributes to an intuitive and therefore more effective processes. To perform data mining tasks, machine learning operation. experience and programming knowledge are required. A software solution that makes data mining accessible 4) Scope: A large number of methods are required to view to domain experts without a machine learning experience data from many different perspectives. As already mentioned, is required. Furthermore, such a solution should offer com- there are a large number of different methods and algorithms. prehensive analysis methods for technical time series with An intuitive toolbox should include a sufficient spectrum of physical reference. analysis tools, but not overload the user with an oversupply. An excessive number inhibits an intuitive workflow. A B. Focus and needs of a tool for KDDM inexperienced users compromise between complexity and applicability has to be As shown in section II-A, there are potential software found in order to support the user in the best possible way. solutions, especially for business management applications. However, toolboxes in the development and research area 5) Structure: It can be assumed that the data mining for users without Machine Learning experience have special industry will continue to grow enormously in the future [10]. requirements. Such dynamics constantly bring new processes and methods 1) Focus: The existing software solutions are designed to them. To take full advantage of such a development, a for use by analysts, in other words, a group of users who toolbox must be modular. A modular structure allows methods are already familiar with automated data analysis. These to be replaced or novel approaches to be integrated into the users have a high level of Machine Learning experience toolbox. A cross-platform toolbox or a cloud application and are familiar with data mining and knowledge discovery enables it to operate independently of the local systems. process models. Significantly fewer applications allow technical experts to evaluate data. Domain experts are, for To ensure a future-proof toolbox, a modular and platform example, developers who are familiar with their domains and independent structure is needed. generate specific data such as sensor time series. Especially in data mining of time series from technical measurements, III. CONCEPT FOR A INTUITIVE TOOLBOX there is a lack of easily understandable software tools. A data mining project can be designed in many ways [6]. Therefore, an intuitive multivariate time series toolbox The challenge for a toolbox is to find a suitable sequence of should also enable domain experts to extract new, useful, operations in context of the problem at hand and to provide

402 access to data mining for users without a machine learning experience. 1 3 5 Since the introduction of the methodology of Knowledge Discovery in Databases (KDD) [4] further process models have been developed. Among the most popular are the CRISP- DM and SEMMA process [15]. However, all processes require data mining expertise. Before individual methods are selected -5 0 5 2 4 motor current (A) in each KDDM step, the data set is first roughly analyzed and methods are selected trough an educated guess. To overcome this challenge, we use a fixed sequence of basic steps, each of 0 500 1000 1500 2000 which suggests one method. The resulting process is shown in table III-B. As with the KDD process, the steps selection, data points preprocessing, transformation, data mining and evaluation are Fig. 1. Five measuring instances of a motor current measurement performed one after the other. Each process step is divided into further sub-steps, which are also performed step by step. In the following the processing steps are explained and the this example five times, which are compared in the analysis. methods used are briefly described. Therefore, the data set has to be divided into instances. The instances can contain any number of data points, depending A. Selection on how long the event lasts. The data basis is created in the selection. For this purpose, Depending on the question, experts can narrow down the the problem definition and problem environment must be data. Problem knowledge enables engineers to determine that determined to select the correct data. the solution they are looking for occurs with a certain behavior In the run-up to the actual data mining project, the problems of the measuring systems. This knowledge can be used to filter and targets must be defined and, if necessary, divided into the data set and to limit its size. sub-problems. Also, it should be determined which indicators can be used to consider the data mining project as successful C. Transformation and completed. Engineers are very familiar with their domain, While in preprocessing the data set is considered as a whole, this domain knowledge can be used. Therefore, knowledge during the transformation the characteristics of the features are from the environment of the problem should be gathered in examined more closely. To be able to compare time series in order to be incorporated into the later process. Depending on a meaningful way, a standardization must first be performed. the problem and the goals, the appropriate analysis method For this purpose, they can e.g. be mapped to a fixed range or (Clustering for unlabeled data, Classification for discrete la- z-normalized. beled data, regression for continuous labeled data or anomaly A crucial task of transformation in data mining is the detection for filtering the dataset) is chosen. Accordingly, reduction of redundant features. With the help of domain appropriate data sets, in which usable information on the knowledge, irrelevant features can be identified and removed, problem is presumably contained, will be provided afterwards. even without analysis methods. Sensor values that are not In the last step of the selection the structure of the data causally related to the desired application can lead to invalid set resp. the format is checked. If the data format cannot be results. To identify the features responsible for the differences, handled, a conversion must be performed. If the time series clustering can be performed based on the correlation of the are not available as separate signals, a separation is performed features. If the clusters of the supposedly identical instances additionally. are now compared with each other, the features responsible for the differences in the signal characteristics will be identified. B. Preprocessing In preprocessing, the quality of the entire data set is con- sidered and, if possible, improved. When using several data TABLE I sets, it must be ensured that the same features and a uniform PROCESS STEPS FOR MINING TIME SERIES unit between the data sets are available. Otherwise, the data Selection Preprocessing Transformation sets must be modified to unify them. problems and targets unify standardization Outliers are often caused by faulty measurement methods domain knowledge quality enhancement feature reduction analysis method time framing feature generation and can lead to incorrect results. To enhance the quality of a data sets filter the data feature selection data set, these can be removed. Furthermore, signals without structure and separation variance can be neglected for further considerations. Data Mining For data mining with time series, time framing is a classification or clustering or regression or anomaly detection Interpretation necessary step. This is illustrated in Fig. 1. A time series visualization consists of a continuous measurement like the blue motor explanation current curve. During the measurement, events occur, like in evaluation

403 To compare the cluster results of the instances, the Goodman properties of the data set. Alternatively, the algorithms and and Kruskal’s gamma coefficient is a suitable method [14]. used hyperparameters can be adjusted manually. After the feature reduction has mainly removed features that E. Interpretation contribute none or incorrect information, the features with the highest information content are selected in the feature The results are visualized in suitable 2D plots. For high- selection. On the one hand, the information content, measured dimensional data, a 2D representation is achieved by selecting by the variance, can be determined based on a Principal suitable features or by a feature reduction technique. A princi- Component Analysis. On the other hand, a force directed pal component analysis or the t-distributed stochastic neighbor graph can be generated based on the correlation, which divides embedding method can be used as a reduction method. To the features into groups with similar properties. The most simplify the explanation of the data mining models, Local promising features can then be selected from these groups. Interpretable Model-Agnostic Explanations (lime) can be used In the context of time series, the feature generation plays a to generate a diagram for each instance, which breaks down decisive role. For time series to be processed by common data the features responsible for the decision and their contributions mining algorithms, they must be converted into single-element [22]. The user can then evaluate the results and modify the data. Various approaches are conceivable for this task. The settings if necessary and redo parts of the process. mean value of an instance can serve as a suitable data point IV. CONCEPT FOR EMBEDDING THE PROCESS AS GUI [2]. A more advanced method is to use Dynamic Time Warping The goal is to create a code-free environment in which distances related to a suitable reference signal as a single inexperienced users can perform data mining projects. To element value [18]. Reference signals can either be specified enable users to follow the steps of the presented analysis by experienced users or synthesized using the Dynamic Time process, it is embedded in a . The Warping Barycenter Averaging algorithm [19]. user interface is based on a wizard assistant. Therefore, it D. Data Mining consists of individual interaction dialogs. Each process step is implemented as a dialog page. Thus users are guided step There are four major data mining application classes: clas- by step through the entire process. Within each dialog page, sification, clustering, regression and anomaly detection. the user only has to set a limited number of parameters Within these classes there is a large number of different data and choices. This prevents inexperienced users from being mining algorithms, with different advantages and disadvan- overwhelmed. The dialog pages communicate via a standard tages. However, based on user input, suitable and unsuitable interface. This makes the realization of various program flows algorithms can be determined. For this purpose, a ranking possible. Depending on the user input, certain dialogue pages concept, as described in [20] based on weighted efficiency can be skipped. Furthermore, such a structure offers model criteria, such as training time or memory consumption, is used maintenance. to propose a selection of suitable algorithms. This ranking In addition to simple operation, assistance is also important concept can be seen as an evaluation matrix in which the to implement an intuitive toolbox. With information texts efficiency criteria are contained as columns, while each row on each dialog page, the essential tasks is explained and represents one algorithm. Each algorithm is evaluated in additional information can be provided. Furthermore, the re- relation to the efficiency criteria considered. To determine a quired parameters are already pre-set. Users can, therefore, ranking for a specific use case, the user weights the criteria also continue with default parameters. To offer inexperienced from 0 (irrelevant) to 2 (very relevant).The summation of users optimal support, the transformation steps can be car- the weighted criteria results in a certain overall evaluation, ried out completely automatically. The toolbox uses pre-set the algorithm with the lowest total sum is considered the parameters and selects suitable features based on the data. most promising and is given first place in the ranking. All To enable experienced users to apply their knowledge in the other algorithms follow the overall rating. This weighting best possible way, the transformation can also be carried out means that the individual goals associated with a particular only partially automated. This means that individual process application case are included in the algorithm selection without steps such as the feature selection are carried out manually the user needing extensive knowledge of the advantages and while the rest is automated. To enable cross-person processing disadvantages of the algorithm. For example, if a problem of projects, individual sessions can be exported and loaded. requires a very understandable model, this can be expressed One requirement for an intuitive toolbox is to be accessible by the appropriate weighting. The properties of the algorithm across platforms. Therefore, the toolbox is created with the R are inherently included in the weighting matrix and do not ”Shiny” extension [23]. This allows the toolbox to be hosted have to be considered individually. This concept enables an as a webpage on a server and can be used by any standard objective, reproducible selection procedure in an easily under- Internet-capable device. Thus there is no binding to a local standable way while maintaining the differentiated suitability system or local resources. considerations that the knowledge and experience of an expert can provide. Another problem are the hyperparameters of V. R ESULTS AND EVALUATION the algorithms, which are responsible for the performance The toolbox resulting from the concept, which is imple- of most algorithms [21]. These can be estimated by the mented as a prototype, is presented below and is published as

404 a free webpage application (see [24]). First, the basic structure display current of the toolbox is explained based on a dialog page. Afterward, dialog page an evaluation is made according to the specific criteria defined in section II-B. Finally, the results achieved with a completely automated process are presented. navigationbar A. Presentation of the Toolbox popover

For the screen recording of the toolbox dialog page ”Time input time framing Windowing & Filtering” shown in Fig. 2, a real data set from the context of the automotive industry is used, which contains measurements of 515 opening and closing processes of a double-sided bus door with approx. 400 samples each. The remaining dialog pages are based on the structure of the representation shown in Fig. 2. Only the input and output Input and output for objects are different. Each dialog page has a superordinate time framing display in which the progress already achieved can be tracked (red ). Underneath it, there is a (yellow frame), that allows to navigate to any process step. This en- sures the iterative behavior of knowledge discovery processes back and next despite the step-by-step process. The shown dialog page is progressbar divided into two halves. On the right is the optional data set filtering, which can be activated by the user. The time frame is displayed on the left side. There are various options Fig. 2. Dialog page of time framing and feature filtering available for dividing the data set into instances. To inform the user about the respective methods, popovers can be opened, 3) Scope: A detailed analysis of technical time series which contain additional information and instructions. In this with physical reference by using classification, clustering, example, instantiation using a reference signal is selected. regression or anomaly detection analysis is possible. A wide Using a suitable reference signal, selectable with the cursor the range of implementations is covered by thirteen different data set is divided into instanc) (graph in the dark blue frame). classification algorithms, like Random Forest (RF), Aver- You can see the motor current form of an opening process of aged one-dependence estimators (AODE) or Averaging Neural the bus door. Consequently, each new opening process in the Network (avNNet). Five different clustering algorithms, like data set is interpreted as a new instance. After pressing the Density Based Spatial Clustering of Applications with Noise next the set parameters are processed. To give the user (DBSCAN), and five different anomaly detection algorithms, feedback about the progress of the calculations a like Autoencoder are used. By selecting the most promising opens. methods and a partially automated selection of algorithms, the B. Evaluation of the toolbox user is not overwhelmed by an oversupply of methods. The evaluation of the toolbox is done according to the 4) Structure: The Shiny environment offers the possibility criteria defined in section II-B. to host the application from a server. Therefore it can be 1) Workflow: The workflow is based on the steps of the used independently from the local system. However, in server KDD process. Users are navigated step by step through the operation via the free servers provided by RStudio , the dialog pages, with the buttons ”back”, and ”next”, they are publisher of Shiny, there may be limitations when exporting guided through the entire process. Especially inexperienced classification models because the serial conversion of Java- users can be supported by the automated execution of the based models cannot proceed. This restriction can be overcome dialog pages. The manual execution also offers the possibility when using own servers, or by using local applications. This to bring in specific knowledge. Information about the methods problem can be avoided with local application versions. The is provided by the popover. toolbox has a modular structure. All methods are implemented 2) Visual representation: The R-package Shiny enables as functions and can be modified as desired. Dialog pages are the creation of a modern user interface, which is intuitively structured as blocks and communicate via a standard interface, designed through the input elements. To present results and which makes it easy to exchange and modify the program flow. data, a visual representation is increasingly used, due to its better comprehensibility. At the same time graphics offer the C. Results of automated classification analysis possibility to document results. The navigation bar allows To evaluate the automated analysis of the intuitive Time- the user to switch between KDD steps via a superordinate Series-Analysis-Toolbox a classification analysis is performed. structure, for example, to make changes in past steps. The The achieved accuracy of the classification models of the deliberately reduced number of different options on each thirteen different implemented algorithms serves as the evalu- dialog page prevents the user from being overloaded. ation criterion. As a data set, we used the Condition Mon-

405 used can convey knowledge to users. Gained experience from wKNN SVM-rad this study should be integrated into future toolbox versions. To SVM-pol further improve the results obtained in an automated toolbox SVM-lin run, techniques for parameter estimation of algorithms and RF automated transformation methods can be further developed KNN and refined. CNN CART REFERENCES C4.5 [1] Freiknecht, Jonas, and Stefan Papp. (2018). Big Data in der Praxis. Carl avNNet Hanser Verlag GmbH Co KG, ISBN 978-3-446-45396-8 AODE [2] Maimon, O. and Rokach L. (2010). ”Data Mining and Knowledge 60 70 80 90 Discovery Handbook.” Springer - ISBN 978-0-387-09822-7 [3] Fernandez-Delgado, M. et. al. (2014). Do we Need Hundreds of Classi- Accuracy in percentage (%) fiers to Solve Real World Classification Problems?. Journal of Machine Learning Research. 15. 3133-3181. [4] Fayyad, Usama, Gregory Piatetsky-Shapiro, and Padhraic Smyth. (1996). Fig. 3. Results of an automated toolbox execution ”From data mining to knowledge discovery in databases.” AI magazine [5] Alnoukari, M. and El Sheikh, A. (2012). Knowledge Discovery Process Models: From Traditional to Agile Modeling. ISBN 978-1-61350-050-7. [6] Chapman, Pete, et. al. ”CRISP-DM 1.0: Step-by-step data mining guide.” itoring of Hydraulic System data set from UCI repository SPSS inc 9 (2000) [17]. The dataset contains 17 sensor time series as well [7] Dorschel, Joachim. (2015). Praxishandbuch big data. Springer Fachme- as categories which allow to infer the behavior of dien Wiesbaden, ISBN 978-3-658-07289-6 [8] Team, Economic Graph. ”LinkedIn Workforce Report— United States— the hydraulic cooling circuit [16]. We used 1000 instances August 2018.” Retrieved December 1 (2018): 2018. and a label with four different categories. The classification [9] Weskamp, M. and Schatz, A. (2014). Studie Einsatz und Nutzenpoten- models are created using the Time-Series-Analysis-Toolbox. ziale von Data Mining in Produktionsunternehmen. [10] Krensky, P. et. al. (2020). Gartner Magic Quadrant for Data Science The transformation as well as the data mining are performed and Machine Learning Platforms. Published: 11 February 2020 - ID: completely automated. This simulates the use case in which G00385005 - https://www.gartner.com/en/documents/3980855. the user has no Machine Learning experience. The results can [11] Alteryx: Industry Solutions. https://www.alteryx.com/solutions/industry. Version: 2020 therefore be reproduced by any user. Fig. 3 shows the achieved [12] Sarma, Kattamuri S. (2017). Predictive modeling with SAS enterprise accuracy of the individual algorithms. Some algorithms such miner: Practical solutions for business applications. SAS Institute as wKNN, SVM-pol, CART, C4.5, or AODE achieved results [13] The MathWorks, Inc. 2020. https://www.mathworks.com/products/matlab. html?stid=hpproductsmatlab above 80% or as in the case of RF even above 90%. The [14] Morris, Raymond N. Multiple Correlation and Ordinally Scaled Data*. results are comparable or even better than shown by Nikolai (1970) doi: 10.2307/2574649 Helwig, et. al [16]. Considering the fact that any user can [15] Piatetsky, Gregory. KDnuggets. (2014). CRISP-DM, still the top methodology for analytics, data mining, or data science projects. reproduce these results since no domain knowledge or machine https://www.kdnuggets.com/2014/10/crisp-dm-top-methodology- learning experience is required, it is clear that the shown analytics-data-mining-data-science-projects.html results demonstrate how easily good results can be achieved [16] Nikolai Helwig, et. al., ”Condition Monitoring of a Complex Hydraulic System Using Multivariate Statistics”, in Proc. 2015 IEEE International with the toolbox. Instrumentation and Measurement Technology Conference [17] Dua, Dheeru ; Graff, Casey: UCI Machine Learning Repository. http: VI. CONCLUSION AND FUTURE WORK //archive.ics.uci.edu/ml. Version: 2017 [18] Pistorius, F., Grimm, D., Auer, M., Sax, E. (2019). Time Series Class- To provide inexperienced data mining users access to Ma- fication of Automotive Test Drives Using an Interval Based Elastic Ensemble. In Proceedings of International conference on Time Series chine Learning, this paper presents a software solution with and Forecasting (ITISE-2019), Granada, vol. 2, pages 927-939 an intuitive interface for the implementation of multivariate [19] Petitjean, Franc¸ois; Ketterlin, Alain; Ganc¸arski, Pierre. (2011). A global time series data mining projects. First, specifications were averaging method for dynamic time warping, with applications to clustering. doi: 10.1016/j.patcog.2010.09.013 defined for a user friendly software for inexperienced users. [20] Pistorius, F., Grimm, D., Erdosi,¨ F., Sax, E. (2020). Evaluation Matrix Based on these characteristics a concept is presented which for Smart Machine-Learning Algorithm Choice. Proceedings of The supports users in the application by a stepwise execution of International Conference on Big Data Analytics and Practices (IBDAP) [21] Stang, M., Meier, C., Rau, V., and Sax, E. (2019). An Evolutionary defined process steps and an automated parameter adjustment. Approach to Hyper-Parameter Optimization of Neural Networks. In With the help of real data sets, consisting of technical time International Conference on Human Interaction and Emerging Technolo- series, the automated app run was tested. The test shows gies (pp. 713-718). Springer, Cham. [22] Ribeiro, M. et. al. (2016). ”Why should I trust you? - Explaining the how convincing results, performed with the toolbox, can be predictions of any classifier.” Proceedings of the 22nd ACM SIGKDD achieved even without Machine Learning experience. The international conference on knowledge discovery and data mining. partially automated execution also allows the users expertise [23] by RStudio, Shiny. (2020) ”A web application framework for R.” (https://shiny.rstudio.com/). to be integrated into certain process steps. [24] Free prototype version of the Intuitive Time-Series-Analysis-Toolbox: To evaluate how inexperienced users deal with the toolbox https://seidellu.shinyapps.io/KDDApp/ and the concept, a study should be performed with test persons. One possibility that could be explored is the extent to which information texts and instructions on the methods and graphics

406