Tutorials SIGMOD ’20, June 14–19, 2020, Portland, OR, USA Automating Exploratory Data Analysis via Machine Learning: An Overview Tova Milo Amit Somech Tel Aviv University Tel Aviv University
[email protected] [email protected] ABSTRACT up-close, better understand its nature and characteristics, Exploratory Data Analysis (EDA) is an important initial step and extract preliminary insights from it. Typically, EDA is for any knowledge discovery process, in which data scien- done by interactively applying various analysis operations tists interactively explore unfamiliar datasets by issuing a (such as filtering, aggregation, and visualization) – the user sequence of analysis operations (e.g. filter, aggregation, and examines the results of each operation, and decides if and visualization). Since EDA is long known as a difficult task, which operation to employ next. requiring profound analytical skills, experience, and domain EDA is long known to be a complex, time consuming pro- knowledge, a plethora of systems have been devised over cess, especially for non-expert users. Therefore, numerous the last decade in order to facilitate EDA. lines of work were devised to facilitate the process, in multi- In particular, advancements in machine learning research ple dimensions [10, 24, 25, 31, 42, 46]: First, simplified EDA 1 have created exciting opportunities, not only for better fa- interfaces (e.g., [25], Tableau ) allow non-programmers to cilitating EDA, but to fully automate the process. In this effectively explore datasets without knowing scripting lan- tutorial, we review recent lines of work for automating EDA. guages or SQL. Second, numerous solutions (e.g., [6, 14, 22]) Starting from recommender systems for suggesting a single were devised to improve the interactivity of EDA, by re- exploratory action, going through kNN-based classifiers and ducing the running times of exploratory operations, and active-learning methods for predicting users’ interestingness displaying preliminary sketches of their results.