Towards an Analytics Query Engine ∗ Nantia Makrynioti Vasilis Vassalos Athens University of Economics and Business Athens University of Economics and Business Athens, Greece Athens, Greece
[email protected] [email protected] ABSTRACT with black box libraries, evaluating various algorithms for a This vision paper presents new challenges and opportuni- task and tuning their parameters, in order to produce an ef- ties in the area of distributed data analytics, at the core of fective model, is a time-consuming process. Things become which are data mining and machine learning. At first, we even more complicated when we want to leverage paralleliza- provide an overview of the current state of the art in the area tion on clusters of independent computers for processing big and then analyse two aspects of data analytics systems, se- data. Details concerning load balancing, scheduling or fault mantics and optimization. We argue that these aspects will tolerance can be quite overwhelming even for an experienced emerge as important issues for the data management com- software engineer. munity in the next years and propose promising research Research in the data management domain recently started directions for solving them. tackling the above issues by developing systems for large- scale analytics that aim at providing higher-level primitives for building data mining and machine learning algorithms, Keywords as well as hiding low-level details of distributed execution. Data analytics, Declarative machine learning, Distributed MapReduce [12] and Dryad [16] were the first frameworks processing that paved the way. However, these initial efforts suffered from low usability, as they offered expressive but at the same 1.