Hillview: A trillion-cell spreadsheet for big data Mihai Budiu Parikshit Gopalan Lalith Suresh
[email protected] [email protected] [email protected] VMware Research VMware Research VMware Research Udi Wieder Han Kruiger Marcos K. Aguilera
[email protected] University of Utrecht
[email protected] VMware Research VMware Research ABSTRACT Unfortunately, enterprise data is growing dramatically, and cur- Hillview is a distributed spreadsheet for browsing very large rent spreadsheets do not work with big data, because they are lim- datasets that cannot be handled by a single machine. As a spread- ited in capacity or interactivity. Centralized spreadsheets such as sheet, Hillview provides a high degree of interactivity that permits Excel can handle only millions of rows. More advanced tools such data analysts to explore information quickly along many dimen- as Tableau can scale to larger data sets by connecting a visualiza- sions while switching visualizations on a whim. To provide the re- tion front-end to a general-purpose analytics engine in the back- quired responsiveness, Hillview introduces visualization sketches, end. Because the engine is general-purpose, this approach is either or vizketches, as a simple idea to produce compact data visualiza- slow for a big data spreadsheet or complex to use as it requires tions. Vizketches combine algorithmic techniques for data summa- users to carefully choose queries that the system is able to execute rization with computer graphics principles for efficient rendering. quickly. For example, Tableau can use Amazon Redshift as the an- While simple, vizketches are effective at scaling the spreadsheet alytics back-end but users must understand lengthy documentation by parallelizing computation, reducing communication, providing to navigate around bad query types that are too slow to execute [17].