DIMACS Series in Discrete Mathematics
and Theoretical Computer Science
Synopsis Data Structures for Massive Data Sets
Phillip B Gibb ons and Yossi Matias
Abstract Massive data sets with terabytes of data are b ecoming common
place There is an increasing demand for algorithms and data structures that
provide fast resp onse times to queries on such data sets In this pap er we
describ e a context for algorithmic work relevant to massive data sets and a
framework for evaluating suchwork We consider the use of synopsis data
structures which use very little space and provide fast typically approxi
mated answers to queries The design and analysis of e ective synopsis data
structures o er many algorithmic challenges We discuss a numb er of concrete
examples of synopsis data structures and describ e fast algorithms for keeping
them up to date in the presence of online up dates to the data sets