Streaming Data Algorithms For High Quality Clustering
Liadan O Callaghan Nina Mishra Adam Meyerson Sudipto Guha
Ra jeev Motwani
Octob er
Abstract
As data gathering grows easier and as researchers discover new ways to interpret data streaming
data algorithms have b ecome essential in many elds Data stream computation precludes algo
rithms that require random access or large memory In this pap er we consider the problem of
clustering data streams which is imp ortant in the analysis a variety of sources of data streams
such as routing data telephone records web do cuments and clickstreams We provide a new clus
tering algorithms with theoretical guarantees on its p erformance We give empirical evidence of
its sup eriority over the commonly used k Means algorithm