Document Clustering using Word Clusters via the Information Bottleneck Method
Noam Slonim and Naftali Tishby
School of Computer Science and Engineering and The Interdisciplinary Center for Neural Computation
The Hebrew University, Jerusalem 91904, Israel email: noamm,tishby @cs.huji.ac.il
Abstract have been studied in the context of information retrieval system- s: clustering the documents on the basis of the distributions of We present a novel implementation of the recently introduced in- words that co-occur in the documents, and clustering the words formation bottleneck method for unsupervised document cluster- using the distributions of the documents in which they occur (see
ing. Given a joint empirical distribution of words and documents, [28] for in-depth review). In this paper we propose a new method
´ µ