Arxiv:2003.04726V1 [Cs.LG] 7 Mar 2020 Ces Indicating Some Regularities in a Data Matrix
New advances in enumerative biclustering algorithms with online partitioning Rosana Veronezea,∗, Fernando J. Von Zubena aUniversity of Campinas (DCA/FEEC), 400 Albert Einstein Street, Campinas, SP, Brazil Abstract This paper further extends RIn-Close CVC, a biclustering algorithm capable of performing an effi- cient, complete, correct and non-redundant enumeration of maximal biclusters with constant values on columns in numerical datasets. By avoiding a priori partitioning and itemization of the dataset, RIn-Close CVC implements an online partitioning, which is demonstrated here to guide to more infor- mative biclustering results. The improved algorithm is called RIn-Close CVC3, keeps those attractive properties of RIn-Close CVC, as formally proved here, and is characterized by: a drastic reduction in memory usage; a consistent gain in runtime; additional ability to handle datasets with missing values; and additional ability to operate with attributes characterized by distinct distributions or even mixed data types. The experimental results include synthetic and real-world datasets used to per- form scalability and sensitivity analyses. As a practical case study, a parsimonious set of relevant and interpretable mixed-attribute-type rules is obtained in the context of supervised descriptive pattern mining. Keywords: Enumerative biclustering, Online partitioning of numerical datasets, Efficient enumeration, Quantitative class association rules, Supervised descriptive pattern mining 1. Introduction Biclustering is a powerful data analysis technique that, essentially, consists in discovering submatri- arXiv:2003.04726v1 [cs.LG] 7 Mar 2020 ces indicating some regularities in a data matrix. It has been successfully applied in various domains, such as analysis of gene expression data, collaborative filtering, text mining, treatment of missing data, and dimensionality reduction.
[Show full text]