ELKI: ELKI: ELKI: a Software System for Evaluation a Software System
Total Page:16
File Type:pdf, Size:1020Kb
Institute for Informatics ELKI: Ludwig- Maximilians- A Software System for Evaluation Universityy Munich of Subspace Clustering Algorithms KDDTKDDTask k Environment for coordinates data source,,pp application,li i , manages data readingreading, DeveLoping and result. result DtbDatabase CtiConnection provides database to KDD-Applications main class Supported by Indendex-StrStructures ct res Index Database Structure Motivation: DtDataata miningiig researchesea ch ldleadseads tto manyayalgorithmsagolith t s fforo similarsaiil tktasks. A fifair andd usefulflcomparisonp i off ththese algorithmslithg iis KDDTask difficult due to several reasons: applies algorithm on database • Implementations of comparison partners are not at hand. Database • If implementations of different authors are providedprovided, an prints result evaluation in terms of efficiency is biased to evaluate the to desired effortseosff t ofof diffdifferentdee t authorsauth o s iin efficienteceffi i t programmingpogai gitdinsteads ead location RltResult off evaluatingltig algorithmiclithig meritsit . On the other handhand, efficient data management tools like collects result index-structures can show considerable impact on data flithfrom algorithmg mining tasks and are therefore useful for a broad variety of Algorithmg algorithms. Result-File IdIndex Central class KDDTask gets assignedidd a data File USAGEUSAGE: StStructure t source, an algorithm, javaj de.lmu.ifi.dbs.elki.KDDTask -algorithm : <class> Classname of an algorithm (implementing and a target address for de.lmu.ifi.dbs.elki.algorithm.Algorithm -- available classes: thehl result. -->d>de.lmu.ifi.dbs.el l ifi db lkiki.algorithm.APRIORI l ith APRIORI -->de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN -->de.lmu.ifi.dbs.elki.algorithm.clustering.DeLiClu -->de.lmu.ifi.dbs.elki.algorithm.clustering.EM Database Parser -->de.lmu.ifi.dbs.elki.algorithm.clustering.KMeans -->de>de.lmu.ifi.dbs.elki.a lmu ifi dbs elki algorithmlgorithm.clustering.OPTICS clustering OPTICS -->de.lmu.ifi.dbs.elki.ade.lmu.ifi.dbs.elki.algorithm.clustering.SLINK -->de.lmu.ifi.dbs.elki.algorithm.clustering.SNNClustering -->de.lmu.ifi.dbs.elki.algorithm.clustering.biclustering.ChengAndChurch -->de.lmu.ifi.dbs.elki.algorid l ifi db lki l iththm.clustering.biclustering.FLOC l t i bi l t i FLOC -->de>de.lmu.ifi.dbs.elki.algori lmu ifi dbs elki algorithmthm.clustering.biclustering.MaPle clustering biclustering MaPle SSome parameters -->de.lmu.ifi.dbs.elki.algorithm.clustggggering.biclustering.PClustering define classes -->de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.CASH DtbDatabase -->de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.COPAC expecting own -->de>de.lmu.ifi.dbs.elki.algori lm ifi dbs elki algorithmthm.clustering.correlation.ERiC cl stering correlation ERiC CiConnection -->de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.FourC parameters iin turn, -->de.lmu.ifi.dbs.elki.algorithm.clustering.correlation.ORCLUS handed down from the -->de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.CLIQUE KDDTKDDTask k -->de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.DiSH main-classclass. -->de>de.lmu.ifi.dbs.elki.algori lmu ifi dbs elki algorithmthm.clustering.subspace.PreDeConclustering subspace PreDeCon -->de.lmu.ifi.dbs.elki.algoride. u. .dbs.e .a go tthm.clustering.subspace.PROCLUS .c uste g.subspace. OC US -->de.lmu.ifi.dbs.elki.algorithm.clustering.subspace.SUBCLU ). Either full name to identify classpath or only classname, if its package is dde.lmu.ifi.dbs.elki.algorithm. l ifi db lki l ith -dbc : <class> Classname of a database connection (implementing de.lmu.ifi.dbs.elki.database.connection.DatabaseConnection -- Distance available classes: -->de.lmu.ifi.dbs.elki.database.connection.FileBasedDatabaseConnection -->de>de.lmu.ifi.dbs.elki.database.connec lmu ifi dbs elki database connectiontion.InputStreamDatabaseConnection InputStreamDatabaseConnection Function -->de.lmu.ifi.dbs.elki.database.connection.MultipleFileBasedDatabaseConnection Algorithmg ). Either full name to identify classpath or only classname, if its package is de.lmu.ifi.dbs.elki.database.connection. Default: de.lmu.ifi.dbs.elki.database.connection.FileBasedDatabaseConnection. -description : <class> Name of a class to obtain a description - for classes that implementp AlAlgorithm ith - de.lmu.ifi.dbs.elki.utilities.optionhandling.Parameterizable) -- no further processing will be performed. -hFltbtihlh : Flag to obtain help-message, eitherith for f the th main i -routineti or PtParameters for any specified algorithm. algorithm Causes immediate stop of the program.pg Result-File -help : Flag to obtain help-message, either for the main-routine or ⋅⋅⋅ for any specified algorithm. Causes immediate stop of the programprogram. -norm : <class> Classname of a normalization in order to use a database with normalized values (implementing de.lmu.ifi.dbs.elki.normalization.Normalization -- available classes: IdIdea: … IIn ELKIELKI,, dtdata miningiig algorithmslithg andd dtdata managementg t tktasks FFor supportedpp tdalgorithms,lithg,a shorthtditidescriptionp iis providedp id d on are separatedp and allow for an independentp evaluation. This request: separation makes ELKI unique among data mining jjava –jjar elki.jar lki j -ditidescription dde.lmu.ifi.dbs.elki.algorithm.clu l ifi db lki l ith lstering.correlation.ERiC t i l ti ERiC frameworks like Weka or YALE and frameworks for index ERiC: Exploring Relationships among Correlation Clusters Performs the DBSCAN algorithmggp on the data using a special distance function taking into account correlations among attributes and builds a hierarchy structures like GiST. At the same timetime, ELKI is open to that allows multiple inheritance from the correlation clustering result. arbitrarybi ddata types, didistance or similarityiili measures, or filfile Reference: E. E Achtert,C, C. Böhm,H, H.-PP. Kriegel,P, P. Kröger, and AA. Zimek:On: On Exploring Complex Relationships of Correlation Clusters. In Proc. 19th ftformats. ThThe fdfundamental tlapproachpp h iis ththe idindependencep d off International Conference on Scientific and Statistical Database Management (SSDBM 2007), Banff, Canada, 2007 file parsersp or database connectionsconnections, data typestypes,yp distancesdistances, … distance functionsfunctions, and data mining algorithms. Helper This work is continued – find the source-code and binariesbinaries, classesclasses, e.g. for algebraic or analytic computations are documentationdocumentation, and bug-reports via: available for all algorithms on equal terms. http://wwwhttp://www.dbs.ifi.lmu.de/research/KDD/ELKI/ dbs ifi lmu de/research/KDD/ELKI/ Elke AchtertAchtert, Hans-Peter KriegelKriegel, Arthur Zimek Database Group.