
WEKA A Data Mining Tool By Susan L. Miertschin 1 Data Mining Task Types Numerous Algorithms Classification C4.5 Decision Tree Clustering K-Means Clustering Discovering Association Rules Discovering Sequential Patterns – Sequence Analysis RiRegression Detecting Deviations from Normal 2 http://www. cs. waikato. ac. nz/ml/weka/ WEKA can be freelyyyg downloaded by visiting the Web site 3 WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato , New Zealand Vision: Build state-of-the-art software for developing machine learning (ML) techniques and apply them to real- world data-mining problems DeveloppJed in Java 4 WELA’s Collection of Machine Learning Algorith ms Algorithms for data mining tasks WEKA is open source software issued under the GNU General Public License TlTools for: Data pre-processing Classification Regression Clustering Association rules Visualization 5 After Installing - Start WEKA 6 WEKA Main Interface 7 WEKA Sample Files C:\Program Files\weka\data WEKA formatted files (.arff) Open the contact-lenses file 8 Example – Contact Lens Data How many data instances are in the file? How many attributes? Numerical attributes? Categorical attributes? 9 Example – Contact Lens Data Can you think of problems that might be solved with this data? 10 Example – Contact Lens Data If supervised learning were to be done, which would be the output attribute, do you think? 11 Example – Contact Lens Data 12 Example – Contact Lens Data 13 Example – Contact Lens Data 14 ElExample – Classif y - CttContact Lens DDtata 15 ElExample – Classif y - CttContact Lens DDtata 16 ElExample – Classif y - CttContact Lens DDtata Select the rule generator named PART from the list that shows up after you select Choose 17 ElExample – Classif y - CttContact Lens DDtata 18 ElExample – Classif y - CttContact Lens DDtata 19 10-Fold Cross-Validation Data is partitioned into 10 equally (or nearly equally) sized segments or folds 10 iterations of training and validation are completed In each iteration a ddffifferent fldfold of the data is hhldeld out for validation, with the remaining 9 folds used for learning 20 http://www.public.asu.edu/~ltang9/papers/ency-cross-validation.pdf ElExample – Classif y - CttContact Lens DDtata 21 ElExample – Classif y - CttContact Lens DDtata IF tear-prod- rate = reduced THEN contact- lenses = none IF astigma tism = no THEN con tact-leesesnses = soft 22 ElExample – Classif y - CttContact Lens DDtata Coverage = 12 23 ElExample – Classif y - CttContact Lens DDtata IF tear-prod- rate = reduced THEN contact- lenses = none IF astigma tism = no THEN con tact-leesesnses = soft 24 ElExample – Classif y - CttContact Lens DDtata Coverage = 6 Misclassification = 1 Accuracy = 5/6 = 83.3% 25 ElExample – Classif y - CttContact Lens DDtata 26 WEKA A Data Mining Tool By Susan L. Miertschin 27.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages27 Page
-
File Size-