WEKA A Tool

By Susan L. Miertschin

1 Data Mining

Task Types Numerous Algorithms  Classification  C4.5 Decision Tree  Clustering  K-Means Clustering  Discovering Association Rules  Discovering Sequential Patterns – Sequence Analysis  RiRegression  Detecting Deviations from Normal

2 http://www. cs. waikato. ac. nz/ml/weka/

WEKA can be freelyyyg downloaded by visiting the Web site

3 WEKA – Data Mining Software  Developed by the Group, University of Waikato , New Zealand  Vision: Build state-of-the-art software for developing machine learning (ML) techniques and apply them to real- world data-mining problems  DeveloppJed in Java

4 WELA’s Collection of Machine Learning Algorith ms  Algorithms for data mining tasks  WEKA is open source software issued under the GNU General Public License  TlTools for:  Data pre-processing  Classification  Regression  Clustering  Association rules  Visualization

5 After Installing - Start WEKA

6 WEKA Main Interface

7 WEKA Sample Files  :\Program Files\weka\data  WEKA formatted files (.arff)  Open the contact-lenses file

8 Example – Contact Lens Data

How many data instances are in the file? How many attributes? Numerical attributes? Categorical attributes?

9 Example – Contact Lens Data

Can you think of problems that might be solved with this data?

10 Example – Contact Lens Data If supervised learning were to be done, which would be the output attribute, do you think?

11 Example – Contact Lens Data

12 Example – Contact Lens Data

13 Example – Contact Lens Data

14 ElExample – Classif y - CttContact Lens DDtata

15 ElExample – Classif y - CttContact Lens DDtata

16 ElExample – Classif y - CttContact Lens DDtata

Select the rule generator named PART from the list that shows up after you select Choose

17 ElExample – Classif y - CttContact Lens DDtata

18 ElExample – Classif y - CttContact Lens DDtata

19 10-Fold Cross-Validation  Data is partitioned into 10 equally (or nearly equally) sized segments or folds  10 iterations of training and validation are completed  In each iteration a ddffifferent fldfold of the data is hhldeld out for validation, with the remaining 9 folds used for learning

20 http://www.public.asu.edu/~ltang9/papers/ency-cross-validation.pdf ElExample – Classif y - CttContact Lens DDtata

21 ElExample – Classif y - CttContact Lens DDtata

IF tear-prod- rate = reduced THEN contact- lenses = none

IF astigma tism = no THEN con tact-leesesnses = soft

22 ElExample – Classif y - CttContact Lens DDtata

Coverage = 12

23 ElExample – Classif y - CttContact Lens DDtata

IF tear-prod- rate = reduced THEN contact- lenses = none

IF astigma tism = no THEN con tact-leesesnses = soft

24 ElExample – Classif y - CttContact Lens DDtata  Coverage = 6  Misclassification = 1  Accuracy = 5/6 = 83.3%

25 ElExample – Classif y - CttContact Lens DDtata

26 WEKA A Data Mining Tool

By Susan L. Miertschin

27