WEKA a Data Mining Tool

WEKA a Data Mining Tool

WEKA A Data Mining Tool By Susan L. Miertschin 1 Data Mining Task Types Numerous Algorithms Classification C4.5 Decision Tree Clustering K-Means Clustering Discovering Association Rules Discovering Sequential Patterns – Sequence Analysis RiRegression Detecting Deviations from Normal 2 http://www. cs. waikato. ac. nz/ml/weka/ WEKA can be freelyyyg downloaded by visiting the Web site 3 WEKA – Data Mining Software Developed by the Machine Learning Group, University of Waikato , New Zealand Vision: Build state-of-the-art software for developing machine learning (ML) techniques and apply them to real- world data-mining problems DeveloppJed in Java 4 WELA’s Collection of Machine Learning Algorith ms Algorithms for data mining tasks WEKA is open source software issued under the GNU General Public License TlTools for: Data pre-processing Classification Regression Clustering Association rules Visualization 5 After Installing - Start WEKA 6 WEKA Main Interface 7 WEKA Sample Files C:\Program Files\weka\data WEKA formatted files (.arff) Open the contact-lenses file 8 Example – Contact Lens Data How many data instances are in the file? How many attributes? Numerical attributes? Categorical attributes? 9 Example – Contact Lens Data Can you think of problems that might be solved with this data? 10 Example – Contact Lens Data If supervised learning were to be done, which would be the output attribute, do you think? 11 Example – Contact Lens Data 12 Example – Contact Lens Data 13 Example – Contact Lens Data 14 ElExample – Classif y - CttContact Lens DDtata 15 ElExample – Classif y - CttContact Lens DDtata 16 ElExample – Classif y - CttContact Lens DDtata Select the rule generator named PART from the list that shows up after you select Choose 17 ElExample – Classif y - CttContact Lens DDtata 18 ElExample – Classif y - CttContact Lens DDtata 19 10-Fold Cross-Validation Data is partitioned into 10 equally (or nearly equally) sized segments or folds 10 iterations of training and validation are completed In each iteration a ddffifferent fldfold of the data is hhldeld out for validation, with the remaining 9 folds used for learning 20 http://www.public.asu.edu/~ltang9/papers/ency-cross-validation.pdf ElExample – Classif y - CttContact Lens DDtata 21 ElExample – Classif y - CttContact Lens DDtata IF tear-prod- rate = reduced THEN contact- lenses = none IF astigma tism = no THEN con tact-leesesnses = soft 22 ElExample – Classif y - CttContact Lens DDtata Coverage = 12 23 ElExample – Classif y - CttContact Lens DDtata IF tear-prod- rate = reduced THEN contact- lenses = none IF astigma tism = no THEN con tact-leesesnses = soft 24 ElExample – Classif y - CttContact Lens DDtata Coverage = 6 Misclassification = 1 Accuracy = 5/6 = 83.3% 25 ElExample – Classif y - CttContact Lens DDtata 26 WEKA A Data Mining Tool By Susan L. Miertschin 27.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    27 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us