Web Usage Mining: Application to an Online Educational Digital Library Service
Total Page:16
File Type:pdf, Size:1020Kb
Utah State University DigitalCommons@USU All Graduate Theses and Dissertations Graduate Studies 5-2012 Web Usage Mining: Application To An Online Educational Digital Library Service Bart C. Palmer Utah State University Follow this and additional works at: https://digitalcommons.usu.edu/etd Part of the Education Commons, and the Philosophy Commons Recommended Citation Palmer, Bart C., "Web Usage Mining: Application To An Online Educational Digital Library Service" (2012). All Graduate Theses and Dissertations. 1215. https://digitalcommons.usu.edu/etd/1215 This Dissertation is brought to you for free and open access by the Graduate Studies at DigitalCommons@USU. It has been accepted for inclusion in All Graduate Theses and Dissertations by an authorized administrator of DigitalCommons@USU. For more information, please contact [email protected]. WEB USAGE MINING: APPLICATION TO AN ONLINE EDUCATIONAL DIGITAL LIBRARY SERVICE by Bart C. Palmer A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Instructional Technology and Learning Sciences Approved: Dr. Mimi Recker Dr. James Dorward Major Professor Committee Member Dr. Anne R. Diekema Dr. Jamison Fargo Committee Member Committee Member Dr. Andrew Walker Dr. Mark R. McLellan Committee Member Vice President for Research and Dean of the School of Graduate Studies UTAH STATE UNIVERSITY Logan, Utah 2012 ii Copyright c Bart C. Palmer 2012 All Rights Reserved iii ABSTRACT Web Usage Mining: Application to an Online Educational Digital Library Service by Bart C. Palmer, Doctor of Philosophy Utah State University, 2012 Major Professor: Dr. Mimi Recker Department: Instructional Technology & Learning Sciences This dissertation was situated in the crossroads of educational data mining (EDM), educational digital libraries (such as the National Science Digital Library; http://nsdl.org), and examination of teacher behaviors while creating online learning resources in an end-user authoring system, the Instructional Architect (IA; http://ia.usu.edu). The knowledge from data/database (KDD) framework for preparing data and finding patterns in large amounts of data served as the process framework in which a latent class analysis (LCA) was applied to IA user data. Details of preprocessing challenges for web usage data are included. A meaningful IA activity framework provided four general areas of user behavior features that assisted in the interpretation of the LCA results: registration and usage, resource collection, project authoring, and project usage. Four clusters were produced on two samples (users with 5–90 logins and those with 10–90 logins) from 22 months of data collection. The analyses iv produced nearly identical models with both samples. The clusters were named according to their usage behaviors: one-hit wonders who came, did, and left and we are left to wonder where they went; focused functionaries who appeared to produce some content, but in only small numbers and they did not share many of their projects; popular producers who produced small but very public projects that received a lot of visitors; and prolific producers who were very verbose, created many projects, and published a lot to their students with many hits, but they did not publish much for the public. Information about EDM within the context of digital libraries is discussed and implications for the IA, its professional development workshop, and the larger context of educational digital libraries are presented. (262 pages) v PUBLIC ABSTRACT Web Usage Mining: Application to an Online Educational Digital Library Service by Bart C. Palmer, Doctor of Philosophy This dissertation examined how users of the Instructional Architect (IA; http://ia.usu.edu) utilized the system in order to find online learning resources, place them in a new online instructional activities, and share and use them with students. The online learning resources can be found in educational digital libraries such as the National Science Digital Library (NSLD; http://nsdl.org) or the wider Web. Usage data from 22 months of IA use were processed to form usage features that were analyzed in order to find clusters of user behavior using latent class analysis (LCA). The users were segmented into two samples with 5–90 and 10–90 logins each. Four clusters were found in each sample to be nearly identical in meaning and were named according to their usage characteristics: one-hit wonders who came, did, and left and we are left to wonder where they went; focused functionaries who appeared to produce some content, but in only a small numbers and they did not share many of their projects; popular producers who produced small but very public projects that received a lot of visitors; and prolific producers who were very verbose, created many projects, and published a lot to their students with many hits, but they did not publish much for the public. Because this work was in the field of educational digital libraries and educational data mining, the process and results are discussed in that context with an eye to assist novice data miners become familiar with the process and caveats. Implications of user behavior clusters for the IA developers and professional development planners are discussed. The importance of using clustering for the users of educational digital library and end-user authoring tools lies in at least two different areas. First, the clusters can be a very useful framework in which to conduct future studies about how users use the system, from which the tool may be redesigned or simply enhanced to better support different kinds of use. Second, the ability to develop different instructional or promotional outreach for each class can help users accomplish their purposes more effectively, share ideas with other users, and help those struggling with the use of technology. vi ACKNOWLEDGMENTS First, thank you to a loving God in Heaven who has watched over my family while on this long and demanding journey. To my loving and capable wife, Jennifer, and my seven amazing children, thank you for tolerating long hours and short times together. To my parents, in-laws, and friends, thank you for not ever giving up on me through constant questions of how much longer this would take. Thank you to my advisor, Mimi Recker, for her tireless patience and gentle, nagging encouragement—this would not be here without you. To my committee members, Andy Walker, Anne Diekema, Jim Dorward, and Jamison Fargo, thank you for your willingness to take this project on and your help in making this a great report. I owe all of the requirement tracking help to the Instructional Technology and Learning Sciences staff, Melany Bodily and Launa Julander. Thank you to my colleagues at USU and particularly the Instructional Architect team over the years for good times and many discussions: Ye Liu, Jayang Park, Deonne Johnson, Xin Mao, Beijie Xu, M. Brooke Robertshaw, and many others who came and went. Thank you again to Jamison Fargo and the Center for Methodological & Data Sciences (OMDS) at USU for advice and direction with statistical and technical matters. Thank you Mary Ellen Heiner for reviewing this document. My appreciation goes to Ric Ott and the rest of the folks in the Research and Evaluation Department at the Missionary Training Center in Provo, Utah for their patience and concern that this project finish. vii Finally, I express appreciation to many who will never know of this work. To those who develop R, PostgreSQL, PHP, and the NSDL teams. Thanks to developers of LATEX, particularly for the APA6 package—your warnings not use this style for a thesis or dissertation went unheeded. Thank you, BYU Physics Department, for the BYUPhys class that provided a basis for this document. This material is based upon work supported by the National Science Foundation under Grants No. 840745 & 0434892, and Utah State University. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation. Bart C. Palmer viii CONTENTS Page ABSTRACT . iii ACKNOWLEDGMENTS . vi LIST OF TABLES . x LIST OF FIGURES . xi CHAPTER I. INTRODUCTION . 1 Statement of the Problem . 1 Purpose of the Study . 5 Definition of Terms . 7 Significance of the Study . 7 II. LITERATURE REVIEW . 9 Knowledge Discovery from Data/Databases . 11 Machine Learning . 16 Web Usage Mining . 23 Educational Data Mining . 26 Educational Digital Libraries and Data Mining . 37 Progressive EDM and the Instructional Architect . 42 Synthesis of Literature . 63 III. METHODS . 66 Purpose . 66 Research Questions . 67 Pattern Mining Methods . 68 Characterization Methods . 87 Methods for Reporting Data, Methods, and Tools . 88 Methods for KDD Documentation and Generalization . 88 IV. RESULTS . 90 Pattern Mining (KDD) Results . 90 Presentation Revisited: Characterization of Results . 151 Data, Methods, and Tool Report . 172 ix KDD Process Documentation . 179 V. CONCLUSIONS . 185 Research Summary . 185 Research Questions—Revisited . 188 Limitations . 207 Future Work . 209 REFERENCES . 212 APPENDICES . 222 Appendix A: Glossary . 223 Appendix B: Data Sources and Variables . 230 Appendix C: Detailed Feature Box Plots . 237 CURRICULUM VITÆ . 245 x LIST OF TABLES Table Page 1 Example Goals, Questions, and Data for Educational Web Usage Mining . 27 2 A Comparison of Metrics Available Within Each Data Source . 49 3 Study Overview . 68 4 Initial LCA Feature Space . 111 5 Included Direct Effects . 130 6 Summary of Final Models for Two Samples . 131 7 Cluster Size Summary . 133 8 Cluster Change Summary Between Models . 135 9 Cluster Means for Users with 5–90 Logins . 137 10 Cluster Means for Users with 10–90 Logins . 138 11 Binned and Shaded Normalized Cluster Means for 5–90 and 10–90 Logins . 152 12 Cluster Characterizations . 173 13 Comparison with a Similar Study . 187 14 Comparison of Cluster Description and Sizes with a Similar Study . 203 B1 Initial Data Transformations from the IARD Tracking Entries .