Leveraging Content from Open Corpus Sources for Technology Enhanced Learning

Leveraging Content from Open Corpus Sources for Technology Enhanced Learning A Thesis submitted to the University of Dublin, Trinity College for the degree of Doctor in Philosophy Séamus Lawless Knowledge and Data Engineering Group, School of Computer Science and Statistics, Trinity College Dublin Submitted October 2009 Declaration I, the undersigned, declare that this work has not been previously submitted as an exercise for a degree at this or any other University, and that, unless otherwise stated, it is entirely my own work. ________________________ Séamus Lawless October 2009 ii Permission to lend or copy I, the undersigned, agree that the Trinity College Library may lend or copy this thesis upon request. ________________________ Séamus Lawless October 2009 iii ACKNOWLEGDEMENTS Many people have influenced the writing of this thesis and provided both guidance and support on what has been a challenging, yet enjoyable and rewarding journey. I would like to acknowledge these individuals and thank them for the contributions they have made over the past number years which have made this thesis and the work described herein possible. Firstly, I would like to thank my supervisor, Prof. Vincent Wade, whose vast knowledge and experience, patience and encouragement has made this work possible. Special thanks are also reserved for Dr. Lucy Hederman, my co-supervisor during much of this research, whose insightful input was invaluable. As importantly, I would like to thank my family. My parents, Betty and Jimmy, for their unconditional and unfailing love, encouragement, belief and guidance throughout my life. They have set an example which I will always aspire to live up to. My fiancée, Pam, whose love, kindness, encouragement and patience helped me persevere on what has been a long and sometimes difficult road. My brother and sisters, for always being there, whenever and wherever I have needed them. My family’s support and boundless humour have been the foundation for everything I have achieved. I would also like to extend my gratitude to the members of the Knowledge and Data Engineering Group (KDEG) in Trinity College Dublin. Their friendship and insightful contributions have had a significant impact on this research. A number of us have undertaken this journey together and have always provided a mutual willing ear during the difficult periods. Thanks are also due to my colleagues in the School of Computer Science and Statistics. Finally, I would like to express my gratitude to the Irish Research Council for Science, Engineering and Technology for funding the research detailed in this thesis. I sincerely thank you all. Although nature commences with reason and ends in experience it is necessary for us to do the opposite, that is to commence with experience and from this to proceed to investigate the reason. – Leonardo Da Vinci (1452 – 1519) iv ABSTRACT As educators attempt to incorporate the use of educational technologies in course curricula, the lack of appropriate and accessible digital content resources acts as a barrier to adoption. Quality educational digital resources can prove expensive to develop and have traditionally been restricted to use in the environment in which they were authored. As a result, educators who wish to adopt these approaches are compelled to produce large quantities of high quality educational content. This can lead to excessive workloads being placed upon the educator, whose efforts are better exerted on the pedagogical aspects of eLearning design. The accessibility, portability, repurposing and reuse of digital resources thus became, and remain, major challenges. The key motivation of this research is to enable the utilisation of the vast amounts of accumulated knowledge and educational content accessible via the World Wide Web in Technology Enhanced Learning (TEL). This thesis proposes an innovative approach to the targeted sourcing of open corpus content from the WWW and the resource-level reuse of such content in pedagogically beneficial TEL offerings. The thesis describes the requirements, both educational and technical, for a tool-chain that enables the discovery, classification, harvesting and delivery of content from the WWW, and a novel TEL application which demonstrates the resource-level reuse of open corpus content in the execution of a pedagogically meaningful educational offering. Presented in this work are the theoretical foundations, design and implementation of two applications: the Open Corpus Content Service (OCCS); and the User-driven Content Retrieval, Exploration and Assembly Toolkit for eLearning (U-CREATe). To evaluate and validate this research, a detailed analysis of the different aspects of the research is presented, outlining and addressing the discovery, classification and harvesting of open corpus content from the WWW and open corpus content utilisation in TEL. This analysis provides confidence in the ability of the OCCS to generate collections of highly relevant open corpus content in defined subject areas. The analysis also provides confidence that the resource-level reuse of such content in educational offerings is possible, and that these educational offerings can be pedagogically beneficial to the learner. A novel approach to the sourcing of open corpus educational content for integration and reuse in TEL is the primary contribution to the State of the Art made by this thesis and the research described therein. This approach differs significantly from those used by current TEL systems in the creation of learning offerings and provides a service which is considerably different to that offered by general purpose web search engines. v TABLE OF CONTENTS ACKNOWLEGDEMENTS ..........................................................................................iv ABSTRACT ................................................................................................................ v TABLE OF CONTENTS .............................................................................................vi TABLE OF FIGURES ................................................................................................ xii ABBREVIATIONS .................................................................................................... xvi 1 Introduction ......................................................................................................... 1 1.1 Motivation ...................................................................................................... 1 1.2 Research Question ....................................................................................... 5 1.3 Research Goals and Objectives .................................................................... 6 1.4 Research Contribution .................................................................................. 8 1.5 Research Approach ...................................................................................... 9 2 Technology Enhanced Learning and Educational Content ............................... 11 2.1 Introduction to Technology Enhanced Learning .......................................... 11 2.2 Educational Theories in Technology Enhanced Learning ........................... 12 2.2.1 Theoretical Categorisations of Learning ............................................... 13 2.2.1.1 Associationist/Empiricist Perspective ............................................. 13 2.2.1.2 Cognitive Perspective .................................................................... 14 2.2.1.3 Situative Perspective ...................................................................... 16 2.2.2 Mapping Educational Theory to the Pedagogical Design of Learning Environments .................................................................................................... 17 2.2.3 Analysis ................................................................................................ 20 2.2.4 Summary .............................................................................................. 21 2.3 Educational Content Creation, Dissemination and Utilisation...................... 21 2.3.1 Learning Objects and Content Modelling Standards ............................ 22 2.3.1.1 Analysis .......................................................................................... 25 2.3.2 Digital Content Repositories ................................................................. 27 2.3.2.1 Analysis .......................................................................................... 30 2.3.3 Content Publication, Aggregation and Social Applications ................... 34 2.3.3.1 Analysis .......................................................................................... 38 2.3.4 Content Utilisation in Personalised TEL................................................ 39 2.3.4.1 Analysis .......................................................................................... 42 vi 2.3.5 Digital Rights Management and Intellectual Property ........................... 42 2.3.6 Summary .............................................................................................. 43 2.4 Summary ..................................................................................................... 44 3 State of the Art of Information Retrieval on the WWW ...................................... 45 3.1 Introduction ................................................................................................. 45 3.2 The Evolution of Information Retrieval .......................................................

Leveraging Content from Open Corpus Sources for Technology Enhanced Learning

Crawling Frontier Controls

The Google Search Engine

Web Crawling, Analysis and Archiving

Detecting Malicious Websites with Low-Interaction Honeyclients

Web Manifestations of Knowledge-Based Innovation Systems in the U.K

Data-Intensive Text Processing with Mapreduce

Bachelor of Library and Information Science (Blisc.)

A Focused Web Crawler Using Link and Content Analysis for Relevence Prediction

Enhancing Web Search User Experience: from Document Retrieval to Task Recommendation

Department of Library and Information Science Svu College of Arts Tirupati

Data-Intensive Text Processing with Mapreduce

Spiders, Crawlers, Harvesters, Bots