Using Online Data Sources to Make Recommendations on Reading Material for K-12 and Advanced Readers

Brigham Young University BYU ScholarsArchive Theses and Dissertations 2014-02-01 Using Online Data Sources to Make Recommendations on Reading Material for K-12 and Advanced Readers Maria Soledad Pera Brigham Young University - Provo Follow this and additional works at: https://scholarsarchive.byu.edu/etd Part of the Computer Sciences Commons BYU ScholarsArchive Citation Pera, Maria Soledad, "Using Online Data Sources to Make Recommendations on Reading Material for K-12 and Advanced Readers" (2014). Theses and Dissertations. 4378. https://scholarsarchive.byu.edu/etd/4378 This Dissertation is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. Using Online Data Sources to Make Recommendations on Reading Materials for K-12 and Advanced Readers Maria Soledad Pera A dissertation submitted to the faculty of Brigham Young University in partial fulfillment of the requirements for the degree of Doctor of Philosophy Yiu-Kai Ng, Chair David Embley Christophe Giraud-Carrier Eric Ringger Sean Warnick Department of Computer Science Brigham Young University February 2014 Copyright c 2014 Maria Soledad Pera All Rights Reserved ABSTRACT Using Online Data Sources to Make Recommendations on Reading Materials for K-12 and Advanced Readers Maria Soledad Pera Department of Computer Science, BYU Doctor of Philosophy Reading is a fundamental skill that each person needs to develop during early childhood and continue to enhance into adulthood. While children/teenagers depend on this skill to ad- vance academically and become educated individuals, adults are expected to acquire a certain level of proficiency in reading so that they can engage in social/civic activities and successfully participate in the workforce. A step towards assisting individuals to become lifelong readers is to provide them adequate reading selections which can cultivate their intellectual and emotional growth. Turning to (web) search engines for such reading choices can be overwhelming, given the huge volume of reading materials offered as a result of a search. An alternative is to rely on reading materials suggested by existing recommendation systems, which unfortunately are not capable of simultaneously matching the information needs, preferences, and reading abilities of individual readers. In this dissertation, we present novel recommendation strategies which iden- tify appealing reading materials that the readers can comprehend, which in turn can motivate them to read. In accomplishing this task, we have examined used-defined data, in addition to information retrieved/inferred from reputable and freely-accessible online sources. We have in- corporated the concept of “social trust” when making recommendations for advanced readers and suggested fiction books that match the reading ability of individual K-12 readers using our readability-analysis tool for books. Furthermore, we have emulated the readers’ advisory service offered at school/public libraries in making recommendations for K-12 readers, which can be ap- plied to advanced readers as well. A major contribution of our work is in the development of unsupervised recommendation strategies for advanced readers which suggest reading materials for both entertainment and learning acquisition purposes. Unlike their counterparts, these recommendation strategies are un- affected by the cold-start or long-tail problems, since they exploit user-defined data (if available) while taking advantage of alternative publicly-available metadata. Our readability-analysis tool is innovative, which can predict the readability-levels of books on-the-fly, even in the absence of excerpts from books, a task that cannot be accomplished by any of the well-known readability tools/strategies. Moreover, our multi-dimensional recommendation strategy is novel, since it simultaneously analyzes the reading abilities of K-12 readers, which books readers enjoy, why the books are appealing to them, and what subject matters the readers favor. Besides assisting K-12 readers, our recommender can be used by parents/teachers/librarians in locating reading materials to be suggested to their (K-12) children/students/patrons. We have validated the performance of each methodology presented in this dissertation using existing benchmark datasets or datasets we created for the evaluation purpose (which is another contribution we make to the research community). We have also compared the performance of our proposed methodologies with their corresponding baselines and state-of-the-art counterparts, which further verifies the correctness of the proposed methodologies. Keywords: Recommendation Systems, Readability, K-12, Readers’ Advisory ACKNOWLEDGMENTS I first want to thank my PhD committee, who were there every step of this journey known as graduate school. I am very thankful for Dr. Embley and Dr. Christophe Giraud-Carrier, their advice and suggestions were invaluable in completing my work. I also appreciate Dr. Ringger’s and Dr. Warnick’s willingness to serve as my committee members and their feedback to improve my work. Thanks go also to Jen, Jenny, and Gordon, who let me hang out at the CS Office on the good days and the not so good ones. My family and my friends, in the US and in Argentina, were my greatest support system, so to all of you all I can say is thank you... you are awesome! And of course, I want to thank my advisor, Dr. Yiu-Kai Ng, without whom I could not have produced this research. Dr. Ng has been an incredible mentor, who has helped me grow and improve both academically and personally. His support, encouragement, patience, and help in completing this work and throughout my time at BYU are deeply appreciated. To my Yaya. Table of Contents List of Figures x List of Tables xiv 1 Introduction 1 2 With a Little Help From My Friends: Generating Personalized Book Recommenda- tions Using Data Extracted from a Social Website 9 2.1 Introduction.................................... .. 10 2.2 RelatedWork ..................................... 11 2.3 OurProposedBookRecommender. .... 13 2.3.1 WordCorrelationFactors. ... 14 2.3.2 SelectingCandidateBooks. ... 14 2.3.3 RankingLibraryThingBooks . .. 15 2.4 ExperimentalResults . .. .. .. .. .. .. .. .. .... 19 2.4.1 ExperimentalData .............................. 20 2.4.2 EvaluationMetrics . .. .. .. .. .. .. .. .. 21 2.4.3 PerformanceEvaluationandComparisons . ....... 22 2.5 Conclusions..................................... 25 3 Exploiting the Wisdom of Social Connections to Make Personalized Recommenda- tions on Scholarly Articles 27 3.1 Introduction.................................... .. 27 v 3.2 RelatedWork ..................................... 29 3.3 OurProposedRecommender . ... 31 3.3.1 CiteULike................................... 32 3.3.2 WordCorrelationFactors. ... 33 3.3.3 SelectingCandidatePublications. ....... 33 3.3.4 RankingofScholarlyPublications . ...... 35 3.3.5 Observations ................................. 42 3.4 ExperimentalResults . .. .. .. .. .. .. .. .. .... 43 3.4.1 Dataset .................................... 44 3.4.2 EvaluationProtocol. .. 45 3.4.3 Metrics .................................... 45 3.4.4 PerformanceEvaluation . .. 47 3.4.5 ApplyingPReSAtoOtherDomains . .. 53 3.5 Conclusions..................................... 53 4 A Readability Level Prediction Tool for K-12 Books 56 4.1 Introduction.................................... .. 57 4.2 RelatedWork ..................................... 59 4.3 OurReadabilityAnalysisTool . ...... 60 4.3.1 MultipleRegressionAnalysis . .... 61 4.3.2 AnalyzingBookContent . 62 4.3.3 AnalyzingTopicalInformationMetadata . ....... 69 4.3.4 AnalyzingTargetedAudienceMetadata . ...... 74 4.3.5 ThePredictedReadabilityLevelofaBook . ...... 75 4.4 ExperimentalResults . .. .. .. .. .. .. .. .. .... 76 4.4.1 TheDataset.................................. 76 4.4.2 Metrics .................................... 78 4.4.3 PerformanceEvaluation . .. 78 vi 4.4.4 Trollie,anOnlinePrototypeofTRoLL . ..... 86 4.5 Conclusions..................................... 87 5 What to Read Next?: Making Personalized Book Recommendations forK-12Users 89 5.1 Introduction.................................... .. 90 5.2 RelatedWork ..................................... 92 5.2.1 ReadabilityFormulas/AnalysisTools . ....... 92 5.2.2 BookRecommenders............................. 93 5.3 AGradeLevelPredictionTool . ..... 93 5.4 TheBookRecommender .............................. 97 5.4.1 IdentifyingCandidateBooks . .... 97 5.4.2 ContentSimilarityMeasure . ... 98 5.4.3 ReadershipSimilarityMeasure. .101 5.4.4 RankAggregation ..............................102 5.5 ExperimentalResults . .. .. .. .. .. .. .. .. .103 5.5.1 TheDatasets .................................103 5.5.2 MetricsandEvaluationProtocol . .104 5.5.3 PerformanceEvaluationofReLAT. .105 5.5.4 PerformanceEvaluationofBReK12 . .106 5.6 ConclusionsandFutureWork. .109 6 Automating Readers’ Advisory to Make Book Recommendations for K-12 Readers 111 6.1 Introduction.................................... 112 6.2 RelatedWork ..................................... 114 6.2.1 ExtractingInformationfromReviews . .. .. .114 6.2.2 BookRecommenders. .. .. .. .. .. .. .. .. .115 6.3 Readers’Advisory(RA) .. .. .. .. .. .. .. .. 116 6.4 Appeal-TermDescriptions . .117 vii 6.5 OurProposedRecommender . 121 6.5.1 CandidateBooks ...............................122

Using Online Data Sources to Make Recommendations on Reading Material for K-12 and Advanced Readers

A Data-Driven Analysis of Workers' Earnings on Amazon Mechanical Turk

Amazon Mechanical Turk Developer Guide API Version 2017-01-17 Amazon Mechanical Turk Developer Guide

Dspace Cover Page

Amazon Mechanical Turk Requester UI Guide Amazon Mechanical Turk Requester UI Guide

Privacy Experiences on Amazon Mechanical Turk

Crowdsourcing for Speech: Economic, Legal and Ethical Analysis Gilles Adda, Joseph Mariani, Laurent Besacier, Hadrien Gelas

Using Amazon Mechanical Turk and Other Compensated Crowd Sourcing Sites Gordon B

Employment and Labor Law in the Crowdsourcing Industry

Amazon AWS Certified Solutions Architect

Five Processes in the Platformisation of Cultural Production: Amazon and Its Publishing Ecosystem

Requester Best Practices Guide

PRINCE: Provider-Side Interpretability with Counterfactual Explanations in Recommender Systems Azin Ghazimatin, Oana Balalau, Rishiraj Saha, Gerhard Weikum