Context Driven Retrieval Algorithms for Semi-Structured Personal Lifelogs

Context Driven Retrieval Algorithms for Semi-Structured Personal Lifelogs by Liadh Kelly, B.Sc. (Hons), M.Sc. (Research) A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.) to the Dublin City University School of Computing and Centre for Digital Video Processing Supervisor: Dr. Gareth J. F. Jones September 2011 I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of Doctor of Philosophy is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copyright, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work. Signed: ____________ (Candidate) ID No.: ___________ Date: _______ CONTENTS Abstract x Acknowledgements xii I Introduction 1 1 Introduction 2 1.1 Motivation . 4 1.1.1 Memory and Context . 5 1.1.2 Implicit Indicators of Item Importance . 6 1.2 Hypothesis . 7 1.3 Thesis Aims and Contributions . 8 1.4 Thesis Outline . 10 II Background, Setup and Analysis 12 2 Towards Context Data for Personal Lifelog Retrieval 13 2.1 Introduction . 14 2.2 Memory and Context . 15 2.2.1 Personal Information Systems . 16 2.2.2 Recalled Context in Retrieval . 19 2.2.3 Beyond the Desktop . 21 2.3 Query Independent Context Data . 23 2.3.1 Biometric Response . 25 2.3.2 Biometric Response and the Digital Environment . 26 2.4 Evaluation in the Lifelogging Domain . 28 2.5 Conclusions . 32 3 Lifelog Test Sets 33 3.1 Introduction . 34 3.2 Test Set Structure . 37 3.3 Test Set Creation . 42 3.3.1 Computer Activity Monitoring . 43 3.3.2 Mobile Phone Activity Monitoring . 51 3.3.3 Microsoft SenseCam Images . 52 3.3.4 Context Data Generation . 54 i 3.4 Test Set Contents Analysis . 62 3.4.1 20 Month Test Set . 63 3.4.2 Biometric One Month Test Set . 66 3.5 Conclusions . 69 4 Towards Information Retrieval in the Lifelog Domain 70 4.1 Introduction . 71 4.2 Query Driven Retrieval Approaches . 73 4.2.1 Content Only Based Retrieval . 73 4.2.2 Multi-field Based Retrieval . 78 4.2.3 Integrating Query Independent Evidence . 83 4.3 Created Test Sets and Test Cases for Ranked Retrieval Technique De- velopment . 84 4.3.1 Textual Test Set Indexing . 85 4.3.2 Textual Test Case Creation . 87 4.4 Conclusions . 95 III Experimentation 96 5 Queried Content-and-Context-Based Retrieval Algorithms for the Lifelog- ging Domain 97 5.1 Introduction . 98 5.2 Experiment Procedure . 99 5.3 Investigation 1: Structured BM25 Content+Context-Based Retrieval - The Utility of Recalled Context in Lifelog Retrieval . 104 5.3.1 Results and Analysis . 104 5.3.2 Concluding Remarks . 112 5.4 Investigation 2: Flat BM25 and BM25F Content+Context-Based Retrieval114 5.4.1 Results and Analysis . 115 5.4.2 Concluding Remarks . 119 5.5 Investigation 3: Flat BM25F mod Content+Context-Based Retrieval - Moving Beyond the State of the Art . 119 5.5.1 BM25F mod1 . 121 5.5.2 BM25F mod2 . 124 5.5.3 BM25F mod3 . 126 5.5.4 Concluding Remarks . 129 5.6 Conclusions . 129 6 Extracting Important Items using Past Biometric Response 132 6.1 Introduction . 133 6.2 Experiment Setup . 134 6.2.1 Extracting Important Events . 134 6.2.2 Experiment Procedure . 142 6.3 Experiment Results and Analysis . 145 6.3.1 Detecting Important SenseCam Events . 146 6.3.2 Detecting Important Computer Items . 154 6.3.3 Concluding Remarks . 160 6.4 Discussion . 160 6.5 Conclusions . 162 ii 7 Static Scores: Boosting Relevant Items in Result Lists using Past Biometric Response 177 7.1 Introduction . 178 7.2 Experiment Procedure . 179 7.2.1 Static Relevance Scores . 180 7.3 Results and Analysis . 184 7.3.1 Overall Static Score Performance . 184 7.3.2 Performance Across Individual Subjects . 186 7.3.3 Performance of Biometric Tagged Result Set . 190 7.3.4 Further Analysis . 194 7.4 Discussion . 196 7.5 Conclusions . 198 IV Conclusions 199 8 Conclusions and Future Work 200 8.1 Conclusions . 200 8.1.1 Summary of Presented Work . 201 8.1.2 Achievement of Thesis Objectives . 206 8.2 Future Work . 207 8.2.1 Evaluation Techniques . 207 8.2.2 Context Data in Retrieval . 209 8.2.3 Biometrics in Retrieval . 211 8.3 Summary . 212 V Appendix 213 A List of Publications 214 A.1 Towards Context Data in Lifelog Retrieval . 214 A.2 Context Data in Lifelog Retrieval . 215 A.3 Biometrics in Lifelog Item Importance Detection . 216 A.4 Supporting Evaluation in the Lifelogging Domain . 218 A.5 SIGIR 2010 Desktop Search Workshop . 219 A.6 ECIR 2011 Evaluating Personal Search Workshop . 219 Bibliography 220 iii LIST OF TABLES 2.1 Complete set of content and context. 20 3.1 Complete set of content and context. 36 3.2 Items table fields and sample data. This table holds all unique items accessed by a subject. 39 3.3 Item access table fields and sample data. This table holds all accesses to items by a subject. 39 3.4 Campaignr table fields and sample data. This table holds all geo- locations and people present experienced by a subject. 40 3.5 Weather table fields and sample data. A weather table was created for each geo-location visited by a subject. Weather tables hold the weather conditions experienced by a subject. 40 3.6 Light status table fields and sample data. A light status table was created for each geo-location visited a subject. Light status tables hold the ambient light status experienced by a subject. 41 3.7 Biometrics table fields and sample data. This table holds all raw biometric data captured for a subject. 41 3.8 Source of information (for computer items) for title, application, contents, to, from, file path and URL fields in items table and beginDate, beginTime, endDate and endTime fields in item access table. 43 3.9 Number of distinct items in each subject’s collection. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other = class files, bib’s, logs, etc. 63 3.10 Number of PC item accesses in each subject’s collection. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other = class files, bib’s, logs, etc. 63 3.11 Number of laptop item accesses in each subject’s collection. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other = class files, bib’s, logs, etc. 64 iv 3.12 Total number of item accesses with geo-location tags in each subject’s collection. The percentages in brackets provide the percentage of the total number of item accesses on each device that these figures correspond to. The number of item accesses on the mobile phone with geo-location tags corresponds to the number of SMS messages with geo-location tags. 64 3.13 Number of distinct items in each subject’s ‘biometric month’ collection. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other = class files, bib’s, logs, etc. 67 3.14 Number of PC item accesses in each subject’s ‘biometric month’ collection. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other = class files, bib’s, logs, etc. 67 3.15 Number of laptop item accesses in each subject’s ‘biometric month’ collection. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other.

Context Driven Retrieval Algorithms for Semi-Structured Personal Lifelogs

Connected Policing Framework White Paper Transforming Policing Through Technology

The Fourth Paradigm in Practice

COMMUNICATIONS Cacm.Acm.Org of the ACM 05/2010 Vol.53 No.05

PROGRAM EARTH Electronic Mediations Series Editors: N

Futures Microsoft’S European Innovation Magazine Issue N°4 I June 2009

MICROSOFT RESEARCH Научно-Исследовательское Подразделение Microsoft Работа Для Будущего Информационных Технологий 4

Photo Sharing for Intergenerational Connections

Copyrighted Material