Context Driven Retrieval Algorithms for Semi-Structured Personal Lifelogs

Total Page:16

File Type:pdf, Size:1020Kb

Context Driven Retrieval Algorithms for Semi-Structured Personal Lifelogs Context Driven Retrieval Algorithms for Semi-Structured Personal Lifelogs by Liadh Kelly, B.Sc. (Hons), M.Sc. (Research) A dissertation submitted in fulfilment of the requirements for the award of Doctor of Philosophy (Ph.D.) to the Dublin City University School of Computing and Centre for Digital Video Processing Supervisor: Dr. Gareth J. F. Jones September 2011 I hereby certify that this material, which I now submit for assessment on the programme of study leading to the award of Doctor of Philosophy is entirely my own work, that I have exercised reasonable care to ensure that the work is original, and does not to the best of my knowledge breach any law of copyright, and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work. Signed: ____________ (Candidate) ID No.: ___________ Date: _______ CONTENTS Abstract x Acknowledgements xii I Introduction 1 1 Introduction 2 1.1 Motivation . 4 1.1.1 Memory and Context . 5 1.1.2 Implicit Indicators of Item Importance . 6 1.2 Hypothesis . 7 1.3 Thesis Aims and Contributions . 8 1.4 Thesis Outline . 10 II Background, Setup and Analysis 12 2 Towards Context Data for Personal Lifelog Retrieval 13 2.1 Introduction . 14 2.2 Memory and Context . 15 2.2.1 Personal Information Systems . 16 2.2.2 Recalled Context in Retrieval . 19 2.2.3 Beyond the Desktop . 21 2.3 Query Independent Context Data . 23 2.3.1 Biometric Response . 25 2.3.2 Biometric Response and the Digital Environment . 26 2.4 Evaluation in the Lifelogging Domain . 28 2.5 Conclusions . 32 3 Lifelog Test Sets 33 3.1 Introduction . 34 3.2 Test Set Structure . 37 3.3 Test Set Creation . 42 3.3.1 Computer Activity Monitoring . 43 3.3.2 Mobile Phone Activity Monitoring . 51 3.3.3 Microsoft SenseCam Images . 52 3.3.4 Context Data Generation . 54 i 3.4 Test Set Contents Analysis . 62 3.4.1 20 Month Test Set . 63 3.4.2 Biometric One Month Test Set . 66 3.5 Conclusions . 69 4 Towards Information Retrieval in the Lifelog Domain 70 4.1 Introduction . 71 4.2 Query Driven Retrieval Approaches . 73 4.2.1 Content Only Based Retrieval . 73 4.2.2 Multi-field Based Retrieval . 78 4.2.3 Integrating Query Independent Evidence . 83 4.3 Created Test Sets and Test Cases for Ranked Retrieval Technique De- velopment . 84 4.3.1 Textual Test Set Indexing . 85 4.3.2 Textual Test Case Creation . 87 4.4 Conclusions . 95 III Experimentation 96 5 Queried Content-and-Context-Based Retrieval Algorithms for the Lifelog- ging Domain 97 5.1 Introduction . 98 5.2 Experiment Procedure . 99 5.3 Investigation 1: Structured BM25 Content+Context-Based Retrieval - The Utility of Recalled Context in Lifelog Retrieval . 104 5.3.1 Results and Analysis . 104 5.3.2 Concluding Remarks . 112 5.4 Investigation 2: Flat BM25 and BM25F Content+Context-Based Retrieval114 5.4.1 Results and Analysis . 115 5.4.2 Concluding Remarks . 119 5.5 Investigation 3: Flat BM25F mod Content+Context-Based Retrieval - Moving Beyond the State of the Art . 119 5.5.1 BM25F mod1 . 121 5.5.2 BM25F mod2 . 124 5.5.3 BM25F mod3 . 126 5.5.4 Concluding Remarks . 129 5.6 Conclusions . 129 6 Extracting Important Items using Past Biometric Response 132 6.1 Introduction . 133 6.2 Experiment Setup . 134 6.2.1 Extracting Important Events . 134 6.2.2 Experiment Procedure . 142 6.3 Experiment Results and Analysis . 145 6.3.1 Detecting Important SenseCam Events . 146 6.3.2 Detecting Important Computer Items . 154 6.3.3 Concluding Remarks . 160 6.4 Discussion . 160 6.5 Conclusions . 162 ii 7 Static Scores: Boosting Relevant Items in Result Lists using Past Biometric Response 177 7.1 Introduction . 178 7.2 Experiment Procedure . 179 7.2.1 Static Relevance Scores . 180 7.3 Results and Analysis . 184 7.3.1 Overall Static Score Performance . 184 7.3.2 Performance Across Individual Subjects . 186 7.3.3 Performance of Biometric Tagged Result Set . 190 7.3.4 Further Analysis . 194 7.4 Discussion . 196 7.5 Conclusions . 198 IV Conclusions 199 8 Conclusions and Future Work 200 8.1 Conclusions . 200 8.1.1 Summary of Presented Work . 201 8.1.2 Achievement of Thesis Objectives . 206 8.2 Future Work . 207 8.2.1 Evaluation Techniques . 207 8.2.2 Context Data in Retrieval . 209 8.2.3 Biometrics in Retrieval . 211 8.3 Summary . 212 V Appendix 213 A List of Publications 214 A.1 Towards Context Data in Lifelog Retrieval . 214 A.2 Context Data in Lifelog Retrieval . 215 A.3 Biometrics in Lifelog Item Importance Detection . 216 A.4 Supporting Evaluation in the Lifelogging Domain . 218 A.5 SIGIR 2010 Desktop Search Workshop . 219 A.6 ECIR 2011 Evaluating Personal Search Workshop . 219 Bibliography 220 iii LIST OF TABLES 2.1 Complete set of content and context. 20 3.1 Complete set of content and context. 36 3.2 Items table fields and sample data. This table holds all unique items accessed by a subject. 39 3.3 Item access table fields and sample data. This table holds all accesses to items by a subject. 39 3.4 Campaignr table fields and sample data. This table holds all geo- locations and people present experienced by a subject. 40 3.5 Weather table fields and sample data. A weather table was created for each geo-location visited by a subject. Weather tables hold the weather conditions experienced by a subject. 40 3.6 Light status table fields and sample data. A light status table was created for each geo-location visited a subject. Light status tables hold the ambient light status experienced by a subject. 41 3.7 Biometrics table fields and sample data. This table holds all raw bio- metric data captured for a subject. 41 3.8 Source of information (for computer items) for title, application, con- tents, to, from, file path and URL fields in items table and beginDate, beginTime, endDate and endTime fields in item access table. 43 3.9 Number of distinct items in each subject’s collection. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other = class files, bib’s, logs, etc. 63 3.10 Number of PC item accesses in each subject’s collection. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other = class files, bib’s, logs, etc. 63 3.11 Number of laptop item accesses in each subject’s collection. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other = class files, bib’s, logs, etc. 64 iv 3.12 Total number of item accesses with geo-location tags in each subject’s collection. The percentages in brackets provide the percentage of the total number of item accesses on each device that these figures correspond to. The number of item accesses on the mobile phone with geo-location tags corresponds to the number of SMS messages with geo-location tags. 64 3.13 Number of distinct items in each subject’s ‘biometric month’ collec- tion. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other = class files, bib’s, logs, etc. 67 3.14 Number of PC item accesses in each subject’s ‘biometric month’ col- lection. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other = class files, bib’s, logs, etc. 67 3.15 Number of laptop item accesses in each subject’s ‘biometric month’ collection. Code file = java, c, h, etc; Excel files = CSV, XLS and XLSX files; Presentation = Keynotes and PowerPoint files; Text file = txt, dat, tex, etc, files; and Other.
Recommended publications
  • Connected Policing Framework White Paper Transforming Policing Through Technology
    A Microsoft Government Connected Policing Framework White Paper Transforming Policing Through Technology Published for the APCO-APA 2008 International Policing and Summer Conference The Microsoft Connected Policing Framework The Microsoft® Connected Policing Framework is the result of work between Microsoft, specialist partners and many public safety, law enforcement and judicial organizations to build solutions, technologies and services that assist in integrated and collaborative policing to combat crime and assist with citizen safety. We reuse this collective “Know-How” to share and reuse international best practices in areas such as combating cyber crime with innovative intelligence and investigation solution architectures. IMPROVED OUTCOMES FOR CITIZEN SAFETY AND SECURITY INFORMATION CHANNELS ITERATIVE FIRE POLICE JUSTICE INTELLIGENCE INTERNAL SERVICES SECURITY CYCLE EMERGENCY RESPONSE ER CASE & RECORDS MANAGEMENT INTEGRATED JUSTICE PLAN PARTNER SOLUTIONS PARTNER INTELLIGENCE & INVESTIGATION TRANSPORT & BORDER CONTROL CRITICAL INFRASTRUCTURE PROTECTION E-POLICING (CSP, PORTAL & CRM) PROCESS APPLICATION PLATFORM BLUEPRINTS GUIDES ORGANIZATIONAL PRODUCTIVITY REFERENCES REUSABLE IP INFRASTRUCTURE CORE INFRASTRUCTURE e.g. MIF PROGRESS PRIVACY ACCESSIBILITY USABILITY SECURITY/ID STANDARDS INTEROP POLICY CITIZENSHIP INITIATIVES (CETS & DART) REVIEW Contents Introduction 4 1.0 Public Safety 5 1.1 Emergency and disaster response 5 1.2 Crime and DisorderReduction Strategies 9 1.3 E-policing 10 1.4 Microsoft Citizen Service Platform 13 2.0
    [Show full text]
  • The Fourth Paradigm in Practice
    THE FOURTH PARADIGM IN PRACTICE PAGE 1 PAGE PAGE 2 3 PAGE PAGE 4 5 Science@Microsoft THE FOURTH PARADIGM IN PRACTICE © 2012 Microsoft Corporation. All rights reserved. Except where otherwise noted, content in this publication is licensed under the Creative Commons Attribution 3.0 United States license, available at www.creativecommons.org/licenses/by/3.0/legalcode. ISBN 978-0-9825442-1-1 Printed in the United States of America The information, findings, views, and opinions contained in this publication are those of the authors and do not necessarily reflect the views of Microsoft Corporation or Microsoft Research. Microsoft Corporation does not guarantee the accuracy of any information provided herein. Microsoft Research PAGE PAGE 6 7 Foreword IN THE 20 YEARS since its founding, Microsoft been making in a number of disciplines—from Research has grown from a small group of neurology and immunology to astronomy and climate researchers to more than 800 computer scientists and change—and describes the technologies that have researchers at labs on four continents. Throughout been deployed to gain these insights. We see our this growth, the mission of Microsoft Research has collaborative ventures and blue-sky research yielding remained consistent: to advance the state of the art returns both in the broader social arena and in in computer science and software engineering, and to improved products and services. It’s a classic case of take these technology advances to the public through “doing well by doing good.” our products. These stories also reveal interesting ways in which As in many areas of research, the eventual Microsoft and academic researchers are effectively applications and impact of computer-science applying computer science and technical computing research are challenging to predict and frequently research to fields far removed from computer science.
    [Show full text]
  • COMMUNICATIONS Cacm.Acm.Org of the ACM 05/2010 Vol.53 No.05
    COMMUNICATIONS cacm.acm.oRG OF THE ACM 05/2010 vol.53 no.05 Beyond total capture a constructive critique of Lifelogging Why Cloud Computing Will never be Free modeling the astronomical energy-effi cient algorithms aCm General election Association for Computing Machinery ACM SIGCHI Symposium on 2010 Engineering Interactive Computing Systems Berlin, German y ◊ June 19-23, 2010 General conference chairs Call for participations Jean Vanderdonckt (Université catholique de EICS is the second international conference devoted to the engineering of usable and Louvain, Belgium) & Noi Sukaviriya (IBM Research Center, USA) effective interactive computing systems. Systems of interest will include traditional workstation-based interactive systems, new and emerging modalities (e.g., gesture), Technical program chair entertaining applications (e.g., mobile and ubiquitous games) and development methods Michael Harrison (Newcastle University, UK) (e.g., extreme programming). Research papers & practical and experience reports chairs EICS focuses on methods and techniques, and the tools that support them, for designing and Jeffrey W. Nichols (IBM Almaden Research developing interactive systems. It brings together people who study or practice the Center, USA) & Sahin Albayrak (Technische engineering of interactive systems, drawing from the HCI, Software Engineering, Universität Berlin, Germany) Requirements Engineering, CSCW, Ubiquitous / Pervasive Systems and Game Development Late breaking results chairs communities. Angel Puerta (RedWhale Soft Corporation, USA) EICS encompasses the former EHCI (Engineering Human Computer Interaction, sponsored & Gavin Doherty (Trinity College Dublin, Ireland) by IFIP 2.7/13.4), DSV-IS (International Workshop on the Design, Specification and Demonstrations chairs Verification of Interactive Systems), CADUI (International Conference on Computer-Aided Heinrich Arnold (T-Labs, Germany), James Lin Design of User Interfaces) and TAMODIA (International Workshop on Task Models and (Google, USA) & Phil Gray (University of Glasgow, UK) Diagrams) conferences.
    [Show full text]
  • PROGRAM EARTH Electronic Mediations Series Editors: N
    PROGRAM EARTH Electronic Mediations Series Editors: N. Katherine Hayles, Peter Krapp, Rita Raley, and Samuel Weber Founding Editor: Mark Poster 49 Program Earth: Environmental Sensing Technology and the Making of a Computational Planet Jennifer Gabrys 48 On the Existence of Digital Objects Yuk Hui 47 How to Talk about Videogames Ian Bogost 46 A Geology of Media Jussi Parikka 45 World Projects: Global Information before World War I Markus Krajewski 44 Reading Writing Interfaces: From the Digital to the Bookbound Lori Emerson 43 Nauman Reiterated Janet Kraynak 42 Comparative Textual Media: Transforming the Humanities in the Postprint Era N. Katherine Hayles and Jessica Pressman, Editors 41 Off the Network: Disrupting the Digital World Ulises Ali Mejias 40 Summa Technologiae Stanisław Lem 39 Digital Memory and the Archive Wolfgang Ernst 38 How to Do Things with Videogames Ian Bogost 37 Noise Channels: Glitch and Error in Digital Culture Peter Krapp (continued on page 359) PROGRAM EARTH Environmental Sensing Technology and the Making of a Computational Planet JENNIFER GABRYS Electronic Mediations 49 UNIVERSITY OF MINNESOTA PRESS MINNEAPOLIS • LONDON The University of Minnesota Press gratefully acknowledges financial support for the publication of this book from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007–2013) / ERC Grant Agreement n. 313347, “Citizen Sensing and Environmental Practice: Assessing Participatory Engagements with Environments through Sensor Technologies.” An earlier version of chapter 1 was published as “Sensing an Experimental Forest: Processing Environments and Distributing Relations,” in Computational Culture 2 (2012), http://www.computationalculture.net. Portions of chapter 4 were previously published as “Ecological Observatories: Fluctuating Sites and Sensing Subjects,” in Field_Notes, edited by Laura Beloff, Erich Berger, and Terike Haapoja, 178– 87 (Helsinki: Finnish Bioarts Society, 2013).
    [Show full text]
  • Futures Microsoft’S European Innovation Magazine Issue N°4 I June 2009
    Futures Microsoft’s European Innovation Magazine Issue n°4 I June 2009 Computing and the environment: Tools to track threatened species Trust in digital life: Partnering for trustworthy ICT solutions The ultimate challenge? Modelling the human body Microsoft and the British Library: Meet the ‘extreme information user’ “ICT is the recovery” Viviane Reding, European Commissioner for Information Society and Media 1 Colophon Content Editor in Chief Thaima Samman, Associate General Counsel, Microsoft Editorial Board Jan Muehlfeit, Chairman Europe, Microsoft I. INNOVATION POLICIES FOR EUROPE Lisa Boch-Andersen, Senior Director Communications Europe, Microsoft Andre Hagehülsmann, Innovation Manager Europe, Microsoft On the agenda: bridging the industry–university gap 4 Peter Wrobel, Editorial Director, Science|Business ICT – the way out of the economic crisis 6 Laura Bacci, Associate Director, APCO Worldwide Jerick Parrone, Communications Coordinator-Publications, Microsoft A call for global innovation 9 European Research Council: set for growth, says president 10 External Contributors Richard Hudson, Science|Business Nuala Moran, Science|Business II. COMPUTING AND THE ENVIRONMENT Peter Wrobel, Science|Business Mary Lisbeth D’Amico Putting computer science to work on biodiversity 12 Jean-Fabrice Delaye Green ICT: the industry must lead the way 16 Lori Fortig Violette Frescobaldi More intelligence on the streets 16 Christina Giles Small cameras that not only see, they watch! 19 Anna Jenkinson ‘Water Watch’ makes a splash with EU bathers Randolph Walerius 21 Production Quadrant Communications III. SECURITY AND PRIVACY B-9000 Gent Do you have trust in digital life? 22 Layout and Design Data sharing: it’s all a matter of context 24 sittingonacornflake.be The face in the crowd 26 B-9040 Gent Illustrations IV.
    [Show full text]
  • MICROSOFT RESEARCH Научно-Исследовательское Подразделение Microsoft Работа Для Будущего Информационных Технологий 4
    MICROSOFT RESEARCH Научно-исследовательское подразделение Microsoft Работа для будущего информационных технологий 4 Научные открытия для всего человечества 6 Развитие инноваций 8 Облик грядущего 10 Простое и естественное взаимодействие с компьютерами 10 Повышение качества и надежности ПО 12 Хранение, извлечение и представление информации 14 Изучение проблем завтрашнего дня 16 Мы открываем новые горизонты 16 Новые знания во всех областях 18 Всемирное сотрудничество 18 Инвестиции в людей 20 Лучшие умы компьютерной науки — вместе 22 Инновации — в жизнь 28 Где работают лучшие умы? 30 Компьютеры станут умнее и полезнее 32 Работа для будущего информационных технологий С момента своего основания в 1991 г. Microsoft модель, вносит большой вклад в разработку про- Research выросла в одну из крупнейших и наиболее дуктов. Этот подход, уникальный для корпоратив- авторитетных организаций, занимающихся иссле- ных исследовательских подразделений, позволяет дованиями в области информационных технологий. Microsoft улучшать предлагаемые продукты и услуги На сегодня более 1100 блестящих ученых и инже- и создавать новые. При этом с его помощью мы неров — в числе которых лучшие умы в области обеспечиваем заметный прогресс во всех областях компьютерных технологий — в 13 лабораториях компьютерной науки. на четырех континентах идут к цели, неизменной Оценкой нашего вклада в бизнес является то, что на протяжении более 20 лет: новые технические Microsoft постоянно инвестирует в наши науч- достижения. но-исследовательские разработки. В отличие от Инновации — плоть и кровь продуктов и ус- других компаний, где исследовательские проекты луг Microsoft. Пока наши исследователи решают спонсируются отдельными продуктовыми группами, сложнейшие проблемы в сфере ИТ, делятся сво- Microsoft поддерживает свои исследования на кор- ими идеями с исследовательским сообществом и поративном уровне, чтобы гарантировать, что мы сотрудничают с продуктовыми группами Microsoft, смотрим дальше текущих продуктовых решений — мы можем быть уверены в том, что компания — и в задачи будущего.
    [Show full text]
  • Photo Sharing for Intergenerational Connections
    Photo Sharing for Intergenerational Connections ME310 Design Documentation Sara Jaafar ∙ Mishel Johns ∙ Shiquan Wang ∙ Xuesen Li Yingwei Li ∙ Yunjun Wu ∙ Jingxian Zhang ∙ Yikang Liu 1 Executive Summary Digital technology forms the basis for most remote communication today. As we gain more freedom in being able to communicate across distances and time zones, we also move a large proportion of our communication online. We use mobile phones, email, social networking, teleconferencing and video chatting regularly. However, not all people are comfortable with the use of technology. At particular disadvantage are seniors, especially those in developing countries, who did not encounter any digital technology in their youth and find learning to use modern devices extremely difficult. Our switch to digital communication has created a barrier between people living in the digital realm and those living in the physical one. Add to this the divergent interests of different generations, and migration to and between cities that tears families apart and the fast paced city lifestyle that more of the world is moving into, and we have a dire need of technology that can keep family members connected. Our mission is to bridge the gap between the Digital Foreigners (seniors), and their tech-savvy relatives, the Digital Natives (the youth), by creating a connection experience that takes into account the generation gap, physical separation, and lack of time for communication. We aim to provide an engag- ing and pleasurable experience for both generations by tailoring their end of the system to their needs. Sponsored by Microsoft Research Asia, the Stanford and USTC teams have worked for 30 weeks on researching the needs of the user, brainstorming for solutions, prototyping and testing multiple solutions before settling on a promising concept and building a device that can solve these problems.
    [Show full text]
  • Copyrighted Material
    Index 3D accelerometer 422 choosing among 339–42 3D desktop interface 181 generating 336–8 3D interactive maps 183 researching similar products 363–4 3D printing 386–7, 424 alternative hypothesis 485, 487 3D TV 40–1 Amazing Amanda, interactive doll 153 3D virtual environments 53–4, 178–9 ambient displays 3D goggles 115–16 design concept 43 augmented reality 211 heuristics 506–7 gaming 182 persuasiveness of 149–50 software toolkits for 179 real-time feedback 122–3 ‘404 error’ message 142 amnesia, SenseCam device 80 7 ∓ 2 rule 75–6 analogies see metaphors analysis of data see data analysis abstract artifacts 312 analytics, web 263–8, 301–3, 460, 514–18 abstract vs. realistic interfaces 180–1 annotation 95–6 accelerometers 192, 199, 421, 422, 465, 491 annoying interfaces 140–3 accessibility 18 answering machine, Bishop’s marble 3–4 interactive TV services 177 anthropomorphism 152–4, 217 Rehabilitation Act, Section 508, USA 506 API (application programming interface) 425–6 usability testing 483–4 apologies from computers 143 web content guidelines 188–9 app mentality 84–5 ActiMates 153 appliance interfaces 189–91 activity-centered design 322 apps for mobile devices 191–2 activity theory (AT) 310–14 Arduino toolkits 421–2 actuators, haptic interfaces 117, 118, 202, 203 artifacts, activity theory 311–12 Adobe PDF accessibility tools 484 assumptions 38–9 advertising 188, 268, 515 ‘At home with Bob’ software 140 aesthetically pleasing interfaces 138–9 ATMs (automated teller machines) 292 Affdex software, facial coding 143–4 attention 67–70 affective
    [Show full text]