Managing Tail Latency in Large Scale Information Retrieval Systems

MANAGING TAIL LATENCY IN LARGE SCALE INFORMATION RETRIEVAL SYSTEMS A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy. Joel M. Mackenzie Bachelor of Computer Science, Honours – RMIT University Supervised by Associate Professor J. Shane Culpepper Associate Professor Falk Scholer School of Science College of Science, Engineering, and Health RMIT University Melbourne, Australia October, 2019 DECLARATION I certify that except where due acknowledgement has been made, the work is that of the author alone; the work has not been submitted previously, in whole or in part, to qualify for any other academic award; the content of the thesis is the result of work which has been carried out since the official commencement date of the approved research program; any editorial work, paid or unpaid, carried out by a third party is acknowledged; and, ethics procedures and guidelines have been followed. I acknowledge the support I have received for my research through the provision of an Australian Government Research Training Program Scholarship. Joel M. Mackenzie School of Science College of Science, Engineering, and Health RMIT University, Melbourne, Australia October 15, 2019 iii ABSTRACT As both the availability of internet access and the prominence of smart devices continue to increase, data is being generated at a rate faster than ever before. This massive increase in data production comes with many challenges, including efficiency concerns for the storage and retrieval of such large-scale data. However, users have grown to expect the sub-second response times that are common in most modern search engines, creating a problem — how can such large amounts of data continue to be served efficiently enough to satisfy end users? This dissertation investigates several issues regarding tail latency in large-scale information retrieval systems. Tail latency corresponds to the high percentile latency that is observed from a system — in the case of search, this latency typically corresponds to how long it takes for a query to be processed. In particular, keeping tail latency as low as possible translates to a good experience for all users, as tail latency is directly related to the worst-case latency and hence, the worst possible user experience. The key idea in targeting tail latency is to move from questions such as “what is the median latency of our search engine?” to questions which more accurately capture user experience such as “how many queries take more than 200ms to return answers?” or “what is the worst case latency that a user may be subject to, and how often might it occur?” While various strategies exist for efficiently processing queries over large textual corpora, prior research has focused almost entirely on improvements to the average processing time or cost of search systems. As a first contribution, we examine some state-of-the-art retrieval algorithms for two popular index organizations, and discuss the trade-offs between them, paying special attention to the notion of tail latency. This research uncovers a number of observations that are subsequently leveraged for improved search efficiency and effectiveness. We then propose and solve a new problem, which involves processing a number of related queries together, known as multi-queries, to yield higher quality search results. We experiment with a number of algorithmic approaches to efficiently process these multi-queries, and report on the cost, efficiency, and effectiveness trade-offs present with each. Ultimately, we find that some solutions yield a low tail latency, and are hence suitable for use in real-time search environments. Finally, we examine how predictive models can be used to improve the tail latency and end-to- end cost of a commonly used multi-stage retrieval architecture without impacting result effectiveness. By combining ideas from numerous areas of information retrieval, we propose a prediction framework which can be used for training and evaluating several efficiency/effectiveness trade-off parameters, resulting in improved trade-offs between cost, result quality, and tail latency. v Not everything that can be counted counts, and not everything that counts can be counted. W. B. Cameron. vi ACKNOWLEDGEMENTS During my candidature, I have had the pleasure of working with and among a number of incredible individuals who have helped shape my academic and personal growth. Firstly, I would like to thank my advisors, Shane and Falk. I first met Shane when I was an undergraduate student — he was lecturing both the Operating Systems Principles and Algorithms and Analysis courses. This was when I became fascinated with the elegance of clever algorithms, system design, and efficiency. Falk introduced me to the fundamentals of search during the Information Retrieval course, while I was undertaking my honours project on efficient location- aware web search. Since then, Shane and Falk have provided me with numerous opportunities that have developed my skills as a computer scientist, an academic, and as a person. I will be forever grateful to them. Luke Gallagher has been an immense support over the last few years, assisting me in both professional and personal capacities — he even proofread this thesis! I am extremely grateful to him. I am also thankful to Rodger Benham, Sheng Wang, Matt Crane, and Matthias Petri, who always had time for useful discussions, ideas, and repartee, either over coffee, beer, and occasionally, email. There are many other colleagues, friends, and mentors from RMIT, the University of Melbourne, and other institutions around the world who have supported and encouraged me to become a better academic. I am indebted to you all. Milad Shokouhi and Ahmed Hassan Awadallah allowed me to spend three months at Microsoft Research during 2018. It was a pleasure working on interesting problems outside of the efficiency space, and I am extremely grateful to them for taking me on as an intern. Craig Macdonald (University of Glasgow) and Simon Puglisi (University of Helsinki) allowed me to visit and present my research to their groups. Each was a wonderful experience, and I look forward to visiting them again in the future. To all of my friends — thank you for your support during my candidature. The many coffees, beers, barbecues, bike adventures, disc golf outings, camping trips, weightlifting sessions, and hikes helped reinvigorate me and kept me motivated. Finally, I dedicate this thesis to my parents, Sandra and Jock, my sister, Chloe, and my partner, Mikaela. Without your endless love, support, patience, and encouragement, none of this would have been possible. I will look back on the years of my Ph.D as some of the best years of my life. vii LIST OF PUBLICATIONS PUBLICATIONS INCLUDED IN THIS DISSERTATION R. Benham, J. Mackenzie, A. Moffat, and J. S. Culpepper: Boosting Search Performance Using Query Variations. In ACM Transactions on Information Systems (TOIS), October, 2019. J. Mackenzie, J. S. Culpepper, R. Blanco, M. Crane, C. L. A. Clarke, and J. Lin: Query Driven Algorithm Selection in Early Stage Retrieval. In Proceedings of the 11th International ACM Conference on Web Search and Data Mining (WSDM 2018), February, 2018. J. Mackenzie, F. Scholer, and J. S. Culpepper: Early Termination Heuristics for Score-at-a- Time Index Traversal. In Proceedings of the 22nd Annual Australasian Document Computing Symposium (ADCS 2017), December, 2017. M. Crane, J. S. Culpepper, J. Lin, J. Mackenzie, and A. Trotman: A Comparison of Document- at-a-Time and Score-at-a-Time Query Evaluation. In Proceedings of the 10th International ACM Conference on Web Search and Data Mining (WSDM 2017), February, 2017. ADDITIONAL PUBLICATIONS A. Mallia, M. Siedlaczek, J. Mackenzie, and T. Suel: PISA: Performant Indexes and Search for Academia. In Proceedings of the Open-Source IR Replicability Challenge, co-located with SIGIR (OSIRRC 2019), July, 2019. M. Petri, A. Moffat, J. Mackenzie, J. S. Culpepper, and D. Beck: Accelerated Query Processing Via Similarity Score Prediction. In Proceedings of the 42nd International ACM Conference on Research and Development in Information Retrieval (SIGIR 2019), July, 2019. J. Mackenzie, K. Gupta, F. Qiao, A. H. Awadallah, and M. Shokouhi: Exploring User Behavior in Email Re-Finding Tasks. In Proceedings of the 2019 World Wide Web Conference (WWW 2019), May, 2019. ix x F LIST OF PUBLICATIONS J. Mackenzie, A. Mallia, M. Petri, J. S. Culpepper, and T. Suel: Compressing Inverted Indexes with Recursive Graph Bisection: A Reproducibility Study. In Proceedings of the 41st European Conference on Information Retrieval (ECIR 2019), April, 2019. L. Gallagher, J. Mackenzie, and J. S. Culpepper: Revisiting Spam Filtering in Web Search. In Proceedings of the 23rd Annual Australasian Document Computing Symposium (ADCS 2018), December, 2018. R. Benham, J. S. Culpepper, L. Gallagher, X. Lu, and J. Mackenzie : Towards Efficient and Effective Query Variant Generation. In Proceedings of the First Biennial Conference on Design of Experimental Search and Information Retrieval Systems (DESIRES 2018), August, 2018. J. Mackenzie, C. Macdonald, F. Scholer, and J. S. Culpepper: On the Cost of Negation for Dynamic Pruning. In Proceedings of the 40th European Conference on Information Retrieval (ECIR 2018), March, 2018. J. Mackenzie: Managing Tail Latencies in Large Scale IR Systems (Doctoral Consortium Extended Abstract) In Proceedings of the 40th International ACM Conference on Research and Development in Information Retrieval (SIGIR 2017), August, 2017. LIST OF FIGURES 1.1 An example of how tail latency is measured ......................... 2 2.1 Growth over the first decade of the web ........................... 8 2.2 An example of a search engine results page ......................... 9 2.3 Processing architecture of a large-scale index ........................ 20 2.4 Multi-stage retrieval matching and ranking ......................... 21 3.1 Basic architecture of a web search engine ........................... 24 3.2 Organization of a document-ordered inverted index ................... 26 3.3 Organization of a frequency-ordered inverted index ................... 28 3.4 State-of-the-art compression, adapted from [198] ....................

Managing Tail Latency in Large Scale Information Retrieval Systems

Microsoft and Cray to Unveil $25,000 Windows-Based Supercomputer

The Fourth Paradigm

2017 PDL Fall Update

The Computer Games Journal Ltd Registered Company No

Microsoft Corp. BUY Company Update : Enterprise Software

What Is Kodu?

Microsoft and Cray to Unveil $25,000 Windows-Based Supercomputer

To Save Everything, Click Here

Teaching Kodu with Physical Manipulatives

2018 Spring Newsletter

Assembly Passes Budgets of Eight New Govt Bodies

Expressing Computer Science Concepts Through Kodu Game Lab