Criticality Assessments for Improving Algorithmic Robustness Thomas B

Total Page:16

File Type:pdf, Size:1020Kb

Criticality Assessments for Improving Algorithmic Robustness Thomas B University of New Mexico UNM Digital Repository Computer Science ETDs Engineering ETDs Fall 11-15-2018 Criticality Assessments for Improving Algorithmic Robustness Thomas B. Jones University of New Mexico Follow this and additional works at: https://digitalrepository.unm.edu/cs_etds Part of the Computer and Systems Architecture Commons, Hardware Systems Commons, Other Computer Engineering Commons, Software Engineering Commons, Systems Architecture Commons, and the Theory and Algorithms Commons Recommended Citation Jones, Thomas B.. "Criticality Assessments for Improving Algorithmic Robustness." (2018). https://digitalrepository.unm.edu/ cs_etds/96 This Dissertation is brought to you for free and open access by the Engineering ETDs at UNM Digital Repository. It has been accepted for inclusion in Computer Science ETDs by an authorized administrator of UNM Digital Repository. For more information, please contact [email protected]. i Thomas Berry Jones Candidate Computer Science Department This dissertation is approved, and it is acceptable in quality and form for publication: Approved by the Dissertation Committee: David Ackley , Chairperson Darko Stefanovic George Luger Thomas Caudell Lance Williams ii Criticality Assessments for Improving Algorithmic Robustness by Thomas Berry Jones B.S., Computer Engineering and Mathematics, Gonzaga University, 2010 M.S., Computer Science, University of New Mexico, 2013 DISSERTATION Submitted in Partial Fulfillment of the Requirements for the Degree of Doctorate of Philosophy Computer Science The University of New Mexico Albuquerque, New Mexico December, 2018 iii Dedication To my family, Bryan, Colleen, Tim, and Jon, for their love and support and to my nephew and niece Colby and Olivia. A Vehicle, Heinsight, and Carbon on Paper iv Acknowledgments I would first like to thank my friends, especially James Orrell, Kyle Wilson, and Evan Poncelet who have, at various times, encouraged me to continue on this path. I don't think I would be here today without them. I would also like to thank my mother, Colleen Jones, who graciously helped with copy-editing the final document on very short notice. Next, I want to thank my committee members, who have all been very kind and have had great suggestions for improving the work. Dr. George Luger has provided immense intellectual, moral, and material assistance during this process. He has always supported my work and has helped me greatly with my career. I am sure that I would not have finished without his assistance. Dr. Darko Stefanovic was very gracious to join my committee with little notice and has taught me a great deal about programming languages. Additionally, I want to thank Dr. Lance Williams who has always inspired some of the most interesting ideas when I've spoken with him. His suggestions concerning failure redistribution methods were spot on. I would also like to thank Dr. Thomas Caudell who has been very benevolent in looking after the work of this wayward graduate student after his retirement. Finally, I would like to thank my advisor, Dave Ackley, for putting up with me all these years. I have learned more from him than from anyone else. v Criticality Assessments for Improving Algorithmic Robustness by Thomas Berry Jones B.S., Computer Engineering and Mathematics, Gonzaga University, 2010 M.S., Computer Science, University of New Mexico, 2013 Ph.D., Computer Science, University of New Mexico, 2018 Abstract Though computational models typically assume all program steps execute flaw- lessly, that does not imply all steps are equally important if a failure should occur. In the \Constrained Reliability Allocation" problem, sufficient resources are guaran- teed for operations that prompt eventual program termination on failure, but those operations that only cause output errors are given a limited budget of some vital resource, insufficient to ensure correct operation for each of them. In this dissertation, I present a novel representation of failures based on a combi- nation of their timing and location combined with criticality assessments|a method used to predict the behavior of systems operating outside their design criteria. I ob- serve that strictly correct error measures hide interesting failure relationships, failure importance is often determined by failure timing, and recursion plays an important role in structuring output error. I employ these observations to improve the out- put error of two matrix multiplication methods through an economization procedure vi that moves failures from worse to better locations, thus providing a possible solu- tion to the constrained reliability allocation problem. I show a 38% to 63% decrease in absolute value error on matrix multiplication algorithms, despite nearly identical failure counts between control and experimental studies. Finally, I show that efficient sorting algorithms are less robust at large scale than less efficient sorting algorithms. vii Contents List of Figures xiii List of Tables xxii Glossary xxiii 1 Introduction 1 1.1 Background . .1 1.1.1 From Fault Tolerance to Approximate Computing . .3 1.1.2 Best-Effort Computing . .4 1.2 Problem Statement . .6 1.3 Conceptual Framework . .7 1.4 Research Questions and Hypotheses . 10 1.4.1 How do Measures of Correctness Hide or Uncover Interesting Failure Dynamics Inside Algorithms? . 10 1.4.2 How Are an Algorithm's Failure Dynamics Related to the Al- gorithm's Known Behavior? . 11 Contents viii 1.4.3 Can Failure Dynamics Analysis Improve Algorithmic Perfor- mance in Resource Constrained Environments? . 11 1.4.4 Which Operations Respond Well to This Method? . 12 1.5 Procedures . 13 1.6 Thesis Outline . 14 2 Related Work 15 2.1 Criticality . 15 2.2 Fault Tolerance . 17 2.3 Fault Injection . 19 2.4 Quantified Correctness . 21 2.5 Approximate Algorithms and Computing . 23 2.6 Error Economization . 26 2.7 Theories of Robustness . 27 3 The Model 29 3.1 Author Contribution Statement . 29 3.2 Publication Notes . 29 3.3 Introduction . 30 3.4 Criticality Assessments . 31 3.5 Error Measures . 34 Contents ix 3.6 Mathematical Abstractions of Space and Time . 35 3.7 Criticality Defined . 37 3.8 Criticality Explorer . 40 3.9 Failure Shaping . 43 4 Sorting Algorithm Criticality 47 4.1 Author Contribution Statement . 47 4.2 Publication Notes . 47 4.3 Introduction . 48 4.3.1 Measuring Sortedness . 50 4.4 Results . 51 5 Criticality and Armoring in Matrix Multiplication 58 5.1 Author Contribution Statement . 58 5.2 Publication Notes . 58 5.3 Introduction . 59 5.4 Failure Shaping on Scalar Multiplication . 60 5.4.1 Criticality Assessment Results on Scalar Multiplication . 62 5.4.2 Failure Shaping Results on Scalar Multiplication . 62 5.5 Failure Shaping Matrix Multiplication . 63 5.5.1 Criticality Assessment Results on Matrix Multiplication . 64 Contents x 5.5.2 Failure Shaping Results on Matrix Multiplication . 65 5.6 Proxy Failure Shaping . 66 5.6.1 Criticality and Failure Shaping Results on Scalar Multiplica- tion Proxy Method . 66 6 Scalable Robustness 97 6.1 Author Contribution Statement . 97 6.2 Publication Notes . 97 6.3 Introduction . 98 6.4 Scalable Robustness Defined . 99 6.4.1 Average-Case Scalable Robustness . 99 6.4.2 Worst-Case Scalable Robustness . 102 6.5 Scalable Robustness on Pairwise-Comparison Sorting . 104 6.5.1 Algorithms Old and New . 105 6.6 Average-Case Scalable Robustness on Sorting Algorithms . 106 6.7 Worst-Case Scalable Robustness on the Sorting Algorithms . 111 6.7.1 Quicksort|Spearman's Footrule Error . 114 6.7.2 Bubble Sort|Spearman's Footrule Error . 114 6.7.3 Round Robin Sort|Spearman's Footrule Error . 115 7 Discussion 120 7.1 Summary of Work . 120 Contents xi 7.2 Questions Answered . 121 7.2.1 Strict Correctness Hides Criticality Structure While Quantified Correctness Reveals It . 122 7.2.2 Criticality Structure Seems Related to Known Algorithmic Be- haviors . 123 7.2.3 Algorithmic Performance in Resource-Constrained Environments Is Improved Through Failure Dynamic Analysis . 127 7.2.4 Bulk Program Operations Make Good Choice for Failure Shaping131 7.2.5 Numerical Stability . 131 7.2.6 When Will Criticality Structures Scale Effectively? . 132 7.3 Limitations of the Study . 132 7.4 Future Work . 135 7.4.1 Expanded Empirical Studies . 135 7.4.2 Scaling Through Search . 135 7.4.3 Machine Learning and Attribution . 136 7.4.4 Hardware Failure Studies . 138 7.4.5 Spatial Graphs for Coordinated Failures . 138 7.4.6 Stochastic and Continuous Failure Redistribution Policies . 138 7.4.7 Scalable Robustness Beyond Sorting . 139 7.5 Significance . 139 7.6 Conclusion . 141 Contents xii A Permission to Reproduce Previously Published Content 143 A.1 IEEE Reproduction Instructions . 145 A.2 Springer Nature License Terms and Conditions . 148 References 154 xiii List of Figures 3.1 Criticality Assessment and Failure Shaping Overview. Criti- cality Explorer takes a prepared algorithm with error measures, input generator, and annotated method level failure interfaces shown on the left. After a calibration step to evaluate the maximum failure point count, and samples per failure point, the tool then estimates failure point criticalities through a Monte Carlo simulation. Each sample produces the output of a single run with, and without, a failure in- jected on the a given failure point and scores the error between the two runs. The errorful inputs, outputs, and absolute difference score can be found in the bottom center of the figure. Failures are then shaped using the median criticality and error scores are produced for average i.i.d. as well as shaped failures. See Section 3.8 for details. Figure reprinted from [1] with permission. 46 4.1 Strict Correctness Criticality on Sorting Algorithms Extremal values dominate in a plot of strict correctness criticality (y axis) vs. the comparisons executed during a sort (`Comparison Index'; x axis): Most faults are either critical or not critical. See Section 4.4 for de- tails. Figure reprinted from [2] with permission.
Recommended publications
  • Low-Power Computing
    LOW-POWER COMPUTING Safa Alsalman a, Raihan ur Rasool b, Rizwan Mian c a King Faisal University, Alhsa, Saudi Arabia b Victoria University, Melbourne, Australia c Data++, Toronto, Canada Corresponding email: [email protected] Abstract With the abundance of mobile electronics, demand for Low-Power Computing is pressing more than ever. Achieving a reduction in power consumption requires gigantic efforts from electrical and electronic engineers, computer scientists and software developers. During the past decade, various techniques and methodologies for designing low-power solutions have been proposed. These methods are aimed at small mobile devices as well as large datacenters. There are techniques that consider design paradigms and techniques including run-time issues. This paper summarizes the main approaches adopted by the IT community to promote Low-power computing. Keywords: Computing, Energy-efficient, Low power, Power-efficiency. Introduction In the past two decades, technology has evolved rapidly affecting every aspect of our daily lives. The fact that we study, work, communicate and entertain ourselves using all different types of devices and gadgets is an evidence that technology is a revolutionary event in the human history. The technology is here to stay but with big responsibilities comes bigger challenges. As digital devices shrink in size and become more portable, power consumption and energy efficiency become a critical issue. On one end, circuits designed for portable devices must target increasing battery life. On the other end, the more complex high-end circuits available at data centers have to consider power costs, cooling requirements, and reliability issues. Low-power computing is the field dedicated to the design and manufacturing of low-power consumption circuits, programming power-aware software and applying techniques for power-efficiency [1][2].
    [Show full text]
  • HOW MODERATE-SIZED RISC-BASED Smps CAN OUTPERFORM MUCH LARGER DISTRIBUTED MEMORY Mpps
    HOW MODERATE-SIZED RISC-BASED SMPs CAN OUTPERFORM MUCH LARGER DISTRIBUTED MEMORY MPPs D. M. Pressel Walter B. Sturek Corporate Information and Computing Center Corporate Information and Computing Center U.S. Army Research Laboratory U.S. Army Research Laboratory Aberdeen Proving Ground, Maryland 21005-5067 Aberdeen Proving Ground, Maryland 21005-5067 Email: [email protected] Email: sturek@arlmil J. Sahu K. R. Heavey Weapons and Materials Research Directorate Weapons and Materials Research Directorate U.S. Army Research Laboratory U.S. Army Research Laboratory Aberdeen Proving Ground, Maryland 21005-5066 Aberdeen Proving Ground, Maryland 21005-5066 Email: [email protected] Email: [email protected] ABSTRACT: Historically, comparison of computer systems was based primarily on theoretical peak performance. Today, based on delivered levels of performance, comparisons are frequently used. This of course raises a whole host of questions about how to do this. However, even this approach has a fundamental problem. It assumes that all FLOPS are of equal value. As long as one is only using either vector or large distributed memory MIMD MPPs, this is probably a reasonable assumption. However, when comparing the algorithms of choice used on these two classes of platforms, one frequently finds a significant difference in the number of FLOPS required to obtain a solution with the desired level of precision. While troubling, this dichotomy has been largely unavoidable. Recent advances involving moderate-sized RISC-based SMPs have allowed us to solve this problem.
    [Show full text]
  • Introduction to Analysis of Algorithms Introduction to Sorting
    Introduction to Analysis of Algorithms Introduction to Sorting CS 311 Data Structures and Algorithms Lecture Slides Monday, February 23, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks [email protected] © 2005–2009 Glenn G. Chappell Unit Overview Recursion & Searching Major Topics • Introduction to Recursion • Search Algorithms • Recursion vs. NIterationE • EliminatingDO Recursion • Recursive Search with Backtracking 23 Feb 2009 CS 311 Spring 2009 2 Unit Overview Algorithmic Efficiency & Sorting We now begin a unit on algorithmic efficiency & sorting algorithms. Major Topics • Introduction to Analysis of Algorithms • Introduction to Sorting • Comparison Sorts I • More on Big-O • The Limits of Sorting • Divide-and-Conquer • Comparison Sorts II • Comparison Sorts III • Radix Sort • Sorting in the C++ STL We will (partly) follow the text. • Efficiency and sorting are in Chapter 9. After this unit will be the in-class Midterm Exam. 23 Feb 2009 CS 311 Spring 2009 3 Introduction to Analysis of Algorithms Efficiency [1/3] What do we mean by an “efficient” algorithm? • We mean an algorithm that uses few resources . • By far the most important resource is time . • Thus, when we say an algorithm is efficient , assuming we do not qualify this further , we mean that it can be executed quickly . How do we determine whether an algorithm is efficient? • Implement it, and run the result on some computer? • But the speed of computers is not fixed. • And there are differences in compilers, etc. Is there some way to measure efficiency that does not depend on the system chosen or the current state of technology? 23 Feb 2009 CS 311 Spring 2009 4 Introduction to Analysis of Algorithms Efficiency [2/3] Is there some way to measure efficiency that does not depend on the system chosen or the current state of technology? • Yes! Rough Idea • Divide the tasks an algorithm performs into “steps”.
    [Show full text]
  • Software Design Pattern Using : Algorithmic Skeleton Approach S
    International Journal of Computer Network and Security (IJCNS) Vol. 3 No. 1 ISSN : 0975-8283 Software Design Pattern Using : Algorithmic Skeleton Approach S. Sarika Lecturer,Department of Computer Sciene and Engineering Sathyabama University, Chennai-119. [email protected] Abstract - In software engineering, a design pattern is a general reusable solution to a commonly occurring problem in software design. A design pattern is not a finished design that can be transformed directly into code. It is a description or template for how to solve a problem that can be used in many different situations. Object-oriented design patterns typically show relationships and interactions between classes or objects, without specifying the final application classes or objects that are involved. Many patterns imply object-orientation or more generally mutable state, and so may not be as applicable in functional programming languages, in which data is immutable Fig 1. Primary design pattern or treated as such..Not all software patterns are design patterns. For instance, algorithms solve 1.1Attributes related to design : computational problems rather than software design Expandability : The degree to which the design of a problems.In this paper we try to focous the system can be extended. algorithimic pattern for good software design. Simplicity : The degree to which the design of a Key words - Software Design, Design Patterns, system can be understood easily. Software esuability,Alogrithmic pattern. Reusability : The degree to which a piece of design can be reused in another design. 1. INTRODUCTION Many studies in the literature (including some by 1.2 Attributes related to implementation: these authors) have for premise that design patterns Learn ability : The degree to which the code source of improve the quality of object-oriented software systems, a system is easy to learn.
    [Show full text]
  • Curtis Huttenhower Associate Professor of Computational Biology
    Curtis Huttenhower Associate Professor of Computational Biology and Bioinformatics Department of Biostatistics, Chan School of Public Health, Harvard University 655 Huntington Avenue • Boston, MA 02115 • 617-432-4912 • [email protected] Academic Appointments July 2009 - Present Department of Biostatistics, Harvard T.H. Chan School of Public Health April 2013 - Present Associate Professor of Computational Biology and Bioinformatics July 2009 - March 2013 Assistant Professor of Computational Biology and Bioinformatics Education November 2008 - June 2009 Lewis-Sigler Institute for Integrative Genomics, Princeton University Supervisor: Dr. Olga Troyanskaya Postdoctoral Researcher August 2004 - November 2008 Computer Science Department, Princeton University Adviser: Dr. Olga Troyanskaya Ph.D. in Computer Science, November 2008; M.A., June 2006 August 2002 - May 2004 Language Technologies Institute, Carnegie Mellon University Adviser: Dr. Eric Nyberg M.S. in Language Technologies, December 2003 August 1998 - November 2000 Rose-Hulman Institute of Technology B.S. summa cum laude, November 2000 Majored in Computer Science, Chemistry, and Math; Minored in Spanish August 1996 - May 1998 Simon's Rock College of Bard A.A., May 1998 Awards, Honors, and Scholarships • ISCB Overton PriZe (Harvard Chan School, 2015) • eLife Sponsored Presentation Series early career award (Harvard Chan School, 2014) • Presidential Early Career Award for Scientists and Engineers (Harvard Chan School, 2012) • NSF CAREER award (Harvard Chan School, 2010) • Quantitative
    [Show full text]
  • ALGORITHMIC EFFICIENCY INDICATOR for the OPTIMIZATION of ROUTE SIZE Ciro Rodríguez National University Mayor De San Marcos
    3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Ed. 34 Vol. 9 N.º 2 Junio - Septiembre ALGORITHMIC EFFICIENCY INDICATOR FOR THE OPTIMIZATION OF ROUTE SIZE Ciro Rodríguez National University Mayor de San Marcos. Lima, (Perú). E-mail: [email protected] ORCID: https://orcid.org/0000-0003-2112-1349 Miguel Sifuentes National University Mayor de San Marcos. Lima, (Perú). E-mail: [email protected] ORCID: https://orcid.org/0000-0002-0178-0059 Freddy Kaseng National University Federico Villarreal. Lima, (Perú). E-mail: [email protected] ORCID: https://orcid.org/0000-0002-2878-9053 Pedro Lezama National University Federico Villarreal. Lima, (Perú). E-mail: [email protected] ORCID: https://orcid.org/0000-0001-9693-0138 Recepción: 06/02/2020 Aceptación: 23/03/2020 Publicación: 15/06/2020 Citación sugerida: Rodríguez, C., Sifuentes, M., Kaseng, F., y Lezama, P. (2020). Algorithmic efficiency indicator for the optimization of route size. 3C Tecnología. Glosas de innovación aplicadas a la pyme, 9(2), 49-69. http://doi.org/10.17993/3ctecno/2020. v9n2e34.49-69 49 3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254 – 4143 Ed. 34 Vol. 9 N.º 2 Junio - Septiembre ABSTRACT In software development, sometimes experienced programmers have difficulty determining before performing their tests, which algorithm will work best to solve a particular problem, and the answer to the question will always depend on the type of problem and nature of the data to be used, in this situation, it is necessary to identify which indicator is the most useful for a specific type of problem and especially in route optimization.
    [Show full text]
  • Epstein Institute Seminar Ise
    EPSTEIN INSTITUTE SEMINAR ▪ ISE 651 The Many Faces of Regularization: from Signal Recovery to Online Algorithms ABSTRACT - Regularization plays important and distinct roles in optimization. In the first part of the talk, we consider sample- efficient recovery of signals with low-dimensional structure, where the key is choosing the right regularizer. While regularizers are well- understood for signals with a single structure (l1 norm for sparsity, nuclear norm for low rank matrices, etc.), the situation is more complicated when the signal has several structures simultaneously (e.g., sparse phase retrieval, sparse PCA). A common approach is to combine the regularizers that are sample-efficient for each individual structure. We present an analysis that challenges this approach: we prove it can be highly suboptimal in the number of samples, thus new regularizers are needed for sample-efficient recovery. Regularization is also employed for algorithmic efficiency. In the second part of the talk, we consider online resource allocation: sequentially allocating resources in response to online demands, with resource constraints that couple the decisions across time. Such problems arise in operations research (revenue management), computer science (online packing & covering), and e-commerce (?Adwords? problem in internet advertising). We examine primal- dual algorithms with a focus on the (worst-case) competitive ratio. We show how certain regularization (or smoothing) can improve this ratio, and how to seek the optimal regularization by solving a Dr. Maryam Fazel convex problem. This approach allows us to design effective Associate Professor regularization customized for a given cost function. The framework Department of Electrical and also extends to certain semidefinite programs, such as online D- Computer Engineering optimal and A-optimal experiment design problems.
    [Show full text]
  • Science Journals — AAAS
    RESEARCH ◥ and sporadically and must ultimately face di- REVIEW SUMMARY minishing returns. As such, we see the big- gest benefits coming from algorithms for new COMPUTER SCIENCE problem domains (e.g., machine learning) and from developing new theoretical machine There’s plenty of room at the Top: What will drive models that better reflect emerging hardware. ◥ Hardware architectures computer performance after Moore’s law? ON OUR WEBSITE can be streamlined—for Read the full article instance, through proces- Charles E. Leiserson, Neil C. Thompson*, Joel S. Emer, Bradley C. Kuszmaul, Butler W. Lampson, at https://dx.doi. sor simplification, where Daniel Sanchez, Tao B. Schardl org/10.1126/ a complex processing core science.aam9744 is replaced with a simpler .................................................. core that requires fewer BACKGROUND: Improvements in computing in computing power stalls, practically all in- transistors. The freed-up transistor budget can power can claim a large share of the credit for dustries will face challenges to their produc- then be redeployed in other ways—for example, manyofthethingsthatwetakeforgranted tivity. Nevertheless, opportunities for growth by increasing the number of processor cores in our modern lives: cellphones that are more in computing performance will still be avail- running in parallel, which can lead to large powerful than room-sized computers from able, especially at the “Top” of the computing- efficiency gains for problems that can exploit 25 years ago, internet access for nearly half technology stack: software, algorithms, and parallelism. Another form of streamlining is the world, and drug discoveries enabled by hardware architecture. domain specialization, where hardware is cus- powerful supercomputers. Society has come tomized for a particular application domain.
    [Show full text]
  • Understanding and Improving Graph Algorithm Performance
    Understanding and Improving Graph Algorithm Performance Scott Beamer Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-153 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-153.html September 8, 2016 Copyright © 2016, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Understanding and Improving Graph Algorithm Performance by Scott Beamer III A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Krste Asanovi´c,Co-chair Professor David Patterson, Co-chair Professor James Demmel Professor Dorit Hochbaum Fall 2016 Understanding and Improving Graph Algorithm Performance Copyright 2016 by Scott Beamer III 1 Abstract Understanding and Improving Graph Algorithm Performance by Scott Beamer III Doctor of Philosophy in Computer Science University of California, Berkeley Professor Krste Asanovi´c,Co-chair Professor David Patterson, Co-chair Graph processing is experiencing a surge of renewed interest as applications in social networks and their analysis have grown in importance. Additionally, graph algorithms have found new applications in speech recognition and the sciences. In order to deliver the full potential of these emerging applications, graph processing must become substantially more efficient, as graph processing's communication-intensive nature often results in low arithmetic intensity that underutilizes available hardware platforms.
    [Show full text]
  • Efficient Main-Memory Top-K Selection for Multicore Architectures
    Efficient Main-Memory Top-K Selection For Multicore Architectures Vasileios Zois Vassilis J. Tsotras Walid A. Najjar University of California, University of California, University of California, Riverside Riverside Riverside [email protected] [email protected] [email protected] ABSTRACT to, simple selection [15, 29, 18, 22], Top- k group-by aggre- Efficient Top- k query evaluation relies on practices that uti- gation [28], Top- k join [31, 23], as well as Top- k query varia- lize auxiliary data structures to enable early termination. tions on different applications (spatial, spatiotemporal, web Such techniques were designed to trade-off complex work in search, etc.) [7, 11, 8, 26, 1, 5]. All variants concentrate on the buffer pool against costly access to disk-resident data. minimizing the number of object evaluations while main- Parallel in-memory Top- k selection with support for early taining efficient data access. In relational database manage- termination presents a novel challenge because computa- ment systems (RDBMS), the most prominent solutions for tion shifts higher up in the memory hierarchy. In this en- Top- k selection fit into three categories: (i) using sorted- vironment, data scan methods using SIMD instructions and lists [15], (ii) utilizing materialized views [22], and (iii) ap- multithreading perform well despite requiring evaluation of plying layered-based methods (convex-hull [6], skyline [27]). the complete dataset. Early termination schemes that favor Efficient Top- k query processing entails minimizing object simplicity require random access to resolve score ambiguity evaluations through candidate pruning [29] and early termi- while those optimized for sequential access incur too many nation [15, 2].
    [Show full text]
  • Chapter 3: Searching and Sorting, Algorithmic Efficiency
    1 Chapter 3: Searching and Sorting, Algorithmic E±ciency Search Algorithms Problem: Would like an e±cient algorithm to ¯nd names, numbers etc. in lists/records (could involve DATA TYPES). Basic Search: Unsorted Lists Purpose: To ¯nd a name in an unsorted list by comparing one at a time until the name is found or the end of the array is reached. See `Example 4' in the example programs for a basic program to do this. Searching through data types requires some di®erent syntax... Example: Search through data types for a record corresponding to name `Zelda' Suppose each name and number is stored in a data type type person character(len=30)::name integer::number end type person We declare an array of person(s) type(person)::directory(1000) Then search for `Zelda'. Do i=1,1000 if (directory(i)%name=='Zelda') then print *,directory(i)%number end if End do This algorithm is ¯ne for an unsorted list of names. However, if we know that the list has been stored in some order (i.e. it is sorted) then there are faster methods. 2 Binary Search: Sorted Lists Purpose: To search through an ordered array more quickly. (Note that <; >=; == etc. can be used with character strings to compare alphabetical order). Algorithm: 1. Data: The name to be searched for is placed in a alphabetically ordered array `Namearray' that is ¯lled from data ¯les. 2. Set Low=1 Low, High integers Set High=N where N is Size(Namearray) 3. Compare Name with Namearray(Low) and Namearray(High). If there is a match, then exit.
    [Show full text]
  • On the Nature of the Theory of Computation (Toc)∗
    Electronic Colloquium on Computational Complexity, Report No. 72 (2018) On the nature of the Theory of Computation (ToC)∗ Avi Wigderson April 19, 2018 Abstract [This paper is a (self contained) chapter in a new book on computational complexity theory, called Mathematics and Computation, available at https://www.math.ias.edu/avi/book]. I attempt to give here a panoramic view of the Theory of Computation, that demonstrates its place as a revolutionary, disruptive science, and as a central, independent intellectual discipline. I discuss many aspects of the field, mainly academic but also cultural and social. The paper details of the rapid expansion of interactions ToC has with all sciences, mathematics and philosophy. I try to articulate how these connections naturally emanate from the intrinsic investigation of the notion of computation itself, and from the methodology of the field. These interactions follow the other, fundamental role that ToC played, and continues to play in the amazing developments of computer technology. I discuss some of the main intellectual goals and challenges of the field, which, together with the ubiquity of computation across human inquiry makes its study and understanding in the future at least as exciting and important as in the past. This fantastic growth in the scope of ToC brings significant challenges regarding its internal orga- nization, facilitating future interactions between its diverse parts and maintaining cohesion of the field around its core mission of understanding computation. Deliberate and thoughtful adaptation and growth of the field, and of the infrastructure of its community are surely required and such discussions are indeed underway.
    [Show full text]