Dissertation zur Erlangung des Doktorgrades der Fakultät für Angewandte Wissenschaften der Albert-Ludwigs-Universität Freiburg im Breisgau

Potentials and Limitations of Visual Methods for the Exploration of Complex Data Structures

Tobias Lauer

Betreuer: Prof. Dr. Thomas Ottmann

Dekan der Fakultät für Angewandte Wissenschaften: Prof. Dr. Bernhard Nebel

1. Gutachter: Prof. Dr. Thomas Ottmann, Universität Freiburg 2. Gutachter: Prof. Dr. Amitava Datta, University of Western Australia

Tag der Disputation: 29.01.2007

Zusammenfassung

Sowohl bei der Analyse von Algorithmen und Datenstrukturen als auch bei deren Vermittlung in der Lehre werden häufig Visualisierungen eingesetzt. Im letzteren Fall werden diese meist zur Verdeutlichung von Sachverhalten oder für Beispiele verwendet, wobei die Visualisierungen in der Regel von den Lehrenden vorbereitet und den Lernenden präsentiert werden. Handelt es sich bei den Visualisierungen um interaktive Animationen oder Simulationen, so können diese auch dazu verwendet werden, den Lernenden ein selbständiges Explorieren der Datenstrukturen und Algorithmen zu ermöglichen, bei dem sie den Input der Algorithmen frei wählen und damit die Ergebnisse manipulieren können. Auf diese Weise erhofft man sich ein tiefer gehendes Verständnis und damit einen höheren Lernerfolg. In ähnlicher Weise können interaktive Visualisierungen auch in der Forschung bei der Analyse komplexer Datenstrukturen bzw. den darauf ausgeführten Operationen hilfreich sein. Für eine formale mathematische Analyse ist in der Regel der richtige Ansatz entscheidend, für den eine gewisse Grundvorstellung bzw. eine Beweisidee notwendig ist. Hier kann eine geeignete visuelle Repräsentation der Datenstruktur sowie der Dynamik der zu analysierenden Algorithmen eine wertvolle Unterstützung sein. Diese Arbeit verfolgt zwei Ziele. Zunächst wird am Beispiel einer relativ neuen und weniger bekannten Datenstruktur, Priority Search Pennants, aufgezeigt, wie mit Hilfe von Visualisierungen Algorithmen für komplexere geometrische Suchanfragen auf Mengen von Punkte in der Ebene analysiert werden können. Priority Search Pennants ähneln den bekannteren Prioritätssuchbäumen und sind insbesondere deshalb interessant, weil in ihnen die in einer Prioritätswarteschlange sonst übliche - Ordung nur in einer abgeschwächten Form gilt. Dadurch ist diese Struktur einfacher zu handhaben, wenn die Menge der verwalteten Elemente dynamisch ist, also Punkte eingefügt und entfernt werden und die darunter liegende Baumstruktur rebalanciert werden muss. Allerdings unterstützt die Datenstruktur bestimmte Bereichanfragen an die Punktmenge in einer schlechteren asymptotischen Laufzeit als Prioritätssuchbäume. Unklar war bisher, ob dies auch für andere typische Arten von Bereichsanfragen gilt. Wir analysieren die Komplexität solcher Anfragen, die in einem gegebenen rechteckigen Anfragebereich den am weitesten rechts, links bzw. unten liegenden Punkt ermitteln, und weisen nach, dass diese dieselbe asymptotische Laufzeit aufweisen wie in Prioritätssuchbäumen. Darüber hinaus wird in Abhängigkeit von der Höhe des unterliegenden Baums eine scharfe obere und untere Grenze für die Suchpfadlänge im worst case ermittelt. iv

Zu den Anwendungen für Bereichsanfragen gehören unter anderem IP-Routertabellen, in denen beispielsweise für die Zieladresse eines ankommenden Pakets der Filter mit dem spezifischsten Adressbereich ermittelt werden muss, der die gegebene Adresse enthält. Für einen existierenden Lösungsansatz auf der Basis von Prioritätssuchbäumen zeigen wir, dass das Ersetzen dieser Bäume durch Priority Search Pennants oder eine noch einfachere Struktur, Min-augmented Range Trees, einen Performancegewinn sowohl für Update- als auch für die Lookup-Operationen bringt. Überdies weisen wir nach, dass das von uns untersuchte Routertabellendesign eine redundante Datenstruktur enthält, die bei geeigneter Modifikation entfallen kann und dadurch ca. 50% sowohl beim Speicherplatzbedarf als auch bei der Laufzeit für Update-Operationen einspart. Das zweite Ziel dieser Arbeit ist es zu ermitteln, wie Visualisierungen in der Lehre gewinnbringend eingesetzt werden können, und mögliche Lösungen bereitzustellen. Dazu werden zunächst die Ergebnisse einer empirischen Untersuchung präsentiert, die durchgeführt wurde, um den Einfluss des Grades der Interaktion der Lernenden mit einer Algorithmenvisualisierung auf den Lernerfolg zu bestimmen. Frühere Studien haben Anlass zu der Vermutung gegeben, dass Algorithmenanimationen nur dann eine signifikante Verbesserung des Lernerfolgs erzielen, wenn die Lernenden diese nicht nur passiv betrachten, sondern sich aktiv mit ihnen beschäftigen. Unsere Untersuchung reiht sich in einen von der ACM Special Interest Group on Computer Science Education (SIGCSE) aufgestellten Forschungsrahmen ein, der das Ziel hat, diesen Zusammenhang im Detail zu untersuchen. Entgegen der Hypothese zeigen unsere Ergebnisse keine signifikanten Unterschiede zwischen Lernenden, die eine Animation lediglich betrachteten, solchen, die interaktiv die nächste auszuführende Operation bzw. deren Input bestimmen konnten, sowie denjenigen, die die Algorithmenanimation aus einer Menge „atomarer“ Bausteine selbst visuell konstruieren mussten. Dagegen wurde ein signifikanter Einfluss der dem Test vorausgehenden Vorlesung sowie der allgemeinen Leistung der Studierenden festgestellt. Zudem gibt es Hinweise, dass das Vorhandensein der Option, eine Animation zurückzuspulen bzw. diese schrittweise zu betrachten, den Lernerfolg möglicherweise ebenfalls signifikant beeinflusst. Aufgrund unserer eigenen Ergebnisse sowie der Resultate weiterer in jüngster Zeit durchgeführter Studien kommen wir außerdem zu dem Schluss, dass einige Verfeinerungen am verwendeten Forschungsrahmen sinnvoll sind, um bei zukünftigen Experimenten differenziertere Aussagen zu ermöglichen. Als Hauptgrund dafür, dass Algorithmenanimationen trotz Interesse auch der Lehrenden in der Lehrpraxis eher wenig eingesetzt werden, gilt der Zeit- und Arbeitsaufwand, um geeignete Visualisierungssysteme zu finden, den Umgang damit zu lernen und passende Beispiele vorzubereiten. Wir schlagen daher ein „radikal einfaches“ Prinzip vor, das Lehrenden die Möglichkeit gibt, Visualisierungen von Datenstrukturen ad hoc während einer Präsentation zu erstellen und mit diesen zu interagieren. Da Vorlesungen inzwischen oft mit Hilfe stiftbasierter elektronischer Eingabemedien gehalten werden, wurde ein System entwickelt, welches Freihandskizzen als Instanzen von vorgegebenen Datenstrukturen interpretiert. Die dazugehörigen Operationen können mit stiftbasierten Kommandos (Gesten) aufgerufen werden, wobei ihre Ausführung animiert dargestellt wird. Der letzte Teil dieser Arbeit präsentiert die Architektur des Systems und erläutert die Funktionsweise anhand ausgewählter Beispiele.

Abstract

Visualizations can be used for the analysis of algorithms and data structures as well as for algorithm teaching and learning. In the latter case, they are usually employed for clarification or as instructive examples. In practice, it is most often the teacher who prepares the visualizations and presents them to the students in a class. If interactive animations or simulations are available, learners can also explore the data structures and algorithms on their own by selecting the input or the next operation to be carried out on a . By seeing how an algorithm reacts to different input sets, a deeper understanding and hence, better learning, is expected. Similarly, visualization can be a helpful research tool for the analysis of complex data structures and the algorithms operating on them. For a formal mathematical analysis, the right approach to the problem is often the most decisive step. A suitable visual representation of the structure and the dynamics of the involved algorithms can provide important clues for the initial idea. This work has two main goals. First, using the example of a relatively recent data structure, the priority search pennant, we show how visualizations can support the analysis of algorithms for complex geometric range queries on sets of point in the two- dimensional plane. Priority search pennants are similar to the better-known priority search trees; the structure is interesting, since the rigid heap order known from many implementations of priority queues is weakened. As a result, priority search pennants are easier to maintain than priority search trees when the of points is dynamic, i.e. when points are inserted and deleted and the underlying has to be rebalanced. This advantage comes at a cost: certain range queries have been shown to have a higher asymptotic complexity for priority search pennants than for priority search trees. However, it has been unclear whether this is also true for other frequently occurring types of range queries. We analyze the complexity of those operations which, for a given rectangular query range, return the leftmost, rightmost, or bottommost point of the set, respectively. It is shown that these operations enjoy the same asymptotic bounds in priority search pennants as they do in priority search trees. In addition, a sharp upper and lower bound for the actual worst case search path lengths of these queries is established in relation to the height of the underlying tree. As an application for range queries, we consider so-called most-specific range queries in IP router tables, where, e.g., for the destination address of an incoming packet the filter containing the most specific range containing that address must be found. It is shown for an existing router table design based on priority search trees that a vi replacement of the tree by a priority search pennant or an even simpler structure, the min-augmented , boosts the performance of update as well as lookup operations considerably. Moreover, we prove that the original router table design contains a redundant structure which can be omitted at no loss of efficiency, thereby reducing the space requirements and the cost of update operations by approximately 50%. The second goal is to investigate how interactive algorithm visualizations can be effectively employed in teaching and learning, and to provide possible applications. We first present the results of an empirical experiment conducted to evaluate the impact of the level of learner engagement with visualizations on the learning outcome. Early experiments on visualization effectiveness have given rise to the hypothesis that algorithm animations are effective only if they are interactive and engaging. Our study was carried out within a framework established by the ACM Special Interest Group on Computer Science Education (SIGCSE), whose goal is to investigate the above interrelation. Contrary to the hypothesis, our results showed no significant differences between students who simply viewed algorithm animations, those who could actively choose the input, and those who could even construct the animations visually by assembling the algorithm from “atomic” building blocks. However, a significant influence of the introductory lectures to the topic was found, as well as a strong correlation of test results with the overall performance of the participants in the course. In addition, there is some indication that the possibility to rewind an animation and to watch it in individual steps may have a significant influence as well. Judging from our own results and those of further studies, we also conclude that some refinements to the research framework may be useful in order to allow for more differentiated results in future evaluations. One main reason why many instructors – despite their willingness to use algorithm animations in their teaching – are reluctant to do so seems to be the time and effort required to find suitable visualization systems, to learn how to use them, and to create good examples. We therefore propose a “radically simple” approach; our system allows instructors to sketch an example “on the spot” during a presentation with a standard pen input device. The sketches are then interpreted as instances of a data structure. Commands (issued as pen gestures) allow users to interact with the data structures, e.g. to carry out operations on them and trigger animations of the resulting actions. The last part of the thesis describes the architecture of the system and outlines its properties with the help of selected examples.

Acknowledgments

The work presented in this thesis was carried out during my time as a research assistant at the University of Freiburg. Throughout these years, many people have inspired my work and my life, for which I am very thankful. First of all, I would like to thank my advisor, Thomas Ottmann, for the great freedom and opportunities he allowed me regarding all the different aspects of my research, for many fruitful discussions, and, not least, for suggesting a thesis title that unifies three rather distinct topics within a single phrase. I also wish to thank Amitava Datta for reviewing my thesis. I am very happy to have had such a supportive co-advisor, and I am especially grateful for his willingness to squeeze all the work involved in the review and the defense into his short trip to Germany. I gratefully acknowledge the support of parts of my work through a grant from the German Research Foundation (DFG) within its interdisciplinary program “Netzbasierte Wissenskommunikation in Gruppen”. During my work, I have had a lot of support from my current or former colleagues in our research group. I would like to thank Frank Dal-Ri, Christoph Hermann, Wolfgang Hürst, Christine Kupich, Jochen Lienhard, Khaireel Mohamed, Rainer Müller, Elisabeth Patschke, Robin Pomplun, Stephan Trahasch, and Martina Welte for all their advice and for the great time. I have also had the opportunity to advise or co-advise several student projects and diploma theses. Many of these have provided fresh impetus to my own work. I am particularly grateful to Robert Adelmann, Bettina Bär, Tobias Bischoff, Regina Brugger, and Sandra Busl. Daniel Lundberg and Karina Marx have proofread substantial parts of this thesis. I am very grateful for their patience and their round-the-clock availability during the last days of writing. Naturally, all remaining errors in the script are solely my responsibility. Life would not have been half as interesting if it hadn’t been for the people around me. I would like to thank all those who share my enjoyment in asking and trying to answer strange questions, in particular Daniel Lundberg, Frank Neugebauer, Jens Schmitz, Lorenz Bockisch, Claudia Füßler, Frank Zimmermann, Martina Boos, Anne Schlicht, and Ulrich Ruh. Most of all, I wish to thank Karina, who has always been there, and my family for their constant support. viii

To my parents Contents

1 Introduction...... 1 1.1 Background and motivation...... 1 1.2 Overview of the thesis ...... 2 1.3 Research contribution and publications ...... 3 1.4 Notation ...... 5

2 Preliminaries...... 7 2.1 Data structures and algorithms...... 7 2.2 Visualization of algorithms and data structures...... 9 2.2.1 Introduction ...... 9 2.2.2 Algorithm animation ...... 9 2.2.3 Coupling of animation and algorithm...... 11 2.2.4 JEDAS: a Java library for algorithm animation ...... 12

Part I Priority Search Queues and Their Application to IP Packet Classification ...... 15

3 Priority Search Queues...... 17 3.1 Introduction...... 17 3.2 Operations defined for priority search queues ...... 18 3.2.1 Dictionary operations ...... 18 3.2.2 operations...... 18 3.2.3 Range queries ...... 19 3.2.4 Further operations...... 20 3.3 Priority Search Trees ...... 20 3.3.1 The data structure ...... 21 3.3.2 Construction of priority search trees ...... 22 3.3.3 Update operations...... 26 3.3.4 Rebalancing priority search trees ...... 27 3.3.5 Complexity of PSQ operations in PSTs ...... 30 3.3.6 Related structures ...... 31

4 Priority Search Pennants...... 33 4.1 Relaxing the heap condition...... 33 x

4.2 Structure of priority search pennants ...... 34 4.3 A construction method for priority search pennants...... 35 4.4 Structural properties...... 36 4.4.1 Canonical decomposition ...... 36 4.4.2 Subtree property ...... 37 4.5 Update operations ...... 38 4.5.1 Insertion...... 38 4.5.1 Deletion ...... 39 4.6 Balancing priority search pennants...... 41 4.7 Priority queue operations ...... 43 4.8 South-grounded range queries ...... 43 4.8.1 The operation enumerateRectangle ...... 43 4.8.2 Interactive visualizations as an aid for data structure analysis ...... 47 4.8.3 The operation minXinRectangle ...... 49 4.8.4 The operation maxXinRectangle...... 55 4.8.5 The operation minYinXRange...... 56 4.9 Comparison with priority search trees ...... 59 4.9.1 Worst-case complexities...... 59 4.9.2 Average-case behavior ...... 60 4.9.3 Space requirements...... 62 4.10 Priority search trees and priority search pennants – an alternative view ...... 63 4.10.1 A geometric visualization...... 63 4.10.2 Construction algorithm for PST ...... 64 4.10.3 Construction algorithm for PSP...... 66

5 An application for priority search queues ...... 69 5.1 Introduction: IP packet classification...... 69 5.1.1. Problem specification ...... 69 5.1.2 Approaches to packet classification ...... 70 5.2 Geometric interpretation ...... 71 5.3 A data structure based on priority search trees ...... 73 5.3.1 Detection of conflicts ...... 74 5.3.2 Summary...... 76 5.4 Min-augmented range trees...... 77 5.4.1 Definition of the structure...... 77 5.4.2 Range queries ...... 78 5.4.3 Balancing min-augmented range trees ...... 80 5.5 Comparison of the data structures...... 81 5.5.1 Theoretical bounds ...... 81 5.5.2 Simulation results for average cases...... 82 5.6 Improvement of conflict detection...... 86 5.7 Experimental results ...... 88 5.7.1 Prefix ranges...... 89 5.7.2 Nonintersecting ranges ...... 91 5.8 Conclusion and future work...... 93 Contents xi

Part II Effectiveness of Algorithm Visualization for Learning...... 95

6 Learner engagement with algorithm visualizations...... 97 6.1 Introduction...... 97 6.2 A research framework for empirical evaluation ...... 98 6.2.1 The engagement taxonomy...... 98 6.2.2 Hypotheses regarding learner engagement...... 100 6.2.3 Methodology...... 100 6.3 First evaluations within the framework ...... 101

7 An empirical study on the influence of the engagement level on the learning outcome 103 7.1 Introduction...... 103 7.2 Experiment...... 103 7.2.1 Test design...... 104 7.2.2 Participants ...... 104 7.2.3 Contents and learning objectives...... 105 7.2.4 Interactive visualization...... 106 7.2.5 Preparatory materials...... 108 7.2.6 Procedure...... 109 7.3 Results...... 114 7.3.1 Data analysis...... 114 7.3.2 Influence of preparatory materials...... 114 7.3.3 Levels of engagement...... 116 7.3.4 Differences with respect to other variables ...... 117 7.4 Additional findings ...... 122 7.4.1 Importance of a rewind function and intermediate steps...... 122 7.4.2 Learning styles...... 123 7.5 Conclusions...... 124

8 Visualization effectiveness research: an overview...... 127 8.1 Further empirical studies...... 127 8.1.1 VIEWING vs. CONSTRUCTING ...... 127 8.1.2 VIEWING vs. NO VIEWING...... 128 8.1.3 VIEWING vs. RESPONDING ...... 128 8.1.4 Representational aspects of animations...... 129 8.1.5 Visualizations as programming aids...... 129 8.2 Summary and future directions...... 130 8.2.1 Conclusions ...... 130 8.2.2 A refined taxonomy...... 131

Part III Supporting Rapid Creation of Interactive Algorithm Visualizations ...... 133

9 A system for “on the fly” generation of interactive algorithm visualizations...... 135 9.1 Goals and general principle ...... 136 xii

9.2 Related work ...... 137 9.2.1 Structure recognition ...... 138 9.2.2 Interactive data structure animation ...... 138 9.2.3 Creation of animations by pen sketches ...... 140 9.2.4 Collaborative modeling tools...... 140 9.3 System architecture...... 141 9.3.1 Client-server architecture ...... 141 9.3.2 Generic framework and domain-specific modules...... 145 9.3.3 Division into layers of specificity...... 148 9.3.4 Information streams...... 149 9.3.5 The framework structure ...... 151 9.4 Illustration by example ...... 153 9.4.1 Object stream...... 153 9.4.2 Command stream...... 154 9.4.3 Action stream...... 155 9.4.4 Advantages ...... 156 9.5 Creation of animations on the client ...... 157 9.6 Basic recognition service for data structures ...... 159 9.7 Implemented modules...... 160 9.7.1 Linear list module...... 161 9.7.2 modules...... 162 9.7.3 Petri-net module ...... 163 9.7.4 CONNECT4 module ...... 165 9.7.5 Module management ...... 168 9.8 Summary and outlook...... 169

10 Conclusions and future work ...... 171

References ...... 173

Appendix A Proofs and Algorithms for Part I ...... 183 A1. Proof of Corollary 4.5 ...... 183 A.2. Range query algorithms ...... 184 A2.1 Improved iterative implementation of minXinRectangle in a PSP ...... 184 A.2.2 Iterative algorithm for minYinXRange in a PSP ...... 185 A.2.3 Algorithms for minXinRectangle in a PST ...... 187 A.3 Examples for minXinRectangle queries in PSP and PST...... 190 A.3.1 Worst-case example of minXinRectangle in a PSP ...... 190 A.3.2 Example of minXinRectangle in a PST...... 190 A.4 Insertion of ranges in conflict-free sets...... 192

Appendix B Materials used in the experiment ...... 197 B.1 Introductory survey ...... 197 B.2 Post-test...... 198 B.3 Final questionnaire...... 202

Chapter 1

Introduction

1.1 Background and motivation

Algorithms and data structures are at the heart of computer science. Despite being one of the oldest fields in the discipline, and notwithstanding the ever-increasing computing power and processing speed, the search for efficient algorithms has lost none of its topicality. On the contrary, in addition to the need for time and space efficiency, today’s trend towards small and mobile devices, for instance, brings along a new demand for energy-efficient algorithms in order to maximize the lifespan of battery-powered devices. Obviously, the study of algorithms and data structures continues to be fundamental for virtually every part of computer science. The importance of algorithms and data structures is also reflected in computer science education; despite the growing specialization in the discipline, first-year introductory courses on algorithms and data structures can be found in virtually every computer science curriculum around the world. Since computing nowadays is not only the subject of computer science education but also used as an instrument for teaching, e.g. as a medium for presentation, the dynamics of an algorithm can easily be visualized by animations. Moreover, interactive simulations enable students to explore the algorithms and data structures on their own, allowing for alternative ways of learning. It has long been assumed that visualizations are valuable tools for understanding and analyzing algorithms. Indeed, not only textbooks for students but also the majority of scientific papers on data structures include some form of visual representation to better explain relevant points. In fact, the names of many well-known data structures (such as trees or stacks) originate from specific visual representations. Most people would probably agree that a suitable visualization is helpful for learning and understanding a complex topic. Interestingly, early empirical evaluations of the effectiveness of algorithm visualization on learning have cast considerable doubt on this apparent truism [58]. In the light of mixed results of further studies, a research framework was established including a set of hypotheses to be tested in future evaluations [87]. In short, 2 those hypotheses state that visualizations must be interactive and engaging in order to be effective. While, according to a survey, most computer science instructors are willing to use algorithm visualizations in their courses, very few actually do so [87]. The main reason for this was found to be the time necessary to find and learn to use suitable visualization tools and to create visualization examples. Apparently, as long as actual programming is involved in creating visualizations, teachers perceive the time to be invested as too much. On the other hand, common easy-to-use graphics and animation editors (e.g. [125, 127]) are insufficient as they only create movie-style animations that do not support the creation of interactive examples. Hence, there is a need for the rapid and effortless creation of algorithm visualizations both for class presentations and for learners to interact with. The goal of this work is threefold. In addition to addressing the above need by providing a new framework and system design for the generation of interactive visualizations “on the spot” at virtually no cost, we contribute to the empirical research on the effectiveness of algorithm visualization for learning. Thirdly, we show by an example how data structure visualizations can contribute to the formal analysis of algorithms in the area of priority search queues. Even though this topic has been well- studied, refreshingly new and elegant structures (e.g. [48]) as well as applications [73] have turned up in recent years, to both of which this work offers contributions.

1.2 Overview of the thesis

Following this introduction and the preliminaries outlined in the next chapter, this thesis consists of three main parts which can be read independently of each other. Figure 1.1 is a visualization of the structure.

1. Introduction

2. Preliminaries

Part I Part II Part III Priority search queues Learner engagement Supporting rapid and applications for IP with algorithm creation of interactive packet classification visualizations algorithm visualizations (Chapters 3 to 5) (Chapters 6 to 8) (Chapter 9)

10. Conclusions

Figure 1.1: Overview of this thesis. 1.3 Research contribution and publications 3

The first and theoretical part employs visualizations to explore a relatively recent data structure called priority search pennant, which is an implementation of priority search queues. After an introduction to this abstract data type, the supported operations, and the best-known classical implementation, Chapter 4 gives a detailed description of priority search pennants and provides an in-depth analysis of the involved algorithms. Employing interactive visualizations as a tool for exploration helps to find a starting point for our formal analysis of certain range queries whose complexities for priority search pennants had not been known so far. Chapter 5 provides an example for the application of this data structure in the context of the IP lookup and packet classification problem. The second part describes our empirical research on the effectiveness of algorithm visualization on learning. After an overview of the state of the art in this field, Chapter 7 presents the findings of an empirical study conducted in an introductory course on algorithms and data structures, where the effects of different levels of learner engagement with algorithm visualizations on learning were examined. The results are summarized and put in relation to those of other studies in Chapter 8. The third part presents a framework and prototypical implementation of a system for easy and rapid generation of interactive visualizations in lecture presentations or collaborative learning sessions. Its main goal is the effortless creation of and interaction with data structure visualizations. The system supports recognition of data structures from freehand examples sketched with an electronic stylus on a digital whiteboard and the interaction with those data structures by pen gestures.

1.3 Research contribution and publications

This section gives an overview of the new contributions of the present work. Some of the research results have been published or accepted for publication as conference papers or journal articles. Only the key publications are listed here; all others will be referenced later in the respective chapters. On the theoretical side, the contributions include a detailed complexity analysis of priority search pennants [48] for so-called south-grounded range queries on sets of points in the two-dimensional plane. We show that for those operations that do not enumerate all points of a given query range but only return one extremal value (the leftmost, rightmost, or bottommost point in a given rectangle), priority search pennants offer the same asymptotic complexity, O(log n), as the better-known and more complex structure, priority search trees. Since priority search pennants are conceptually simpler, easier to maintain under insertions and deletions and require less space, they are suggested as a replacement for priority search trees in all those practical applications where updates are important and range queries enumerating all points are not required or less important. For the specific example of an existing IP router table design [73], it is shown that this replacement boosts the performance considerably, both for update and for lookup operations. In addition, we outline how to further improve that router table design for 4 nonintersecting ranges, independently of the particular data structure, resulting in a reduction of space requirements and update costs approximately by half. Our theoretical proofs of the asymptotic worst-case complexities are backed up by simulation results and experimental tests for average cases. The key parts of this research, which has been conducted in collaboration with Amitava Datta and Thomas Ottmann, have been submitted and accepted for publication and will be published by the International Journal of Foundations of Computer Science [70]. The empirical part of the thesis contributes to the research on the effectiveness of algorithm visualization for learning. Our evaluation, carried out within an established research framework [87], shows that a higher level of engagement with visualizations does not necessarily lead to improved learning but that other variables, such as preparatory materials, are likely to be more important. In addition, the results indicate that the role of certain navigational features for interaction with visualizations may be underestimated, and they reveal possible weaknesses of the research framework. Refinements to that framework are proposed which take into account our own results and those of other recent studies. The system used for the evaluation and the results of the empirical study have been published in two papers, which were presented at the 2005 and 2006 ACM Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE), respectively [62, 65]. The system developed and described in Chapter 9 includes both theoretical and practical contributions. The proposed architecture provides a framework for the combination of sketch recognition and interactive animation that is not restricted to the domain of algorithms and data structures. While the system itself builds on existing recognition and animation technologies, it is novel in the way how the separate parts are combined by the proposed architecture. Its prototypical implementation allows the easy integration of new (data) structures, alternative animation systems or advanced sketch and structure recognition techniques. The research and development in this part was carried out in collaboration with Robert Adelmann and Tobias Bischoff. The results have not been published so far, but the system and its documentation are freely available [2, 13]. In its entirety, this thesis can also be regarded as an example of the variety of research methods applied in modern computer science. Besides rigorous mathematical proof for the analysis of algorithms and data structures, extensive simulations are run in order to assess the average behavior of algorithms, and experimental tests are conducted to compare them in actual applications. Systems design employs strategies such as prototyping, top-down and bottom-up approaches, design patterns, heuristic evaluation, and many more, while the research on human factors of computing, notably in computer-aided teaching and learning, requires empirical testing methods established in the social sciences, together with statistical methods of analysis. 1.4 Notation 5

1.4 Notation

Throughout this thesis, the following symbols and abbreviations will be used:

Symbol Description PSQ priority search queue PST priority PSP priority search pennant MART min-augmented range tree (i) a node in a tree structure (in Part I) N (ii) size of a sample in a statistical test (in Part II) the x-coordinate of the point stored in a tree node N, x N also known as the key of node N the y-coordinate of the point stored in a tree node N, y N also known as the priority of node N

sN the split (or: router) value stored in a tree node N a range or interval [u, v] of integers, where u ≤ v r (including both u and v) R a set of ranges maximum probability of a Type-I error in a significance test p (if p < 0.05, statistical significance is assumed)

6

Chapter 2

Preliminaries

Two of the main concepts occurring throughout this work are data structures and visualizations. The purpose of this chapter is to clarify these notions and define how we use them in the remaining chapters.

2.1 Data structures and algorithms

Following Aho et al., we define a data structure as “a collection of variables, possibly of several different data types connected in various ways” [4]. The main purpose of a data structure is to store and manage information efficiently. In order to be usable for specific tasks, a data structure must support certain operations. These operations are usually specified by an abstract data type. An abstract data type is a set of abstract items together with a description of the operations defined on the set. Typically, an abstract data type is defined by the requirements of a certain information management problem. Let us consider a simple example:

Example 2.1: A dictionary (or: finite map) is a set of elements supporting the insertion, lookup, and deletion of an element. Each element has a unique key, by which it can be identified. An everyday example of a dictionary would be a personal address book, with the name of each contact serving as key. Note that a dictionary is not a data structure, since it does not specify how the data are managed. It only dictates the operations that a data structure must support in order to qualify as a dictionary.

Definition 2.2: If a data structure D supports the operations defined by an abstract data type A, we say that D implements A.

8

For example, a dictionary could be implemented by a linear array, a singly-linked linear list, a hash table, or a (cf. [22, 94]). As a second example of an abstract data type, let us consider priority queues:

Example 2.3: A priority queue is a set of elements each of which is assigned a priority value. In its simplest form, a priority queue supports the insertion of an element together with its priority as well as the access to and deletion of an element with highest priority (note that several elements can have the same priority). An example (albeit a somewhat constructed one) would be a to-do list, where any task can enter the list but only the most urgent one can be accessed (in order to start it) and removed (after it has been finished). Note that looking up an element is not required in this data type. However, it may be useful to delete an arbitrary element (if it has become dispensable) or change its priority, i.e. to make an element more important. Priorities are usually given by numbers, with a lower value representing a higher priority. Therefore the operation to assign a higher priority to an element is often called decrease or decreasekey (here the key of an element means its priority). Just like dictionaries, priority queues can be implemented by a variety of data structures, for instance binary heaps, binomial queues, Fibonacci heaps, etc. [94] .

Different implementations of an abstract data type usually differ in how efficiently they support the respective operations, where efficiency is usually measured in terms of running time or memory requirements and where asymptotic and actual time and space complexity are distinguished [22]. Usually, the goal is to find data structures for which the required operations can be carried out as efficiently as possible. In many cases, however, the efficient implementation of one operation is at the expense of the running time of another operation. To use our simple introductory example, if a dictionary is implemented by a linear list and a new element is inserted by simply appending it at the end of the list, the insertion operation is very efficient; however, lookup of an element becomes inefficient, since in the worst case, all elements will have to be examined. Hence, the choice of a data structure also depends on the application scenario and the operations required there. Each operation carried out on a data structure can be described by an algorithm, i.e. a sequence of computational steps; conversely, an algorithm, in order to be efficient, requires a suitable data structure to operate on. Hence, data structures and algorithms go hand in hand and can be regarded as two sides of the same coin – neither one exists without the other [94]. This should also be kept in mind when talking about the visualization of algorithms and data structures; visualizing an algorithm implicitly includes the visualization of the data structure(s) that the algorithm operates on. 2.2 Visualization of algorithms and data structures 9

2.2 Visualization of algorithms and data structures

This section defines our use of the term visualization in the specific context of algorithms and data structures. For a more detailed introduction to visualization in general, we refer to the excellent overview given by Bröcker [16].

2.2.1 Introduction

Visualization has been defined as “the binding (or mapping) of data to a representation that can be perceived” [38]. Note that in this broad sense, visualization also includes audible, tactile, or other representations that can be sensed by humans, whereas in practice the term is most often reduced to purely visual illustration. In computer science, visualization is considered “a method of computing [which] transforms the symbolic to the geometric, enabling the researchers to observe their simulations or computations” [77]. This definition mainly concerns the purposes of visualization for scientific research. In computer science education, visualization is most often used to explain concepts, in particular algorithms. Here, the main purpose of visualization is “to further the understanding of algorithms and data structures inductively, based on observations of an algorithm in operation” [59]. The history of algorithm visualization dates back until at least the early 1980s, when the first animated film on sorting algorithms was released [11]. Since that time, a vast number of visualization systems have been developed and used in computer science education, e.g. [18, 102, 12, 15, 63, 96, 98, 104, 111, 51, 89, 60, 24]. It would be beyond the scope of this thesis to give a detailed description of the development of algorithm visualization or to compare different visualization systems. Instead, we will point out what is relevant for our own work and refer interested readers to the many overviews provided, for instance, in the works of Bröcker [16], Faltin [36], Korhonen [59], Müller [83] and Rößling [103]. The theoretical foundations and many practical examples of algorithm visualization have been compiled in two independent anthologies, both titled Software Visualization [28, 113].

2.2.2 Algorithm animation

Stasko defines algorithm animation as “the process of abstracting the data, operations, and semantics of computer programs, and then creating dynamical graphical views of those abstractions” [112]. In particular, an animation portrays the execution of an algorithm rather than its description. Note, however, that since algorithms operate on data structures, an algorithm animation is almost always at the same time a visualization of a data structure. Visualizing an algorithm does not necessarily involve animation. Series of static images of the state before and after an operation are also common. However, most algorithm visualization systems support the creation of smooth transitions between two 10 consecutive states. We will therefore often use the words animation and visualization interchangeably in the remainder of this work, even though, strictly speaking, the latter is more general than the former. Note that smooth animations do, in most cases, not reflect “algorithmic reality”, since algorithms consist of discrete steps. For instance, when two elements x and y are swapped in a , this is usually done in three discrete operations with the help of a temporary variable. temp = x; x = y; y = temp;

Most visualization systems would represent an exchange of two elements in a sorting algorithm by one smooth animation where the two graphical objects representing the elements are trading places, as is shown in Figure 2.1. This is a typical example of abstraction; in this case the swap operation is simplified and represented by a metaphor that is more “tangible” for humans. The temporary variable temp is not visualized at all, and what in algorithmic terms would be three re- assignments of values to variables is shown as two synchronous motions. The main reason for using smooth transitions rather than discrete steps is to maintain visual coherence between successive states of an algorithm [68]. It is easier for users to keep track of an object if they can follow its path. Note, however, that this does not necessarily mean that learning with smooth animations will be better than with static images. In fact, a study by Awan and Stevens found no significant difference in procedural learning and knowledge transfer between static and animated pictures, and even found a better retention rate of static information (such as object names) when static images rather than animations are used [10].

Figure 2.1: Animated visualization of a sorting algorithm. 2.2 Visualization of algorithms and data structures 11

2.2.3 Coupling of animation and algorithm

The connection between an algorithm and its visualization can be realized in different ways. Of course, an animation can be created completely independently of any actual implementation of the algorithm to be visualized. This would be a studio-like approach, comparable to the production of traditional animated films. While this method allows complete freedom of how the resulting visualization should look like, it comes with two major disadvantages. First, it is a very time-consuming process to create such an animation. Second, the resulting movie visualizes one specific example. If the algorithm is to be visualized for different input data, a complete new movie has to be produced. This problem is avoided by creating an animation directly from the executed algorithm. Two major approaches can be distinguished. The declarative method associates variables or other entities in the algorithm (e.g. array positions) with graphical objects in the animation display once in the beginning [28]. Rather than being included in the algorithm, the animation commands are part of these objects, which react to changes of the variables and update themselves accordingly. An advantage of this approach is that the original code of the algorithm stays relatively clean of any animation-related parts and hence remains readable. The main disadvantage is that it is hard to control the visualization explicitly. For instance, the above example of a swap operation in a sorting algorithm (cf. Figure 2.1) will be rather difficult to implement if each array position is associated with one object that is updated independently from all the others. A far more popular way to connect the algorithm with the visualization is by so-called interesting events [19]. In this approach, the source code is augmented with explicit animation commands inserted at certain interesting points in the algorithm. This approach provides maximum control of the animation. To use our above example, the whole swap operation can be treated as one interesting event rather than three independent variable updates. Interesting events can either be used to trigger the animations directly or to encode and collect them as a list of animations commands (in some specified format) to be executed later. The former case is also referred to as live animation, whereas the latter is called post mortem animation, to denote that the actual visualization takes place after the algorithm has finished [29]. An advantage of post mortem animation is the independence of the visualization from the underlying algorithm during playback. This makes it very easy to support, for instance, backward navigation in the animation, since the complete animation is given as a linear movie. The major drawback is that interaction with the algorithm during runtime is impossible. For instance, if a sequence of insertions is carried out on a binary tree, the complete input has to be specified at the beginning of the algorithm. In a live animation, on the other hand, an algorithm can stop after the visualization of each operation and wait for the user’s next input. Hence, users can interact much more freely with the algorithm or data structure. This kind of interactive visualization is also referred to as algorithm simulation [59]. It is obvious that going backwards in an interactive simulation is not easy to implement, as the algorithms are usually unidirectional and often the steps cannot simply be reversed or undone. Hence, in the 12 live animation approach the algorithm programmer (rather than the animation system) has to take care of this problem, for instance by saving the current state of the data structure to be restored later if the user wants to undo an action or by using a persistent version of the data structure [31]. Using interactive simulations, learners can actively explore algorithms and data structures by choosing their own input and see how an algorithm reacts to it. Moreover, simulations can be used for a more constructivist approach to learning described by Faltin: if the algorithm and its visualization are decomposed into small functions or steps which can be carried out individually, students can discover (or construct) the algorithm visually by combining the correct steps in the right sequence [36]. The interactive visualization serves as user interface to these functions. Note that such an approach allows students to make mistakes. It can therefore also be used for exercises and assessment, as described, for instance, by Korhonen [59] and Trahasch [116]. When used for explorative learning, it is important to provide students with feedback about the correctness of their actions [8]. Such interactive construction tasks play an important role in our research on the impact of learner engagement on the effectiveness of algorithm visualization, which is described in Part II.

2.2.4 JEDAS: a Java library for algorithm animation

The example in Figure 2.1 and all other visualizations developed in the context of this work were created using the Java Educational Animation System (JEDAS), a powerful and freely available Java animation library [128]. It was originally developed by Müller [83] and has been extended in many aspects in the course of our work [67, 114, 62]. During the same time, a great number of visualization examples for various algorithms and data structures were developed, most of them as parts of student projects [66, 54, 74, 20, 93]. In addition to a live coupling of algorithm and visualization through interesting events, JEDAS supports post mortem animations through a recording feature. It also makes use of a useful concept introduced by Stasko: the path-transition paradigm for creating smooth animated sequences [112]. A transition is uniquely defined by a graphical object, a path (including the states before and after the transition, a function for interpolation, as well as its duration), and a transition type specifying which attribute (position, size, color, etc.) of the object is modified. In addition, JEDAS supports the graphical and textual annotation of running animations. In Figure 2.1, the user has drawn two arrows and typed in some text next to the animated content. The animation together with the annotations can be recorded for post mortem replay. These two features were introduced in order to allow instructors to use JEDAS animations in conjunction with presentation recording systems for capturing live presentations using the Authoring on the Fly (AOF) approach [84]. Through a communication interface between JEDAS and the AOF system, interactive animations presented and recorded in a live lecture can automatically be integrated in and synchronized with the recorded slide presentation and audio narration [25]. In addition, 2.2 Visualization of algorithms and data structures 13 such recorded animations can be exported as Scalable Vector Graphics (SVG) [133] and embedded in web pages [67]. JEDAS also forms the technical basis of the MA&DA system for learner assessment through interactive construction tasks [62], which was used in our evaluation described in Chapter 7. We will argue in the following chapters, that interactive visualizations are not only useful in education but can also provide important clues for the analysis of algorithms and data structures. These clues can then be used as a starting point for formal analysis and rigorous proof.

Part I

Priority Search Queues and Their Application to IP Packet Classification

Chapter 3

Priority Search Queues

This chapter provides an introduction to the abstract data type priority search queue and its best-known implementation, the priority search tree [78], which can be considered preliminaries for the discussion of an alternative data structure, the priority search pennant [48], in Chapter 4.

3.1 Introduction

The abstract data type priority search queue (PSQ) unites the properties of a dictionary and a priority queue (cf. Chapter 2.1). To achieve this, each element of the data type contains a pair of entities: a key and a priority. The key is the unique identifier used for looking up elements, while the priority is assigned to support the priority queue operations. The key and the priority must each come from a totally ordered set, but they do not necessarily have to be of the same type. In many practical applications, however, the keys and priorities are interpreted as x- and y-coordinates of points in the plane, as PSQs are often used for problems in computational geometry (cf. [78, 94]). Following this convention, in the remainder of this work we will usually denote keys with x and priorities with y and sometimes treat them as if they were both from the same set. However, we stress that this is not essential for our considerations and involves no loss of generality. The key x serves as the identifier of an element and therefore must be unique; i.e. no two elements in a PSQ may have the same key. No such restriction is imposed on the priorities. While the uniqueness of keys may sound like a major restriction, especially if points in the plane with the same x-coordinates are to be maintained, it is quite easy to see that we can always achieve this property by an appropriate transformation. One option is to transform each pair (x, y) into the pair ((x, y), y) and define the following order for the new keys:

(x1, y1 ) < (x2, y2 ) :⇔ x1 < x2 ∨ (x1 = x2 ∧ y1 > y2 ) 18

In this transformation, the y-value is used as a tie-breaker if x-values are the same; more precisely, a larger y-value will result in a smaller key. This decision is purely arbitrary; we could just as well have defined the order such that a larger y-value will produce a larger key. Of course, if the keys of two elements are equal, the priorities must be different in order to distinguish them. It is not possible to store the exact same element (x, y) twice in a PSQ. In practical applications, both the x- and y-values are often integers from the same finite set. Hence, there is an integer M such that for each point P the coordinates xP, yP < M. In these cases the following alternative transformation, as proposed in [73], can be used: (x, y) a (x ⋅ M − y, y) This transformation involves a dilation of the x-range by factor M and will result in the same overall order of points as the above one:

(1) If x1 < x2: x1·M – y1 < x1·M ≤ x2·M – M < x2·M – y2

(2) If x1 = x2: x1·M – y1 = x2·M – y1 < x2·M – y2 iff y1 > y2 Since one of the above transformations can always be applied to the elements, we will, in the remainder of this work, assume that the elements to be maintained in a PSQ have unique keys, i.e. that they have already undergone the respective transformation, if necessary.

3.2 Operations defined for priority search queues

PSQs support the standard operations known from dictionaries and priority queues. In addition, there are operations involving both the keys and priorities of elements at the same time, so-called range queries.

3.2.1 Dictionary operations

PSQs support the standard dictionary operations lookup, insert, and delete for the keys: ƒ lookup(x): finds and returns the element e with key x, if that element is contained in the PSQ ƒ insert(e): inserts a given element e = (x, y), if an element with key x is not already contained in the PSQ ƒ delete(x): returns and removes the element with key x from the PSQ, if that element is contained in the PSQ

3.2.2 Priority queue operations

Apart from insert and delete, which are essentially the same as in dictionaries, the most important priority queue operations are extractmin and decrease. ƒ accessmin(): returns an element e with minimum priority y, if the PSQ is not empty 3.2 Operations defined for priority search queues 19

ƒ extractmin(): returns and removes an element e with minimum priority y from the PSQ, if the PSQ is not empty

ƒ decrease(x, ynew): decreases the priority y of the element e with key x to ynew, if that element is contained in the PSQ and if ynew < y

The important difference to the well-known priority queues implemented, for instance, as binary heaps, Fibonacci heaps [39] or binomial queues [119] is that PSQs support efficient lookup of elements. Hence, the delete and decrease operations do not require direct access to the respective element in the structure but merely require the key as an input.1

3.2.3 Range queries

In addition to the dictionary and priority queue operations, PSQs allow for queries that involve both the x and y values simultaneously. If (x, y) pairs are interpreted as points in the plane, these types of search are called south-grounded (or: three-sided) range queries. Figure 3.1 is a visualization of a set of points and a south-grounded query rectangle (only xleft, xright, ytop must be specified; ybottom, i.e. the lower boundary is given implicitly by the x-axis or, in general, by the lowest possible value of the y-coordinates).

ƒ enumerateRectangle(xleft, xright, ytop): returns all points in the rectangular area bounded by the given parameters and the x-axis, if any such points exist

ƒ minXinRectangle(xleft, xright, ytop): returns the leftmost point in the rectangular area bounded by the given parameters and the x-axis, if such a point exists

ƒ maxXinRectangle(xleft, xright, ytop): returns the rightmost point in the rectangular area bounded by the given parameters and the x-axis, if such a point exists

ƒ minYinXRange(xleft, xright): returns a bottommost point in the given x-range, if such a point exists In the above example (see Figure 3.1), enumerateRectangle would return all the 6 points contained in the dashed rectangle, while minXinRectangle would only return the leftmost of these points. The query minYinXRange considers a semi-infinite rectangle that is open on the top and in our example would return the bottommost point in the rectangle. Note that an operation maxYinXRange is not included among these range queries. In fact, the y-dimension can be regarded as a “minor” dimension in the context of priority search queues. Recall that a PSQ is only a priority queue for the y-coordinates, not a sorted list; hence, it is easy to find the smallest y-values, but it can be complex to determine the largest one.

1 In standard priority queues, the implementation of decrease usually requires a direct reference to the element e rather than only to its key x. This is because efficient lookup of an element is not supported. Since for e = (x, y), decrease(e, ynew) is the same as decrease(lookup(x), ynew) in a PSQ, we commonly use the above notation decrease(x, ynew). 20

ytop

xleft xright

Figure 3.1: A set of points and a south-grounded query rectangle given by xleft, xright and ytop.

3.2.4 Further operations

In addition to the above operations, there are further methods which may be useful in certain applications: ƒ isEmpty() returns true if and only if the structure does not contain any elements. ƒ The operation size() returns the number of elements currently stored in the priority search queue.

ƒ For some applications, it is useful to split a PSQ Q into two PSQs Qleft and Qright according to a given value c, such that all points in Qleft have x-values smaller than or equal to c and all the x-values of points in Qright are greater than c. This operation is usually called split(c).

ƒ The reverse operation merge(Qleft, Qright) joins two PSQs into one, provided that xp < xq for all p ∈ Qleft and q ∈ Qright. Up to now, we have only described the abstract data type with its operations but have not discussed how it can be efficiently implemented. The following section describes the best-known implementation technique of priority search queues.

3.3 Priority Search Trees

The most widely known implementation for priority search queues is the priority search tree (PST) introduced by McCreight [78]. Priority search trees are a blend of binary search trees and binary heaps. For the x-values of the stored points, a PST is a (leaf-oriented) search tree. For the y-values, it is a binary min-heap. Figure 3.2 shows a visualization of a priority search tree. 3.3 Priority Search Trees 21

7, 1 4

1, 2 8, 3 2 6

2, 4 4, 5 5, 4 1 3 5 7

3, 8 6, 9 1 2 3 4 5 6 7 8

Figure 3.2: A priority search tree containing 8 elements. Each node stores up to one (x, y) pair and a split key to direct the search for x-values in the tree.

3.3.1 The data structure

Each node N in a PST contains a key-priority pair (xN, yN) as well as a split key (or router) sN. In Figure 3.2, split keys are shown in the bottom half of a node, while the key-priority pair is printed in the upper half. The invariants of the data structure can be summarized as follows:

(1) Min-heap condition: for each node N, the priority yN of the element stored in N is less than or equal to the priorities of the elements stored in the children of N. (2) Search tree condition: for each internal node N, all keys (x-values) and split keys stored in the left subtree of N are less than or equal to the split key sN, and all keys and split keys stored in the right subtree are greater than sN. (3) Finite map (dictionary) condition: the PST may not contain two different elements with the same key value (4) Contraction condition: each element (x, y) is stored in no more than one node, and empty nodes (i.e. nodes that store no element) may not occur as parents of non-empty nodes. Conditions (1) and (3) ensure that each element can be found using the standard search procedure known from binary search trees. Starting from the root, we check for each visited node whether it stores the desired x-value. If yes, we are done. Otherwise, we check the split key in order to decide which subtree we have to inspect. If we arrive at a leaf without finding the search key, no element with that key is contained. While condition (2) does not allow an efficient search for priorities, it guarantees that an element with minimum priority is stored in the root and hence supports the priority queue operations, in particular accessmin and extractmin. McCreight describes this hybrid structure as “1.5-dimensional” to indicate that the search operations are not equally powerful for the x- and y-dimensions: there is one 22

“major” dimension (in x-direction) which is fully sorted and searchable, while the values in the “minor” y-dimension are only heap-ordered [78]. The term “1.5- dimensional” has also been used as referring to the nature of the south-grounded range queries (cf. Section 3.2.3), where the search rectangle has only three free sides, whereas the fourth side is always fixed [79, 94]. In the example in Figure 3.2, most of the leaves and one internal node do not store any elements. Obviously, these nodes could be eliminated without destroying any of the above conditions. We will later see that they are required if the tree needs to be balanced by rotations. First, however, we will show how these nodes come to exist in the first place.

3.3.2 Construction of priority search trees

Given a set S of n points (x, y) in the plane, we will now outline a method for constructing a PST. Let us assume that the points are given in an array a1, …, an. Let us also assume for simplicity that n = 2k is a power of two. We will describe the construction procedure by using the analogy of a knockout tournament, where we interpret each point as a player, with the x-coordinate representing the “name” (or other unique label) of the player and the y-coordinate as the “strength” (with a lower y-value standing for a stronger player). This analogy has been taken from Hinze [48]. A match between two pairs is simply the comparison of their y-values, where the one with smaller y-value is considered the winner. Ties are not possible, so in the case of the identical y-values, the winner is determined by random selection or any other policy. Note that a knockout tournament with n players always consists of exactly n – 1 matches, no matter the order or pairings of the matches. This is because every match has exactly one loser, who immediately drops out of the tournament. Thus, after n – 1 matches, only one player, the overall winner, is left.

Tournament trees A tournament tree can be constructed from S as follows: 1. Sort the points in ascending order of their x-values and insert them from left to right into the n leaf nodes of a binary tree with n – 1 internal nodes. Set the split key of each leaf node to the x-coordinate of the point stored in it. Note that in principle, the shape of the tree does not matter; it corresponds to the pairings of the “matches” played in the tournament. Of course, for efficient search in the tree it makes sense to use a balanced tree, as shown in Figure 3.3a. 2. Set the split key of each internal node N to the largest split value found in the left subtree of N (see Figure 3.3b). This step augments the tree to a leaf-oriented search tree over the keys. 3. While there are internal nodes with empty (x, y) fields, “play a match”: Compare the points stored in the two non-empty children of an empty node. The one with smaller y-value is considered the winner and is copied into the parent node (Figure 3.3c). 3.3 Priority Search Trees 23

1, 2 2, 4 3, 8 4, 5 5, 4 6, 9 7, 1 8, 3 (a) 1 2 3 4 5 6 7 8

4

2 6

1 3 5 7

1, 2 2, 4 3, 8 4, 5 5, 4 6, 9 7, 1 8, 3 (b) 1 2 3 4 5 6 7 8

7, 1 4

1, 2 7, 1 2 6

1, 2 4, 5 5, 4 7, 1 1 3 5 7

1, 2 2, 4 3, 8 4, 5 5, 4 6, 9 7, 1 8, 3 (c) 1 2 3 4 5 6 7 8

Figure 3.3: Construction of a (search-augmented) tournament tree from 8 elements: (a) Insert elements in the leaves; (b) add split keys to internal nodes; (c) play matches and promote the winners up the tree.

The resulting tree is a (search-augmented) tournament tree. It is quite similar to the visualizations often found in sports events involving knockout tournaments, except that our tree can also be used to search for the “name of a player”. The construction of the tournament tree can be done in time O(n log n) including the initial sorting, or O(n) if 24 the points are already sorted. This is because exactly n – 1 matches are played, one for each internal node of the tree. However, this tree is not a PST: while it satisfies conditions (1), (2) and (3), condition (4) is violated because of the duplicate entries.

Contraction In order to create a PST from our tournament tree, we will contract the tree by removing duplicate point entries but leaving the split keys. Each point (x, y) only remains in the topmost node in which it occurs in the tournament tree (see Figure 3.4a). Then, starting from the bottom, we fill each empty node that has at least one non-empty child node. If only one child contains a point, we can simply move it up to the parent node (cf. Figure 3.4b). If both children contain a point, we compare the y-values, i.e. we play another match, and promote the winner up to the next level (Figure 3.4c). We repeat this until there are no more empty nodes with non-empty children. The resulting tree is a priority search tree: Clearly, it satisfies condition (1) because of the nature of the tournaments, which always promotes points with smaller y-value up the tree. The search tree condition (2) is still fulfilled because each point can only be promoted upward on its search path and hence can never end up in a different subtree of a node than is was before. Condition (3) has not changed. Condition (4) was achieved by removing all duplicate points. The contraction can be done in time O(n) in a balanced tree. For each of the n-1 internal nodes, we have to do the following: check whether or not it is empty; if so, look at its two children, play at a match if necessary and move the point from the respective child up to the node. This requires constant time but creates another empty node on the next lower level, for which we have to do the same again and so on, until we reach the leaf level. Hence, for each internal node, the procedure may require the time proportional to its level in the tree, which, in a perfectly balanced binary tree, is at most the height i h = log2 n. Since at each level i, we have 2 nodes (if the root is at level 0), the number of nodes required to inspect for the complete contraction is bounded by

h−1 h−1 h h−1 h i 2 h − i i ∑ 2 ⋅ (h − i) = ∑ h−i ⋅ (h − i) = n ⋅ ∑ h−i = n ⋅ ∑ i < 2 ⋅ n , i=0 i=0 2 i=0 2 i=1 2 and hence, the procedure can be carried out in O(n) time (the sum is bounded by the infinite geometric series whose limit is 2). It is clear that after the contraction, each pair (x, y) is still stored in a node on the search path for x, since it has only moved upward along that path. However, if the tree has to be restructured, is useful to know where a node originally came from. This can be done by following the search path until the leaf level.

Definition 3.1: In a priority search tree, we say that a pair (x, y) originates from a subtree T of the tree if the search path for x ends in a leaf node in T. 3.3 Priority Search Trees 25

7, 1 4

1, 2 2 6

4, 5 5, 4 1 3 5 7

(a) 2, 4 3, 8 6, 9 8, 3 1 2 3 4 5 6 7 8

7, 1 4

1, 2 2 6

2, 4 4, 5 5, 4 8, 3 1 3 5 7

3, 8 6, 9 (b) 1 2 3 4 5 6 7 8

7, 1 4

1, 2 8, 3 2 6

2, 4 4, 5 5, 4 1 3 5 7

(c) 3, 8 6, 9 1 2 3 4 5 6 7 8

Figure 3.4: Contraction of the tournament tree: (a) remove duplicate entries; (b) fill empty nodes: entries in children with empty siblings are promoted directly; (c) if both children contain a point, promote the one with lower y-value.

As has been explained above, the empty nodes can be removed after the contraction if the set of points (and hence, the structure of the tree) does not change after the construction. In many applications, however, the set of points is not static but changes over time. The next section shows how to maintain such dynamic point sets in a PST. 26

3.3.3 Update operations

In the previous section we have seen how to construct a PST from a given set of points. We will now have a brief look at how a PST can be maintained if the set of points changes, i.e. when points are inserted or deleted. Since the respective methods have been described extensively in the literature [78, 79, 94], we only provide a short description and refer to other works for details and examples. Insertion of a new pair (x, y) can be broken down into the following steps: 1. Search for x in the PST. If the search is successful, do not insert (x, y), since duplicate x-values are not permitted because of condition (3). Otherwise, the leaf node L at the end of the search path is the insert position. 2. Extend the underlying leaf search tree by inserting a new leaf node N with split key x and a new internal node that becomes the parent of N and L and whose split key is set to the split key of its left child in order to satisfy condition (2). 3. Walk down the search path for x from the root until you encounter a node M that is either empty or storing a point with a priority greater than y. Node M is the correct position for (x, y) such that condition (1) is satisfied.

4. Store (x, y) in M. If has already stored a point pM, empty M by “pushing” down pM into that subtree from which it originates, and store pM in M’s child. If that child is not empty, recursively push down the point there. This process may trigger a whole chain of “pushdowns” until an empty node catches the last point (note that such a node always exists, since we have not deleted empty nodes). It is obvious that the time required for steps 1, 3 and 4 is proportional to the height h of the tree, while step 2 requires only constant time. If the tree is balanced and stores n elements, h ∈ O(log n). We will take a detailed look at the rebalancing step in the next section. First, however, we will briefly describe how to delete a point from a PST, which is essentially the reverse procedure as insertion. Deletion of a point given by its key x works as follows: 1. Search for x in the PST. If the search fails, nothing needs to be done. Otherwise, the pair (x, y) is found in a node N. 2. If N is a leaf, then its sibling M must be empty, because the two have “played” against each other and the winner was promoted up to the next level (or further). We can therefore delete N and its sibling, leaving their parent as a new leaf node, and proceed with step (5). 3. If N is an internal node, empty N and continue to follow the search path for x until we arrive at a leaf L. Note that L must be empty, as it is the leaf corresponding to point (x, y) in the tournament tree. Note that either L’s sibling is also empty, or otherwise N must be L’s parent. This is because if the point stored in L’s sibling was not promoted, then the point in L can only have been promoted until L’s parent. If it was promoted up further, then the parent of L and its sibling would be empty, contradicting condition (4). 3.3 Priority Search Trees 27

4. Fill N recursively, as we have done during the contraction (by moving up the point of the “winner of the match” between its children). Now L’s sibling is definitely empty, and we can destroy both L and its sibling, leaving their parent node as a new leaf node. 5. Rebalance the tree if necessary. Again, the time complexity of steps 1, 3 and 4 is bounded asymptotically by the height h of the tree. In order to keep this complexity in O(log n), we must ensure that the tree will not degenerate but remains balanced.

3.3.4 Rebalancing priority search trees

Repeated insertions or deletions may cause a PST to grow out of balance, which is undesirable, as the height is the worst-case bound for any search operations, including range queries. Standard search trees can be kept in balance by a variety of balancing schemes, most of which use rotations as a basic mechanism of restructuring the tree. It is obvious that carrying out a rotation does not destroy conditions (3) and (4), and in a standard tree, the search tree condition (2) is not affected either. In a PST, however, we also have to maintain the min-heap condition (1), which may cause a problem. As can be seen in the example in Figure 3.5, the min-heap condition is violated after the right-rotation (this is always the case unless the priorities of the rotated nodes N and C are the same and N originates from its right subtree). In our example, where the rotated child C originates from its right subtree and its y-value is smaller than (or equal to) the y-value of its original sibling S, exchanging the points stored in nodes N and C will restore the heap property (Figure 3.5 right). In all other cases, however, a purely local change is not sufficient to restore the heap property. Consider the example illustrated in Figure 3.6, where the point in C has been replaced by (4, 6). Here exchanging the points stored in N and C does not restore the heap property. Furthermore, the point originally stored in C would end up in the wrong

N 8, 1 6, 3 C 8, 1 C 9 5 5

C 6, 3 10, 5 S 2, 9 8, 1 N 2, 9 6, 3 N 5 12 3 9 3 9

C 2, 9 7, 11 7, 11 10, 5 S 7, 11 10, 5 S 3 8 8 12 8 12 A A B B C B C

A

Figure 3.5: A right-rotation in a priority search tree usually destroys the heap property. In this case, it is restored by swapping the points in nodes N and C. 28

N 8, 1 4, 6 C 8, 1 C 9 5 5

C 4, 6 10, 5 2, 9 8, 1 N 2, 9 4, 6 N 5 12 3 9 3 9

C 2, 9 7, 11 7, 11 10, 5 7, 11 10, 5 S 3 8 8 12 8 12 A A B B C B C

A

Figure 3.6: In general, swapping the points stored in nodes C and N after the rotation does not restore the heap property and may also destroy the search tree condition. subtree of the root, also violating the search tree condition (Figure 3.6 right). In such a case, modifications along a complete chain down a subtree are necessary before and after the rotation. In general, a rotation can be conceptualized as follows: 1. Empty the child node C of the rotated node N by recursively “pushing down” the points in the subtree from which they originate. Note that due the contraction method described above (cf. Section 3.3.2) there will always be an empty node to catch the last point, so the overall structure of the tree does not change. Figure 3.7 (left) shows the tree after node C was emptied. 2. Rotate the nodes. Now the empty node C will be the root of the rotated subtree (see Figure 3.7 right). 3. Fill C by first “pulling up” the point in N and then recursively filling the empty nodes by the point with smaller y-value from either the left or right child. It is clear that by this procedure we make sure that the heap property is not destroyed. In fact, none of the conditions (1), (2) and (3) are ever violated during the rotation if carried out this way. However, it should also be obvious that the cost of such a rotation is proportional to the height of the rotated subtree since we may have to modify nodes along two chains from the rotated node down until the leaf level. Hence, in the worst case, i.e. when the root is rotated, the required time is proportional to the overall height of the tree, which is in O(log n) if the tree is balanced. This cost entails some more disadvantages: First, if the cost for each update is to remain in O(log n), balancing of PSTs is restricted to balancing schemes that require no more than a constant number of rotations per update. This is the case for red-black trees [43], which are used as the balancing mechanism for almost all implementations of balanced PSTs found in the literature (cf. [78, 29, 73]). Many other well-known balancing schemes with their logarithmic bound for the number of rotations, such as AVL-trees [3] or weight-balanced trees [1] would result in a O(log2 n) worst-case time for rebalancing a PST after an update. 3.3 Priority Search Trees 29

N 8, 1 C 9 5

C 10, 5 4, 6 8, 1 N 5 12 3 9

C 4, 6 7, 11 7, 11 10, 5 3 8 8 12 A B B C

A

Figure 3.7: If node C is emptied by a recursive “pushdown” before the rotation, the heap and search tree properties are maintained. After the rotation, node C is re-filled by recursively “pulling up” the point stored in node N.

Second, if rotations as the basic mechanism for rebalancing are not strictly local operations but involve changes to more than just the rotated nodes, it seems impossible to employ so-called relaxed balancing, i.e. to decouple balancing form updates, as proposed in [64]. Relaxed balancing can be an effective instrument in actual applications because costly rebalancing operations can be postponed to times when fewer other operations have to be handled by the structure. Third, if search queries and updates are to be carried out concurrently in a tree, parts of the structure must be locked during rotations. If rotations are purely local operations, only a constant number of nodes must be locked; if, however, complete subtrees are modified, as is the case in PSTs, considerable parts of the data structure have to be locked, which in turn negatively affects the efficiency of concurrent search operations, which will have to wait for nodes to be unlocked. As a final disadvantage, we also stress that the rotations are the reason why the “empty” nodes in a PST (see Figure 3.4c) cannot be simply omitted. If they were missing, step 1 of the above procedure would require an extension of the tree by a new node to catch the last point that is “pushed down”. This might unbalance the tree even more than it already is. In that case, the tree would not be more balanced after the rotation than it was before. Nevertheless, it should be remarked that it is indeed possible to maintain an updatable balanced PST storing n elements that requires only n nodes. This version is described in [78]. However, the node structure becomes considerably more complex, as additional information needs to be maintained: each node in such a PST stores two (x, y) pairs: a primary one satisfying the search tree condition and a secondary one satisfying the min- heap condition. The secondary point field can be empty. In addition, each node requires two extra Boolean fields to indicate whether the primary pair is duplicated in another node and whether the node actually contains a valid secondary pair. This node structure also makes search and update operations slightly more complicated. 30

3.3.5 Complexity of PSQ operations in PSTs

We have seen that in a balanced PST, all the dictionary operations can be carried out in O(log n) time. We turn to the priority queue operations: due to the min-heap condition, it is clear that an element with minimum priority is stored in the root and can thus be accessed in constant time. Hence, the time for accessmin is in O(1). Together with the O(log n) time for deletion, we obtain O(log n) as a bound for the extractmin operation, as extractmin can be considered as delete(accessmin). Decreasing the priority y of an element can also easily be achieved in time O(log n): decrease(x, ynew) can simply be implemented as delete(x) followed by insert(x, ynew). Note that a direct implementation may be more efficient regarding actual running time but will still require Θ(log n) steps in the worst case, as the element may have to be moved from the leaf level up until the root. Note that PSTs are not the most efficient implementation of priority queues, as they require logarithmic time for all operations except accessmin in the worst case Other implementations such as Fibonacci heaps support insert, decrease and merge in O(1) amortized time. Nonetheless, PSTs enjoy the same bounds as well-known implementations such as binomial queues [119]. Regarding range queries, McCreight and Mehlhorn provide detailed complexity analyses for each of these operations, showing that minXinRectangle, maxXinRectangle and minYinXRange can each be answered in time O(log n) in a balanced PST storing n points [78, 79]. The remaining range query, enumerateRectangle, is a so-called output- sensitive algorithm whose complexity is dependent on the size of the answer: it is bounded by O(log n + r), where r is the number of reported points in the query rectangle. While it would be beyond the scope of this chapter to revisit the analyses of these operations in detail, we will present the main ideas, as they are relevant for our own analyses of the range queries for priority search pennants in the next chapter. It is quite obvious that all points falling in a given query rectangle are stored in nodes that are either on the search paths to xleft and xright or in between those paths (cf. Figure 3.8). Let I be the set of those points that are between but not on the paths to xleft and xright. In Figure 3.8, the nodes in I are represented by the unfilled circles. For enumerateRectangle, determining the x-borders of the region containing all points in the rectangle takes O(log n) steps. For each inspected node on the path to xleft, we must also inspect the right child if it is in I (symmetrically, the left child is inspected for the nodes on the path to xright). However, if the point stored in that child is not inside the query rectangle, this must be because its y-value is above ytop, and due to the heap condition we do not have to inspect its children. Hence, we only have to inspect the children of those nodes in I that store points inside the query rectangle, i.e. which are part of the answer. Taken together, if there are r points to be reported by enumerateRectangle, we have to inspect at most a·log n + b·r nodes, with constant a and b. Thus, the overall time complexity of enumerateRectangle is in O(log n + r). 3.3 Priority Search Trees 31

x left xright

Figure 3.8: All points within the x-range bounded by xleft and xright can be found in nodes that are either on the search paths to xleft and xright or in between them.

The operation minYinXRange also needs to inspect only the nodes on the paths to xleft and xright plus at most one child (the one that is in I) of each of them. It is clear that either the topmost node on the left side containing a point in the x-range or that on the right side must store the desired point. If none of these nodes contains a point in the x-range, then there cannot be any one and minYinXRange does not exist. In any case, the operation can be accomplished in time O(log n). For minXinRectangle, it can be shown by a similar argument that no more than the nodes on the path to xleft, their right children in I, plus one additional path down from one of these nodes will have to be inspected in order to find the leftmost point in the query rectangle [79]. Hence, this operation can also be carried out in time O(log n). The same bound of course holds for maxXinRectangle, which can be treated symmetrically.

3.3.6 Related structures

A similar but different structure is the [120], also known as [9]. In a treap, only a pair (x, y) and no split key is stored in each node. The x-value of a node is also used as the split key for the search. This results in a completely fixed structure of such a tree that allows no restructuring. An example can be seen in Figure 3.9. The extra split key in a PST allows for more freedom, e.g. it is possible to rebalance the tree. Hence, a treap can be regarded as a special case of a PST where for each node N the key and split key are always identical. Conversely, PSTs can be seen as a generalized version of . Nevertheless, as we have outlined, PSTs come with some disadvantage, mainly due to the cost of rotations and the need for 2n-1 nodes to store n elements. The following chapter will discuss an alternative structure offering improvements for both of these issues.

32

8, 1

3, 2 9, 3

2, 5 6, 4

4, 6 7, 9

5, 8

Figure 3.9: A treap storing 8 points. The x-value is also used as the split key, hence no restructuring is possible.

Chapter 4

Priority Search Pennants

4.1 Relaxing the heap condition

It has been observed that for many applications which employ heap-ordered structures, it is sufficient to use relaxed versions of heaps. One example is the so-called weak-heap [33]. In a weak-heap, the heap condition is relaxed such that only one successor of an element needs to be dominated. In the binary tree representation of a min-heap, this means that the element in each node is smaller than or equal to all elements in one of its subtrees. For instance, each node in the weak-heap shown in Figure 4.1 dominates its left subtree. Since this condition alone does not guarantee that the root always contains a minimal element (which is desirable in order to support the accessmin operation in constant time), we also demand that the root node must have only one non-empty subtree, which is dominated. This type of binary tree with a “unary” root node (cf. Figure 4.1) is also referred to as pennant [108]. Despite the relaxed heap property, weak-heaps have several advantages compared to

2

6

8 4

11 7 9 5

Figure 4.1: Example of a weak heap. The dashed lines represent pointers to non-dominated children. The root only has one subtree, which is dominated. 34 standard binary heaps. In particular, they can be merged more easily. The well-known heap sort, for instance, can be made more efficient using this property [33, 34]. One possible perspective on the data structure discussed in this chapter is to regard it as “priority search tree with weak-heap condition”. While this description is not fully precise and needs some specification, it gives a good first idea of the structure that is discussed in detail in the following sections.

4.2 Structure of priority search pennants

Priority Search Pennants (PSP) were first described by Hinze as an example of a purely functional data structure [49]. Figure 4.2 shows an example of a priority search pennant containing the same 8 elements as the PST in Figure 3.2.

7, 1 8

1, 2 4

4, 5 5, 4 2 6

2, 4 3, 8 6, 9 8, 3 1 3 5 7

Figure 4.2: A priority search pennant storing 8 elements.

The node structure of a PSP is identical to that of a PST. Each node N stores a key- priority pair or point (xN, yN) and also a split key sN. Both structures have in common that they are binary search trees for the x-coordinates of the points and priority queues for the y-coordinates. The major difference is that priority search pennants are not strictly heap-ordered for the y-coordinates of the points stored in them. Instead, the nodes form a so-called semi-heap: the y-value in each node N is smaller than the y-values in all nodes of the subtree that (xN, yN) originates from, i.e. where the search path for xN ends (cf. Definition 3.1). In Figure 4.2, the semi-heap property is illustrated by using two different line styles for the links between a node N and its children. The solid line leads to the subtree from which the point in N originates.

Because of the search tree condition, it is clear that for each node N its point (xN, yN) originates from N’s left subtree if xN ≤ sN ; otherwise it originates from N’s right subtree. The invariants of a priority search pennant can be stated as follows (cf. [48]): (1) Search-tree condition: For each node, the keys (and split keys) in the left subtree are less than or equal to the split key and the keys in the right subtree are greater than the split key. 4.3 A construction method for priority search pennants 35

(2) Semi-heap condition: For each node L, the priority is less than or equal to the priorities of all the elements originating from the same subtree that the point in L originates from. We call this subtree the dominated subtree of L. (3) Split-key condition: All keys also occur as split keys (and vice versa). (4) Finite map (dictionary) condition: There are no two elements with the same key in the pennant.

4.3 A construction method for priority search pennants

Just like priority search trees, priority search pennants can be seen as a special variant of tournament trees. However, instead of winners, the losers of matches are stored in the respective nodes. The construction of a PSP from a set S of points can be conceptualized as follows: 1. Construct a tournament (winner) tree for S as described in Chapter 3.2.2 and illustrated in Figure 3.3. 2. Contract the tree as follows (cf. Figure 4.3): a. Create new node T and store the overall “winner” point of the tournament (which can be found in the root) there. We call T the top node. Set the split key of T to the maximum x-value of all points in S. (see Figure 4.3a). b. Starting from the root, visit the internal nodes (breadth-first or depth- first) and replace the point found in each node by the “loser” of the two points stored in its child nodes, i.e. the point with smaller y-coordinate. Leave the split keys as they are. This is shown in Figure 4.3 (b, c and d). c. Make the top node T the parent of the root such that the root is T’s left child and T’s right child is empty. d. Delete all leaf nodes. Step b changes the points stored in the nodes in such a way that each node now stores the loser (rather than the winner) of the match that is represented by that node. Hence, the tournament tree is transformed from a winner tree into a loser tree. Step c turns the binary tree into a pennant. Note that the search tree condition must also be satisfied for the top node T. Since the loser tree is the left subtree of T, this is achieved by setting T’s split key to the largest x-value (cf. step a). Step d contracts the tree by removing points storing duplicate nodes; since in a knockout tournament each player except the overall winner loses exactly one match, all the internal nodes, which represent the matches, must contain different points. Hence, each point stored in a leaf is also found in an internal node (except the winner point, which is stored in the top node). 36

7, 1 7, 1 8 8

7, 1 1, 2 4 4

1, 2 7, 1 1, 2 7, 1 2 6 2 6

1, 2 4, 5 5, 4 7, 1 1, 2 4, 5 5, 4 7, 1 1 3 5 7 1 3 5 7

1, 2 2, 4 3, 8 4, 5 5, 4 6, 9 7, 1 8, 3 1, 2 2, 4 3, 8 4, 5 5, 4 6, 9 7, 1 8, 3 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

(a) (b)

7, 1 7, 1 8 8

1, 2 1, 2 4 4

4, 5 5, 4 4, 5 5, 4 2 6 2 6

1, 2 4, 5 5, 4 7, 1 2, 4 3, 8 6, 9 8, 3 1 3 5 7 1 3 5 7

1, 2 2, 4 3, 8 4, 5 5, 4 6, 9 7, 1 8, 3 1, 2 2, 4 3, 8 4, 5 5, 4 6, 9 7, 1 8, 3 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 (c) (d) Figure 4.3: Construction of a PSP from a tournament tree. The overall winner is stored in a new node; for all internal nodes, the winner of the respective match is replaced by the loser. At the end, the top node becomes the new root and the leaf nodes can be deleted.

It is clear that the complete contraction, i.e. all of step 2, can be carried out in time O(n), since each node in the tree only needs to be visited once and constant time is required for each visit. Also note that other than for PSTs, no additional matches must be played. All the losers required in step b have already been determined by the matches during the construction of the tournament tree in step 1.

4.4 Structural properties

This section points out two rather obvious but noteworthy structural properties of PSPs which follow from the structure and are intuitively clear when having the tournament imagery in mind.

4.4.1 Canonical decomposition

For each PSP of size n > 1, there is a canonical decomposition in two PSPs Pleft and Pright by the following method (cf. Figure 4.4): 4.4 Structural properties 37

1, 1 8

7, 2 1, 1 7, 2 4 4 8

4, 5 5, 4 4, 5 5, 4 2 6 2 6

2, 4 3, 8 6, 9 8, 3 2, 4 3, 8 6, 9 8, 3 1 3 5 7 1 3 5 7

Figure 4.4: Canonical decomposition of a priority search pennant.

1. Separate the top node T from its child C.

2. Separate the right subtree Tright of C and make T the top node of Tright.

3. Compare the key xc and the split key sc of node C. If xC > sC, exchange the points stored in T and C.

It is easy to see that the two resulting pennants Pleft and Pright are legal priority search pennants, i.e. they satisfy all the conditions (1)-(4) listed in Section 4.2. Using the tournament analogy, Pleft and Pright represent the two sub-tournaments that are obtained when the final match of the tournament is ignored. It is also obvious that the canonical decomposition can be carried out in constant time, since only one comparison operation is required. The inverse procedure of this canonical decomposition is the merge operation for two PSPs Pleft and Pright, provided that any key stored in Pleft is strictly smaller than all keys in Pright. Again, in the tournament analogy, this merger represents the final match between the winners of Pleft and Pright. This operation can also be conducted in constant time. We remark that using two above operations as constructor and destructor, all the operations of priority search pennants can be elegantly defined in a purely functional fashion [48].

4.4.2 Subtree property

The canonical decomposition allows us to point out another structural property which will be used extensively in our later analysis. It states that all points originating from a proper subtree except one can be found in a node inside that subtree.

Corollary 4.1: For each non-empty proper subtree T in a priority search pennant P, there is exactly one point originating from T which is stored in a node outside T.

Proof (by structural induction): If P is empty or contains only one element, there is no proper subtree and hence the statement is trivially true. 38

If |P| = n ≥ 2, the statement is obviously true for the (only) subtree of the top node, since all nodes in P originate from that subtree. Every other proper subtree of P is a proper subtree of the top node’s child, and hence, a proper subtree of either Pleft or Pright, according to the canonical decomposition of P. Since |Pleft| < n and |Pright| < n, the corollary is valid for Pleft or Pright according to the inductive hypothesis. □

In the tournament analogy, each proper subtree contains the losers of a sub-tournament. Hence, it is always the winner (and only the winner) of a subtree that is stored in a node outside that subtree.

4.5 Update operations

Having described the construction of a PSP from a set of points in Section 4.3, we will now explain how to maintain a PSP under insertions and deletions. Note that using the canonical decomposition as a destructor and its inverse as a constructor, the insert and delete operations can be described very elegantly by short recursive definitions [48]. Here we provide operational descriptions of the insertion and deletion methods, since this better illustrates how these operations actually change the structure of the PSP and provides an intuitive understanding of the complexities. Like for PSTs, an easy (though slightly clumsy) conception is the following: first restore the full tournament tree, then update the underlying leaf-oriented search tree as usual, then replay the necessary parts of the tournament, balance the tree if required, and contract the tree to produce a PSP again. The following descriptions are more compact, as they omit the transformations at the beginning and at the end.

4.5.1 Insertion

In order to insert a point p = (xp, yp) we create a new node N with p as its content and xp as its split key (see Figure 4.5a). If xp is greater than the split key of the top node T, we exchange the split keys of N and T to ensure that the top node always contains the largest x-value as its split key.

We then follow the search path to xp (Figure 4,5b) until we encounter a node M whose y-value is larger than or equal to yp and which originates from the same subtree to where the search path of xp leads (in fact, this second condition is the main difference from the insertion method for PSTs). As shown in Figure 4.5c, we then exchange the points stored in M and N (but leave the split keys) and continue by the same method down the search path of xp. When we have arrived at the end of the search path, we insert N (with its current contents) as the left or right child, as appropriate (Figure 4.5d). It should be clear that the complete insertion can be done in one walk down the tree (excluding the initial search to ensure that xp was not contained in the tree before). 4.5 Update operations 39

N 8, 5 15, 1 15, 1 8 20 20

1, 3 N 8, 5 1, 3 9 8 9

9, 8 10, 2 9, 8 10, 2 4 14 4 14

3, 4 7, 13 14, 8 20, 6 3, 4 7, 13 14, 8 20, 6 1 7 13 16 1 7 13 16

4, 10 13, 5 16, 4 18, 9 4, 10 13, 5 16, 4 18, 9 3 10 15 18 3 10 15 18

(a) (b)

15, 1 15, 1 20 20

1, 3 1, 3 9 9

M 8, 5 9, 8 N 10, 2 8, 5 10, 2 4 8 14 4 14

3, 4 7, 13 14, 8 20, 6 3, 4 7, 13 14, 8 20, 6 1 7 13 16 1 7 13 16

4, 10 13, 5 16, 4 18, 9 4, 10 N 9, 8 13, 5 16, 4 18, 9 3 10 15 18 3 8 10 15 18 (c) (d)

Figure 4.5: Insertion of a new point (8, 5) in a priority search pennant.

Hence, if the tree is balanced (ignoring the top node), an insertion can be carried out in time O(log n).

4.5.1 Deletion

Deletion of a point p from a PSP conceptually works as follows: Find the node in the underlying leaf search tree, remove it, repair the split keys and replay those matches of the tournament that are necessary to restore the semi-heap property. An efficient operational description is illustrated in Figure 13.

We walk down the search path for xp until the end. On the path, we find the node N containing p, the node S that contains xp as its split key, and finally arrive at the last node E on the search path (Figure 4.6a). We remove E from the tree (replacing it by its only child node if it has one) and swap its split key with the one in S (this repairs the split keys), as is shown in Figure 4.6b. Then we take the point q stored in E and walk back the search path, starting at E’s parent (Figure 4.6c), up until N. If yq is greater than the y-value yr of the point r stored in the current node, we swap q with r (Figure 4.6c). Note that this may switch the 40

dominated subtree of the current node but will never violate the semi-heap condition for the subtrees below. Then, we continue by the same method (Figure 4.6d). When we arrive at N (see Figure 4.6e), we can exchange our point with the point p stored in N and finally destroy node E (Figure 4.6f). As we can see, deletion of a point requires one walk down the tree and another one up until N. Thus, the overall time complexity is bounded by O(log n) in a balanced PSP.

S 1, 2 S 1, 2 20 18

20, 3 N 20, 3 N 9 9

5, 10 10, 4 5, 10 10, 4 4 14 4 14

3, 7 9, 12 14, 8 15, 5 3, 7 9, 12 14, 8 15, 5 1 7 13 16 1 7 13 16

4, 15 7, 13 13, 6 16, 11 18, 9 E 4, 15 7, 13 13, 6 16, 11 18, 9 E 3 5 10 15 18 3 5 10 15 20 (a) (b)

1, 2 1, 2 18 18

20, 3 N 20, 3 N 9 9

5, 10 10, 4 5, 10 15, 5 10, 4 E 4 14 4 14 20

3, 7 9, 12 14, 8 18, 9 3, 7 9, 12 14, 8 18, 9 15, 5 E 1 7 13 16 20 1 7 13 16

4, 15 7, 13 13, 6 16, 11 4, 15 7, 13 13, 6 16, 11 3 5 10 15 3 5 10 15 (c) (d)

1, 2 1, 2 18 18

20, 3 N 10, 4 E 10, 4 9 20 9

5, 10 15, 5 5, 10 15, 5 4 14 4 14

3, 7 9, 12 14, 8 18, 9 3, 7 9, 12 14, 8 18, 9 1 7 13 16 1 7 13 16

4, 15 7, 13 13, 6 16, 11 4, 15 7, 13 13, 6 16, 11 3 5 10 15 3 5 10 15 (e) (f) Figure 4.6: Deletion of the point (20, 3) from the given priority search pennant. 4.6 Balancing priority search pennants 41

4.6 Balancing priority search pennants

One notable difference between PSTs and PSPs is that the latter allow for more freedom regarding balancing. Recall that in order to restore the heap condition in a PST after a rotation, O(log n) steps are needed in the worst case. This restricts balancing to those balancing schemes that require no more than a constant number of rotations during an update. As Hinze has shown, a rotation in a PSP requires only constant time, including the steps necessary to restore the semi-heap property [48]. This can be seen in Figure 4.7; each rotation only requires comparing of the values stored in two nodes. The condition for swapping the points of the involved nodes after the rotation is

x1 > s1 ∧ y2 ≤ y1.

In particular, this means that at most one additional “match” has to be played to restore the semi-heap property. Hence, the operation is strictly local, and no trickle-down processes in any subtrees are required. Apparently, the semi-heap property of PSPs is easier to maintain than the stronger heap property found in PSTs. This advantage relieves PSPs from several of the restrictions mentioned for PSTs in section 1.2.3. First, it is possible to choose from a much wider variety of balancing schemes and still remain within the O(log n) worst-case bound for updates. Examples that come to mind are weight-balanced trees, AVL-trees, or balancing by internal path reduction [41]. The latter two have the advantage of achieving better actual worst-case bounds for the height and internal path of the trees than, say, red-black trees, thus improving the actual performance of search operations. This can be critical for real-world applications, where the focus is not just on asymptotic but on the actual average behavior of the employed data structure. Improvement of the actual time also can also be expected for the update operations themselves. If the rotations involved in rebalancing take less time (constant instead of logarithmic), the actual time for updates will also be likely to be lower. Furthermore, with rotations being strictly local operations, it is possible to employ relaxed balancing by decoupling update from rebalancing operations, as proposed by Larsen et al. [64] and described by Hanke [45]. Also, if search operations are to be conducted concurrently with updates, this will be easier in PSPs than in PSTs, as for each rotation only a constant number of nodes needs to be locked for a constant time. Finally, it is easy to see that neither the empty leaf nodes (from the original leaf search tree) nor any other additional information are required for balancing. This is because all leaf nodes and no internal nodes will be emptied through the contraction, whereas we have seen that in PSTs at least one leaf node will always still carry a point and internal nodes may be emptied (cf. Figure 3.2). 42

x2, y2 x1, y1 s2 s1

x1, y1 x2, y2 s1 s2

t3 t1

t1 t2 t2 t3

x2, y2 x2, y2 s2 s1

x1, y1 x1, y1 s1 s2

t3 t1

t1 t2 t2 t3

x2, y2 x1, y1 s2 s1

x1, y1 x2, y2 s1 s2

t3 t1

t1 t2 t2 t3

x2, y2 x2, y2 x1, y1 s1 s2 s1

y1 > y2 y1 =≤yy22 x1, y1 x1, y1 x2, y2 s2 s1 s2

t1 t3 t1

t t 2 3 t1 t2 t2 t3

Figure 4.7: A right-rotation in a priority search pennant. There are four possible cases of subtree domination which have to be distinguished. Only in the last one (bottom), one additional “match” has to be played. This figure has been borrowed from [48].

4.7 Priority queue operations 43

4.7 Priority queue operations

We briefly discuss the complexities of the priority queue operations, accessmin, extractmin, and decrease, in order to show that they – just as the dictionary operations – enjoy the same asymptotic bounds for PSPs as they do for PSTs. Since the top node always contains a point with minimal priority, it is clear that the accessmin operation can be carried out in constant time. Extractmin, i.e. removing the element stored in the top node, is possible in O(log n) time, if we use the delete operation described in the last section. Decreasing the priority of a given element (x, y) can be achieved by removing the element with key x and inserting (x, ynew). Both operations require O(log n) time. In an actual implementation, the complexity is improved if the element is not deleted but its priority is set to the new value directly and then the necessary parts of the tournament are replayed. This, however, will not change the asymptotic bound. We can summarize our observations in the following Lemma.

Lemma 4.2: Priority search pennants support the standard dictionary and priority queue operations in the same asymptotic bounds as priority search trees. The time complexity of accessmin is bounded by O(1), and the time complexities of insert, lookup, delete, extractmin, and decrease are all bounded by O(log n) in the balanced case, where n is the number of elements stored in the tree.

4.8 South-grounded range queries

In his article introducing priority search pennants, Hinze remarks that this data structure does not support range queries as efficiently as priority search trees do [49]. However, an analysis is only provided for enumerateRectangle, while the other types of range queries are ignored. We will first recapitulate Hinze’s analysis of enumerateRectangle and point out the central argument there, before we present our own analyses for the remaining range queries, i.e. minXinRectangle, maxXinRectangle and minYinXRange. We will see that an appropriate visualization of the algorithms can provide important information for the analysis.

4.8.1 The operation enumerateRectangle

The function enumerateRectangle(xleft, xright, ytop) reports all stored points that fall into the rectangle ABCD, where A = (xleft, 0), B = (xleft, ytop), C = (xright, ytop), D = (xright, 0). We have seen in the last chapter that using a balanced priority search tree T, this function can be computed in O(log n + r) time, where n is the number of points stored in T and r is the size of the answer. The algorithm is output-sensitive, i.e. its complexity is 44 dependent on the number of nodes found in the query rectangle. Hinze has established the following bound for enumerateRectangle in priority search pennants [49].

Lemma 4.3: In a balanced priority search pennant, enumerateRectangle takes time Θ(r · (log n – log r + 1)).

We will revisit Hinze’s proof, giving some explanations and clarifications on details that have been omitted there. Assume that the number of points returned by the algorithm is r. The goal is to determine the worst-case number of nodes that must be inspected to find all r points.

Definition 4.4: Let P be a balanced PSP. Then succ(r) is the worst-case number of nodes that must be inspected during an enumerateRectangle query returning r points.

Let us assume for simplicity that all points in P are within the x-range of the query rectangle; hence, we only need to care about the y-coordinates in order to include points to the output list. (It is quite clear that the x-coordinates will not be responsible for any “bad” behavior, as the only important issue here is the heap vs. semi-heap difference between PST and PSP.) We first note the following important observation: in a semi-heap, whenever an inspected node N enters the output list (i.e. its y-value is below the upper rectangle bound), we will always have to inspect its dominated child D plus all the descendents of D on the ‘non-domination’ path below D (see the dashed path below D in Figure 4.8 left). The reason is that even if D turns out to be above the query rectangle, we will not be able to exclude any of the nodes on that path unless we inspect them because their y- values were never compared to D’s y-value or that of any other node on the path. (Of course N’s non-dominated child also has to be checked. However, note that since N’s parent must have been inspected before N, N’s non-dominated child will also have been inspected already because it is on the ‘non-domination’ path of N’s parent.) Hinze points out the natural correspondence between binary semi-heaps and multi-way heaps, which is also shown in Figure 4.8. The transformation is straightforward: starting from the root in a breadth-first manner, for each node all its non-dominated descendents become siblings of that node. If such a multi-way heap is used for enumerateRectangle, for each node entering the output list all its children will have to be inspected. If we assume, for simplicity, that the semi-heap is perfectly balanced and consisting of n = 2h nodes, the corresponding multi-way heap will always be a [119] consisting of a single binomial tree. In such a tree, there is one node (the root) with h children, 20 nodes with h – 1 children, 21 nodes with h – 2 children, …, 2h– 2 nodes with 1 children and 2h–1 nodes with 0 children. The total number n of nodes can also be expressed as the sum of all these children plus the root:

n = 1 + h + 20 · (h – 1) + 21 · (h – 2) + … + 2h-2 · 1 + 2h-1 · 0

We write out the right part of this equation as follows: 4.8 South-grounded range queries 45

7, 1 7, 1 8

1, 2 5, 4 8, 3 1, 2 D 4

4, 5 2, 4 6, 9 4, 5 5, 4 2 6

3, 8 2, 4 3, 8 6, 9 8, 3 1 3 5 7

Figure 4.8: A priority search pennant and the corresponding multi-way heap. Non- dominated descendants of a node D become D’s siblings.

n = 1 + h + (h – 1) + (h – 2) + (h – 2) + (h – 3) + (h – 3) + (h – 3) + (h – 3) + … + (h – (h – 1)) + … + (h – (h – 1) + (h – h) + … + (h – h)

Let us remark that, written this way, the order of the summands reflects the possible worst-case order of an inspection of nodes as outlined before: first, we look at the root: if it does not enter the output list, we are done and can ignore all other summands. If it enters the output list, we will have to inspect all its h children. If one child enters the output list (in the worst case, this will be the child with the maximum number of children), we will have to inspect all its h-1 children. If any of its children or a second child of the root enters the output list, we will have to inspect (up to) another h-2 nodes. Hence, if r nodes end up in the output list, the worst-case number succ(r) of nodes required to inspect is the sum of the first r+1 elements of the above sum. We now rewrite some of the constants in the summands as binary logarithms:

n = 1 + h + (h – 1 – lg 1) + (h – 1 – lg 2) + (h – 1 – lg 2) + (h – 1 – lg 4) + (h – 1 – lg 4) + (h – 1 – lg 4) + (h – 1 – lg 4) + ... + (h – 1 – lg n/2) + ... + (h – 1 – lg n/2)

Since ⎣lg 2k⎦ = ⎣lg (2k + 1)⎦ = ⎣lg (2k + 2)⎦ = … = ⎣lg (2k+1 – 1)⎦ = k, the following replacement does not change the equation 46

n = 1 + h + (h – 1 – ⎣lg 1⎦) + (h – 1 – ⎣lg 2⎦) + (h – 1 – ⎣lg 3⎦) + (h – 1 – ⎣lg 4⎦) + (h – 1 – ⎣lg 5⎦) + (h – 1 – ⎣lg 6⎦) + (h – 1 – ⎣lg 7⎦) + ... + (h – 1 – ⎣lg n/2⎦) + ... + (h – 1 – ⎣lg (n – 1)⎦) which can be nicely summarized as

n−1 n =1+ h + ∑(h −1− ⎣⎦lg k ) k=1 This sum still lists the n+1 summands in the original order given above. Thus, the maximum number of successors for r nodes, succ(r), is obtained by adding up the first r+1 summands.

r−1 succ(r) =1+ h + ∑(h −1− ⎣⎦lg k ) k=1 In order to determine the asymptotic growth of succ(r), Hinze uses the following identity:

m Corollary 4.5: ∑ ⎣⎦lg k = (m +1) ⋅ ⎣lg(m +1) ⎦− 2⎣⎦lg(m+1) +1 + 2 k=1

This identity can easily be proven by induction (see Appendix A.1). By inserting it in the above equation, we obtain

r −1 succ(r) =1+ h + ∑(h −1− ⎣⎦lg k ) k =1 r −1 =1+ h + (r −1) ⋅ (h −1) − ∑ ⎣⎦lg k k =1 =1+ h + r ⋅ h − h − r +1− (r ⋅ ⎣⎦lg r − 2⎣⎦lgr +1 + 2)

= r ⋅ h − r + 2 − r ⋅ ⎣⎦lg r + 2⎣⎦lgr +1 − 2 = r ⋅ lg n − r − r ⋅ ⎣⎦lg r + 2⎣⎦lgr +1 = r ⋅ (lgn − lg r) − r + 2⎣⎦lgr +1 = r ⋅ (lgn − lg r) + Θ(r) = Θ(r ⋅ (logn − log r +1)) □

Clearly, this bound is worse than the Θ(log n + r) running time for PSTs. If we consider, for instance, cases where r ≈ log n, we get worst-case complexities of Θ(log n) for PSTs but Θ(log2 n) for PSPs. 4.8 South-grounded range queries 47

The reason for this difference is quite obvious: The semi-heap property of PSPs is weaker than the strict heap order in PSTs. The intuitive argument is that only one subtree (instead of both) can be pruned whenever a node with large enough priority is reached. The other, non-dominated subtree still has to be examined, increasing the number of nodes to be inspected by the size of that subtree (in the worst case). In his discussion of PSPs, Hinze does not consider the other types of range queries, minXinRectangle, maxXinRectangle, and minYinXRange. For the complexities of those queries, the same disadvantage for PSPs might be expected, since those queries must also make extensive use of the (semi-)heap property of the respective data structure in order to prune subtrees during the search. Hence, it is clear that whenever a node is reached whose priority is greater than the given upper bound, we cannot completely prune the search path there but still have to inspect the non-dominated subtree, as it may contain valid nodes. For this reason, the approach used by Mehlhorn in his analysis of minXinRectangle for PST (cf. Chapter 3.3.5) does not work for PSP. On the other hand, the approach of converting the PSP into a multi-way heap (as Hinze does for enumerateRectangle) does not help for minXinRectangle at all, as this transformation ignores the search tree condition, which is, of course, necessary for finding the node with smallest x-value in the rectangle. Remember that since we are looking for only one element in the query rectangle (and not all), we will sometimes also be able to prune subtrees because of the x-values of nodes. For instance, when we have already found a possible candidate node C for minXinRectangle, we can exclude all those nodes with x-values greater than xC. Hence, the decisive point is whether the “weaker” semi-heap property of PSPs is still strong enough to keep the complexity of minXinRectangle in O(log n). We will use this question to argue that the use of visualizations can be of great help for getting an initial idea in such a situation, which then leads to a formal proof.

4.8.2 Interactive visualizations as an aid for data structure analysis

Visualizations are one possible method to get deeper insight to the complexities of data structures. In particular, they can help to construct extreme instances of a structure triggering worst-case behavior, which are often the most interesting examples for analysis. Indeed, what we have tried was to construct PSPs where a maximum number of nodes need to be inspected for given minXinRectangle queries. Our interactive visualization allows the user to freely insert and delete points in the structure. At the same time, it maintains all the invariants of the respective structures, i.e. no illegal structures can be built. In addition, range queries may be launched, and all visited nodes will be highlighted in the visualization such that the search path can easily be analyzed. Examples can be seen in Figures 4.9 and 4.10. One obvious way to enforce a large number of inspected nodes is to maximize the lengths of individual paths followed (i.e. to achieve a great depth of the search path). Another one is to increase the breadth by maximizing number of forks on the search path. A fork is a node both of whose children need to be visited. Figure 4.9 shows an 48

Figure 4.9: Interactive visualization of a priority search tree. The highlighted nodes show the search path of a minXinRectangle query; the pink node is the result of the query. example of a PST with three forks on the path for minXinRectangle. All the highlighted nodes must be inspected in order to find the correct result. One central part of Mehlhorn’s proof of the logarithmic complexity of minXinRectangle for PST was that only a constant number of individual paths from the root down the tree can be of logarithmic length [79]; all side paths branching off any node on these two paths have constant length (in fact, length 1). This can nicely been seen in Figure 4.9. However, it is exactly this part of Mehlhorn’s proof [79] that cannot simply be transferred to the analysis of minXinRectangle for PSP. The reason is that the semi-heap property does not allow us to prune both subtrees below such a node but only the dominated one. The non-dominated one may contain the desired node somewhere just on the bottommost level. Interestingly, rather than being more complex, in our worst-case examples the search paths for PSPs usually looked simpler than those for PSTs. In particular, we were not able to produce any example with more than a single fork on the search path (see Figure 4.10). This surprising finding certainly runs counter to the intuitive assumption that the search in PSP should be harder than in PST. However, this remarkable fact forms the basis of our proofs in the next sections showing that the time complexities of minXinRectangle, maxXinRectangle, and minYinXRange in a PSP are actually all in O(log n). 4.8 South-grounded range queries 49

Figure 4.10: Visualization of the search path for minXinRectangle in a priority search pennant. All visited nodes are highlighted; the pink node is the result of the query.

It should be noted that the proofs themselves are of course completely formal and independent of any visualization. However, we would also like to stress that without the visualization, we would probably not have arrived at the important point of what to prove, nor would it have been clear how to prove it. In the following section, we provide a detailed analysis of these three queries proving the perhaps surprising result that in these cases semi-heaps are not “weaker” than heaps regarding the asymptotic complexities.

4.8.3 The operation minXinRectangle

A query minXinRectangle(xleft, xright, ytop) detects the unique leftmost point in the query rectangle, More precisely, it returns the element N with minimal key xN such that xleft ≤ xN < xright and yN ≤ ytop, or null if there is no such element, i.e. if the query rectangle is empty.

The search algorithm The procedure for finding the leftmost point can be implemented recursively as shown in Algorithm 4.1. 50

Algorithm 4.1: minXinRectangle for PSP Input: 3-sided query rectangle given by xleft, xright, ytop Output: the leftmost point in the query rectangle (or null, if no such point exists) 1. minXinRectangle(xleft, xright, ytop) 2. Min = null 3. if (!isEmpty) 4. inspect(root) 5. return Min

6. inspect(N)

7. if yN ≤ ytop and xN ≥ xleft and (xN ≤ min{xright, xMin}) 8. Min = N

9. if N.left ≠ null and sN ≥ xleft and (yN ≤ ytop or xN > sN) 10. inspect(N.left)

11. if N.right ≠ null and sN < xMin and (yN ≤ ytop or xN ≤ sN) 12. inspect(N.right)

We use a global pointer Min always pointing to the best valid node that has been found so far. A valid node is one storing a valid point, i.e. a point inside the search rectangle. If the priority search pennant is not empty, the recursive method query is called with the root as parameter. At each visited node N, we first examine whether the node stores a point inside the search rectangle and whether it is further to the left than the best current result (line 7). In line 9 we check whether or not the search must be continued in the left subtree. The left child does not have to be inspected if any one of the following conditions is met: (a) N does not have a left child.

(b) sN < xleft (i.e. the left subtree only stores points that are left of the search rectangle)

(c) yN > ytop and xN ≤ sN (i.e. the left subtree is dominated by N and only contains points that are above the search rectangle) If the left subtree cannot be pruned, it is inspected recursively by the same method (line 10). Then we check if the right subtree of N needs to be inspected (line 11). This is not the case if any one of the following conditions is met: (d) N does not have a right child.

(e) sN ≥ xMin (i.e. we have already found a valid node whose point is further left than all points stored in the right subtree)

(f) yN > ytop and xN > sN (i.e. the right subtree is dominated by N and only contains points that are above the search rectangle) If the right subtree cannot be pruned, it is inspected recursively by the same method (line 12). After the traversal is completed, Min is returned (line 5), containing the desired result if the search was successful or null otherwise. 4.8 South-grounded range queries 51

The correctness of the algorithm is obvious; it is essentially a depth-first search in preorder, where all those subtrees are pruned which cannot store a better valid point than the one found so far.

Complexity analysis In order to identify the complexity of the minXinRectangle search, we will determine the maximum number of forks on the search path, i.e. nodes both of whose children need to be visited. For this purpose, we need to introduce the notion of a potential fork.

Definition 4.6: An internal node N inspected during minXinRectangle(xleft, xright, ytop) is called a potential fork if N satisfies none of the above conditions (a)-(f) at the time when it is first visited. N is called an (actual) fork for minXinRectangle(xleft, xright, ytop) if both of its children are inspected during the search.

It is obvious that every actual fork is a potential fork. (This follows directly from the definition, together with the fact that xMin can only become smaller during the search.) We distinguish between potential forks and actual forks because, at the time of the first inspection of a node, it is not necessarily clear whether its right subtree actually needs to be visited. It may happen that during the search in the left subtree (which is expanded first) we find out that the right subtree can be ignored for the further search. This is summarized in the following corollary:

Corollary 4.7: If during minXinRectangle(xleft, xright, ytop) in the left subtree of a potential fork N a node L is found such that yL ≤ ytop and xleft ≤ xL ≤ xMin, then N is not an actual fork.

Proof: Since L satisfies the search criteria, we set Min = L. Thus, xMin = xL ≤ sN, and N now satisfies condition (e), i.e. its right subtree will not be inspected. □

We also point out the following necessary condition for potential forks (and hence also for actual forks):

Corollary 4.8: Each potential fork N for minXinRectangle(xleft, xright, ytop) satisfies the following inequalities at the time of its first visit: xleft ≤ sN < xMin and yN ≤ ytop.

Proof: The first inequality follows directly from conditions (b) und (e): If sN < xleft, then only N’s right subtree needs to be inspected (due to the search-tree condition). If sN ≥ xMin, then N’s right subtree can be excluded as it contains only keys greater than xMin. If yN > ytop, then the dominated subtree can be pruned because it only contains points above the query rectangle. □ 52

Now we can make statements about the distribution of potential and actual forks.

Lemma 4.9: Let N be a fork for minXinRectangle(xleft, xright, ytop). Then there are no potential (or actual) forks in the right subtree of N.

Proof: Let R be any node in N’s right subtree and assume R is a potential fork. Because of Corollary 4.8 we know that xleft ≤ sR < xMin. (Note that xMin may have changed between the visits of N and R. However, since it could have only become smaller, the inequality sR < xMin also holds for all previous values of xMin.) For R’s key xR we now distinguish the following cases: st 1 case: xR < xleft Because of the search-tree condition, xR > sN and (since N is a fork) sN > xleft, thus xR > xleft. This is a contradiction. nd 2 case: xleft ≤ xR ≤ sR Since sR < xMin, R satisfies the search conditions xleft ≤ xR < xMin and yR ≤ ytop when it is first visited and we set Min = R. However, since then sR ≥ xR = xMin, according to corollary 4.7, R cannot be a potential fork. This is a contradiction. rd 3 case: xR > sR In this case, R dominates its right subtree. We consider the winner W of R’s left subtree: it must be further up than R on the search path, because R was the loser of the match against W (cf. Corollary 4.1, see Figure 4.11 left).

Since W originates from the N’s right subtree, xW > sN > xleft and (since W originates from R’s left subtree) xW ≤ sR < xMin. Since also yW ≤ yR ≤ ytop, W satisfies the search condition and we set Min = W. However, according to Corollary 4.7, R can then not be a potential fork because sR ≥ xW = xMin. This is a contradiction. □

Lemma 4.10: Let N be a fork for minXinRectangle(xleft, xright, ytop). Then there are no potential (or actual) forks in the left subtree of N.

Proof: Let L be any node in left subtree of N and assume L is a potential fork. We know that xMin cannot have changed between the visits of N and L, because otherwise N would not be a fork according to corollary 4.7.

Because of corollary 4.8, we know that xleft ≤ sN < xMin and xleft ≤ sL < xMin (and yL ≤ ytop). Because of the search tree condition, xL ≤ sN and thus xL < xMin. Now if, in addition, xL ≥ xleft, we set Min = L and according to Corollary 4.7, N is not a fork. This is a contradiction! It remains to consider the case xL < xleft (see Figure 4.11 right). In this case, xL < sL, so xL originates from L’s left subtree. We consider the winner W of L’s right subtree, which is stored outside that subtree, according to Corollary 4.1. W must be further up than L on the search path (since it has “won the match” against L), and when W is first visited, xW > sL > xleft and xW ≤ sN < xMin. Thus, we set Min = W and according to Corollary 4.7, N is not a fork. This is a contradiction! □ 4.8 South-grounded range queries 53

x x N N N N s sN N

x xW W W W

x xR L L R s sR L

s s xL L xW N sN xW sR xR xleft xMin xleft xMin Figure 4.11: The winning points of the subtrees rooted in R (left side) and L (right side) are found in a node W outside that subtree.

Lemmas 4.9 and 4.10 tell us that neither the left nor the right subtree of a fork contains any other potential or actual fork. Taken together, this means that after the first fork on the search path, there cannot be another one. We summarize this result in the following theorem:

Theorem 4.11: There is at most one fork on the search path of a minXinRectangle(xleft, xright, ytop) query carried out on a priority search pennant.

Proof: The theorem follows directly from Lemma 4.9 and Lemma 4.10. □

Now the bound for the complexity of minXinRectangle is obvious:

Theorem 4.12: The complexity of minXinRectangle(xleft, xright, ytop) query on a priority search pennant P of height h is bounded by O(h). More precisely, at most 2·h nodes of P are visited. If P is a balanced priority search pennant containing n elements, the complexity of minXinRectangle bounded by O(log n).

Proof: The height h is defined as the longest distance from a leaf to the root, i.e. one less than the number of levels in the tree. Hence, the longest path from the root to a leaf contains h+1 nodes. Since the first possible fork on the search path is the child of the top node, each of the two sub-paths below it consists of at most h-1 nodes. Thus, together with the top node and its child, at most 2· (h-1) + 2 = 2·h nodes are visited. In a balanced tree, h = O(log n). □

A more efficient implementation As a consequence of Theorem 4.12, it very easy and straightforward to implement the minXinRectangle search without recursion. While it is clear that any recursive tree- traversal algorithm can also be implemented iteratively (and it is well known that in 54 many implementations this comes with better actual running times), this usually requires the use of either a stack or parent pointers of nodes to track back the search path. This is not necessary here. Since we know that in PSPs there is at most one fork on the search path of a minXinRectangle query, all that is required is one additional pointer to the last visited potential fork. This node, at any time, is the only possible candidate for an actual fork, as can be seen by the following corollary.

Corollary 4.13: During a minXinRectangle(xleft, xright, ytop) search on a PSP (when carried out by Algorithm 4.1), whenever a potential fork N is visited for the first time, there cannot be any actual fork among the previously visited nodes.

Proof: Let N be a potential fork just encountered and assume there is an actual fork F that has been visited before N. Since F is the only fork according to Theorem 4.12, N must be either in F’s left or right subtree. If N is in the left subtree, this contradicts Lemma 4.10; if it is in the right subtree, we have a contradiction with Lemma 4.9. □

Hence, we can emulate the recursive algorithm iteratively by simply following the search path to the left whenever possible and keeping track of the last visited potential fork. If we arrive at the end of the path, we continue the search at the right child of the only fork. A non-recursive implementation can be seen in Algorithm 4.2:

Algorithm 4.2: minXinRectangle for PSP (iterative) Input: 3-sided query rectangle given by xleft, xright, ytop Output: the leftmost point in the query rectangle (or null, if no such point exists) 1. minXinRectangle (xleft, xright, ytop) 2. Min = null 3. N = root 4. fork = null 5. while (N ≠ null or fork ≠ null)

6. if (yN ≤ ytop and xN ≥ xleft and xN ≤ min{xright, xMin}) 7. Min = N 8. fork = null

9. if (sN ≥ xleft and (yN ≤ ytop or xN > sN))

10. if (sN < xMin and (yN ≤ ytop or xN ≤ sN)) 11. fork = N 12. N = N.left

13. else if (sN < xMin and (yN ≤ ytop or xN ≤ sN)) 14. N = N.right 15. else if (fork ≠ null) 16. N = fork.right 17. fork = null 18. else 19. N = null 20. return Min

4.8 South-grounded range queries 55

We define a pointer called fork to keep track of the potential candidate for the fork (line 4). Starting from the root, for each visited node N we first check whether it is the current best valid node (line 6). If yes, we set Min = N and, in addition, we set fork = null (line 8). We will later see why this is correct. Lines 9 and 10 check whether N is a potential fork. If yes, we set fork = N. We then proceed to N’s left child (line 12) if required. If not, we check whether we have to go to the right (line 13). If neither child needs to be inspected and fork ≠ null, we continue the search with the right child of fork (line 16). In addition, we set fork = null (line 17) because there will not be any other fork any more. If there was no fork to go on with, i.e. if fork = null (line 18), we are done and can exit the while-loop by setting N = null. It remains to show why line 8 is correct, i.e. why we should set fork = null whenever Min is updated. Suppose fork points to a potential fork F when we encounter N. Then F must be somewhere above N on the search path. If N is in the left subtree of F, Corollary 4.7 tells us that F cannot be a fork, hence we should set fork = null. If N is in the right subtree of F, then F is an actual fork (because we must have visited its left subtree before and are now in the right subtree). In this case, fork has already been set to null (line 17) and hence we do not change it by setting fork = null. The above iterative method visits the exact same nodes in the exact same order as Algorithm 2.1 does. However, the search takes only O(1) additional space as opposed to O(h) in the recursive implementation, since no memory is taken by the stack for recursive calls. We can further improve the algorithm by removing some redundancy from Algorithm

4.2; some of the conditions there are checked twice for the same node (e.g. yN ≤ ytop in lines 6, 9, 10), and the fork pointer can be completely ignored after it has been used to get to the right branch (for a code listing, see Appendix A.2). The running time of the algorithm is reduced drastically when compared to the recursive variant (unfortunately, the same holds for its readability). In our own Java implementation, the performance improved by more than 60%. Details on this benchmark test, which also compared the running times of the recursive and iterative algorithms for PSPs with the same operations in PSTs, are reported in Section 4.9.

4.8.4 The operation maxXinRectangle

The algorithm for maxXinRectangle(xleft, xright, ytop) is almost completely symmetric to that of minXinRectangle(xleft, xright, ytop). 56

Algorithm 4.3: maxXinRectangle for PSP Input: 3-sided query rectangle given by xleft, xright, ytop Output: the rightmost point in the query rectangle (or null, if no such point exists) 1. maxXinRectangle(xleft, xright, ytop) 2. Max = null 3. if (!isEmpty) 4. inspect(root) 5. return Max

6. inspect(N)

7. if yN ≤ ytop and xN ≤ xright and (xN ≥ max{xleft, xMax}) 8. Max = N

9. if N.right ≠ null and sN < xright and (yN ≤ ytop or xN ≤ sN) 10. inspect(N.right)

11. if N.left ≠ null and sN > xMax and (yN ≤ ytop or xN > sN) 12. inspect(N.left)

In this case, the traversal always inspects the right before the left subtree, since it is the rightmost point of the query rectangle that is to be detected. Also note that because of the asymmetric split key condition (the keys in the left subtree may be equal to the split key, while those in the right subtree must be strictly greater), we need to check whether sN < xright (rather than sN ≤ xright) in line 9. This, of course, does not change anything regarding the complexity of the algorithm, which is exactly the same as that of minXinRectangle.

Theorem 4.14: The complexity of maxXinRectangle(xleft, xright, ytop) query on a priority search pennant P of height h is bounded by O(h). More precisely, at most 2·h nodes of P are visited. If P is a balanced priority search pennant containing n elements, the complexity of maxXinRectangle bounded by O(log n).

The proof is completely analogous to that of Theorem 4.12. Likewise, the actual running time of the algorithm can also be improved by an iterative implementation, just like it was done for minXinRectangle.

4.8.5 The operation minYinXRange

It remains to analyze the time complexity of the last range query, minYinXRange. By an argument fairly similar to that in the case of minXinRectangle, we will show that minYinXRange(xleft, xright) is also bounded linearly in the height of the tree.

Implementation The search strategy can be implemented as follows: 4.8 South-grounded range queries 57

Algorithm 4.4: minYinXRange for PSP Input: the left and right borders xleft, xright of an x-range Output: a bottommost point in the x-range (or null, if no such point exists) 1. minYinXRange(xleft, xright) 2. Min = null 3. if (!isEmpty) 4. inspect(root) 5. return Min 6. 7. inspect(N)

8. if xright ≥ xN ≥ xleft and yN < yMin 9. Min = N

10. if N.left ≠ null and sN ≥ xleft and (yN < yMin or xN > sN) 11. inspect(N.left) 12. if N.right ≠ null and sN < xright and (yN < yMin or xN ≤ sN) 13. inspect(N.right)

The procedure also uses a pointer Min to keep track of the best current result. We assume that yMin = ∞ while Min = null. Again, we use a depth-first search in preorder (and show later how it might be improved). However, the criteria for pruning subtrees are different. For each inspected node N, we first check whether it is in the given x-range and whether its y-value is smaller than that of the current minimum (line 8). If so, we set Min = N. In line 10, we check whether the left subtree needs to be inspected. We can prune it if all points there are outside the x-range (i.e. sN < xleft) or have a y-value greater than or equal to the current minimum. The latter is the case if yN ≥ yMin and xN ≤ sN (i.e. N dominates its left subtree). If the left subtree cannot be pruned, we inspect it recursively by the same method. Then we check whether the right subtree needs to be searched (line 12). It can be pruned if all points are outside the x-range (i.e. sN ≥ xright) or have a y-value greater than (or equal to) the current minimum. The latter is the case if yN ≥ yMin and xN > sN (i.e. N dominates its right subtree). If the right subtree cannot be pruned, we inspect it recursively by the same method.

Analysis For minYinXRange, too, it is not obvious how many branches down the tree we will have to walk, as the node we search for may be any node storing a point within the x-range and, other than in a PST, we cannot prune both subtrees when the y-value in the inspected node is too large. Intuitively, for each node found storing a point inside the x-range, we may, in the worst case, still have to search the complete non-dominated subtree (i.e. about 50% of its descendants). Fortunately, our analysis will show that here, again, our search path will never have more than one fork. We define a fork in the same way as we have done for minXinRectangle: 58

Definition 4.15: An internal node N is called a fork for minYinXRange(xleft, xright) if both of its children are visited during the search.

Now we prove that a fork cannot have another fork as its descendant. First, we point out some properties of forks:

Corollary 4.16: Let N be a fork for minYinXRange(xleft, xright). Then N satisfies all of the following conditions:

(a) xleft ≤ sN < xright (i.e. sN is inside the x-range)

(b) yN < yMin

(c) xN < xleft or xN > xright (i.e. xN is outside the x-range)

Proof: (a) and (b) follow directly from lines 10 and 12 in Algorithm 4.4.

To prove (c), assume the opposite: xleft ≤ xN ≤ xright, i.e. the point stored in N lies inside the x-range. Then, together with (b), the condition in line 8 would be true and we would have set Min = N in line 9. However, this implies yN = yMin, which contradicts (b).

We can now establish the same upper bound for the number of forks as we have done for minXinRectangle.

Theorem 4.17: Let N be a fork for minYinXRange(xleft, xright). Then there is no other fork on the search path below N.

Proof: We follow the same strategy as we did for minXinRectangle. Again, we distinguish between the left and the right subtree of N and show that neither one may contain a fork. (1) Let L be any node in the left subtree of N and assume L is a fork. Then, because of

Corollary 4.16, xleft ≤ sL < sN < xright and either xL < xleft or xL > xright. Since xL < sN, xL cannot be greater than xright and thus, xL < xleft. Hence, xL < xleft ≤ sL, i.e. L dominates its left subtree. Let us consider the winner W of L’s right subtree (see Figure 4.11 right). W must be higher up on the search path than L

(because it has “won a match” against L, cf. Corollary 4.1). Thus, yW ≤ yL < yMin and, furthermore, xleft < xW < xright (because sL < xW ≤ sN). This means that when W was inspected, it must have met the search condition (line 8), and therefore Min was set to W. However, since W dominates the whole subtree that it originates from (cf. the semi-heap condition), that subtree – containing L – is pruned (line 10 or 12). Hence, L is never visited during the search and therefore cannot be a fork for minYinXRange(xleft, xright), which contradicts our basic assumption. (2) The proof for the right subtree is completely symmetric. □

Hence, on the search path of minYinXRange queries, there cannot be more than one fork. This yields the following bound for the time complexity: 4.9 Comparison with priority search trees 59

Theorem 4.18: A range query of the type minYinXRange(xleft, xright) on a priority search pennant P of height h is bounded by O(h). More precisely, at most 2·h nodes of P are visited. If P is a balanced priority search pennant containing n elements, then the complexity of minYinXRange is bounded by O(log n).

Proof: Given Theorem 4.17, the proof is completely analogous to that of Theorem 4.12.

Possible improvements to the algorithm As is the case for minXinRectangle and maxXinRectangle, this algorithm can also be implemented non-recursively without the need for a stack or parent pointers, which will reduce the actual running time significantly. Furthermore, the above algorithm always inspects the left before the right subtree of a fork F. It would, however, be more efficient to inspect the non-dominated before the dominated subtree. For, if a better node (i.e. a smaller y-value) is found in the non- dominated subtree and Min is updated, we might find that the dominated subtree of F does not have to be inspected any more due to the semi-heap condition. This is the case when the new yMin ≤ yF. On the other hand, if the dominated subtree of F was inspected fist, we always still have to visit the non-dominated subtree too, as the semi-heap condition says nothing about the y-values there. The modified algorithm is listed in Appendix A.2. It is clear that the asymptotic bound and the actual worst case bound will not be affected by the above changes; however, the average number of visited nodes can be expected to decrease. On the other hand, determining which child is the dominated one of course requires additional comparison operations or additional pointers, which, in turn, might increase running time or space. Additional benchmarks will be required to determine the actual gain.

4.9 Comparison with priority search trees

4.9.1 Worst-case complexities

Both PSPs and PSTs, in their balanced versions, support the standard search tree and priority queue operations in O(log n) time. As we have shown, PSPs answer all the typical range queries – with the exception of enumerateRectangle – in O(log n), i.e. the same asymptotic time as PSTs. This result is remarkable insofar as it suggests that for those south-grounded queries which do not report a set of points but only search for a single point in the query rectangle with a minimal or maximal coordinate, the semi-heap property is just as “strong” as the heap property. Hence, for problem domains where enumerateRectangle is not required, PSPs can be a very attractive alternative to PSTs. In Chapter 5, we will discuss an application example where the replacement of PSTs by PSPs improves an existing solution for dynamic IP router tables. 60

The shortcoming of PSPs with regard to enumerateRectangle is redeemed by several advantages. As we have explained above, PSPs offer more flexibility and efficiency regarding balancing (cf. Section 4.6). While a single rotation (the basic building block of a large class of balancing schemes) takes O(log n) time for PSTs, this can be accomplished in constant time for PSPs, since the semi-heap condition is “weak” enough to be restored by a strictly local operation. The same is true for merging two PSPs P1 and P2 of approximately equal size, where each key of P1 is strictly smaller than all keys in P2: this operation only takes O(1) time, whereas the bound for merging two PSTs is logarithmic. This difference is relevant in the context of an alternative construction algorithm of (static) PSTs and PSPs, which is discussed in Section 4.10. It should be noted that while the analyses for range queries in PSTs described in the literature (e.g. [78, 79]) only yield rough estimates for the actual upper bounds of steps (≤ 4·h visited nodes), our above analysis of PSPs provides the actual lowest upper bounds for the number of visited nodes during minXinRectangle, maxXinRectangle, and minYinXRange in the worst case, i.e. 2·h nodes. It is easy to construct instances that actually require the inspection of this number of nodes. An example is provided in Appendix A.3. Interestingly, the single-fork property of Theorem 4.11 cannot be proven for priority search trees. This is because the proofs for Lemmas 4.9 und 4.10 both make use of the subtree property (Corollary 4.1), which does not hold for priority search trees: the PST in Figure 3.2 is a counter-example, as both the points (7, 1) and (8, 3) are stored outside their original subtree. In fact, Theorem 4.11 does not hold for priority search trees; it is easy to construct cases where the search path of minXinRectangle contains more than one fork (for an example, see Figure 4.9). Nevertheless, one might still expect that the actual worst-case number of inspected nodes during minXinRectangle is lower for PSTs than for PSPs due to the heap- condition. Quite surprisingly, however, this bound is higher for PSTs, which can simply be shown by an example (see Appendix A.3).

4.9.2 Average-case behavior

The theoretical bounds derived above apply to worst-case situations which are not likely to occur too frequently in practice. In actual applications, it is usually the average case that is of main interest. However, a formal analysis of, say, the average search path length for minXinRectangle or other range queries in a PST or PSP, or the average number of node manipulations during an update operation seems beyond reach. In fact, the average case analyses of the update operations in many well-known classes of balanced search trees (e.g. AVL-trees, BB[α]-trees) are still open problems [41]. Furthermore, the question of what is meant by the average case depends very much on the specific application. For example, unsuccessful search may not occur in certain applications and would therefore not be interesting for the average. 4.9 Comparison with priority search trees 61

It is customary in such situations to resort to simulations in order to gain insight to the average behavior of the data structures. We have run extensive simulations for a special application scenario, where the performance of PST, PSP, and a third data structure were compared with respect to insertions, deletions and minXinRectangle queries. Details and results are described in Chapter 5.7.

Recursive vs. iterative implementations For priority search trees, an iterative implementation of minXinRectangle cannot take advantage of the single-fork property, since Theorem 4.11 does not hold for PSTs. Our iterative PST algorithm implemented for comparison and listed in Appendix A.2 uses the parent pointers to walk back the search path in the tree. Alternatively, a stack could be used to keep track of the forks on the path. A benchmark test was conducted to compare search times for both the recursive and iterative versions of minXinRectangle in PSPs and PSTs, both implemented as red-black trees for balancing. For each of the tested sizes of the structures, the same 232 (i.e. roughly 4.3 billion) queries were carried out using each implementation of the algorithm. The queries covered the whole range of possible points stored in the structures. The overall actual running time was measured and divided by the number of queries. The experiment was repeated 5 times for each size and the average was taken. The structures and algorithms were implemented in Java and the tests were run on a 3 GHz Intel Pentium 4® processor with 2 GB RAM and 2 MB cache running a Linux system. The results are shown in Figure 4.12. The solid lines represent the times for PSP, the dashed lines those for PST. The actual time saving of the iterative implementation for PSP was about 61% for a pennant containing 1,000 elements and slightly improved with growing tree sizes. We note that in PST, too, the running time is reduced significantly by the iterative implementation; the saving here was 42% for a PST storing 1,000 elements and increased to almost 50% for 1,000,000 elements. When comparing the two structures directly, it turned out that with the recursive implementation, the PST performed about 5% better than the PSP. This comes as no surprise, since one can expect PSTs to have a shorter average search path for minXinRectangle than PSPs due to the (stronger) heap property. Furthermore, for each inspected node in a PSP, one additional comparison operation is required in order to determine the dominated subtree. For the iterative algorithm, however, the PSP performed more than 25% better than the PST. We attribute this strong difference to the efficiency provided by the single-fork property of PSPs (cf. Theorem 4.11) for the iterative algorithm. Let us conclude by remarking that the time required by minXinRectangle queries in PSPs is not only restricted by the same asymptotic bound as it is for PSTs but is also very competitive with PSTs in terms of actual running time. 62

0.8

0.7

0.6

PSP rec. 0.5 PSP iter. PST rec. 0.4 PST iter.

0.3 Time per query (inµsec)

0.2

0.1 1 K 10 K 100 K 1,000 K Size

Figure 4.12: Average time for minXinRectangle queries in PSPs and PSTs of different sizes for recursive and iterative implementations.

4.9.3 Space requirements

Obviously, both PST and PSP are linear-space structures. However, in real-world applications, the actual space requirements (rather than asymptotic bounds) can be very important. The commonly known implementations of balanced PSTs require 2n – 1 nodes for storing n elements [73, 79]. Next to the stored pair (taking two numerical fields), each node requires one additional field for the split key, plus some extra space for balancing information (one bit to encode the color in the case of red-black trees). Hence, a red-black balanced PST storing n elements requires 6n – 3 numerical fields plus 2n – 1 bits. As we have remarked earlier, McCreight has suggested an implementation of balanced PST which requires only n nodes for storing n elements [78]. However, here each node stores up to two complete points: a primary one satisfying the search tree condition (with the x-value of that point serving as the split key) and possibly another secondary one to satisfy the heap condition. Since the field for the secondary pair is never deleted, an additional Boolean value in each node indicates whether that field contains a valid secondary pair. Yet another Boolean marker is required to indicate whether or not the primary pair is duplicated in some other node. Hence, the actual gain in space of this solution is too overwhelming. This type of PST, if balanced with red-black trees, requires 4n numerical fields and 3n bits. Furthermore, the algorithms for range queries become slightly more complicated and require more actual time, because up to six fields (instead of three) need to be inspected and compared for each visited node. 4.10 Priority search trees and priority search pennants – an alternative view 63

Priority search pennants, on the other hand, always require n nodes for storing n elements, while having the same simple node structure as PSTs with 2n – 1 nodes. Hence, only 3n numerical fields and n extra bits are required in red-black balanced PSPs. This also makes PSP attractive for scenarios where storage space is an important issue (think, for example, of micro-sensors organized in a mobile ad-hoc network, where each sensor may have very limited memory). In addition, less memory requirement usually means lower energy consumption of the respective device, which is another important issue in many fields of mobile computing.

4.10 Priority search trees and priority search pennants – an alternative view

We conclude this chapter with a method to explain and explore the difference between PSTs and PSPs by the help of an alternative view on the data structures. Here we concentrate only on the construction of the static structure (i.e. ignoring update operations) and also remain completely in the geometrical domain of points in the plane. In order to better illustrate the actual difference in complexity between the two data structures, we will use a divide-and-conquer construction method where the difference between PST and PSP can be clearly seen, because only the merge steps of the two construction algorithms are different. Throughout this section we will assume that the set S of points to be maintained is given in an array sorted by x-coordinates.

4.10.1 A geometric visualization

We have pointed out above that the “empty” nodes in a PST are relevant only for balancing and if updates are not required they can be completely omitted (cf. Chapter 3.3.4). In that case, each node of a PST can be uniquely identified with the point it stores (assuming we can also add a split value to a point). The same of course holds for PSPs, as they contain exactly one node per stored point. Hence, we can visualize the structures with the visualization of the points in the plane by simply connecting the points according to the parent–child relations. Examples of a PST and a PSP for the same point set are shown in Figure 4.13.

Figure 4.13: The structure of a PST (left) and a PSP (right) visualized directly in the set of points. Each point is identified with the node in which it is stored. 64

The heap property of a PST is nicely reflected by the direction of the child pointers, which can never go upward. This is different in PSP where the (dashed) pointers to non- dominated children may also point upward.

4.10.2 Construction algorithm for PST

McCreight suggests a recursive construction of a PST from a given set of points [78], which is visualized in Figure 4.14. The idea is to first find a “winner” point w, i.e. one with the lowest y-value. This point will form the root of the PST. Then the remaining points are divided into two sets Sleft and Sright (of approximately equal size) such that the x-value of each point in Sleft is smaller than all the x-values of points in Sright. The maximum x-value occurring in Sleft is used as the split key of w.

The same method is applied recursively to Sleft and Sright. Finally, the roots of the PSTs for Sleft and Sright are made the left and right child nodes of w. The following algorithm describes this method of construction:

Algorithm 4.5: PST_DC(S): Input: a set S of points (x, y) sorted by x-coordinates Output: (the root of) the PST corresponding to S 1. if S = ∅ 2. root = null 3. else 4. root = point w with minimum y-value in S 5. divide the remaining set S \ {w} into Sleft and Sright 6. root.split = largest x-value in Sleft 7. root.left = PST(Sleft); root.right = PST(Sright) 8. return root

This is a classic divide-and-conquer algorithm, whose time complexity T hinges on the time required for the divide and merge steps. Assuming n = |S|, we obtain

n−1 T(n) = divide(n) + 2⋅T ( 2 )+ merge(n) . It is clear that the merge step (lines 6 and 7) in this case only takes constant time c, irrespective of n. The complexity of the divide part in lines 4 and 5 is in O(n). Since the points are given in an array sorted by x-value, line 5 can be carried out in constant time, while line 4 will still take time linear in n. Hence, the overall complexity is given by

n−1 T(.n) = 2⋅T ( 2 )+ c ⋅ n This recurrence equation, which is typical of many divide-and-conquer algorithms, yields an overall asymptotic running time of O(n log n) (cf. [22] [94]). While the above construction method for PST is conceptually simple, it has one drawback: the overall running time is not asymptotically optimal. Recall that since the 4.10 Priority search trees and priority search pennants – an alternative view 65

Figure 4.14: A divide-and-conquer construction algorithm for priority search trees from a set of points. points are already sorted by x-value, we can construct a PST in time O(n) (cf. Chapter 3.2.2). The problem is that in line 4 we first need to find the point w with the lowest y-value of the set S (which takes time linear in the size of S) before dividing S \{w}. This deficiency can be repaired and the complexity can be improved to O(n) by moving the selection of the “winner” point from the divide to the merge step. The divide step now becomes a constant time operation:

Algorithm 4.6: PST(S): Input: a set S of points (x, y) sorted by x-coordinates Output: (the root of) the PST corresponding to S 1. if |S| ≤ 1 2. return S0 3. else 4. divide set S into Sleft and Sright 5. Tleft = PST(Sleft); Tright = PST(Sright) 6. return pst_merge(Tleft, Tright)

The operation for merging two PSTs Tleft and Tright works as follows:

Algorithm 4.7: pst_merge(Tleft, Tright): Input: two non-empty PSTs Tleft, Tright of (approximately) equal size, where all x-values in Tleft are strictly smaller than all x-values in Tright Output: (the root of) the PST obtained by merging Tleft and Tright 1. if accessmin(Tleft) < accessmin(Tright) 2. w = extractmin(Tleft) 3. else 4. w = extractmin(Tright) 5. w.left = Tleft; w.right = Tright; w.split = max-x(Tleft) 6. return w

Accessing and comparing the minimum priorities of the two PSTs is possible in O(1), since only the two roots need to be inspected. The extractmin operation of line 2 or 4 runs in a time that is proportional to the height of the respective tree, i.e. log n/2. We 66

can assume that max-x(Tleft) has already been stored during the divide step, hence we do not need any extra time to find it. Now we get the recurrence relation

n n T()n = c + 2⋅T( 2 )+ log2 (2 ).

If we assume n = 2k for simplicity, we obtain

T (2k )= c + 2 ⋅ T (2k −1 )+ (k − 1) = c + 2 ⋅ []c + 2 ⋅ T ()2k −2 + (k − 2) + (k − 1) = 3 ⋅ c + 4 ⋅ T ()2k −2 + 2 ⋅ (k − 2) + (k − 1) = ... k −1 = (2k − 1) ⋅ c + 2k ⋅ T (1) + ∑ 2i ⋅ ()k − i i =0 k = O()2k + ∑ 2k −i ⋅i i =1 logn = O()n + n ⋅ i ∑ 2i i =1 logn = O()n + n ⋅ i ∑ 2i i =1 The last summand is the same as in our original analysis in Section 3.2.2. (It is also well-known, e.g. from the analysis of the heap-sort algorithm, where it describes the time required for converting an unsorted sequence into a heap (also see [94]). This is just what we expect since, essentially, our procedure converts the (unsorted) sequence of y-values into a min-heap. The sum describes of the (finite) geometric series, whose value is bounded by the value of the corresponding infinite series, i.e. the constant 2.) Hence, the total asymptotic time for algorithm PST is bounded by O(n). Of course, this procedure is no more than a recursive implementation of the iterative construction algorithm given in Chapter 3.2.2.

4.10.3 Construction algorithm for PSP

For priority search pennants, we can describe a similar construction algorithm.

Algorithm 4.8: PSP(S): Input: a set S of points (x, y) sorted by x-coordinates Output: (the root of) the PSP corresponding to S 1. if |S| ≤ 1 2. return S0 3. else 4. divide set S into Sleft and Sright 5. Pleft = PSP(Sleft); Pright = PSP(Sright) 6. return psp_merge(Pleft, Pright)

4.10 Priority search trees and priority search pennants – an alternative view 67

Note that the algorithm is identical to the PST algorithm; the only difference is in the merge step. Algorithm 4.9 is the inverse of the canonical split operation of PSPs described in Section 4.4.1). It can be stated as follows:

Algorithm 4.9: psp_merge(Pleft, Pright): Input: two non-empty PSPs Pleft, Pright of (approximately) equal size, where all x-values in Pleft are strictly smaller than all x-values in Pright Output: (the root of) the PSP obtained by merging Pleft and Pright 1. w = min{accessmin(Pleft), accessmin(Pright)} 2. l = max{accessmin(Pleft), accessmin(Pright)} 3. w.split = accessmin(Pright).split 4. l.split = accessmin(Pleft).split 5. if w = accessmin(Pleft) 6. l.right = l.left ; l.left = w.left ; w.left = l 7. else 8. l.right = w.left ; w.left = l 9. return w

Since accessing the minimum is a constant-time operation, the time complexity of the complete merge step is bounded by O(1). Hence, we get the recurrence relation

n T(,n) = 2⋅T( 2 )+ c leading to the overall asymptotic time O(n). Note that in both algorithm PST and algorithm PSP the divide and conquer steps of the two construction methods are completely identical, the only difference being the method of merging. This makes it easier to understand the difference between the structures. In summary, if PST(n) denotes the time to construct a PST from n elements sorted by x-values, and PSP(n) represents the time for constructing a PSP (both with the above method), we obtain:

n PST (n) = 2 ⋅ PST(2 )+ O(log n)

n PSP(n) = 2⋅ PSP( 2 )+ O(1) Although both recurrence relations lead to the same overall asymptotic complexity O(n), it can be clearly seen in this analysis that each single recursive call of constructing a PST involves a higher cost than for PSP. This reflects the difference in complexity between constructing a heap and a semi-heap and it gives a plausible explanation why PSTs are a “more complex” structure or contain “more information” than PSPs.

Chapter 5

An application for priority search queues

One possible application area of PSPs is the use in dynamic IP router tables for most- specific range matching.

5.1 Introduction: IP packet classification

Efficient methods for IP table lookup and packet classification have significant influence on the speed and throughput of current packet-switching networks. This is the reason why these problems have been extensively studied both from the network and from the algorithms and data structures point of view. Researchers have tried almost every promising technique applicable to these problems in order to find solutions which are faster, need less space, suit better to new needs, or are simpler to implement.

5.1.1. Problem specification

An IP router table consists of a set of (rule, action) pairs. For each incoming packet, the “best” rule matching the destination address, source address, port numbers and other fields contained in the packet header field has to be determined. Then the corresponding action is performed, e.g. the next hop is selected to which the packet is forwarded. Since we are not interested in the action but only in the selection process, we will henceforth ignore the action part. Selecting a matching rule for a given address means filtering each relevant field in the packet header according to a range of values specified for that field in each rule. Hence, the general problem is r-dimensional, if r is the number of fields listed in the rules. We will, in this chapter, restrict ourselves to the one-dimensional case, which can be regarded as the basis on which all higher-dimensional solutions can be based. If we take, for instance, the destination address as the relevant field of the packet header, we have to select a matching rule by filtering the destination address d according to the values specified in the rule as either a range of addresses or as an address prefix. For example, the range [17, 225] matches all addresses d such that 70

17 ≤ d ≤ 225, and the prefix 1011* matches all addresses from d = 10110000 to d = 10111111 (if addresses are given as binary numbers consisting of 8 bits). If prefixes are used as filters, the router table is called a prefix router table. The lengths of prefixes are limited by the length W of the IP addresses (W = 32 for IPv4 and W = 128 for IPv6). Note that each prefix specifies a range of matching addresses. On the other hand, not every range can be expressed by a prefix. For example, there is no prefix specifying the above range [17, 225]; the first address would be 00010001 and the last one 11011101. Hence, the set of all prefix ranges, i.e. ranges which can be given as prefixes, forms a special subset of the set of all ranges. Sets of prefix ranges have the additional property that any two ranges are either disjoint or one is completely contained in the other. Sets of ranges with this property are called nonintersecting [73]. Again, however, not every set of nonintersecting ranges is necessarily a set of prefix filters. If there is more than one rule matching a given address, different criteria may be applied for selecting the “best” rule in the lookup table. For example, a priority can be assigned explicitly to each rule, which is then used as a tie-breaker in cases of multiple matching rules. The strategy that we are concerned with here uses the following approach: for a given address d, we always choose the rule containing the most specific range msr(d). In other words, if ranges are nonintersecting, the rule with the smallest range containing d will be chosen. If ranges are given as prefixes, the most specific range corresponds to the longest matching prefix LMP(d). Note that in order to be able to use the most specific range strategy, it must be ensured that msr(d) is always well-defined for each incoming address d. This is relevant in particular since we are interested in the dynamic version of this problem, where insertions and deletions of ranges as well as lookup operations for addresses occur. Update operations could create a conflict in the resulting set of intervals, meaning that msr(d) may not be well-defined any more for some d. Hence, conflict detection is another issue which must be addressed.

5.1.2 Approaches to packet classification

Looking from an abstract data structures point of view, one may classify the many different approaches for solving the static and the dynamic case of the problem into two major categories. The first category explicitly exploits the fact that all interval boundaries as well as all points occurring as arguments of stabbing queries are chosen from a fixed bounded universe, the ordered set of points 0, 1, …., 2W – 1 on the line. Therefore, one can structure the universe and represent each given set of intervals in this universe. For sets of intervals defined by prefixes most solutions following the - based approach or using hashing schemes fall into this category. (However, the approach of structuring the universe is not only restricted to prefix ranges.) The second category contains solutions where one structures the given set of intervals. Most solutions obtained by considering ranges as r-dimensional intervals (instead of prefixes or substrings of strings) fall into this category. Therefore, this approach is sometimes also considered the geometric approach. (However, again, the principle of structuring the given set is not only restricted to intervals.) It is beyond the scope of this work to 5.2 Geometric interpretation 71 present an exhaustive synopsis of the vast literature on this problem. A recent survey of both approaches for the one-dimensional case can be found in [109]. Gupta considers the multidimensional case of packet classification [44]. Ruiz-Sanchez et al. provide a good survey of trie-based solutions for the static case of prefix router tables [107]. Feldmann and Muthukrishnan present an algorithmic framework for studying grid- based solutions for the dynamic version of the one and two dimensional packet classification problem [37]. In this context the Ω(log n) comparison based lower bound for searching a set of n ranges with endpoints taken from a universe of N points on the line does not apply. The main goal was to achieve sub-logarithmic optimal query time without sacrificing update time too much. This was finally achieved by Thorup [115]. As in the approach by Lu and Sahni [73], our main concern is to consider structures for the dynamic case of the problem in a more general computational model. We allow endpoints of intervals as well as query points to be arbitrary points on the line (and not only those specified by prefixes) and base our solutions, as Lu and Sahni do, on the comparison-based model of computation. More specifically, we restrict ourselves to the one-dimensional case of the IP lookup problem and to solutions which may be considered as following the “geometrical” approach. Our work described in this chapter can be seen as an improvement to a specific solution proposed by Lu and Sahni, who have suggested the use of priority search trees as a lookup structure for msr queries [73]. In the following sections, we first discuss the solution. We will then show that priority search trees can be replaced by priority search pennants or another alternative structure, min-augmented range trees, both of which reduce the actual running time for update as well as lookup operations. In addition, we improve Lu and Sahni’s solution for the case of nonintersecting ranges by removing a redundant part of the data structure, thus reducing the space requirements and costs of updates approximately by half.

5.2 Geometric interpretation

It has been observed that the dynamic version of the one-dimensional IP lookup problem can be naturally translated into geometric terms [73]. Ranges (intervals on the line) can be mapped onto points in the plane and vice versa, as both entities are uniquely determined by two values. Following the notation of Lu and Sahni, we denote by map1 the function that maps each range [l, r] onto the point (r, l). For a set R of ranges (intervals), map1(R) is a set of points below or on the main diagonal in the plane (see Figure 5.1). The original intervals are still visible as the horizontal lines connecting the respective point with the main diagonal.

72

10 C

D d 5 p=(d, d) A D B

C B A

051015 051015

Figure 5.1: A set of ranges (intervals) is mapped on a set of point in the plane. Points representing smaller intervals are located closer to the main diagonal.

Stabbing queries for sets of intervals on the line turn into range queries for south- grounded, semi-infinite ranges of points in the plane. More precisely: d ∈ [l, r] ⇔ [l, r] is stabbed by d ⇔ (l ≤ d and d ≤ r) ⇔ point (r, l) = map1(l, r) is to the right and below point p = (d, d) Hence, all points representing intervals stabbed by d lie to the right and below the point p = (d, d). In the example of Figure 5.1 showing a set of four intervals {[1, 6], [4, 9], [7, 15], [10, 12]}, the two points A, B corresponding to [1, 6] and [4, 9] are right and below point p = (5, 5). In our example, all right endpoints of intervals are pairwise different. Hence, in the set of points corresponding to the set of intervals, no two points have the same x-coordinate. Recall that it is always possible to enforce this property (cf. Chapter 2.1) by first transforming the point set into a new point set by mapping each point (x, y) to the point (2Wx – y + 2W – 1, y), where 2W – 1 is the maximum value that x can assume. Then all transformed points have different first components. For reasons of brevity and clarity, in the remainder of this chapter we will ignore this transformation and tacitly assume that all points have pairwise different x-coordinates. In general, for two ranges [l, r] and [l’, r’] the point map1(l, r) lies to the left and above the point map1(l’, r’) iff the range [l, r] is contained in (is more specific than) the range [l’, r’]. Since our task is to find the most specific range containing d, msr(d), we need to be sure that msr(d) is always well-defined. This is not the case in our example in Figure 5.1. The search value d = 5 is an example of a value contained two ranges. However, neither [1, 6] nor [4, 9] is more specific than the other. Lu and Sahni [73] call such a set of ranges conflicting (another example can be seen in Figure 5.2a). Sets of intervals for which msr(d) is always well-defined are called conflict-free. In Figure 5.2b, although the two larger ranges intersect, msr(d) is well-defined since a more specific range covers the overlapping region. We will in the following not consider the most general case of conflict-free ranges but restrict our discussion to a subclass: 5.3 A data structure based on priority search trees 73

(a) (b) (c)

Figure 5.2: Examples of sets of ranges: (a) conflicting, (b) conflict-free, (c) nonintersecting

Definition 5.1: A set R of ranges is called nonintersecting if for each pair r, s ∈ R either r ∩ s = ∅ or r ⊂ s or s ⊂ r.

In other words, R is nonintersecting any two sets are either disjoint or one is completely contained in the other (see Figure 5.2c). Furthermore, we shall assume that the default range [0, 2W–1] containing all possible query points is included in R, thus ensuring that for any d, msr(d) actually exists. It is obvious that a set of nonintersecting ranges is always conflict-free, i.e. for each d the most specific range in R containing d, msr(d), is well-defined (though Figure 5.2b shows that the opposite is not true). Hence, in the corresponding set P = map1(R) of points in the plane representing the set R of ranges, there is a unique topmost-leftmost point in the semi-infinite range to the right and below every query point p = (d, d) on the main diagonal. This topmost-leftmost point represents the range msr(d). Thus, solving the dynamic version of the IP lookup problem for nonintersecting ranges means to maintain a set of points in the plane for which we can carry out insertions and deletions of points (provided that updates do not introduce conflicts in R) and answer topmost-leftmost queries efficiently. At this point it should be clear that the required operations are exactly what priority search queues can support. Since there is always one unique topmost and leftmost point in each relevant search rectangle, it is sufficient to search for the leftmost point (since that one will also be topmost). Searching for the leftmost point in a given south- grounded rectangle can be done by a minXinRectangle query for that rectangle in a PSQ storing the point set map1(R). Since in our case, the rectangle is bounded by xleft = d, xright = ∞, ytop = d, the search msr(d) is carried out by calling the special query minXinRectangle(d, ∞, d). Note that of course, the right border of the rectangle is not actually open to infinity but given by the maximum possible value that x can assume, i.e. 2W–1.

5.3 A data structure based on priority search trees

Lu and Sahni have suggested a router table design using priority search trees as underlying search structure for msr queries [73]. As we have seen, balanced PSTs support the required lookup query, i.e. minXinRectangle, in time O(log n). Moreover, insertions and deletions can also be carried out in logarithmic time. 74

What remains to be done is to check whether an update operation creates a conflict and if so, refuse that operation.

5.3.1 Detection of conflicts

When looking at the conflict detection problem, Lu and Sahni distinguish between router table designs for prefix ranges, nonintersecting ranges, and (general) conflict-free ranges.

Prefix ranges It is quite obvious that in the case of prefix ranges, avoiding conflicts is not a problem, as prefix ranges are always nonintersecting and hence, conflict-free. The only situation where a conflict may occur is when the default range is deleted; in that case msr may not be defined for all query addresses d. Hence, only the deletion of the default range must be precluded in order to guarantee a conflict-free set.

Nonintersecting ranges When a set R of nonintersecting ranges is maintained in the router table, it is clear that the deletion of a range that is not the default range will never create a conflict, because no intersection can be introduced by removing a range from a set that does not contain any intersections in the first place. Hence, the delete operation can be treated in the same way as for prefix ranges. On the other hand, the insertion of a new range [u, v] in the set R may create a conflict. In order to keep R nonintersecting, we have to refuse the insertion of such ranges that overlap with any existing range [x, y] in R. (see Figure 5.3). The important question is how to detect such overlaps efficiently. A range r = [u ,v] cannot be inserted in R if it overlaps with any range contained in R, i.e. if ∃ s = [x, y] ∈ R: x < u ≤ y < v ∨ u < x ≤ v < y Hence, in order to determine whether a new range r = [u, v] intersects with any of the ranges in R, two conditions have to be checked: (1) ∃ s = [x, y] ∈ R: x < u ≤ y < v ( s left-overlaps r, cf. Figure 5.3a) (2) ∃ s = [x, y] ∈ R: u < x ≤ v < y (s right-overlaps r, cf. Figure 5.3b)

xyx y (a) (b) uuvv

Figure 5.3: Overlap of ranges: (a) [x, y] left-overlaps [u, v]; (b) [x, y] right-overlaps [u, v]. 5.3 A data structure based on priority search trees 75

An example is shown in Figure 5.4a. Points corresponding to ranges that intersect with [u, v] are all in the two rectangular areas. The points corresponding to left-overlapping ranges are in the smaller left rectangle; those representing right-overlapping ranges are found in the one to the right. In order to check if a range [u, v] is left-overlapped by any range in R, it is sufficient to determine whether map1(R) contains at least one point located in the left rectangle bounded by xleft = u, xright = v – 1, ytop = u – 1, ybottom = 0. It is easy to see that this can be achieved by calling minXinRectangle(u, v – 1, u – 1); if a value other than null is returned, such a point exists, i.e. r intersects with a range in R (also cf. [73]). In order to see if [u, v] is right-overlapped by any range in R, one needs to check whether R contains at least one point located in the semi-infinite rectangle bounded by xleft = v+1, ytop = v, ybottom = u+1, as can be seen in Figure 5.4a. (In practice, the rectangle is of course not semi-infinite; its right bound is given by the upper bound of the space of possible ranges, e.g. 232 – 1 for IPv4.) Note that this rectangle cannot be directly queried by a south-grounded range query, since that type of query requires that the rectangle be bounded by ybottom = 0. The solution proposed by Lu and Sahni uses a second mapping map2 of the ranges in R onto points in the plane, which is shown in Figure 5.4b. This transformation is orthogonal to the first one: it maps each range [u, v] on the point (u, 2W – 1 – v). Geometrically, map2(R) is obtained from map1(R) by a 90-degree clockwise rotation. (If the x-coordinates of the points in map2(R) are not pairwise distinct, an appropriate transformation will also be necessary.) As a result, the rectangular area for checking condition (2) becomes a south-grounded rectangle, as can be seen in Figure 5.4b. A second PST is used to maintain these points, and it is obvious that calling minXinRectangle(u, v – 1, 2W – 1 – (v+1)) on this structure will return a point if and only if R contains a range that right-overlaps [u, v].

2W - 1

2W - 1 -u

2W - 1 -v v

u

u v uv 2W - 1 (a) (b) Figure 5.4: (a) Two rectangles must be queried to detect intersections; only the left one is south-grounded. (b) A second orthogonal mapping of ranges as proposed by Lu and Sahni for querying the second rectangle. 76

Hence, new range [u, v] is in conflict with a set R of nonintersecting ranges if and only if at least one of the following conditions is true: (1) minXinRectangle(u, v – 1, u – 1) exists in map1(R) (2) minXinRectangle(u, v – 1, 2W – 1 – (v+1)) exists in map2(R) Of course, the second PST storing map2(R) has to be updated in the same way as the original one, i.e. every range has to be inserted in and deleted from both structures. It is clear that an insertion can still be carried out in overall time O(log n), including the test for intersection. Note, however, that the second PST doubles the space taken by the router table as well as the actual time for updates, which is quite unsatisfactory. The extra structure is maintained exclusively for the detection of intersections (or to be more precise, merely for detecting right-overlaps). The actual information contained in that PST is redundant and the extra structure is kept only in order to ensure that condition (2) can also be checked in O(log n) time. We will address this issue in more detail in Section 5.6 and show that the second structure can be completely omitted by an alternative verification of condition (2).

Conflict-free ranges In the case of (general) conflict-free ranges, not only insertions but also deletions may create conflicts. Consider the example in Figure 5.2b: if the small range on the top is removed, the remaining two ranges are in conflict because for points in the overlapping region the most specific range is not uniquely defined any more. While we shall not be concerned with this most general case in the remainder this work, let us briefly remark that Lu and Sahni have shown how insertions and deletions can still be carried out in overall logarithmic time. However, their structure is blown up considerably: in addition to the second PST, it requires a collection of balanced search trees [73].

5.3.2 Summary

The solution proposed by Lu and Sahni allows the maintenance of sets of ranges under insertions and deletions, including the detection of conflicts in time O(log n). The detection of the most specific range mrs(d) for a query point d can be done by a minXinRectangle query. Hence, all relevant operations can be carried out in time O(log n). Recall, however, that balancing a PST is rather complex, as even a single rotation requires O(log n) time. We have already mentioned the disadvantages resulting from this, concerning the restriction of methods for rebalancing, the large number of locked nodes when searches operations are to be carried out concurrently with updates, and the missing possibility for relaxed balancing (cf. Chapter 3.3.4). Furthermore, we have seen that updatable PSTs are not very space-efficient, as they either require 2n – 1 nodes for storing n elements or need to store a considerable amount of extra information in each node. 5.4 Min-augmented range trees 77

It would thus be desirable to have a somewhat more update-friendly structure for this problem. This is possible because, in a way, PSTs are almost “overly powerful” for our problem of finding the most specific range for a given address: since minXinRectangle is the only interesting range query, we do not require that other range queries be supported efficiently. As we have seen in the last chapter, by replacing PST with PSP the only thing that is sacrificed is the O(log n + r) complexity for enumerateRectangle (which is not required). On the other hand, we gain a constant time bound for rotations (which are often needed during updates). Although this will not reduce the overall asymptotic complexity of an update operation, we can expect to improve the actual time required for updates. Furthermore, it relieves us from the above disadvantages of PST. Before we give a more detailed comparison of the data structures in this context, we would like to describe another, even simpler data structure for solving the given problem. Just like PST and PSP, it is an augmentation of an underlying leaf-oriented search tree.

5.4 Min-augmented range trees

Min-augmented range trees were introduced by Datta and Ottmann as an alternative and conceptually simple structure for maintaining sets of ranges efficiently under updates and msr queries [26].

5.4.1 Definition of the structure

A min-augmented range tree (MART) maintaining a set of points with pairwise different x-coordinates stores the points at the leaf nodes such that it is a leaf-oriented search tree for the x-coordinates of points. Each internal node also carries a pair of values: however, rather than storing complete points, internal nodes contain split keys and information about the minimal y-coordinate of any point stored below that node. In the split key field we store the maximum x-coordinate occurring in the left subtree, just as it is done in PST and PSP. The min field carries the minimum y-coordinate of any point stored in a leaf of the subtree rooted in that node. A min-augmented range tree storing a set S of points may be visualized as shown in Figure 5.5. The construction of a MART is very straightforward, as it is virtually identical to the construction of a tournament tree described in Chapter 3.3.2. In fact, it is even simpler, since only the y-values and not complete points have to be promoted up in the tree. Although a MART requires 2n – 1 nodes for storing n elements, it takes less space than a PST, since each node only contains two fields (excluding information for balancing). 78

1 4

2 1 2 6

2 5 4 1 1 3 5 7

1, 2 2, 4 3, 8 4, 5 5, 4 6, 9 7, 1 8, 3

Figure 5.5: Min-augmented range tree storing 8 points. In internal nodes, the top value represents the minimum y-value of the subtree, the bottom one the split key.

5.4.2 Range queries

This section describes how to answer a minXinRectangle(xleft, ∞, ytop) query, that is, to find the point p with minimal x-coordinate in the semi-infinite range x ≥ xleft and with a y-coordinate below the threshold value ytop. Before listing the algorithm, we give an informal description of the procedure which can be understood intuitively and from which the complexity can be seen immediately. In order to find the desired point p, we first carry out a search for the boundary value xleft. It ends at a leaf storing a point with minimal x-coordinate greater than or equal to xleft (Figure 5.6). If this point has a y-coordinate below the threshold value ytop, we are done. Otherwise, we retrace the search path for xleft bottom-up and inspect the roots of subtrees falling completely into the semi-infinite x-range. These roots appear as right children of nodes on the search path (see Figure 5.6). Among them we determine the first one from below (which is also the leftmost one) that has a min field value below the threshold ytop. This subtree must store the answer to minXinRectangle(xleft, ∞, ytop) in one of its leaves. In order to find it, we recursively proceed to the left child of the current node if its min field shows that the subtree contains a legal point, i.e. if its min field is (still) below the threshold, and proceed to the right child only if we cannot go to the left child (because the min field of the left child is above the threshold ytop). It should be clear that in this way we can find the desired point in a time that is proportional to the height h of the underlying leaf-search tree. The number of nodes to be inspected is bounded by h+1 for the initial search for xleft, plus at most h for retracing that path, plus at most 2·h – 1 for the final search in the correct subtree. Note that in an actual implementation it is more efficient to truncate the initial search for xleft and begin retracing the path soon as the min field of the currently inspected node is above ytop. The following is a recursive description of the algorithm in pseudo-code. It is called with the root of the tree as the initial value for N. 5.4 Min-augmented range trees 79

Figure 5.6: The search path of the query minXinRectangle(35, 80, 34) in a min-augmented range tree. Visited nodes are highlighted; the pink node is the result returned by the query.

Algorithm 5.1: minXinRectangle(xleft, ∞, ytop) for a min-augmented range tree: Input: a starting node N, rectangle bounds xleft and ytop Output: the node storing the leftmost point in the given rectangle 1. MinXinRectangle(N, xleft, ytop)

2. if minN ≤ ytop

3. if N.isLeaf and xleft ≤ xN 4. return N

5. if splitN ≥ xleft 6. M = MinXinRectangle(N.left, xleft, ytop) 7. if M ≠ null 8. return M 9. return MinXinRectangle(N.right, xleft, ytop) 10. return null

Only those nodes with a min value below the threshold ytop are further inspected (cf. line 2 – we assume that the min value of a leaf node equals the y-value of the point stored there). As soon as a leaf with valid x-value is reached (line 3), we return it as our solution (line 4). For internal nodes, we proceed to the left child if the current split key indicates that there may be a valid point in the left subtree (line 5). If the left subtree contains no valid point (i.e. if any of the conditions in line 5 or line 7 is false), we proceed to the right subtree (line 9). It is quite apparent that this algorithm does just what we have described above and that with only a few minor modifications, it can be extended to answer general minXinRectangle(xleft, xright, ytop) queries. An iterative version of the algorithm, which has turned out to be about 50% more efficient than the recursive variant in actual implementation, can be found in Appendix A.2. 80

5.4.3 Balancing min-augmented range trees

Since the complexity of the search is in O(h) where h is the height of the tree, it is again desirable to maintain the underlying leaf-search tree balanced. All we need to show is that the augmented information stored in the min-fields of nodes can efficiently be maintained during update and rebalancing operations. In order to do this, it is appropriate to think of an update operation for the underlying balanced leaf-search tree as consisting of two successive phases. In the first phase, a point is inserted or deleted as in a normal (unbalanced) binary leaf- search tree, and in the second phase we retrace the search path and carry out rebalancing operations, if necessary. In order to update the information stored in the min fields of internal nodes, the first phase is extended as follows. We retrace the search path and play matches starting from the leaf affected by the update operation: We recursively consider the min fields of the current node and its sibling and store the minimum of both in the min field of their common parent. In this way we correctly update the information stored in the min fields after the first phase. In order to show that this information can also be maintained during the second phase, i.e. during rebalancing, let us consider a right-rotation (see Figure 5.7). Here we assume that a, b, c, d, e are the split keys in increasing x-order stored in the internal nodes shown. The values of the min fields before the rotations are u, v, w, x, y and u, v’, w, x’, y after the rotation. Obviously, u, w and y do not have to be changed, because their subtrees are not affected by the rotation. We just have to update the min fields of nodes A and B. Note, however, that the min value that has to be stored at node A is (still) the overall min value x of all subtrees 1, 2, and 3. Hence, it is safe to set x’ = x. Choosing v’ = min{w, y} will finally restore the min fields correctly. Note the difference of maintaining the min-fields during rotations when compared to maintaining the heap-order of nodes stored in a PST (cf. Chapter 3.3.4). No trickle- down process is initiated, and rotations remain constant-time operations. Hence, just as for PSP, we can freely choose an underlying balancing scheme for min-augmented range trees. Moreover, rotations and the process of maintaining the augmented min-

x B A x’ d b

A v y u v’ B b e a d

u w 3 1 w y a c c e

1 2 2 3

Figure 5.7: Right-rotation in a min-augmented range tree. 5.5 Comparison of the data structures 81 information become strictly local, which makes it possible to decouple the update and rebalancing operations as in the case of relaxed balanced trees [64].

5.5 Comparison of the data structures

At this point, it is useful to summarize the similarities and differences of the data structures we have discussed so far. In addition to the theoretical worst-case bounds, we provide simulation results for average cases.

5.5.1 Theoretical bounds

Table 5.1 provides a comparative overview of priority search trees, priority search pennants and min-augmented range trees. The node size lists the number of numerical fields required by a node. It does not include extra information required for balancing, such as color, weight or height balance factor. The size in bits is the memory required to store the numerical fields in all nodes, where 32 bits per value were assumed for IPv4 and 128 bits for IPv6. Note that this number will have to be doubled for the keys and split key fields when a transformation of values (such as described in 3.1) is necessary. Obviously, the priority search pennant is the most space-efficient structure, as it only requires n nodes. Although min-augmented range trees require 2n – 1 nodes just like priority search trees, each node only stores two values; hence, MART require less memory. Regarding updates and minXinRectangle (required for msr queries), all three structures shared the same asymptotic bounds. The bounds for enumerateRectangle are only listed for completeness, as this operation is not required in our application. This operation has not been analyzed for MART so far. The last two lines summarize our analyses of the rebalancing operations. PST, with their O(log n) cost for a single rotation, are restricted in the choice of a balancing scheme, whereas PSP and MART allow for a much wider variety. Note that this also allows the latter two to be balanced to a better height than PST; while red-black trees may result in a tree of height 2 · log2 n, it has been shown, for instance, that the maximum height of an AVL tree or a tree balanced by internal path reduction is bounded by 1.44… · log2 n (cf. [41]). 82

Table 5.1: Overview of different properties and complexities for PST, PSP, and MART.

PST PSP MART

Size for n points 2n – 1 nodes n nodes 2n – 1 nodes Node size 3 fields 3 fields 2 fields (numerical fields) Size in bits: IPv4 192·n 96·n 128·n Size in bits: IPv6 768·n 384·n 512·n Updates O(log n) O(log n) O(log n) minXinRectangle O(log n) O(log n) O(log n) enumerateRectangle O(log n + r) Θ(r·(log n – log r + 1)) --- Single rotation O(log n) O(1) O(1) Balancing scheme Restricted Flexible Flexible

5.5.2 Simulation results for average cases

So far, when discussing running times, we have looked at the worst case scenarios. In practical application such as the one discussed here, however, it is the average case that is mainly relevant for the choice of a particular data structure. As was outlined in Chapter 4.9, if a theoretical average case analysis seems beyond reach, it is common to resort to simulations and use the experimental results to shed light on the average-case behavior. An additional advantage of such tests is that it can be adapted to the actual application rather than assuming all theoretically possible situations. For instance, in our particular scenario the minXinRectangle(xleft, xright, ytop) queries used for the msr search are of a very specific nature, where always xright = ∞ and ytop = xright. Furthermore, in each node N we have yN ≤ xN, since all points all the points stored in the structure are on or below the main diagonal. This interdependence of x and y-values may also have an effect on the performance. In our scenario, we are mainly interested in the following variables: The average number of nodes manipulated during an update operation (including re-balancing) is a good measure to compare the actual cost of insertions and deletions. The average number of nodes inspected during a MinXinRectangle query is important in order to assess the actual lookup time. For comparing balancing schemes, the average number of rotations triggered by an update operation can be of interest. It may also be interesting to compare measures such as the average height and internal path length. In our simulations we have compared the following structures: PST as used in the approach by Lu and Sahni [73], PST as suggested by McCreight [78] (referred to as “PST_2”), both of which are balanced with red-black trees, and MART and PSP in two versions each (balanced by internal path reduction [41] and with red-black trees). One test run consisted of generating one instance of each structure (of size n) by first inserting the default range [0, 232 – 1] and then n – 1 successive insertions of the same randomly selected nonintersecting ranges (from the IPv4 address space). Then one 5.5 Comparison of the data structures 83 million msr(d) queries were carried out on each structure, with the values of d equally distributed over the whole IPv4 address space. Finally, a series of random deletions was carried out. For each insertion and deletion, the number of rotations triggered and the number of node manipulations was counted. We have deliberately neglected the number of pure node inspections since those should be more or less identical for all structures (the insert/delete position on the leaf level has to be determined), and instead have concentrated on the differences between the structures. Nevertheless, this measure requires some further explanation: a node is manipulated whenever any of its pointers to other nodes or any of its contents are updated (with the exception of color or weight updates since one can argue that only the structural changes of rebalancing operations have to be carried out before new search queries can be processed, while other information can be updated later in parallel to search queries.) Multiple manipulations of one node occurring at a time, such as simultaneous updates of pointers and contents during a rotation, are counted as one manipulation. However, if the same node is manipulated several times independently during an update operation (e.g. its content is updated during an insertion and later during the same insertion the same node is rotated), each manipulation is counted separately. After each structure was created, the height and internal path length were measured. (Note that for PSP, only the tree below the top node was considered, since the top node itself never has to be inspected during searches, as it always contains the default range and it only has one child.) For the msr(d) searches, the length of the search path, i.e. the number of inspected nodes during the search, was measured. Tests were made for different sizes n of the structures; for each size, one hundred of the above test runs were carried out; hence, the average search paths in Table 5.2 are averages over 100,000,000 searches each, and for the measures on insertions and deletions, each number is the average over 100·n measured values.

Table 5.2: Overview of average search path lengths (for msr queries) and node manipulations per update operation.

Number of elements 500 1000 2000 5000 10000 20000 50000 100000

Average search path PST 10.84385 11.97032 13.08479 14.54128 15.62774 16.70334 18.12349 19.14308 PST_2 9.86812 10.99020 12.09834 13.54233 14.66064 15.70629 17.12135 18.11471 MART_IPR 12.43751 13.58151 14.70415 16.17944 17.29168 18.36328 19.77570 20.75329 MART_RB 12.50034 13.65053 14.79566 16.27953 17.37603 18.46475 19.89255 20.93622 PSP_IPR 10.11708 11.18305 12.25476 13.66596 14.73814 15.77180 17.15355 18.12058 PSP_RB 10.17207 11.25630 12.34265 13.76235 14.81836 15.86958 17.26567 18.28495

Node manipulations per insert PST 9.41198 9.42188 9.48112 9.47493 9.48606 9.48019 9.47830 9.49707 PST_2 12.76194 13.75391 14.81301 16.15099 17.17379 18.19161 19.53612 20.52306 MART_IPR 6.88328 6.85665 6.86598 6.86482 6.85514 6.85133 6.84964 6.85123 MART_RB 6.72138 6.69818 6.71047 6.70283 6.69877 6.69259 6.69206 6.68332 PSP_IPR 4.62710 4.60559 4.61215 4.61276 4.60454 4.60253 4.60014 4.60176 PSP_RB 4.10296 4.08531 4.09010 4.08170 4.08018 4.07491 4.07319 4.06747

Node manipulations per delete PST 8.83692 8.99941 9.03846 9.09688 9.10580 9.11823 9.12686 9.13244 PST_2 13.47702 14.56888 15.63341 17.00824 18.03104 19.02062 20.32044 21.29024 MART_IPR 6.39543 6.43983 6.46524 6.48159 6.48365 6.48883 6.48996 6.49054 MART_RB 6.54394 6.59323 6.61581 6.63318 6.63683 6.64337 6.64306 6.64465 PSP_IPR 4.92863 4.95923 4.96595 4.97410 4.98104 4.98250 4.98587 4.98817 PSP_RB 5.08382 5.12521 5.13086 5.14284 5.14382 5.14614 5.14970 5.15122 84

The experimental results are listed in Table 5.2. The average search path length for msr(d) queries are visualized in Figure 5.8. For interpreting this measure, it should be kept in mind that the actual time taken by a msr(d) query is also influenced by the time per node inspection, which roughly corresponds to the number of values stored in a node that have to be compared during the inspection. MART is the simplest structure in this respect, followed by the PST, then PSP (because the dominated subtree has to be determined through an additional comparison), and finally PST_2. Quite interestingly, this order is just the reverse of that for the search path length. The experimental tests in Section 5.7 reveal how this inverse relationship of search path length and operations per inspected node affects the actual running time. The choice of balancing scheme for MART and PSP turns out to be of rather low importance for the search time. For msr(d) queries, the differences between red-black trees and internal path reduction are minimal on average: the average search path for internal path reduction is less than 1% better than that for red-black trees. For updates, both PSPs and MARTs, as expected, require a lower number of node manipulations than PSTs. If we compare the structures when the same balancing scheme is applied, MART, on average, require about 30% fewer node manipulations during an insertion than PST; the reduction for PSP is even 57%. This discrepancy can surely be attributed to the fact that PSPs only have about half the number of nodes as PST and MART. Also, it is noteworthy that the number of node manipulations grows at an extremely low rate with increasing size of the trees for all structures (except PST_2), indicating that rotations take place mainly at the fringe of the tree (see Figure 5.9 left). This is somewhat different in the worst case: for comparison, we have deliberately chosen a “bad” insertion sequence (each inserted range is properly included in the next one) triggering as many rotations as possible. The required node manipulations per insertion can be seen in Figure 5.9 (right). From n = 1,000 to n = 100,000, there is only a 0.5 to

25

20

15

10

5

0

50000 100000 Size 20000 PST 10000 5000 2000 1000 MART_RB 500 MART_IPR PSP_RB PST_2 PSP_IPR

Figure 5.8: Average number of visited nodes in a most specific range query. 5.5 Comparison of the data structures 85

60 60

50 50

40 40

30 30

20 20

10 10

0 0 PST PST PST_2 PST_2 100000 50000 50000 100000 20000 20000 10000 5000 10000 5000 2000 Size 2000 Size 1000 500 1000 500 MART_RB MART_RB PSP_IPR MART_IPR PSP_IPR MART_IPR PSP_RB PSP_RB

Figure 5.9: Average number of node manipulations during an insert operation. Left: average over random sequences of insertions; right: worst-case sequence of insertions

1% increase for PSP and MART, while for PST, the same measure went up by about 50% and 77%, respectively, which nicely shows the difference between the O(log n) and O(1) complexities for a rotation. For deletions, the situation is very similar, as can be seen in Figure 5.10. The number of node manipulations during deletions in a (red-black balanced) PSP is about 43% lower than for a PST; for MART, the reduction is still 27%. Balancing by internal path reduction requires even slightly fewer rotations (and hence, manipulations) for a delete operation on average than red-black trees. The reason for the different complexity of the PST_2 implementation is not higher number of structural changes, i.e. pointer updates, but the many changes of values inside the nodes during updates. Hence, this effect may not be as drastic when actual running time is measured.

25

20

15

10

5

0 PST 100000 50000 PST_2 20000 10000 Size 5000 2000 1000 500 MART_RB PSP_RB MART_IPR PSP_IPR

Figure 5.10: Average number of node manipulations during a delete operation. 86

Summing up, if we compare the number of manipulated nodes per insertion and deletion between PST, MART and PSP, the latter have advantages regarding update operations compared to PST. In particular, they seem to be more robust against worst case sequences of updates (cf. Figure 5.9 right). The complexity of msr(d) queries in these structures seems comparable to that of PST. We have also run benchmark tests comparing the actual running time of all the above data structures. The results are presented in Section 5.7. First, however, we will point out how the router table design suggested by Lu and Sahni can be further improved by removing the redundant second structure.

5.6 Improvement of conflict detection

The router table design described in Section 5.3 used a second PST in order to detect intersections and prevent conflicts in sets of nonintersecting ranges. Recall that this second structure was required merely to verify the second of the following conditions:

(1) ∃ s = [x, y] ∈ R: x < u ≤ y < v ( s left-overlaps r, cf. Figure 5.3a) (2) ∃ s = [x, y] ∈ R: u < x ≤ v < y (s right-overlaps r, cf. Figure 5.3b)

We will now show that such an additional structure is not required and that the verification of condition (2) can also be achieved in O(log n) by a single (and exceedingly simple) query on the original structure, plus one comparison operation. Our argument is general in the sense that it is not restricted to a specific data structure (in this case, priority search trees) but holds for any structure that supports queries of the type minXinRectangle. Recall that condition (2) is true if any point of map1(R) is inside the hatched rectangle in Figure 5.11 (left). Since the rectangle searched by minXinRectangle(xleft, xright, ytop) is always implicitly bounded by ybottom = 0, we consider the extended rectangle bounded by xleft = v+1, ytop = v, and ybottom = 0, as illustrated by the in Figure 5.11 (right). Note that calling MinXinRectangle for this rectangle will always return an existing range as its result, since we have assumed that R includes the default range that matches all destination addresses. We observe that our original query (i.e. whether the hatched rectangle in Figure 5.11 (left) contains a point of map1(R)) can be translated into the question whether or not the extended rectangle in Figure 5.11 (right) contains any point of map1(R) whose y-coordinate is greater than u. The central point in our argument is the following theorem stating that the leftmost point in the larger rectangle is always topmost, i.e. there is no other point with a larger y-value in the rectangle.

Theorem 5.2: Let L = minXinRectangle(v+1, ∞, v), i.e. L is the leftmost point of map1(R) in the query rectangle. Then the rectangle contains no other point of map1(R) that is above L, i.e. whose y-coordinate is greater than yL. (That is, L is the unique topmost-leftmost point in the rectangle.) 5.6 Improvement of conflict detection 87

v v

u u

u v u v Figure 5.11: To check whether the original rectangle (cf. the hatched area left) contains any points, the extended rectangle (right) is queried with minXinRectangle.

Proof: Assume that there is a point P = (xP, yP) ∈ map1(R) which is in the rectangle and above L, i.e. yP > yL. We know that xL < xP, since L is the leftmost point in the rectangle and no two points have the same x-coordinate. Since both L and P are in the query rectangle, we also know that xL, xP ≥ v+1 and yL, yP ≤ v. Summarizing the above inequalities yields the following order of values:

yL < yP ≤ v < xL < xP

Recall the correspondence of points and ranges: if P = (xP, yP), then the corresponding -1 -1 range is map1 (P) = [yP, xP]. The above order clearly shows that map1 (L) = [yL, xL] -1 and map1 (P) = [yP, xP] are intersecting. This contradicts our basic precondition that the set R of ranges is nonintersecting. Hence, our initial assumption must be wrong. □

Theorem 5.3: If minXinRectangle(v+1, ∞, v) returns a point P whose y-value yP ≤ u, then the (semi-infinite) rectangle defined by xleft = v+1, ytop = v, ybottom = u+1 does not contain any points of map1(R).

Proof: Assume the smaller rectangle is not empty. Then it must contain a leftmost point

Q = (xQ, yQ) ∈ map1(R), i.e. yQ ≥ u+1. Since Q is above P in the (larger) query rectangle, the query minXinRectangle(v+1, ∞, v) must return Q, according to Theorem 5.2. This is a contradiction! □

Together with the above remarks, this leads directly to the following condition for intersections: 88

Theorem 5.4: Let R be a set of nonintersecting ranges represented as points in the plane (by map1) in a data structure supporting minXinRectangle(xleft, xright, ytop) queries. A range r = [u, v] ∉ R intersects with R if and only if one of the following conditions is met: (1) minXinRectangle(u, v – 1, u – 1) exists

(2) minXinRectangle(v+1, ∞, v) returns a point P such that yP > u

Thus, for nonintersecting ranges, a single structure supporting minXinRectangle queries is sufficient for conflict detection. Since the performance of this query is bounded logarithmically for PST, PSP and MART, the asymptotic time for insertions remains in O(log n). Note that the removal of the additional structure does not only reduce the space requirements of Lu and Sahni’s PST solution by half, but also reduces the actual time for updates by almost the same factor, since each insertion or deletion requires an update of both structures. This can be seen in our benchmark results presented in the next section.

5.7 Experimental results

A number of tests were conducted to compare the performance of the data structures in an actual implementation. We were interested in update operations (insertion and deletion of ranges) as well as most specific range queries. We used our existing Java implementations of PST, PSP and MART. For comparison, the double PST router table design for nonintersecting ranges in [73] was implemented as well. All structures were balanced by the red-black tree balancing scheme. A separate test had shown that PSP and MART balanced by internal path reduction take almost twice the time for update operations, while there is almost no gain regarding search times (less than 1% for PSP); hence these versions were not considered any further. When comparing the two PST versions, it turned out that update times are fairly comparable between the two structures; inserts were slightly faster in the McCreight variant (PST_2), while for deletes, the standard version performed slightly better. However, contrary to our assumption based on the simulation results (cf. Section 5.5), msr searches on the smaller PST_2 implementation took almost twice the time than on the standard version (see Figure 5.12). Apparently, the more complex node structure, and hence the greater number of comparisons per node visit, is more influential on the actual running time than the length of the search path, which is smaller in these trees. It was therefore decided to consider only the standard variant in the comparison with the other data structures. All tests were carried out on a 1.7 GHz AMD Athlon® processor with 256 KB L2 Cache and 1 GB RAM, running a Linux system. 5.7 Experimental results 89

1.4

1.2

1.0

0.8

0.6 Search time (in µsec) (in time Search 0.4

0.2

0.0 1000 2000 5000 10,000 20,000 50,000 100,000 PST 0.4209 0.4549 0.4999 0.5457 0.5674 0.6293 0.6934 PST_2 0.7872 0.8603 0.9616 1.0490 1.1188 1.2261 1.3214 Size

Figure 5.12: Comparison of search times for msr queries between the two variants of priority search trees.

The tests were carried out for prefix ranges as well as nonintersecting ranges. For both types of ranges, sets with randomly created ranges were used. For the minXinRectangle queries (which is used for msr search and conflict detection), the (faster) iterative version was used in all data structures.

5.7.1 Prefix ranges

In order to compare different sizes of the data structures, random range sets of sizes 1K, 2K, 5K, 10K, 20K, 50K, and 100K ranges were created. For each size, 20 different sets were produced. A test run consisted of: (1) Successive insertion of all ranges in an initially empty structure. The total time was measured and divided by the number of ranges. (2) 4.3 million msr queries equally distributed over the IPv4 address space. Again, the time for all queries was measured and divided by the number of queries. (3) Successive deletion 33% of the ranges. These ranges had been determined before so the same nodes were deleted for every structure. The total time for deletions was measured and divided by the number of deletions. Each test was repeated 20 times (with different range sets) for each data structure and each of the above sizes. The results listed in the tables below the diagrams in Figure 5.13 are the average of the 20 iterations. For all operations, both MART and PSP perform better than PST. For sets of 10,000 ranges or more, PSP require roughly 27% less time for both insertion and deletion, while MART save about 10% for insertions and 15-20% for deletions, when compared to PST. However, the improvement for update times was not as drastic as had been expected from the simulation results, where the number of node manipulations was compared (cf. Section 5.5.2). 90

5.0

4.5

4.0

3.5

3.0

2.5

2.0 Insert Time (in µsec) (in Time Insert 1.5

1.0

0.5

0.0 1000 2000 5000 10,000 20,000 50,000 100,000 PSP 1.4175 1.2480 1.4172 1.7475 2.2112 2.8504 3.4467 MART 1.0576 1.1779 1.5807 1.8712 2.3540 3.1678 4.0707 PST 1.6067 1.5558 1.9847 2.5346 3.1266 4.0565 4.7442 Size

4.5

4.0

3.5

3.0

2.5

2.0

Delete Time (in µsec) 1.5

1.0

0.5

0.0 1000 2000 5000 10,000 20,000 50,000 100,000 PSP 0.9953 1.0021 1.2967 1.6302 2.0068 2.6566 2.9637 MART 0.9712 1.1242 1.7784 2.3326 2.7832 3.2478 3.6580 PST 1.3921 1.4748 2.0971 2.5570 2.9665 3.6298 4.0485 Size

0.8

0.7

0.6

0.5

0.4

0.3 Search Time (in µsec) (in Search Time 0.2

0.1

0.0 1000 2000 5000 10,000 20,000 50,000 100,000 PSP 0.2726 0.2939 0.3232 0.3436 0.3649 0.3964 0.4233 MART 0.2587 0.2736 0.2954 0.3106 0.3257 0.3485 0.3667 PST 0.4209 0.4549 0.4999 0.5457 0.5674 0.6293 0.6934 Size

Figure 5.13: Average times required for insertion, deletion and msr queries in sets of randomly generated prefix ranges.

5.7 Experimental results 91

The greatest improvement was found for most specific range queries. Here, PSP performed 35% better than PST, and MART even saved 45% of the time. Judging from the simulation results, the strong performance of MART was particularly unexpected, as this is the only structure where the length of the search path for msr queries is always in Ω(h), even in the best case. This is because all ranges are stored in the leaf nodes, whereas in PSP and PST, the search may end after inspecting the first two nodes in the best case. Apparently, the number of inspected nodes is less crucial for the actual time than the number and type of comparisons inside each node; hence, the min-augmented range tree can benefit from its much simpler node structure.

5.7.2 Nonintersecting ranges

Benchmark tests were also run for sets of nonintersecting ranges. In addition to the above structures, the double PST of Lu and Sahni was included to see the actual savings of our improvements when compared to the previous solution in [73]. The test scenario was similar to the one used for prefix ranges. However, instead of prefix ranges (which are always nonintersecting), sets of arbitrary nonintersecting ranges were randomly created. Also recall that a modified insert procedure has to be used which includes the test for intersection of the new range to be inserted with existing ranges (cf. Section 5.6). As was expected, update times were reduced approximately by half through the omission of the second query structure (see Figure 5.14). Deletion time for the two-PST structure is slightly more than twice the time required for one PST, which is surprising but in line with the observation by Lu and Sahni, who explain it by a “disproportionate increase of cache misses” due to the higher total memory requirements [73]. Interestingly, while PSP are faster than MART for deletions (for tree sizes of 10,000 or more), they are slower for insertions. Apparently, the better performance of MART for minXinRectangle (which is used twice during conflict detection) accounts for that phenomenon. Again, the time required for msr queries is drastically lower for PSP and MART than it is for PST. Surprisingly, the improvement is more than 50% here (compared to 35% and 45% for prefix ranges). 92

16.0

14.0

12.0

10.0

8.0

6.0 Insert time(in µsec) 4.0

2.0

0.0 1000 2,000 5000 10,000 20,000 50,000 100,000 MART 1.7414 1.9064 2.4399 2.8606 3.5862 4.5433 5.2057 PSP 2.3737 2.6026 2.8613 3.3095 3.9812 4.9235 5.6943 PST 2.7220 3.1233 4.0684 4.8770 5.8660 7.1822 8.4096 DoublePST 3.8894 4.7519 6.7208 8.2506 9.0567 12.3425 14.2651 Size

10.0

9.0

8.0

7.0

6.0

5.0

4.0

Delete time (in µsec)time (in Delete 3.0

2.0

1.0

0.0 1000 2,000 5000 10,000 20,000 50,000 100,000 MART 0.9620 1.1237 1.7426 2.2996 2.7620 3.2908 3.6379 PSP 1.1923 1.3299 1.8036 2.1131 2.4534 2.9129 3.2838 PST 1.3498 1.4261 2.0812 2.5600 3.0129 3.6499 4.1177 DoublePST 2.7198 3.3459 4.8448 5.7425 6.7290 7.9619 9.1548 Size

1. 0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2 Search Search time (in µsec) 0.1

0.0 1000 2,000 5000 10,000 20,000 50,000 100,000

MART 0.2220 0.2428 0.2666 0.2834 0.3017 0.3291 0.3512

PSP 0.2700 0.2882 0.3199 0.3395 0.3659 0.4022 0.4245 PST 0.5640 0.5977 0.6568 0.7029 0.7339 0.8078 0.8660

DoublePST 0.5722 0.6162 0.6787 0.7060 0.7589 0.8534 0.9178 Size

Figure 5.14: Average times for insert, delete and msr queries in sets of randomly generated nonintersecting ranges.

5.8 Conclusion and future work 93

5.8 Conclusion and future work

This chapter has presented a practical application for priority search queues in the domain of IP packet classification. We have seen that the problem of finding, for a given destination address d, the most-specific range containing d, corresponds to a range query of the type minXinRectangle for a set of points. For this particular problem, the PSTs employed in a previous solution offer no advantage compared to simpler structures such as PSPs or MARTs. On the contrary, the latter structures, with their greater efficiency regarding rebalancing, result in a more efficient implementation of the router table design proposed by Lu and Sahni [73]. Moreover, we have shown how that design can be further improved for nonintersecting ranges by removing a redundant part of the data structure. Experimental results show that in addition to improved update times, the actual time required by most specific range queries is reduced significantly. Lu and Sahni also discuss general conflict-free range sets (cf. Figure 5.2) [73]. In such a set, not only insertion but also the deletion of a range may create a conflict. Lu and Sahni have extended their two-PST data structure by an additional collection of balanced search trees in order to detect so-called resolving subsets for two intersecting ranges and show that by this structure, conflict-free sets can be maintained in O(log n) search and update time. It is unclear whether for conflict-free sets, the second PST (or other PSQ structure) can also be completely omitted. We have been able to show that one PSQ structure (plus the collection of balanced search trees) is sufficient as long as only insertions are carried out and no nodes are deleted. A sketch of the proof can be found in Appendix A.4. Additional research is required to see whether deletions can also be efficiently handled in this simplified design.

Part II

Effectiveness of Algorithm Visualization for Learning

Chapter 6

Learner engagement with algorithm visualizations

This chapter provides an overview of the study of the effectiveness of algorithm visualization in computer science. The state of the art of the research in this field is summarized. Extra attention is given to the engagement taxonomy and the hypotheses involved, which have formed a common research framework for recent studies on algorithm animation effectiveness, including our own study described in Chapter 7.

6.1 Introduction

In the appendix of his award-winning dissertation on algorithm visualization, Marc Brown remarks that his work has not answered the question whether his simulations actually help students learn the contents better [17]. While both instructors and learners are often intuitively convinced of the value of algorithm visualizations when asked in evaluations, this is of course no objective measure of the effectiveness of such learning aids. Almost all studies on algorithm visualizations unanimously report that students were “excited”, “enthusiastic”, or “motivated”, and that they “enjoyed” working with the visualizations, and often the learners express their firm belief that they have learned better because of the visualizations [10, 117]. However, as Goldstein put it, while such tools may enhance the learning experience, they do not necessarily improve learning [40]. On the contrary, even a reduction of the performance has been observed if presentations are laden with too much (and possibly irrelevant) multimedia materials [76]. Since the late 1990s, an increasing number of empirical studies have been carried out in order to assess the pedagogical value of algorithm visualizations. An excellent overview is given in [52]. The outcomes of these evaluations were very mixed. While some studies reported significantly improved learning, others could not detect any difference to traditional teaching without visualizations or even indicated a negative effect. This 98 dissatisfying result may be attributed to the fact that the settings and designs of the studies were, in most cases, not comparable and hence a great variety of influencing factors might be responsible for the discrepancies. However, an extensive meta-review of former studies by Hundhausen et al. suggested that there was indeed one factor which could – at least in part – explain the differences of 21 earlier evaluations under consideration [52]. The authors found that out of the 9 experiments which focused on the representational aspects of the visualizations such as sophisticated graphics and animation, only 3 produced significant results, while 10 of the remaining 12 evaluations found significant effects. These latter 12 experiments were different from the former 9 in that they all engaged the students in activities beyond passive watching. The obvious conclusion was that it is not so much important what learners see but what they do with visualizations [53].

6.2 A research framework for empirical evaluation

Hundhausen’s work was groundbreaking for the research on the effectiveness of algorithm visualizations in the sense that it set a new trend. In the light of his meta- study, the focus of research has shifted from studying representational aspects of the programs towards examining the level of engagement that learners exert when learning with visualizations. In 2002, a working group of ACM’s Special Interest Group on Computer Science Education (SIGCSE) put forward a research framework [87] including a taxonomy of learner engagement, together with a number of hypotheses and testing methods for further evaluations. We describe the framework in some detail, as it is the basis for the empirical study conducted in our own work which is reported in Chapter 7.

6.2.1 The engagement taxonomy

In a learning scenario, students exert a certain level of engagement with the visualization of an algorithm. The actual degree of engagement certainly depends a great deal on the students’ attitudes towards the learning contents and methods and their willingness to engage. However, the visualization itself and the context in which it is used also allow or even enforce a certain level of engagement. A rough division in several discrete categories can be made. The six levels of learner engagement as proposed in [87] are:

(1) NO VIEWING: This category describes the situation where learners are not provided with any visualization and is thus the default case.

(2) VIEWING: This is the most basic form of engagement with visualization which includes all the following categories. When purely viewing, learners watch an algorithm visualization more or less passively, i.e. without any interaction other than navigational controls of the execution or changing between different views. In particular, there is no interaction with the algorithm under consideration. 6.2 A research framework for empirical evaluation 99

(3) RESPONDING: In this category, learners still cannot manipulate the visualization, but at certain points, the visualization is interrupted and learners have to answer questions or quizzes before the visualization proceeds. Examples of questions could be predictions (“what will happen next?”), assigning a segment of the code of the algorithm to the currently visualized part, assessing the correctness of the algorithm, or efficiency analysis.

(4) CHANGING: Students can modify the visualization. For instance, they have to select the input to the algorithm so they can compare the behavior in different cases. Another example would be for learners to choose a sequence of operations carried out on a visualized data structure. Depending on the algorithm or data structure, changing can be done either offline (i.e. before the algorithm visualization starts) or online, in the course of the visualized algorithm.

(5) CONSTRUCTING: In this category students are expected to create their own visualizations of an algorithm. This is the most diverse category in the taxonomy, as there is a multitude of options for constructing visualizations. We distinguish three major subcategories: a. Hand-constructed visualizations are not connected to the algorithm; they can be created as a movie with any given animation editor [104] or even without computer assistance using art supplies [50]. This type is sometimes also referred to as “low fidelity” or “low tech” algorithm visualization and has been extensively studied by Hundhausen, who also developed the SALSA language for the easy creation and presentation of such low-fidelity algorithm animations [51]. Obvious advantages are very short production times and concentration on the workings of the algorithm rather than on implementation details. b. Direct construction builds on an implementation of the algorithm. Students either map a given algorithm to a visualization or annotate the algorithm with visualization commands, or they completely program the algorithm themselves from scratch together with a visualization. c. The third construction scenario has the learners work on a given graphical representation, on which they are expected to simulate the steps of the algorithm. The graphical representation is linked to an actual implementation of the data structure and can be manipulated via a user interface. This approach is suitable for assessing students’ knowledge and has been used for exploratory learning [35] and automatic assessment [75], and is also used in our own evaluation, which is described in Chapter 7. Note that although some parts of this subdivision are described in [87], the heterogeneity of this category is not further addressed when hypotheses about the effects on learning are proposed. We will discuss later why such a division should be an explicit part of both the taxonomy and the hypotheses. 100

(6) PRESENTING: Learners themselves present the algorithm or data structure to an audience using visualizations – either created by themselves, or existing ones that they find helpful.

Note that although these categories are, in a certain sense, ordered by an increasing level of engagement with visualizations, the list is not to be understood as a hierarchical scale. With the exception of categories 1 and 2, no category necessarily includes or excludes any of the others. Instead, the last four categories can occur in any combination in a specific learning scenario. Also note that only categories 3, 5 and 6 allow students to actually make mistakes.

6.2.2 Hypotheses regarding learner engagement

The major hypothesis of the framework in [87] is that each level of engagement will result in significantly better learning than the previous ones. For instance, one sub- hypothesis claims that RESPONDING should result in significantly better learning than VIEWING.

Note that there is one exception to this: level 2 (VIEWING) is not hypothesized to result in better learning than level 1 (NO VIEWING). In fact, one hypothesis is that passively viewing an algorithm visualization will not improve learning when compared to no visualization. This claim may seem rather surprising at first, but the hypothesis – like all others – is borne out of and consistent with the results of the majority of former evaluations, most notably the ones reviewed in the meta-study by [52]. In addition, when several levels are combined, one additional hypothesis can be paraphrased as “more is better” [87]. This means that scenarios including more than one of the levels 3 to 6 in the taxonomy will result in better learning than those with only a single level of engagement.

6.2.3 Methodology

In addition to the taxonomy and hypotheses, the research framework provides guidelines for the practical realization of future experiments. These guidelines include examples of test scenarios, each comparing two or more engagement levels. The guidelines also contain possible test questions for assessing different types of learning and understanding according to the well-known taxonomy proposed by Bloom and Krathwohl, who distinguish six different levels of educational objectives: knowledge, comprehension, application, analysis, synthesis, and evaluation [14]. Moreover, other aspects to be measured and covariant factors which might influence the results are proposed. For example, a study conducted by Naps and Grissom found that the experience of students with the visualization tool, as well as the fact whether the outcome of a test counts toward the final grade or course credit, can also influence the outcome of an experiment [88]. 6.3 First evaluations within the framework 101

6.3 First evaluations within the framework

Several studies have been carried out within the proposed framework previous to our own experiment reported in the next chapter. It is important to note that most studies did not strictly adhere to the guidelines suggested in [87]. Nevertheless, the framework has at least provided a common language which makes it much easier to compare experiments, even though the results may not always be directly comparable. The study conducted by Goldstein et al. tested the effectiveness of an interactive simulation tool in the computer networks domain [40]. Students were able to manipulate a virtual network that was visualized; hence the engagement level in the experiment was CHANGING. While the evaluation found significantly improved understanding of the learners after the self-study session with the visualization tool (as compared to before the session), the results of a control group who had a traditional tutor-led session on the same topic could not be used for comparison (for statistical reasons). Hence, unfortunately, this experiment shows only that an additional practical session with a simulation tool can foster learning. However, we do not know whether it is better than a traditional session, nor can we conclude that the simulation itself or the engagement with it was actually responsible for the improvement. It would be possible that reviewing the same topic from a textbook for the same amount of time could have led to the same improved understanding. On the other hand, an experiment carried out at the same institution [30] and on a similar topic found evidence for the hypothesis that mere VIEWING of an animation will not improve learning . The study was originally designed to compare verbally narrated animations with and without an accompanying on-screen text of the narration. No differences in the learning outcome could be detected between the two conditions. Moreover, apparently no learning at all was going on; students did not do significantly better in a post-test than in the pre-test conducted before the visualization session. Naturally, while other influences on the result, such as the introductory lecture on the topic, can never be excluded, it appears that simply watching movie-style animations as accompanying materials does not result in improved learning. Since the two studies above were conducted by (partially) the same researchers and under very similar conditions, the results are better comparable than those of completely independent studies. Taken together, they provide at least some indirect evidence for the hypothesis that CHANGING (level 4) is better than VIEWING (level 2): if VIEWING is not better than NO VIEWING [30] but CHANGING is significantly better than NO VIEWING [40], then CHANGING should also be better than VIEWING.

Grissom et al. compared the effects of the levels NO VIEWING, VIEWING and RESPONDING [88]. They found no significant differences between NO VIEWING and VIEWING (as predicted by the hypothesis), but did not find any significant difference between VIEWING and RESPONDING either (contrary to the hypothesis). However, a significant improvement was measured between NO VIEWING and RESPONDING. Hence, while the overall claim that more engagement is better was supported, not all of the individual hypotheses were substantiated. The differences in the learning outcome between increasing levels of engagement may not be as discrete as the levels proposed in the 102 taxonomy, but might be rather gradual. Moreover, as we shall point out later, there can be differences regarding the degree of engagement within the VIEWING category.

Chapter 7

An empirical study on the influence of the engagement level on the learning outcome

7.1 Introduction

The evaluation conducted in this work compared the effect of learner engagement between the levels of VIEWING, CHANGING and CONSTRUCTING. Unlike most other studies known to us, a rather complex data structure and its operations were chosen as the topic to be taught. While sorting algorithms have been used in many previous studies (it is probably no exaggeration to call the “fruit fly” of algorithm animation research), there are good reasons for not reusing the same algorithms over and over again. One is that the topic to be taught is itself likely to be an influencing factor on the outcome of the experiment. By increasing the diversity of algorithms, the pool of experimental results will get more reliable and less dependent on this factor. Further reasons for the choice of the particular topic are given below. The following section provides a detailed description of the experiment. The results are presented and critically discussed in Section 7.3, while Section 7.4 lists additional findings.

7.2 Experiment

An empirical user study (N = 96) was carried out in order to assess the impact of learner engagement on the effectiveness of algorithm visualizations for deepening the procedural understanding of operations used to manipulate a complex data structure. Following the taxonomy and research framework developed in [87], the engagement levels 2, 4 and 5 (VIEWING, CHANGING, and CONSTRUCTING) were compared. The hypothesis was that CHANGING would result in significantly better learning than VIEWING, and CONSTRUCTING was expected to yield significantly better results than CHANGING. 104

7.2.1 Test design

A between-subjects design of three conditions was used, i.e. each participant belonged to one and only one of three different treatments. (In a “within-subjects” design, each participant would be exposed to all treatments successively.) The conditions were identified according to the degree of interaction the participants had with the animated visualization of a complex data structure.

Students in condition VIEWING (N = 32) watched animations of sequences of operations carried out on sample structures, with interaction restricted to starting, pausing, step forward/backward, and adjusting the animation speed.

Participants in condition CHANGING (N = 31) were free to interactively select the next operation to be carried out on the sample structure together with their parameters (i.e. to determine the input).

The students in condition CONSTRUCTING (N = 33) had to construct a given sequence of operations out of small “building blocks” (sub-operations). Note that this type of CONSTRUCTING the visualization did not involve any programming. Rather, it can be seen as simulating the algorithm on a predefined visualization (cf. Chapter 6.2.1). Instead of selecting a complete operation as in treatment CHANGING, users had to choose small sub-operations in the correct order to assemble a given operation. One notable difference to the other conditions was that this treatment allowed students to make mistakes (and get immediate feedback on the correctness of their actions). However, we note that this is a rather weak form of CONSTRUCTING since learners neither code the algorithm nor construct their own visual representations.

7.2.2 Participants

The test subjects were 96 students in an introductory course on algorithms and data structures taught at Albert-Ludwigs-Universität Freiburg during the summer term of 2005. All students participated in the study for course credit. The experiment was described to them as a multimedia exercise, and they were informed that the (anonymous) results would be used for scientific evaluation. The multimedia exercise replaced one of the compulsory weekly assignment sheets and participants got full credit (12 points) for mere participation in the experiment. As an additional incentive, a bonus credit of up to 10 points could be achieved, depending on the performance in the exercise. This bonus – together with other bonus points earned during the semester – could improve the overall final grade for the course. Regarding their academic status, the participants mainly fell into one of the following groups, as can be seen in Figure 7.1. The majority (50%) were first-year students majoring in computer science. Eight participants had mathematics as their major subject, also in their first year. Twelve students were physics majors, most of them in their second year. Computer science has become a popular minor subject in the biology curriculum, and eleven such students, all of them in their graduate studies, participated 7.2 Experiment 105

Other 9

Teacher 8

Mathematics 8 Computer Science 48

Biology 11

Physics 12

Figure 7.1: Grouping of participants according to major subject of studies. in the test. Another eight people studied to be teachers in secondary education, which is a separate curriculum and degree. The remaining nine participants, who did not fall in any of these groups, majored in such diverse subjects as anthropology, history, hydrology, psychology, sociology, or languages, with computer science as their minor subject. 80 participants were male, 16 female. 73 students were native speakers of German (the course language); the others had diverse cultural backgrounds. None of the participants had any previous experience with the visualization system that was used in the experiment.

7.2.3 Contents and learning objectives

As learning contents for the evaluation, the data structure was used [39]. Fibonacci heaps are one efficient implementation of priority queues, which are used in graph and network algorithms such as Dijkstra’s one-to-all shortest path algorithm or the computation of minimum spanning trees (cf. [94]). They support the priority queue operations, inserting a new element, finding the element with minimum priority, decreasing the priority value of a given element, and merging two disjoint queues, all in O(1) amortized time. The remaining priority queue operations, i.e. deleting the element with minimum priority and deleting a given element, take amortized time O(log n), where n is the number of elements contained in the Fibonacci heap. It is not the purpose of this work to give a detailed description of Fibonacci heaps. The data structure has found its way into many textbooks [22, 94] and is usually taught in advanced courses on algorithms and data structures, where it serves as a non-trivial example for amortized analysis. It was selected for this evaluation in a first-year introductory course mainly for two reasons: first, the topic was “exotic” enough to ensure that the participants had not heard 106 of it before, so testing for previous knowledge was not necessary. Second, the topic is not easy for undergraduate students and hence appears to be a good candidate for visualization. Even more, the data structure is complex enough that – in our opinion – the extra effort of creating and employing a dynamic visualization is justified. Nevertheless, the subject fit very well in the overall course syllabus: important prerequisites such as doubly-linked circular lists, heaps, Fibonacci numbers and easy examples of amortized analysis had been covered before, and an application – Dijkstra’s shortest-path algorithm – was going to be taught towards the end of the term. Our study focused on students’ procedural understanding of the priority queue operations and how they are carried out on Fibonacci heaps, not a detailed analysis of the algorithms. In terms of Bloom’s taxonomy, the educational objectives relevant for the evaluation were at levels 1 and 2 (knowledge and comprehension) [14]. The findings by Grissom et al. suggest that especially these lower levels on Bloom’s scale are supported by visualizations [42]. Hence, while knowledge beyond the comprehension level was not expected to be fostered by the visualizations, some questions were included in the post-test which asked for higher-level concepts such as application, analysis and synthesis.

7.2.4 Interactive visualization

The visualization used in the study can be seen in Figure 7.2 and was developed in the context of the Multimedia Algorithms & Data Structures Assessment (MA&DA) framework [62, 130]. This system is built on top of the JEDAS animation library and is intended for assessment of learners’ understanding of algorithms and data structures through construction exercises. The main idea for this type of exercises is to decompose the operations applied to a data structure into smaller building blocks, each of which can be carried out and visualized separately. Learners are expected to construct a series of operations on a given example by selecting the appropriate building blocks in the correct order. The same approach has also been used in other systems such as SALA [36] and TRAKLA [75]. The MA&DA framework is flexible in three important aspects: First, for the creation of new assignments it supports manual, semi-automatic, or fully automated generation of exercises at different levels of difficulty. Second, for learners, different types of feedback are offered, ranging from immediate feedback with error correction via several intermediate levels until zero feedback in the case where exercises are graded [8]. Third, the resulting visualization can either be graded automatically, which makes the system very attractive for mass courses, or – if detailed feedback is desired – they can be marked manually by a tutor, who can graphically annotate the visualizations and add textual comments. The architecture of MA&DA is realized as a framework-and-plugin concept. The framework provides the applications, and individual data structures are implemented as plugins or modules. Currently, there are modules for AVL trees [74], binomial queues and Fibonacci heaps. Each plugin consist of an implementation of the respective data structure and a JEDAS visualization of it. The algorithms operating on the data 7.2 Experiment 107

Figure 7.2: Screenshot of the interactive visualization. structures must be divided into sub-operations, each of which can be executed separately. In addition, a knowledge base defines the correct sequences of sub- operations and contains meaningful feedback messages for incorrect sequences. The system and its design are described in detail in [116]. For the developer of a new module, it is important to define sub-operations of suitable granularity. If they are too fine-grained students may quickly get lost in unnecessary detail. This would be the case, for instance, if each single pointer update in a Fibonacci heap was counted as a basic unit (because of the doubly-linked lists used in Fibonacci heaps, a simple insertion requires at least 4 pointer updates). On the other hand, the building blocks must be simple enough to be understood as a single basic unit. Though this decision is up to the author and hence arbitrary to a certain degree, there is a large agreement between different instructors on the same topic. Usually, the sub-operations are same as those used when explaining the algorithms in a traditional face-to-face setting. One rule of thumb is that they should be units each of which takes constant time [62]. For the present visualization of Fibonacci heaps, we decided on the following “atomic” sub-operations as building blocks: (1) Create a new node and insert it into the root list (next to the minimum node). (2) Remove a node from the root list and replace it with the list of its children. (3) Set the priority of a node to a different value. (4) Cut a non-root node (together with all its descendants) off its parent node and insert it in the root list (next to the minimum node). 108

(5) Mark or unmark a node. (6) Link two root nodes by making the second selected node a new child of the first selected node. (7) Update the pointer to the node with minimum priority. All the Fibonacci heap operations can be constructed out of these basic methods. For example, the decrease operation is built from steps (3), (4), (5) and (7), where (4) may have to be carried out more than once (in cases of so-called cascading cuts). Similarly, deleting a given node (which is not the minimum) is done by (4), (5) and (2). The most complex operation, deleting the minimum node, is achieved by carrying out steps (2), (6), (5) and (7), with possible multiple instances of (6) and (5). It is important to stress that the visualization alone was neither designed nor expected to provide enough understanding of the topic without prior knowledge of the data structure. The session was therefore the second part of a unit on Fibonacci heaps which started with an expository phase of instruction in the form of two lectures. The intended purpose of the visualization was to deepen students’ procedural knowledge of the Fibonacci heap operations so they would also be able to transfer that knowledge and apply the algorithms on arbitrary examples.

7.2.5 Preparatory materials

Two 90-minute lectures introducing the topic were given by the author the week before the actual experiment. The first lecture described the data structure and the involved algorithms in terms of pseudo-code and verbal descriptions, while the second one provided an amortized worst-case analysis of the complexities of the operations using the so-called accounting method (cf. [22]). Several visual representations of Fibonacci heaps were shown in the lectures; all of them were static ‘before–after’ images (see Figure 7.3), but looked similar to the visualizations used later during the test. As has been pointed out by Grissom et al., it is important to make sure the visualization and other materials are consistent with each other in order to avoid confusion and possible negative effects on students’ learning [42]. This is particularly important for the chosen topic, as several implementations of Fibonacci heaps can be found in the literature which differ from one another in important aspects (e.g. implementation of the delete method, time when nodes are unmarked, etc.). For this study, we consistently followed the original implementation proposed in [39], which is slightly different from some textbook implementations (e.g. [22, 94]). Like all other lectures of the course, the two on Fibonacci heaps were recorded using the Lecturnity system and made available shortly after the lecture as multimedia documents in several different media formats (with or without the lecturer’s video picture) [124, 129, 131]. The lectures are freely available [132]. Figure 7.3 shows the recording as students would see it in the Lecturnity player software. Students were used to this procedure, and our surveys have shown that the recordings are frequently used, both by students who have missed the class and attendees who use them for revision [69]. 7.2 Experiment 109

Figure 7.3: The recorded lecture as seen in the Lecturnity player [129], including static ‘before – after’ visualizations and the lecturer’s hand-written annotations.

The participants were told in advance that the multimedia exercise was about Fibonacci heaps and they were asked to attend the respective lectures and/or watch the recordings. Also, the original presentation slides were made available as PowerPoint and PDF documents. It should be noted that on several of the slides there were intentional blanks which were going to be filled by the lecturer during the presentation, so merely reading those static slides was not a recommended option for the students. In the survey preceding the experiment (see Appendix B.1), participants were asked whether and how they had watched the introductory lectures. The answers are shown in Figure 7.4. It may be surprising that the number of students who attended the live lecture is less than 40%. However, these numbers are in line with usual class attendance. As our own studies have shown, many students use the recordings as a complete substitute for the live lectures [47]. Unfortunately, though not quite unexpectedly, about 25% of the participants had neither attended the live lecture nor watched the recordings and more than 10% had not even read the slides. This difference in familiarity with the contents was strongly reflected in the test scores, as will be discussed in the results section.

7.2.6 Procedure

The evaluation took place 4 and 6 days after the second introductory lecture (with a weekend in between), during the usual time of the tutorials, in a closed lab condition. Students were asked to bring their own laptop if they had one. All the others were provided with laptops from the university, so each participant had his or her own laptop 110

Live + Nothing recording 8.3% 7.3% Other materials 3.1% Live lecture 31.2% Slides only 13.5%

Recording 36.5%

Figure 7.4: Student preparation with accompanying materials. to work on. Students were not prevented from talking to each other, but were encouraged to do the exercise on their own. Due to the number of participants on the one hand and available space and equipment on the other, the evaluation took place in 6 separate sessions, each of which was supervised by two people (a student tutor and one of the experimenters). The participants were randomly assigned to one of three groups, A, B and C, corresponding to the three treatments CONSTRUCTING, CHANGING and VIEWING. Depending on the group, students got the respective visualization, which was installed on their computer, plus an instruction sheet about its usage. In addition, they got a sheet explaining the seven exercises to be worked on with the help of the animations. The overall preparation took about 10-15 minutes. Students then had up to 50 minutes time to work with the visualizations. The exercises, which were the same for all treatments, consisted of sample instances of Fibonacci heaps, on which given sequences of operations were (to be) carried out.

The VIEWING group could only watch the visualizations. Each example started with the visualization of a given Fibonacci heap and a smooth animation showed all the operations carried out in sequence. Interaction was restricted to starting, pausing, step forward/backward (where a “step” was a sub-operation as defined above, not a complete operation), and changing the animation speed. A description of the operations and all sub-operations was provided in a separate frame (cf. Figure 7.5).

Participants in condition CHANGING were presented with the sample structure and then had to explicitly select the next operation by interacting with the visualization. For instance, in order to decrease the key value of a node, the node itself had to be right- clicked, the command “decrease-key” had to be selected from a menu and the new key value had to be chosen (see Figure 7.6). Then the complete action was carried out in one smooth animation with the name of the current operation and its sub-steps displayed below the visualization. Also, the animation could be paused and resumed and the speed could be adjusted. For technical reasons, no step forward/backward option and no rewinding capability were provided. We discuss possible implications in Section 7.4.1. It should also be noted that the granularity of steps in this treatment was complete 7.2 Experiment 111

Figure 7.5: Visualization environment for treatment VIEWING. Navigation was possible through the buttons (bottom right) or by choosing a (sub-)operation from the list.

Figure 7.6: Sceenshot of the environment for treatment CHANGING. The next operation could be selected from a popup menu. No backward navigation was possible. operations, not sub-operations, i.e. the animation was not stopped after each sub- operation but only after the whole operation was completed. 112

The third group, condition CONSTRUCTING, had to assemble the operations from the “atomic” sub-operations. For example, in order to decrease the key value of a node here, participants first had to set the key to the new value, then cut the node from its parent (if the heap constraint was violated) and insert it into the root list, perform additional cascading cuts if necessary, and update the Fibonacci heap’s minimum pointer. This could all be done by right-clicking the respective object in the visualization and choosing the appropriate action from a context-sensitive menu. Immediate feedback signaled whether the action just completed was correct and if not, some additional textual explanation was given. Then the student had the chance of trying it again. After the same step was carried out incorrectly three times in a row, the correct one was shown and explained. Figure 7.7 shows a screenshot of the visualization environment. The last treatment definitely forced students into the most activity, which could be clearly seen by their constant mouse actions during the experiment. Treatment CHANGING also coerced activity, but in larger time intervals: only after a complete operation was finished, the next one had to be selected. In the VIEWING condition, the degree of interaction strongly depended on the learner’s preferences; students could watch the complete series of operations in one continuous animated sequence without any interaction, or navigate stepwise with mouse actions after each sub-operation. In addition, it was possible to go backward step by step. During all of this phase (see Figure 7.8), students in all treatments could ask the supervisors for help if they had any problems. This occurred very seldom. After spending up to 50 minutes with the visualizations (students could stop earlier if they wanted to), the participants were asked to shut down their computers and were given the final questionnaire. This questionnaire was the same for all participants and can be found in Appendix B.3. It consisted of three parts: There was a one-page section asking for personal information (major subject, age, gender, etc., and some questions about the learning behavior). The actual test consisted of nine exercises on Fibonacci heaps: five of them were of the same “visual” type as the ones the students had just worked on: in each of these questions, participants were presented with (a graphical representation of) a sample Fibonacci heap and the task was to carry out one or more operations on the structure and draw the resulting state of the structure. These questions tested procedural knowledge and comprehension, and the hypothesis was that the average results should differ significantly between the treatments. Question 6 was also a “visual” question, but required analytical thinking: for 3 small Fibonacci heap examples (each consisting of only 4 or 5 nodes), the students had to find a sequence of operations by which it could have emerged from an initially empty heap. The remaining questions tested for higher-level knowledge on Bloom’s scale, such as analysis, application, and synthesis [14]. For example, question 9 asked students to implement a new operation, increase (in analogy to the decrease operation) and analyze its complexity, the difficulty being that increasing a key is more complex than decreasing. In question 7, the participants had to apply Fibonacci heaps to the problem of sorting a set of elements and compare the time and space complexities with other sorting algorithms known to them. Question 8 proposed an apparent improvement of 7.2 Experiment 113

Figure 7.7: Screenshot of the visualization tool as seen by students in treatment CONSTRUCTING. The given operation (top right) had to be assembled by selecting the correct sub-operations from a popup menu. Navigation was possible in either direction.

Figure 7.8: Photo of one of the experimental sessions.

Fibonacci heaps by changing two of the heap operations, and asked students whether it would really improve the running times (which it did not). It was not expected that the type of knowledge tested by these questions would be fostered by the visualizations, or that there would be any significant differences between treatments. In fact, the final three questions were considered rather difficult and mainly 114 aimed at assessing students’ overall insight in the structure in order to see whether it is appropriate to teach such complex data structures in a first-year course. The final page of the questionnaire contained seven questions on students’ personal opinion about the visualizations, such as how easily they felt they could follow the animations; whether they thought that text and/or static images would be sufficient; or whether the visualizations should have been more interactive (cf. Appendix B.3). The sheet also provided additional space for students’ own comments. Total time for completing this questionnaire was one hour. The overall evaluation took about 120 minutes.

7.3 Results

In this section the results of the experiment are presented and discussed in detail. Since the findings for the major hypotheses were unexpected, further analyses were carried out in order to identify other influencing factors.

7.3.1 Data analysis

The exercises in the test were marked and graded by two persons independently and double-checked in cases of different markings. A maximum of 10 points could be achieved for each exercise in order to allow for differentiation in cases of partial solutions or minor errors. The exercises were grouped into “visual” questions (exercises 1-5) and “non-visual” questions (exercises 6-9), and the sum of the score for both groupings as well as the total score for all questions was calculated for each participant. All data were collected and analyzed using the SPSS statistics software [135]. In order to test for significant differences, two types of tests were used. For normally distributed data, standard t-tests can be used. This was the case for the total scores over all questions but not when visual and non-visual questions were considered separately (see Figure 7.9). In cases where no normal distribution can be assumed, non-parametric tests have to be used. In our analysis, we have applied the well-known Wilcoxon rank-sum test (also known as Mann-Whitney-Wilcoxon test or Mann-Whitney U test) in all cases of non- normal distribution. The significance level we use is p < 0.05, i.e. differences are assumed to be statistically significant if the probability of a Type-I error, i.e. that the differences have resulted by pure chance, is less than 5%.

7.3.2 Influence of preparatory materials

As was expected and can be clearly seen from Figure 7.10 (left), test scores differed significantly between students who had followed the lectures (live and/or as a recording) and those who had not (two-sample t-test: t = -7.062, df = 94, p < 0.001). The differences between those two groups were also found to be significant when visual and other questions were considered separately (p < 0.001 in both cases). Figure 7.10 (right) 7.3 Results 115

Figure 7.9: Test scores of students with regard to their method of preparation (left: average scores of all questions; right: distribution of scores for visual questions). shows a box plot for the scores of the visual questions, where the participants are grouped into lecture viewers, the ones who only read the static slides, and all others. In a box plot, the horizontal line indicates the median; the filled boxes represent regions containing 50% of all samples, while the span covered by the vertical lines is the whole range of values with the exception of outliers (or mavericks). These values, which are considerably further away from the median than all others, are plotted as individual dots. The difference between scores for the given groups is even stronger for the non-visual questions. This supports the assumption that the preparatory lecture was more important for answering those questions and that the animations turned out to be helpful mainly for the visual type measuring procedural comprehension. Among those students who had not followed the lectures, the difference between scores of participants who said they had read the slides only and those who had not prepared at all was not found to be statistically significant (t = 0.916, df = 22, p < 0.37). This was unexpected, as the slides did contain considerable information, which could, at least to a great part, be understood without the verbal narration. One possible explanation for the low scores of the “slides only” students is that some of them had not seen the lecture materials at all or had only had a quick glance at the slides but were reluctant to admit this and thus checked the “slides” box. This assumption is supported by the great range of the scores in the “slides only” group, as can be seen in the box plot in Figure 7.10 (right). We will subsequently refer to the participants who had followed the lectures as prepared students and to all others as unprepared students (even though the latter may have read the static slides). Due to the dramatic difference between the scores of these two groups, we will from now on restrict most of the analysis to a filtered data set containing only the prepared students (N = 72), except when stated otherwise. Overall, the test subjects performed surprisingly well on the visual questions. An average of 43.5 points out of 55 (i.e. 79.1%) were reached, which had not been anticipated given the complexity and novelty of the topic. Even the unprepared students achieved 49.3% of the score on average. For non-visual questions, 16.5 out of 44 points 116

Figure 7.10: Test scores of students with regard to their method of preparation (left: average scores of all questions; right: distribution of scores for visual questions).

(37.5%) were reached on average by the prepared students and 5.1 (11.6%) by the unprepared students. The lower scores for non-visual questions came as no surprise since they required higher-level understanding that was not expected to be fostered by the visualizations.

7.3.3 Levels of engagement

The main question addressed in the experiment was the influence of the engagement level on the learning outcome, i.e. whether the test scores of the participants, in particular those of the visual questions, would differ significantly between the three treatments. The results are listed in Table 7.1 and visualized in Figure 7.11.

Table 7.1: Test scores compared between treatments. The number on top of each box is the mean score in the treatment; the bottom number shows the standard deviation.

Visual Other All Treatment questions questions questions VIEWING 45.08 15.35 60.42 (N=25) 6.60 9.82 14.35 CHANGING 41.57 15.62 57.19 (N=21) 7.52 9.43 14.96 CONSTRUCTING 43.36 18.40 61.76 (N=26) 9.34 12.88 20.04 All 43.46 16.49 59.94 (N=72) 7.92 10.82 16.58

Contrary to the hypotheses, no significant differences between the three treatments could be detected when comparing each pair of treatments. In fact, there are hardly any differences, except that the students in treatment CHANGING did slightly worse on visual questions than the others. We will comment on this later (cf. 7.4.1). Although the best overall results were found in treatment CONSTRUCTING, the same was true for the weakest results. It would seem that some people benefited from the higher level of engagement while others did not. However, a closer look at the participants within that treatment revealed no other differences which would explain the greater variance. 7.3 Results 117

Figure 7.11: Test scores according to experimental treatment. Left: average scores for visual (blue) and non-visual (green) questions; right: box plot of total scores.

When interpreting the findings, before rejecting the hypotheses on the grounds of non- significant results, it makes sense to have a look at possible other reasons and influencing factors, e.g. flaws in the design of the study or shortcomings in the analysis. For example, it is conceivable that – in spite of the similar test scores – the participants in different treatments made different types of mistakes. In order to check this possibility, the exercises were reviewed again and a detailed error analysis was carried out. In particular, we distinguished between careless mistakes (for example, when copying from one figure to the next in a sequence), procedural errors (for instance, destroying the heap order by a wrong link operation), and errors of “algorithmic detail” (for example, inserting a node at a position to which no pointer was available at that stage in the algorithm). It is especially this last type of error that was expected to be less frequent in the CONSTRUCTING treatment because of the more detailed simulation of the algorithms. However, our detailed review did not reveal any noticeable difference regarding the types of mistakes across treatments. Another possible explanation for the results would be that the (visual) questions in the post-test were too easy or graded too generously, or that the scoring system was not suited to allow for significant differences between the treatments; this assumption would be supported by the strong overall performance of the participants. However, if this were the case, differences between other groups should also be minimal and non- significant. In order to test this, we analyzed the test scores with respect to additional factors gained from the questionnaires or other sources and found significant differences for some of these variables.

7.3.4 Differences with respect to other variables

The test scores were analyzed for differences according to other variables. Among them were demographic aspects such as gender and native language, the major subject of participants, but also factors concerning learning styles. For instance, participants had 118 been asked in the questionnaires how often they attended the lectures and the tutorials, and whether they preferred to work on exercises alone or in groups. In addition, the test scores were put in relation with the participants’ overall performance in the course. All results are listed in Table 7.2.

Table 7.2: Test scores by other variables. Significant differences are marked red.

Variable Visual Other All questions questions questions Personal background Gender Female 43.31 16.00 59.13 (N=13) 8.31 11.86 17.96 Male 43.49 16.59 60.08 (N=59) 7.91 10.68 16.42

Native language German 44.73 19.20 63.93 (N=55) 7.42 10.18 15.43 Other 39.35 7.71 47.06 (N=17) 8.33 7.89 13.61

Major subject Computer Science 44.10 17.46 61.56 (N=39) 6.99 9.48 14.20 Mathematics 51.57 27.00 78.57 (N=7) 2.88 5.63 6.02 Physics 44.63 16.62 61.25 (N=8) 5.80 11.43 15.66 Biology 37.50 10.50 48.00 (N=6) 8.87 12.45 17.34 Teacher 37.50 4.25 41.75 (N=4) 9.40 3.77 17.34 Other 39.50 13.00 52.50 (N=8) 9.91 13.38 20.64

Learning style Lecture Always 42.04 14.93 56.96 attendance (N=28) 7.78 10.09 15.47 Often 42.50 14.93 57.43 (N=14) 10.52 11.95 20.19 Seldom 44.40 16.93 61.33 (N=15) 7.71 9.01 14.57 Never 46.07 20.40 66.47 (N=15) 5.13 12.63 16.41

Attendance in Always 43.58 16.60 60.19 tutorials (N=43) 7.29 9.89 15.09 Often 35.80 8.70 44.50 (N=10) 10.30 8.76 15.26 Seldom 45.55 16.73 62.27 (N=11) 5.20 11.38 15.31 Never 49.50 25.25 74.75 (N=8) 3.16 11.93 13.66

Exercises Alone 44.17 16.28 60.46 (N=46) 6.55 11.72 16.61 Group 42.19 16.85 59.04 (N=26) 9.92 9.23 16.80

7.3 Results 119

7.3.4.1 Personal background No differences were found between the genders; male and female participants performed equally well for both visual and non-visual questions, as can be seen from Table 7.2. However, the mother tongue of the participants did have a serious influence on the test results: non-native speakers of the course language performed significantly worse than native speakers, both for visual (p < 0.01) and much more for non-visual questions (p < 0.001), which can be clearly seen in Figure 7.12.

Figure 7.12: Test scores according to native language: average scores for visual (blue) and non-visual questions (green).

7.3.4.2 Major subject When analyzing the results with respect to participants’ major subjects, we found that students majoring in mathematics performed significantly better than the rest. This finding is consistent for both types of questions (p < 0.001 for visual, p < 0.006 for non- visual questions). Even the weakest mathematics student achieved a higher score than 77% of the students in any other group. There are two possible tentative explanations for this strong difference: one is that the analytical skills developed during the first year of mathematics studies lead to a great advantage for the field of algorithms and data structures. The other one is that the general perception of mathematics among prospective students is that of a rather hard subject, while computer science seems to be perceived as being easier. Hence, students who enroll in mathematics as a major will be more likely to have had the analytical skills already. These two assumptions do not exclude each other and are both supported by the very low variance of scores in the group of mathematics majors, as can be seen in the box plot in Figure 7.13 (right). On the negative side, biology and teacher students did worse than the average, even though the difference was not found to be statistically significant (p < 0.188 for teacher students; p < 0.067 for biology majors) among the prepared students. A straightforward 120

Figure 7.13: Test scores depending on major subject of studies. Left: average scores for visual (blue) and non-visual (green) questions; right: box plot of total scores (all questions). explanation for the biology students’ weak results is that most of them were also attending an intensive lab course requiring their full attention during the week when the experiment took place. An explanation for the even weaker performance of the teacher students is rather hard to find, in particular because this was the most diverse group with respect to year of study and second major subjects (all students studying to become a teacher are required to major in at least two subjects).

7.3.4.3 Learning styles The students’ styles of learning also reflected some differences in their performance in the test. From the questionnaires, we obtained information on the general attendance in the live lectures, the tutorials and whether students preferred to learn alone or in groups when working on the weekly exercises. General attendance or absence in the live lectures (not to be confused with attendance in the specific introductory lecture on Fibonacci heaps) seems to have no significant effect on the performance, although there is a slight (but non-significant) tendency that non- attendees performed better in the test (see Table 7.2). When attendance in the tutorial meetings is examined, an interesting picture emerges which can be seen in Figure 7.14: students who attended those meetings often but not always showed the lowest performance, while those who never attended the tutorials performed best on average. The difference between the often-attendees and the rest is significant for all questions (p < 0.003). The picture is similar for the non-attendees, who performed significantly better than the others on all questions (p < 0.006). An explanation for the latter might be that only “good” students who are certain of getting the exercises right can afford to skip the meetings. It is unclear, however, why the students who attend the tutorials often should do worse than those who attend always or seldom. No difference was found between the scores of students who prefer to learn in groups and those who prefer to work alone, when only the prepared students were taken into 7.3 Results 121

Figure 7.14: Test scores according to attendance in tutorials. Left: average scores for visual (blue) and non-visual (green) questions. Right: box plot for total scores. account. There is a visible (though non-significant; p < 0.110) advantage for the “loners” when all participants are considered.

7.3.4.4 General course performance In order to put the experimental results in relation to the overall academic performance of the participants, we tested for the correlation between the participants’ test scores and their score in the final exam at the end of the semester. The correlation was found to be statistically significant for the prepared students (Pearson: r = 0.600, p < 0.01, N = 65). As can be expected, this effect is even stronger if the non-visual questions are also included in the score (r = 0.675, p < 0.01). The correlation is visualized by the scatter-plot in the left diagram of Figure 7.15. Thus overall course performance turns out to be a fairly reliable predictor for the test score. However, the same correlation was not found to be significant among the unprepared students (r = 0.347, p < 0.18, N = 17; see Figure 7.15 right). This suggests that the students who did well may have profited more from the preparatory lectures than from the animations, which would again emphasize a strong influencing effect of the introductory lectures. At least for the topic examined here, the (purely expository) explanation of algorithms in a lecture appears to be a stronger influencing factor for the learning outcome than the level of engagement with visualizations. Hence, it may well be possible that the effects of different levels of learner engagement in a 50-minute session with visualizations, after 180 minutes of lecture on the same topic, are simply not strong enough to produce significant results. For all the above factors where different groups showed a significant difference between test scores, post-hoc examinations were carried out in order to check whether any of the respective groups were either underrepresented or overrepresented in any of the three experimental treatments. It was found that all such groups were evenly distributed 122

100 100

80 80

60 60

40 Test score Test score 40

20

20

0

0 20 40 60 80 0 20 40 60 80 Overall course perfomance (final exam) Overall course performance (final exam) Figure 7.15: Correlation of test general course performance and test scores of prepared (left) and unprepared (right) students. among the treatments, so the above effects could not have evened out possible effects caused by the engagement level.

7.4 Additional findings

The fact that no differences between the experimental treatments were found has one positive side effect: it is possible to use the data to look at other aspects which might have had an influence on the results. We have already seen some of these factors in the last section. In this section, we will look at the data from some different points of view in order to get some insights into aspects that have not been explicitly addressed by the study.

7.4.1 Importance of a rewind function and intermediate steps

As was pointed out above, students in treatment CHANGING did slightly worse, on average, in the visual questions than the others. A reason for the difference might be that in this condition, step-wise forward and backward navigation was not possible. In fact, there was no division in sub-operations either for this treatment and students were not able to rewind or undo their actions at all. Once an operation was selected, students had to watch and wait until it was carried out completely. If they felt they had missed or not understood an important point, they were not able to rewind but had to start with the same example from the beginning. Hence, while the engagement level in this treatment was higher than in condition VIEWING, the options for navigation were more restricted. Naps and Grissom have conjectured that both the navigation along intermediate steps and the rewind capability could be important factors for learning [88]. The motivation for that hypothesis was that the latter feature was the most striking difference between their own study on Quicksort visualizations, which found a significant difference between VIEWING and CHANGING, and a similar one that did not find any significant results [57]. Intuitively, the importance of these functions should also grow with the complexity of the topic, since the likelihood of the learner not understanding a certain 7.4 Additional findings 123 step will increase as well. The authors therefore suggest that a separate study should be conducted to further explore the importance of a rewind capability [88]. Although our own experiment was not explicitly designed to examine this question, our results may be able to contribute some information and give at least a tentative answer: the performance of the treatment with no rewind function and coarse granularity of operations was slightly lower for visual questions (41.9 points) than that of the other participants (45.9 points). When both the unprepared and the non-native speakers are filtered out because of the interfering effects of these factors, the difference between the scores on visual questions of the participants who had a rewind function and intermediate steps and those who could not use these options turns out statistically significant (Mann-Whitney U = 205.500, Z = –1.977, p < 0.048; N = 55). Hence, even though the effect does not seem to be very strong, it may well be that the presence or absence of navigational elements such as a rewind capability or the division into intermediate steps can account for better or worse learning. In fact, our results suggest that this type of interaction may be more important than the actual engagement level, at least for those levels compared in the study.

7.4.2 Learning styles

7.4.2.1 Live vs. recorded lectures It may be interesting to note that participants who had only watched the recordings of the lectures did not perform worse but slightly better than those who had attended the live lectures (cf. Figure 7.10), although the difference was not significant. This result might seem surprising at first, but is in line with earlier findings by Zupancic and Horz on the performance of students who learn with recordings as replacement for face-to- face lectures [123]; they also report a tendency towards a stronger performance of those students who watch the recordings. One possible reason would be the option to go through the lecture at the learner’s own preferred pace, being able to go over difficult parts several times if necessary, and other navigational features, whose influence may be underestimated, as we have just seen. Another explanation does not assume a direct causal relationship but a common cause: students who have enough motivation and self-discipline to learn on their own with only recordings (i.e. without the outside pressure of a fixed time and place of a face-to-face lecture) are likely to be more motivated in general and hence perform better than those who lack this skill.

7.4.2.2 Group work vs. working alone According to the survey, 57.3% of the participants work on the weekly exercises alone; the others prefer to work in groups. There are no notable differences in this ratio when the personal background (gender, native language) is considered separately. However, when looking at major subjects, the teacher students showed a dramatically different behavior: only one out of nine (11.1%) preferred to work alone. This confirms our own experience that the students studying to be teachers are strong group workers, which 124

Figure 7.16: Ways of preparation for students who work on exercises on their own and those who prefer to work in groups. seems to be an advantage especially in oral exams; interestingly, this was not the case in our experiment, where this group showed the weakest performance (cf. Figure 7.13). It may well be possible that group workers are at a disadvantage when there is not much time for preparation, as was the case in our experiment. To examine this further, we also tested whether these two groups prepared differently: as can be seen in Figure 7.16, the loners were far more likely to have prepared by watching the recordings only (47%) than the group workers (25%). Also, the percentage of unprepared students (i.e. slides only, other materials, or nothing) was almost twice as high for group workers (35%).

7.5 Conclusions

The study reported in this chapter could not confirm hypotheses on the effects of the level of learner engagement with algorithm visualizations as proposed in [87]. Instead, the test scores were strongly affected by whether the participants had seen the accompanying lectures and it correlated significantly with the participants’ general performance in the course. An explanation supported by the data is that the lectures introducing the topic had a greater influence on the test results than the visualizations. In retrospect, it might have been helpful to include a control group that would not have been exposed to any visualization but to texts on the same topic. In addition, the results suggest that navigational features such as the option to rewind an animation or to go through it in intermediate steps may in fact be more important than assumed previously. While sometimes listed as a key feature [105], this issue is not reflected in the engagement taxonomy [87]. We will address this problem and propose a refinement to the research framework in the next chapter. 7.5 Conclusions 125

One surprising finding was that students did not appear to have many difficulties with the rather advanced topic of Fibonacci heaps. As a consequence, this topic was also included one year after in the same course taught by a different instructor.

Chapter 8

Visualization effectiveness research: an overview

8.1 Further empirical studies

In this section we present very recent results which were carried out in parallel or after our own experiment and which add to the overall picture of when algorithm visualizations can be effective as learning aids. Most of them were presented at the 2006 Program Visualization Workshop or the 2006 ACM Symposium on Software Visualization.

8.1.1 VIEWING vs. CONSTRUCTING

The study by Urquiza-Fuentes and Velázquez-Iturbide [117] is the only other experiment known to us (except our own) comparing the VIEWING and CONSTRUCTING levels within the framework proposed in [87] and is probably the one that can best be compared to our own evaluation. However, their two treatments were more different from each other than the VIEWING and CONSTRUCTING groups in our experiment: while their VIEWING group also only watched a pre-fabricated animation, the CONSTRUCTING group received the source code of the algorithm and had to create an animation from it with the authors’ visualization system WinHIPE. Hence, their type of construction was different from ours, since students also had to deal with the code of the algorithm rather than just a visual representation. The authors found a significantly better learning outcome on the application level of Bloom’s taxonomy, but also regarding the comprehension level (for one out of four questions) [14]. However, this result must be treated with caution. The difference might also be attributed to the time the students spent with the visualizations: while both groups were allowed to take as much time as they needed, the students in the CONSTRUCTING group took almost three times as long as those in the VIEWING group on average. This is not surprising, as the latter had to actually create an animation and not 128 just watch one. However, it is unclear whether the longer time of exposure to the problem or the level of engagement was responsible for the different learning outcome. Nevertheless and despite the low number of only 15 participants, the significant results indicate that this type of active construction, which involves the code of the algorithm, can actually lead to improved learning.

8.1.2 VIEWING vs. NO VIEWING

The hypothesis that a simple “movie-style” animation is no better than no visualization at all is apparently challenged in the light of a study by Ahoniemi and Lahtinen [5]. They studied the effect of visualizations when students prepared for a programming course session. In addition to printed course material, one group received simple visualizations of the new contents before a homework assignment. The authors observed a significant difference in the test grades between the two treatments when only the “novices and strugglers” of each group were considered (unfortunately, without detailing how they arrived at this separation of the learners). They conclude that the visualizations did help the weaker students but not the stronger ones. However, this result could not be replicated in a second run of the experiment carried out one week later. Another methodical problem of that study is that the two treatments used different tools to accomplish the assignments. While the students in the control group used paper and pen, the VIEWING group had a specialized tool which allowed them to verify and visualize their code. Hence, the effect might be caused by the students’ method of coding and feedback rather than the movie-style visualizations.

8.1.3 VIEWING vs. RESPONDING

Rhodes et al. present a study on the effect of interactive pop-up questions built into algorithm animations [101]. In addition to testing the hypothesis claiming that RESPONDING leads to improved learning compared to just VIEWING an animation, the authors were also interested in differences regarding the type of questions and whether or not immediate feedback was provided to the students’ answers. As in our own experiment, great care was taken in order to eliminate any additional factors that might influence the result. However, one severe weakness is the relatively low number of participants (N = 29 distributed over six treatments). Interestingly, the students who had to answer pop-up questions performed worse than those who simply viewed the animation (although the difference was not statistically significant). This may be surprising, but the result is in line with that of an earlier experiment on the same aspect [57]. Despite this finding, those students who received immediate feedback on their pop-up questions performed significantly better on the pop-up questions than those who got no feedback. The difference was not significant for the regular (non pop-up) questions in the post-test. Also, there was no significant difference between students who answered 8.1 Further empirical studies 129 predictive questions (What will happen next?) and those who had to answer questions about previous steps (What did you just see?). The authors assume that the overall negative impact of the pop-up questions is due to the interruption of the higher-level visualization of the whole algorithm by lower-level questions on small details.

8.1.4 Representational aspects of animations

In addition to studies relating directly to the framework and engagement taxonomy proposed by [87], there have also been recent experiments studying the effects of purely representational aspects of algorithm visualizations on the learning outcome. They are interesting in this context because they specifically focus on algorithm animation, rather than attempting to answer the questions for animations in general, i.e. independently of the discipline and subject domain. Reed et al. report on an experiment evaluating the specific representational aspects of visual cueing and exchange motions in a Quicksort animation [99]. Visual cueing is the attempt to attract the viewer’s attention to specific objects, for instance by highlighting or flashing two objects in an array in order to signal that they are being compared. An exchange of two objects (such as swapping two elements in an array) can be animated in different ways. A very common method is to have the visual representations of the objects trade places, i.e. one is moving to the location of the other and vice versa. Another way would be to leave the objects at their original positions but change their shape in such a way that at the end, each one looks like the other one did before. The experiment compared CUEING vs. NO CUEING as well as MOVE vs. CHANGE SHAPE for element exchanges in the Quicksort animation. No significant effects were discovered for questions asking for the overall comprehension of the algorithm. However, for two specific subsets of pop-up questions classified as cue-specific (e.g. Which two elements were just compared?) and exchange- specific (e.g. Which two elements were just exchanged?), a significant benefit of CUEING over NO CUEING and MOVE over CHANGE SHAPE, respectively, was detected. In summary, these variables show some influence on learning the low-level behavior of an algorithm when asked immediately after these low-level steps occur, but not on the much more important overall comprehension of an algorithm. This finding is in line with the meta-study by Hundhausen et al., which has observed that representational details seem to have no significant effects on learning [52].

8.1.5 Visualizations as programming aids

The focus of the framework described in [87] and adopted for the present work is on students’ understanding of algorithms and data structures. This understanding is usually tested with the help of questions in a (written) post-test. What is often neglected in these tests is the learners’ ability to actually code the respective algorithms. This aspect has 130 not been evaluated in most of the above evaluations – and in fact, none of the six levels in the engagement taxonomy necessarily involves active coding. One might argue that although the ability to code an algorithm is very strong evidence for its understanding, the opposite is not true: for the comprehension of the abstract concept of an algorithm, it may not be required to be able to program it. However, a survey reported by Jain et al. indicates that one major reason for the high drop-out rate of computer science students is the missing bridge between understanding fundamental principles and their implementation [56]. Even students with appropriate expertise in a programming language apparently lack the skills to code the concepts they have understood. This is also supported by our own practical experience in advising students’ projects where they create algorithm visualizations; a student’s own successful implementation of an algorithm is very important and often takes as much or even more time than augmenting it with a suitable visualization. Jain et al. argue that the gap between understanding a concept and ability to implement it as an algorithm may be bridged by programming environments which include automatically produced visual representations of the algorithms and data structures that students are coding. They conducted two experiments to test whether such views help students produce and debug code more efficiently and more accurately. While the time taken by the test subjects with and without visualizations was nearly identical, there were significant differences both for the correctness of written programs and the number of errors found in debugging tasks. Students who had the visualizations performed significantly better than those who used the same programming and debugging environment without visualizations. It seems as if visualizations can be particularly helpful for those tasks.

8.2 Summary and future directions

The findings of experiments conducted within the framework of Naps et al. have only partially confirmed the proposed hypotheses. In general, the results are still inconsistent and often even contradict each other. This may be due to the fact that a number of the experiments did not strictly adhere to the proposed methods and procedures of the framework, often because of factors that could not be modified by the experimenters or that are not explicitly covered by or described in the research framework. A refinement of the taxonomy would appear very useful to us.

8.2.1 Conclusions

Given all the results we have compiled here, it would seem that the pedagogical value of algorithm visualizations is not too overwhelming, even if they are engaging and require interaction from the users. However, it also seems as if certain navigational features can enhance the benefit of visualizations. In addition, our own experiment has led us to the conclusion that other learning materials accompanying the visualizations such as introductory lectures may play an important role and can even be stronger than possible effects of the engagement level of visualizations. Rather, we tend to claim that higher 8.2 Summary and future directions 131 engagement is effective only if it involves more than just a visual representation of the data structures and algorithms. The only experiment reporting significantly better results for the CONSTRUCTING level involved students’ working with the code of the algorithm, not just constructing visual representations [117]. Similarly, very recent results such as those of Jain et al. confirm that the importance of actually coding the algorithms must not be underestimated for the students’ success [56]. Indeed, they have found significant improvement when programming tasks are supported by visualizations that are part of the developing environment. More research is required to verify this hypothesis. This issue points to another weakness – or, vagueness – in the engagement taxonomy. Coding as one important form of engagement is not explicitly included there and only may be part of the CONSTRUCTING level. It would certainly be beneficial to subdivide that level into forms of construction that involve programming and those that do not.

8.2.2 A refined taxonomy

In the light of our own results and those of other studies, we propose the following refinements to the engagement taxonomy, including an explicit subdivision of two of the engagement levels, VIEWING and CONSTRUCTING.

Instead of a single VIEWING category, we suggest a distinction between PASSIVE VIEWING and ACTIVE VIEWING. The former describes those scenarios where viewing is unidirectional and uninterrupted. This means that learners cannot go back and there are no fixed break points after intermediate steps (even though users may be able to pause the animation manually). Note that a large number of existing algorithm animations belong to this category. In most cases, the missing rewind function is due to the tight coupling of the animations actual algorithm. Since the algorithm cannot be rewound, neither can the visualization.

Conversely, the category of ACTIVE VIEWING includes visualizations which allow users to go back to earlier stages of the algorithm and/or provide break points at which the animation stops to better highlight intermediate steps. Our results described in Section 7.4.1 suggest that ACTIVE VIEWING leads to significantly better learning than PASSIVE VIEWING. This hypothesis should be tested in future experiments. In Chapter 6, we have already pointed to the great heterogeneity among possible learning scenarios belonging to the CONSTRUCTING level. We propose an explicit subdivision of that category into what could be labeled CONSTRUCTIVE SIMULATION, HAND-CONSTRUCTING, and CODE-BASED CONSTRUCTING. Only the third of these categories involves working with actual code. This means that learners either program an algorithm themselves or augment the given source code in order to visualize it. Hence, the outcome of such a scenario is always a visualization directly connected with the respective algorithm. In contrast, HAND-CONSTRUCTING includes those settings where learners work, for instance, with graphical editors or use art supplies to create visualizations. Thus, Hundhausen’s low-fidelity type of visualization would belong to that category [50]. By CONSTRUCTIVE SIMULATION, we understand settings where 132 learners simulate an algorithm in a predefined visualization environment by constructing its operation out of smaller building blocks. Examples include the MA&DA system [62], which was used for the study in Chapter 7, as well as the approaches described by Faltin [35] and Korhonen [60]. This is the weakest – or most passive – form of construction, since students neither code the algorithm, nor do they have to come up with their own ideas for a suitable visualization. In fact, one might even consider this engagement level as belonging to the category of CHANGING rather than CONSTRUCTING. The proposed subdivisions allow us to also refine the hypotheses given by Naps et al. [87]. For example, the original general hypothesis “CONSTRUCTING is better than VIEWING” may have been confirmed for CODE-BASED CONSTRUCTING vs. PASSIVE VIEWING, but it would also predict that mere CONSTRUCTIVE SIMULATION results in better learning than ACTIVE VIEWING, which is rather doubtful in the light of our results. Similarly, an earlier study by Hundhausen suggested that HAND-CONSTRUCTING is not better than ACTIVE VIEWING [50]. Judging from our own experiment and those described in Section 8.2, another hypothesis we propose for evaluation is that CODE-BASED CONSTRUCTING leads to better learning than HAND-CONSTRUCTING or CONSTRUCTIVE SIMULATION. We note that these are not the only possible subdivisions within the framework. However, we are also aware that refinements to any taxonomy are useful only if the resulting categories are still general enough to group entities in a reasonable way. With our refined taxonomy, results of previous studies can still be compared; moreover, the comparison is more accurate and accounts for differences that we consider very important. Finally, we hope that future research on the effectiveness of algorithm visualization will benefit from more specific categories describing different levels of learner engagement and that a clearer picture of the influential factors in this area will emerge.

Part III

Supporting Rapid Creation of Interactive Algorithm Visualizations

Chapter 9

A system for “on the fly” generation of interactive algorithm visualizations

The system presented in this chapter addresses two of the major issues identified by the algorithm animation community. The first has been discussed in detail in the preceding chapters and it mainly concerns the learners. As we have seen, there is evidence that the more actively students are engaged with visualizations, the better they will learn. Hence, algorithm animation systems should support as many levels of the engagement taxonomy as possible, from VIEWING to PRESENTING (cf. Chapter 6). The second issue is a problem chiefly concerning instructors. As a large-scale survey among computer science instructors conducted at the 2002 International Conference on Innovation and Technology in Computer Science Education (ITiCSE ’02) found, the main reasons for many teachers’ reluctance to use algorithm visualizations in their courses is the time it takes to find good examples, the time to learn how to use a visualization tool, and the time to create the visualizations [87]. Even though a great number of algorithm animation systems are freely available and offer comfortable ways of creating visualizations, it is apparently too time-consuming to build useful examples, especially when actual programming is involved. It is therefore extremely important to support instructors in creating visualizations as quickly and with as little effort as possible with an easy-to-use system. On the other hand, these visualizations should also be interactive, so an instructor can develop ad hoc examples during the presentation rather than having to stick to a prefabricated animation. If one takes this goal to the extreme, one could imagine a system that would allow a teacher, during his or her presentation, to simply sketch an example of a data structure, say, a binary search tree, on an interactive display (see Figure 9.1). The system would automatically recognize the structure and, upon simple pen gestures by the instructor, carry out the steps of an algorithm, for instance a rotation for rebalancing the tree, in a smooth animation directly on the instructor’s input drawing. With such a system, teachers would not only be able to create animations virtually without effort – they could also respond quickly to questions or comments from students, ask and answer “what if” questions, or let students create and present their 136

Figure 9.1: Sketching on a wall-size interactive whiteboard [134]. (Photo courtesy of W. Hürst and K. Mohamed.) own examples. When integrated with other hardware and software in the classroom, such a system would also come close to the goal of ubiquitous computing [121], as the instructor would not even have to activate the animation system or switch back and forth between it and the normal presentation. If the system were to be used by learners outside class on their private computers, they could construct arbitrary examples in a way that is much easier than interacting with most existing animation systems. They could use it to prepare in-class presentations. Moreover, collaborative learning could be supported by allowing students to share the drawing panel and construct visualizations together. In particular, all levels of the engagement taxonomy would be supported; learners could respond to “what-next” questions, specify the input, construct their own visualizations and present them. The system presented in this chapter is a step towards this vision. It was developed in collaboration with two students, Robert Adelmann and Tobias Bischoff, whose invaluable contributions are gratefully acknowledged. Thanks are also due to Khaireel Mohamed, whose work on gesture recognition [81] could be used for our prototype implementation.

9.1 Goals and general principle

The main goal of the framework described in this chapter is to provide a generic architecture for “on the fly” generation of and interaction with visualizations. Although our focus is on data structures and algorithms, the framework as such is independent of any specific subject domain. 9.2 Related work 137

Recognizing structures from pen-based input can be seen as a chain process involving several steps. First, the input traces must be classified as either a primitive shape or a gesture command. The details of this low-level recognition process are not part of our work, as shape recognition is an extensive field of research of its own [122]. We have incorporated an existing system, which does not only provide robust recognition of primitive shapes based on a small number of training examples (cf. [106]) but also reliably disambiguates graphics from gestures, i.e. input intended to be a drawing from input intended to be a command [80]. These primitive objects (or collections of them) and gestures must then be classified as domain-specific objects and commands. For example, a circle together with a number drawn inside it may be classified as a node in the binary tree domain; a “crossing out” gesture drawn on top of a node object may be recognized as a delete command. Finally, interrelations between the domain-specific objects must be detected to infer structures; for instance, two nodes connected by a line should be interpreted as parent and child nodes in a tree. Likewise, commands must be interpreted as invocations of actions performed on the structure (rather than just a single object). For example, deleting a node in a tree usually involves a change of the complete structure, which implies much more than just removing the respective node object from the visualization. In order to do this, an actual instance of the data structure must be available within the system. When an action is carried out it usually changes the structure and, consequently, its graphical representation. In our binary tree example, the rotation of a node will alter the shape of the tree considerably. Hence, the results of such actions must be propagated back to the original input visualization, where the collection of primitive objects is updated in order to visualize the change. The goal of our system is to combine methods for shape and structure recognition with actual implementations of structures, which are instantiated according to the recognized input. Once the actual structure exists, it can then be manipulated, and the output will be visualized on the input. We note that our main focus is not on sophisticated recognition methods but on an open architecture supporting the complete process chain, in which existing or future recognition and visualization technologies as well as content domains can be “plugged in”. Before we give a detailed description of the architecture, we provide a brief overview of existing systems designed for similar tasks.

9.2 Related work

When looking at related work, we have to distinguish between structure recognition systems, animation and visualization tools, and systems developed in the context of collaborative learning. In the following paragraphs, a brief overview of existing systems and their characteristics will be given, with a focus on their relationship to and relevance for this work. 138

9.2.1 Structure recognition

There are a number of systems supporting the recognition of structures in specialized domains, most notably for design tasks. As a typical example, we mention the DENIM system for website design [91]. It supports web developers in the early stages of the design process by recognizing and organizing handwritten diagrams representing individual pages and links between them. Seven predefined gestures are supported in order to interact with the diagrams, for instance to cut, copy, paste, undo, redo, etc. These interactions, however, are purely on the editing level, i.e. they support the sketching process itself, whereas the goal in our system is to interact with (the semantics of) more complex structures. As we have mentioned above, the actual recognition process itself is not the main focus of this work. Nevertheless, knowledge from existing recognition systems has been used in order to build the framework that supports the recognition process. A system particularly relevant in this context is the SketchREAD engine described by Alvarado and Davis [7]. SketchREAD is a multi-domain sketch recognition engine that uses Bayesian networks and a combined bottom-up and top-down approach to recognize structures. There are two points that make it interesting for this work. The system itself is also independent of the domain and supports different domains through plug-ins, and its arrangement of the recognition procedure is similar to the one used in the above mentioned framework. SketchREAD allows for the description of different domains using a hierarchical description language. Given the sketches and such a description of domain objects and patterns, the engine returns the recognized structure, also coded in the description language. Citrin and Gross present a distributed architecture for diagram recognition from pen- based input, where low-level shape recognition is done on PDA clients, while higher- level diagram recognition from those shapes is carried out on the server with the help of spatial grammars [21]. Our system uses a similar division of tasks between server and clients. Both of the above tools are static in the sense that the structures can only be edited but not further manipulated on the domain level. The focus is clearly on the recognition part; the systems lack the support for higher-level domain-specific operations to be carried out on the recognized structures. What unites both systems with our architecture is the arrangement of the recognition process. It can be divided into the same three levels that have been described by Wenyin as a general principle of graphics recognition: primitive shape recognition, then composite graphical object recognition, and finally the detection of relations among these domain objects [122].

9.2.2 Interactive data structure animation

A central task of our system is to combine structure recognition with structure animation. As we have already seen in the preceding chapters, interaction is a relevant 9.2 Related work 139 issue. In our application scenario, the easy and interactive construction of examples is particularly important. In many existing animation systems, this is accomplished by allowing the user to specify the input of an algorithm at the beginning, for instance a sequence of keys to be inserted in a search tree successively. Then the algorithm is run on that input and the result is visualized. One drawback if such an approach is that the instructor has to know beforehand what the example should look like. If, for example, the answer to a question by a student requires a different example, this would not be easy to create. Other systems allow the creation of sample structures by series of interactive steps carrying out individual operations on a data structure. This means that each step (e.g. an insertion in a tree) can be chosen after the last step has been executed. While this provides more flexibility, the construction of larger examples is very time-consuming. Moreover, both of the above interaction types will only allow valid instances of the structure. Hence, no illegal states can be shown, even though this may sometimes be very instructive. The MA&DA system used for the study described in Chapter 7 does allow users to carry out illegal actions by an incorrect combination of sub-operations. However, the interaction is completely restricted to the sub-operations provided by the system.

Some systems, such as ANIMAL [104], include graphical editors which support users in drawing visualizations by providing ready-made objects or graphical primitives occurring frequently in the respective domain. An example would be the visualization of an array in the domain of algorithms. After the objects have been drawn, animation commands can be added either via the GUI or through a scripting language. Those animations can be stored for later replay. In such a scenario, however, an animation will be completely predefined. Moreover, it is not linked to the actual algorithm it visualizes – they are no more than small movie clips. Hence, it will be impossible to interact with the resulting animations. A system that combines the structure recognition and animation process offers great advantages compared to those traditional algorithm animation tools. One is that each desired situation can be constructed “just in time”, which results in much more flexibility. When taking up the teaching scenario again, this means that the lecturer is not restricted to previously created or recorded situations. This fact becomes especially relevant when recalling that the time needed to create a specific animation is a major reason for the still restrained use of algorithm animation tools in teaching. Of course, in our system time has to be invested too. But here the most time-consuming task is the creation of a module for a specific domain. Once such a module has been implemented, the creation of new situations or animations comes at virtually no cost. Another very distinct advantage of a system that works directly with the recognized user input, instead of a separately created model, is its capability to completely blend into the normal workflow. The person using such a system can do this without the need to interrupt her or his current work. No starting of an external programs or switching between workspaces is necessary. A structure can be drawn, explained, and animated on the fly. 140

In principle, our framework is independent of any specific system for animating the contents. The specific client implementations we have built for the prototype make use of the JEDAS algorithm animation system, since its built-in annotation feature has greatly facilitated the implementation of the graphical input [68].

9.2.3 Creation of animations by pen sketches

Some research has been conducted in recent years on creating animations by pen sketching. Davis and Landay describe a system by which users can draw objects and then create animations by simple pen interaction with those objects, such as dragging them to their destination position via an interactive timeline [27]. However, no structure recognition is involved in the process, and the focus of that research is on the quick creation of arbitrary movie-style animations for later replay, whereas our goal is to construct ad hoc examples for immediate use and interaction during a presentation.

9.2.4 Collaborative modeling tools

One additional feature of our system is its potential to be used as a tool for collaborative learning. Several users can work on the same input and interact with a structure together over a network. This relates the system with other tools for computer-supported collaborative learning. A prime example for such a system is the COLLIDE framework with its Cool Modes environment [97]. Cool Modes is a tool that supports collaborative modeling tasks for various domains and uses plug-in reference frames for encapsulating the semantics of different models. This plug-in concept is not the only common characteristic with our system. Both systems provide collaborative, potentially distributed environments with shared workspaces. Unlike our own system, Cool Modes mainly focuses on modeling and simulation and lacks any recognition part. The system supports the input of sketches, but they are not interpreted but just treated as notes. Instead of recognizing arbitrary user input, the plug- ins encapsulating specific domains contain, among other things, so-called palettes. These palettes work plug into the Cool Modes GUI and allow the users to select domain-specific objects and their relations directly. This palette architecture is also responsible for the restriction of the Cool Modes plug-in mechanism to visual languages. Only graph-based structures relying on nodes and edges can be modeled [97]. In general it can be said that our system is novel in its combination of the structure recognition and animation processes and their seamless integration into a minimalist user interface. 9.3 System architecture 141

9.3 System architecture

The architecture of the system can be regarded from different perspectives, which are presented in the following sections: First, there is the physical structure of the distributed system with the task division between the server, the client(s), and the infrastructure managing the communication between them. Second, the system has been designed as a domain-independent framework, with domain-specific modules that are plugged in to make it functional for certain subject domains. In addition to structuring the recognition and animation processes, the framework provides services to support the modules, so the creation of new domain modules is facilitated. Finally, within the framework we have a logical separation of tasks within the structure recognition and animation process according to the semantic level of specificity on which the individual system components operate, and according to the different types of information that is processed (objects, commands, and actions).

9.3.1 Client-server architecture

The system developed has a distributed client-server structure, consisting of one server and an arbitrary number of clients, as depicted in Figure 9.2. Each client provides the user interface which accepts input from a user, transforms it into primitive graphical objects and commands, and hands those objects and commands on to the server for further processing. In addition, a client accepts and executes actions sent from the server, for instance instructions about updates of the structure. There are several reasons for using a distributed system. Some are discussed in detail by Citrin and Gross in their work on distributed architectures for pen-based input and diagram recognition [21]. Its major benefits are:

(1) Functionality: The supported work of many potentially distributed users on the same system state allows for interesting collaboration scenarios. For example, if keeping the teaching scenario from the introduction in mind, distributed lectures would be possible in such a shared workspace environment. The existing possibilities can be extended further by the application of a flexible rights management that allows the specification of rights for each user. (2) Performance: The server application including the potentially time consuming high-level recognition process can be executed on a powerful server machine, while the low-level input and output of data can be easily carried out on small devices, even on a PDA. (3) Flexibility: In a distributed scenario, it is no problem to support different kinds of hardware devices with specialized clients. It is also possible to alter the server implementation, including the way general actions are created out of general primitives and commands, without having to redistribute the clients. 142

Technically, the Java RMI (remote method invocation) middleware is used for communication (cf. Figure 9.2). It is a middleware system that provides services similar to CORBA or RPC [23]. This middleware technology has been chosen because it provides complete location transparency to the programs using it. In the final program code, no distinction has to be made between local and remote calls. Once implemented, it does not matter whether the programs are running on the same host (as would usually be the case in the traditional lecture scenario), or are distributed over a set of connected hosts (for instance, for collaborative learning). Since the client and server are coded in Java, RMI with its focus on this language was a natural choice compared to other technologies, e.g. CORBA. For a more detailed description of the actual Java implementation of the architecture, interested readers are referred to the technical specification in [2]. When looking at distributed systems, relevant topics include security, system robustness against failures, the coordination of clients, and the maintenance of a consistent global system state. Each of these points is addressed by different components of the infrastructure part. Since it increases the system’s robustness and simplifies its coordination, the clients are stateless in the sense that all relevant information on the system state is stored on the server. This way it becomes possible to reconnect a client after it crashed, or a connection failed, and to continue working on the current system state, without any data being lost. The low-level recognition of primitives and gestures from the pen-based input is done on the client, whereas the server takes care of the recognition of domain-specific objects and structures, i.e. the relations between those objects. In addition, the actual implementation of the data structure is on the server. When it is manipulated, the changes are sent back to the clients, which update the visual representation accordingly. This task division was motivated by the observation that the low-level objects, also referred to as graphical primitives, are rather domain-independent. These are geometric shapes such as lines, circles, squares etc. as well as numbers or other characters. Also, basic pen gestures, for instance crossing out an object in order to delete it, tend to be similar across domains [21]. Moreover, instead of pure free-hand input, it is possible to use simple graphics editors (as are included in many presentation programs) supporting the creation of primitive objects. These objects can then be handed directly to the domain-specific structure recognition without the need for a low-level shape recognition process. One possible disadvantage of such a division is that it is more difficult to revise and repair misclassifications. As soon as the low-level recognizer on the client has classified some input, it must be taken “as is” by the server. In order to overcome this shortcoming, a back channel was introduced which enables the server to ask a client to revise its interpretation of a primitive or command if an alternative interpretation is available. This is interesting in cases where the low-level recognizer provides a confidence value for each object; this value (between 0 and 1) describes the certainty with which the according object was recognized. Such a mechanism is important, since recognition accuracy will never be perfect on the different recognition levels, and the user’s intention can also be ambiguous [122]. 9.3 System architecture 143

Figure 9.2: Overview of the distributed architecture of the system. An arbitrary number of clients can be registered and communicate with the server by RMI calls.

The client The client is responsible for managing the user input of primitives and commands, and for executing the actions it receives from the server. Actions are not restricted to changes and updates of the visualization (i.e. animation actions) but can also be system- related, e.g. a request to revise the interpretation of a primitive. The client is not a fixed part of the system, but specified by a Java interface. This makes it possible to implement new clients depending on the usage scenario, for example to make use of a certain animation system or to integrate it with other software. Presently, two clients are available. Both are stand-alone applications and use the JEDAS library for displaying the visualizations and for rendering smooth animations. The first prototype, shown if Figure 9.3, does not include any low-level recognition; instead, it provides a simple graphics palette for primitive shapes (Figure 9.3 left) and a separate palette for commands (Figure 9.3 right). This client was implemented to have a tool that is independent of any low-level sketch recognition. The second client implementation makes use of the low-level shape and gesture recognition engine developed by Mohamed [81]. The appearance of the interface is as minimalist as possible, in order to come as close as possible to the envisioned scenario of a more or less invisible environment. The GUI only consists of a whiteboard surface, scrollbars for zooming and scrolling, and two small menus for changing the low-level recognition training profile and for adjusting some parameters (cf. Figure 9.4). When an input trace has been classified as a shape, it is morphed into a beautified version of that shape so the user gets feedback about the recognition process. Traces recognized as 144

Figure 9.3: Screenshot of the first prototype client. Instead of low-level shape and gesture recognition, there are palettes for direct creation of graphical primitives and commands. gestures will disappear after a short period of time so the whiteboard surface will not be cluttered with unnecessary relicts. It would be possible to make the environment even less visible by integrating the client into another application, for instance the presentation software used for lectures, or to overlay it as a transparent layer over another application. Furthermore, the drawn objects could be left completely as they are so they will keep the user’s personal touch and flavor. (However, our experience has shown that users will be confused if no feedback at all is given on whether or not an object has been recognized.)

Infrastructure When a client is connected to a server for the first time, it will be automatically synchronized with the actual system state on the server. The system state is represented by a list of all primitive objects which are visible at the current moment. This is not only useful for system stability but also allows for a late join option, i.e. new clients can join and participate in running sessions at any time (also cf. [118]). In a distributed system with more than one client, infrastructure-related problems such as authentication and authorization as well as the maintenance of consistency under concurrent actions initiated by different clients need to be addressed. Since our focus is on the recognition and animation part, it would go beyond this chapter to outline how exactly the system handles these issues. Interested readers are referred to the technical 9.3 System architecture 145

Figure 9.4: The current pen client with a minimalist user interface. specification in [2]. Unless stated otherwise, in the remainder of this chapter we will assume for simplicity that only one client is connected, as it would be the case, for instance, in a lecture presentation. Internally, a system-wide unique logical time is maintained, realized by a counter that is incremented after each relevant system event, e.g. any incoming object or gesture from the client or any action carried out on the data structure. This allows us to establish a unique labeling and a total order of all events happening. This would not be possible with only a milliseconds counter measure real time, since within one millisecond a whole set of events can happen. The logical time also provides an easy way to access for each event the respective succeeding or preceding event.

9.3.2 Generic framework and domain-specific modules

The overall architecture of the system is completely domain-independent. This is necessary because the aim was to develop a system that can be applied to all sorts of domains. All the semantics needed for a specific domain is encapsulated in a so-called domain module. These modules are plugged into and managed by the core component that is part of the server application (see Figure 9.5). In general, a module can be seen as an entity implementing transformations required in order to map the user input to the specific structure represented by the module, and to inform the GUI about changes of the structure. 146

Module 1

Module 2 GUI Core . . .

Module n

Services

Framework

Figure 9.5: Overview of the framework–module architecture.

Framework The framework component has three main tasks: First, the core provides a skeleton for the recognition and animation process independently of the domain. This structure will be described in detail in the following sections. Second, it contains an interface to the input from GUI running on the client. As we have mentioned above, the client is not a fixed part of the system but can be customized to fit a certain scenario. The framework specifies how clients communicate with the server (cf. Section 9.3.1). Third, the framework provides optional services to the modules. These services include tasks which are likely to occur across different domains and hence will be required by many modules. The purpose of the services is to relieve programmers of new modules from the burden to implement those tasks by themselves. The most important service is a basic recognition system that supports the recognition of structures by analyzing geometrical properties of objects. This service will be described in more detail in Section 9.6. Other services include a reset function and a recording service which can be used to capture and replay the whiteboard interaction during a session.

Modules Looked at from outside, a module takes information about primitive shapes and gestures (as provided by the client) as input and uses that information, together with its current state, to produce general actions as output. This transformation is performed with the help of domain-specific semantic knowledge. Modules encapsulate this semantic knowledge, together with an actual implementation of the (data) structure they represent. 9.3 System architecture 147

A new domain can be supported by creating a module for it and plugging it in, i.e. registering it at the core. Only the combination of both the domain-independent framework and the domain specific-modules forms a functioning system. The sharp encapsulation of semantics in modules allows for some interesting options. For example, it is possible to register more than one module at a time (cf. Figure 9.5). All registered modules can then react concurrently on the given input. It is also possible to only activate some of them, for example the one that fits best to the input received so far. In principle, a module is completely free in terms of how it realizes the transformations. However, in order to keep them as compact and clear as possible, they are supported by a framework. This framework structures the recognition and animation creation processes performed inside a module. Besides other things, the framework also provides some services to the modules. For example, a multi-purpose recognition service has been implemented which is used by all the example modules created so far. Most importantly, however, the goal is to keep the modules structured along the same layers and streams as we have described above. Hence, a module usually specifies its domain on three semantic layers; the structure (i.e. the relationships between domain objects), the domain objects themselves, and the primitive graphical objects they consist of. In general, a domain module can be seen as a set of transformations. It transforms the input it receives into some form of output, according to its domain-specific semantics. A module can have an internal state; hence its output does not exclusively depend on its input but also takes into account the current state of the module. The input to a module can be divided into primary and secondary input, depending on the information source. The primary input is created by a client or another module, and consists of the general primitive and command streams. The secondary input contains information provided to the module by the core management. Similarly, the output a module creates is also divided into primary and secondary output, depending on the recipient of the output. Primary output is destined for the clients and consists of general actions. All other output is labeled secondary and can be of any form, since modules are in no way restricted in their capabilities. For example, a module can write output into some files or pass information to other programs. Modules can make use of the full power of a programming language. In summary, a module is a software component that receives primary and secondary input, and produces primary and secondary output. This transformation process depends on its semantics and is supported by the framework. The core management system communicates with a module through the methods defined in the ModuleInterface interface. This interface provides access to the modules’ attributes and all relevant module components. When developing the module concept, two requirements or module characteristics were considered to be important. One major requirement was that the only restriction for a module should be imposed by the form of its primary input and primary output. The structure recognition and animation creation process performed in a module should be 148 completely unrestricted in terms of how it is implemented or how it should work. This is important, as our aim is to keep the system open for alternative existing or future recognition techniques. In fact, although the framework supports the structuring of the recognition process by a skeleton of semantic layers and information streams, as well as a recognition service, modules are not forced to make use of these options. This support provided by the framework is the result of the second important characteristics. An “ideal” module should only contain domain-specific parts. All other parts that are also required by the module to perform its task (but are independent of the domain semantics) can and should be handled by the framework. The goal was to keep the modules as compact and clear as possible, in order to facilitate the development of further domain modules. We are aware that this is the most critical part regarding the development of visualizations. The implementation of a new module is far from trivial, and most instructors will not be willing to create modules by themselves. Our vision is that a growing library of modules is available to instructors, and they do not have to do any programming at all when preparing for their classes. New modules could be developed by students in project works; this is how most of the currently available modules were created. An overview of all existing modules is given in Section 9.7.

9.3.3 Division into layers of specificity

As we have mentioned above, the framework structures the recognition and animation processes. One dimension of this structure is the subdivision of the recognition process into three general tasks: low-level shape and gesture recognition, domain-specific object recognition, and the detection of relationships between those objects. Accordingly, the whole system is organized in a sequence of layers, or semantic levels. The purpose of these layers is to reduce the system complexity, to isolate independent tasks from each other, and to provide services to other layers. This structure promotes the system’s openness and extendibility.

The layers can be labeled RAW, GENERAL, DOMAIN, and LOGICAL, according to the data they manage and contain (see Figure 9.6). The task of each layer is to store all information that is available on that particular level of specificity, so the recognition system can transform it to the next higher level. Technically, each layer is implemented as a list of all the elements contained in it.

The RAW layer consists of the ink traces drawn by the user. This layer is only found on the client. The GENERAL layer contains a list of all domain-independent primitive shapes and gestures which have been classified from the raw pen input. Note that this layer also forms the interface between clients and the server; hence, instances of the general objects exist both on the server and on the clients (see Figure 9.6 bottom). These shapes and gestures are then interpreted as and transformed into domain-specific objects and commands, which are stored on the DOMAIN layer. Each object in that layer contains references to all primitive shapes (in the general layer) that it consists of. 9.3 System architecture 149

Raw General Domain Logical

Primitive Domain Data Pen shapes objects structure traces & & & gestures commands operations

Client Server

Figure 9.6: Structure recognition is organized in different layers. Each layer consists of objects on a certain level of domain specificity.

However, domain objects still have a position, size and other attributes by which they relate to each other.

The most specific LOGICAL layer essentially maintains an actual instance of the data structure and manages the operations, i.e. the algorithms to be carried out on that structure. Objects on that level are purely abstract in the sense that their attributes do not relate to any visual representation. We will describe the different layers in some more detail in Section 9.4. The arrows in Figure 9.6 represent the flow of information between the layers, which is from least specific to most specific for the recognition process and vice versa for the animation process. The dashed arrow indicates an optional connection; only if the client retains the original ink traces after low-level recognition, there will be a need to modify them. In our present client implementation, ink traces are replaced by graphical objects immediately after classification, so only the latter need to be modified by an animation.

9.3.4 Information streams

The second dimension of the structure provided by the framework is the division into types of information. The communication between the client(s) and the server, as well as the data flow within the core system is based on three information streams, which are illustrated in Figure 9.7. All three streams occur throughout the system layers just described. 1. The object stream consists of the visible graphical objects created on the client and their transformations into domain objects and structures. 150

Figure 9.7: The information flow within the system consists of three separate streams for objects, commands, and actions.

2. The command stream contains that part of the user’s input that is intended to interact with the structure, i.e. gestures. On their way across the layers, those gestures are transformed in domain-specific commands and operations invoked on the data structure. The direction of the object and command streams is from the user interface on the client to the system core and hence, across the layers from raw input to the most domain- specific level. Note that it is the task of the low-level recognizer between the RAW and GENERAL level to separate the user’s pen input into these two streams, i.e. to disambiguate between shapes entering the object stream and gestures which form the command stream. One technique for robust disambiguation in cases of great similarity between a shape and a gesture is the analysis of lag and lead times of the respective pen traces, as described by Mohamed [80]. 3. The action stream runs in the opposite direction, i.e. from domain-specific to domain-independent. It consists of the operations carried out on the actual implementation of the data structure and transports this information back to the user interface on the client(s), where it is visualized appropriately. Note that, again, it is completely up to the client how changes to objects are visualized, e.g. whether it is by discrete steps or by smooth animations. The separation into three separate information streams is a rather natural one, given the tasks of the system. In addition, the different streams can be handled independently during client-server communication in order to achieve a higher flexibility. This way it is possible to use different programs for the input of primitives and commands than for the output of the structure; for example, specialized software for pen input taking advantage of sophisticated features such as pen pressure could be used for low-level recognition, whereas an advanced animation system may be used for displaying the resulting visualization. Also, it would be possible to connect passive clients that only display what is happening but provide no means for user interaction. This opens up interesting scenarios, in particular for collaborative learning. 9.3 System architecture 151

Technically, the streams consist of transporter classes passing the output of one layer to the next layer. There are classes for objects, commands and actions. These are explained in detail in Section 9.4.

9.3.5 The framework structure

As mentioned before, it is a main goal to keep the module implementations as compact and easy as possible. Taking this as a starting point, it makes sense to pre-structure processes often recurring in the module implementation and to shift domain independent tasks that occur often into the framework. This has to be done without restricting the possible module implementations. At first sight, these requirements seem to contradict each other. The solution found in this work is to provide an optional framework, where the module implementations themselves can decide which parts of the framework they use, or whether they use it at all. The complete core architecture of the framework is visualized in Figure 9.8. The horizontal arrows indicate the three information streams, while the vertical blocks represent the semantic levels. The general level (shown at the left) receives its input, i.e. general objects and commands (gestures), from the clients via the infrastructure and also sends its output there. Only the parts with light background color, i.e. the mapping of objects, commands and actions as well as an implementation of the data structure need to be provided by the modules. This is accomplished by implementing the respective interfaces given by the framework. The allocation of a yellow part to actual classes in a module is quite flexible. All the yellow parts could be implemented by the same Java class, or several Java classes can be used to realize one yellow component. On the right hand side, we see the logical representation of the structure that should be recognized and animated. There are two basic principles behind the framework structure shown in Figure 9.8. The first one is to maintain the separate object, command and action streams inside the modules. This makes sense, since in one way or the other, each module must handle the creation of, interaction with, and output by the structure it represents. The object stream is used for the structure recognition; along the command stream the interaction with the recognized structure is handled, while actions created along the action stream change the structure. The second principle is to divide the processes inside the modules into the semantic levels described above. It makes sense to apply this separation not only to the structure recognition process, but also to the command recognition and action creation processes. The next section describes function of the architecture in more detail by looking at an example. 152 Interface Logical Structure Interface Interface Logical Level Manager Logical Command Logical Manager Action Interface Interface Interface Object Domain-Logical Mapper Mapper Command Domain-Logical Domain-Logical Action Mapper Interface Interface Interface Domain Level Domain Manager Action Manager Manager Domain Object Domain Command Interface Interface Interface General-Domain Action Mapper Command Mapper General-Domain Mapper General-Domain Object Interface Interface Interface General Level General Manager Command Manager Action Action General General Manager Object

Figure 9.8: Overview of the core architecture, including the streams and system layers. Yellow parts are implemented by the domain modules.

9.4 Illustration by example 153

9.4 Illustration by example

As a simple example, let us take a linear list, for which a prototypical module has been implemented (cf. Section 9.7.1). This module represents a singly-linked linear list and allows the user to draw and delete nodes, to swap the positions of two nodes, to sort the list, or to draw and remove links between nodes. The following paragraphs will now describe the general purposes of the different module parts displayed in Figure 9.8, and show how they work in the linear list module. We look at each of the three information streams individually.

9.4.1 Object stream

Suppose that a user wants to draw a node in order to create a linear list and starts with a circle. If this shape is classified correctly by the low-level recognition on the client, a GeneralPrimitiveOval object will be created and transmitted to the server. Note that this process is completely independent of the module. The GeneralPrimitive class is specified by the framework. It contains an independent coordinate grid so clients using a different visualization system can simply transform the coordinates accordingly when displaying objects. In essence, a general object is an abstraction from what is actually displayed on the user interface; however, it is still domain-independent. For example, a general circle is specified by its position, its radius and its color. This is how it arrives at the framework’s GeneralObjectManager (see Figure 9.8 left). When the GeneralObjectManager is informed about a newly drawn, changed or removed primitive, it will hand this information to the module’s GDObjectMapper (GD is an abbreviation for GENERAL to DOMAIN). The purpose of this class is to recognize and construct domain objects from the existing general primitives. For example, when the GDObjectMapper detects that a number is located inside a circle, it will construct a new DomainNode object. A DomainNode object is defined by a certain position and a value, and it contains references to the general objects it consists of (for instance, a GeneralPrimitiveOval object and a GeneralPrimitiveText object drawn inside the oval). A DomainLink object contains two coordinates, its starting point and its end point. In contrast to a GeneralPrimitiveLine object, it contains the semantic information that this line represents (or could represent) a link in the domain of linear lists. Once the domain object has been created (or changed, or removed), the GDObjectMapper will inform the framework’s DomainObjectManager. In turn, this class informs the DLObjectMapper about a newly created, changed or removed domain object (DL stands for DOMAIN to LOGICAL). The task of this part is to recognize or maintain the logical structure contained in the LogicalStructure part, given the available domain objects. The GLObjectMapper first constructs the LogicalNode and LogicalLink objects from the information about domain objects, links them to the respective domain objects, and adjusts the list representation stored in the LogicalStructure part if necessary. What we get at the end is a logical representation of the structure or structure 154 state drawn by the user. Logical objects abstract completely from the visual representation. For example, a LogicalNode object only contains an integer representing the node’s value and a reference to the according domain object. A LogicalLink contains two references to LogicalNode objects: one to the node it originates from, and one to the node it points to. Additionally, it has also a reference to the according domain link object. Note that the logical objects are the same that are used in the actual implementation of the data structure. Between general objects and domain objects, just as between domain objects and logical objects, many-to-many relationships are possible. This means that a domain object can consist of more than one primitive (as the nodes of a linear list do), and that it is also possible that one primitive is part of several domain objects. While the latter case does not occur in the linear list domain, one can imagine, for instance, a vertical line in the representation of an array, which could be the right border of the cell to its left and, at the same time, the left border of the cell to its right, hence changes to either cell might modify that line. The same relationships are possible for domain and logical objects. When looking at a domain-specific object, such as a node of a linear list, it can be said that such an object has three representations inside the framework, representing that object on different semantic levels. A node is represented by primitives (i.e. general objects), by a domain object and by a logical object. These different representations are successively built and mapped to each other during the recognition process. It should be rather obvious that references between these objects is necessary; given a logical node object, it is easily possible to obtain all primitives associated with it. Conversely, given a primitive, the domain and logical objects it belongs to can be determined. The framework provides abstract classes for domain objects and logical objects, which have to be extended by every domain or logic object created in the module implementation. The abstract classes support the many-to-many relationships between objects on different semantic levels, since this is a task often occurring. Let us now take a look at the LogicalStructure part located at the right side of Figure 9.8. This object, or collection of objects, contains the actual implementation of the algorithm, data structure, and whatever should be represented by the respective module. In our example, it contains the implementation of a singly-linked linear list. It is realized through LogicalNode objects that are connected by the help of LogicalLink objects. The explicit representation of links has been chosen, because it allows for a more consistent explanation. Alternatively, a link in the actual implementation could simply consist of a reference (i.e. a pointer) in one LogicalNode to another LogicalNode.

9.4.2 Command stream

For the command stream, the structure interaction process works in a similar way. The framework informs the GDCommandMapper object about newly received general commands, for example a new “cross” gesture performed at a certain position. This general command represents that cross and contains the intersection point of the two lines it consists of. Other gestures may specify two points as their main information, e.g. 9.4 Illustration by example 155 a connecting arc between two objects, which, in the linear list domain, might stand for exchanging two elements of the list. A gesture could also involve a whole region as a parameter for a command; for instance, a large oval encircling several objects could stand for selecting these objects in order to apply the next operation to all objects simultaneously. To remain with our “cross” gesture, crossing out an object stands for the deletion of a node or link in the linear list module. Hence, the GDCommandMapper creates a DomainCommandDelete object and hands this domain command to the DomainCom- mandManager. This DomainCommandDelete object contains as its main information the coordinate at which something should be deleted. After the framework has informed the DLCommandMapper, this class uses the domain objects and information about the existing logical structure to create a logical command out of the received domain command. In our example, it would look up which domain node is located at the position specified in the DomainCommandDelete object, obtain the associated logical node N and create a LogicCommandDelete object containing a reference to the logic node that should be deleted. The created logic command is then handed to the framework, which informs the LogicalStructure part about the new logic command. That component would then call the respective method, for instance delete(N), on the actual instance of the linear list.

9.4.3 Action stream

It remains to describe the action stream, which is responsible for handing changes of the logical structure back to the clients, whose task then is to visualize these changes. It is clear that all intermediate semantic levels must also be notified about such changes in order to maintain consistency. The action stream starts at the LogicalStructure part. Let us remain with our previous example and assume that a node is removed from a linear list. When this operation is carried out on the actual instance of a linear list, it involves (up to) three actions: the deletion of one node, the deletion of the link to the successor of that node, plus the modification of another link (the one going to the deleted node from its predecessor is adjusted to the successor node). For each of those actions, the respective logical action object is created and handed to the framework’s LogicalActionManager. In our example, we have a LogicalActionDelete object which contains references to the LogicalNode and LogicalLink objects that should be deleted, and a LogicalActionModify object containing a reference to the modified LogicalLink object. Once the framework has received a logical action, it hands this object on to the module implementation’s DLActionMapper class. This class maps from the LOGICAL to the DOMAIN level – the abbreviation DL (rather than LD) was chosen to maintain consistency with the object and command streams (cf. Figure 9.8). Here, the according domain actions are created that execute the tasks intended by the logical action. For example, when the DLActionMapper receives a LogicalActionDelete object, it will create a DomainActionDelete object and hand it to the DomainActionManager. This 156

DomainActionDelete object contains now a reference to the domain objects that are associated with the logical objects specified in the original LogicalActionDelete object. Likewise, if a LogicalActionModify object for a LogicalLink is received, the associated DomainLink object has to be updated: its coordinates must be set to the coordinates of the DomainNode objects associated with the LogicalNode objects referenced in the LogicalLink. In addition, a DomainActionModify object is created and passed on. After the DomainActionManager has informed the GDActionMapper about a new domain action, this class will finally create general action objects which delete or update those general primitives that are part of the domain objects referenced in the received DomainActionDelete object. Once created, the general action object will be passed on to the framework’s GeneralActionManager, which will inform the Infrastructure Management component about it, which in turn sends the appropriate commands to the connected clients. General action objects are basically a representation of animation directives. For example, the GeneralActionPDelete object contains a reference to the general primitive object(s) that should be removed.

9.4.4 Advantages

This framework structure offers several advantages. One is that a module implementation is free to decide which parts to use or to which extent it implements a specific part. Besides that, the framework allows for a reduction of the complexity, by dividing the main processes performed in a module into the three distinct semantic levels and three streams of information, without restricting these processes. Each part of a module located between two semantic levels is concerned with distinct tasks relevant only for that level. Between the general and domain levels, decisions have to be made regarding the actual appearance of domain objects, the possible interaction possibilities and the visualization of actions, for example whether a node in a list is represented by a circle, a rectangle, or a combination of primitives. The tasks performed by the module implementation parts located between the domain and logical levels are independent of such lower level decisions. There the overall structure is recognized and manipulated, independent of the low level representation of domain objects. For instance, it is specified that a domain link requires two logical nodes (a source and a destination) in order to be a logical link. The logical structure is independent of any structure recognition and manipulation tasks. This is another advantage of the framework: it allows for a quite sharp separation between the actual implementation of the algorithm or data structure that should be represented by a module and the associated recognition and action creation processes. Such a separation makes it possible to extend or modify the logical structure without the need to adjust the recognition or action creation process. For example, if we want to visualize a new sorting algorithm in our linear list module, then this algorithm only has to be implemented in the logical structure, completely independent of the actual node positions or appearances. Once the recognition and action processes are implemented 9.5 Creation of animations on the client 157 for a domain, then the changes and innovations implemented on the logic level will automatically be supported. No further adjustments are necessary.

9.5 Creation of animations on the client

There is one very important difference between the object and command streams and the action stream. While new objects and commands can simply be handled and interpreted in the same sequence in which they were issued by the user, an exact timing of actions is essential for creating meaningful and appealing animations. Let us consider the example of swapping two nodes in a linear list. A common way to animate this operation is the following: 1. Highlight the two nodes, e.g. by flashing the color and changing it to a more striking color. This is done for both nodes at the same time. 2. Move each node object to its new position in a smooth animation. Again, both nodes are moved simultaneously. 3. Change back the colors of the nodes to the original color (both at the same time). As can be seen, even a simple animation like this consists of an arrangement of several individual actions, some of which occur simultaneously while others take place sequentially. In usual live animation settings where the algorithm is tightly coupled with the visualization (cf. Chapter 2.2.3) this issue is not a problem since the algorithm controls the animation directly. In our scenario, the coupling of algorithm and animation is rather loose; only changes to objects are transmitted, while each client is free on whether or not these changes should be animated at all. Hence, some sort of action management is required to handle the timing relations between actions. Without it, every action would be executed immediately after the client received it. Since the server is not informed about the end of an action, there would be no way to make sure that the execution of an action a2 will start immediately after action a1 has been finished. Informing the server about the end of an action would not be very useful for several reasons. The first one is the possible diversity in clients. There can be many different kinds of clients connected to the server, resulting in a wide variation of possible execution times for a specific action. A centralized animation con- trol on the server would also be difficult because of the asynchrony of the underlying network. Even when informing the server about every finished action, it would not be possible to guarantee that action a2 starts immediately after a1 on all clients, because of the unbounded message transmission times and delays inherent to asynchronous systems [23]. The solution chosen in our system is to specify for each action its execution time in relation to other actions. This allows the server to create action sequences by defining an execution condition for each action. It is the task of each client to ensure the correct sequence of actions. If no condition is specified, the action is executed immediately. Otherwise, the following execution conditions can be selected: An action can be 158 executed either after all actions previously received from the server have been completed or after a specified action has been completed (each action can be uniquely identified through its logical time stamp, cf. Section 9.3.1). New and more complex conditions can be added if required. In Figure 9.9 the information flow inside the action management component of the client is displayed. Central to this are two action lists. The waiting list contains all general action objects whose execution condition has not been met so far, while the execution list stores all actions that are currently executed. If a new general action object is received, we check if its execution condition is fulfilled. If not, it is added to the waiting list. If yes, it is added to the execution list and handed to the appropriate component for execution. Later, when this component notifies the action management system about the end of the execution, the action will be removed from the execution list. If there are other actions in the waiting list, all will be checked, and the ones whose execution conditions are now fulfilled will be added to the execution list.

Figure 9.9: Activity diagram for processing incoming action objects on the clients. 9.6 Basic recognition service for data structures 159

9.6 Basic recognition service for data structures

Since some ingredients of basic structure recognition are likely to occur in many different modules, it makes sense to provide support for these often needed tasks in the framework. This way the module implementation can be kept small, in accordance with the second fundamental module characteristic. In order to demonstrate the usefulness of the framework and to implement the example modules described in the next chapter, some basic structure recognition tools were required. A simple but powerful recognition service was implemented for use in the example modules. It supports the recognition of general primitive relations and properties, such as spatial coordinates, color, or time of creation of an object. The service consists of two layers. The first layer contains basic methods to check whether two primitives are contained in each other, intersect, or if a general primitive contains a specified general coordinate. Based on this first layer, the second one operates on sets of objects, using a filter method and filter objects. A filter object encapsulates object properties in the form of logical predicates. The interface describing filter objects dictates only one method: public boolean valid(Object o)

This method returns true if the specified object fulfills the properties represented by the respective filter object. The filter method takes a set of objects together with a filter object, and returns a set of objects containing all entries of the input set for which the valid method in the filter object returns true. In addition, the second layer also provides the options to obtain the union, intersection and difference of two sets. This simple filter architecture allows for rather complex expressions, since each produced output set can directly be used as input for the next step. Figure 9.10 shows an example. Both layers, the basic methods in the first one and the filters used in the second one, support the specification of uncertainties. For example, when checking whether one

ColorFilter

Color = GREEN

Boolean valid(Object o)

starting_set: all primitives Filter Method set2: all green primitives

ContainedInFilter

Intersect resulting_set: all green primitives primitive = PrimitiveA that are contained in primitive A Boolean valid(Object o)

Set3: all primitives Filter Method contained in primitive A

Figure 9.10: Example of an expression for recognizing object attributes and relations. 160 primitive A is contained in another primitive B, a certain fuzziness can be allowed which would still classify object A as contained in B although a small part of A is outside the bounding box of B. The amount of uncertainty allowed can be specified. The recognition system comes with a set of predefined filter classes. There are different filters for primitive shapes, creation times, attributes, and the relationships of primitives to each other. One big advantage of this structure is that module implementations are not restricted to the predefined filters, but can build their own. These can contain domain specific semantics or be higher-level filters which use available filters to check more complex object properties. Given this system, it was possible to develop the example modules quickly and with little effort.

9.7 Implemented modules

While the previous sections have introduced the overall system, in this section the example modules implemented so far will be described. Besides the four domain modules, which will be described in detail, two small service modules have been created. The recording module allows a user to control the recording system by pen gestures, while the reset module enables a gesture-controlled reset of the whole system. If the user starts the recording by performing a certain gesture, a small recording sign will appear on the workspace and the recording of actions as mentioned in Section 9.2.3 is started. Recording can be stopped by issuing a stop command. In this case the recording sign will be removed. A third command will start replaying the recorded part. The reset module is comparable to the recording module. However, instead of access to the recording system, this module allows users to reset the system by a certain gesture. Such a system reset will clear the screens on all connected clients and resets all registered modules. The creation of new modules is supported by three so-called module frames. There is a minimal, a normal and an explicit frame available. These frames contain predefined classes and methods and are fully functioning, except for the missing method implementations. A minimal module frame consists of only one class, while the explicit module frame is made up of several classes and sub-packages and contains exemplary access to some framework services. Hence, the creation of new modules for more subject domains is facilitated as much as possible. All modules support different feedback levels. The feedback level is a measure that determines how much feedback a user gets when drawing different domain objects and can be chosen freely. The higher the feedback level, the more the objects will be beautified, i.e. the input graphics is morphed into a tidier layout. (Note that since the low-level recognition of graphical primitives from freehand lines is part of the client, the actual look of the primitives depends on the implementation on the client. In our current pen client implementation, each recognized trace is automatically rendered as a tidy representation of the recognized primitive.) 9.7 Implemented modules 161

9.7.1 Linear list module

The linear list module has already been used as an example before. It encapsulates a singly-linked linear list. This module allows the user to draw a linear list and to modify it in various ways. A linear list consists of nodes and links, while nodes consist of ovals with a number inside them, and links of lines. The different elements can be drawn in an arbitrary way and order. The list will be recognized correctly. Figure 9.11 shows a typical situation. Besides drawing new nodes and links, there are several other ways of interaction. It is possible to switch the position of two nodes, to mark a selected set of nodes, or to insert new nodes between two marked nodes. If a user deletes a node or link, the resulting linear list will be recognized accordingly. Furthermore, a sorting algorithm implemented in the LogicalStructure can be executed either step by step, or the list can sort itself automatically. All these operations are nicely animated. An obvious application scenario of this module could be an introduction to linear lists. Also, it can be used to explain different sorting algorithms, where it may be more appropriate to interpret the list as a linear array. When using the filter based recognition service introduced in Section 9.6, the recognition of domain objects is very simple and straightforward. A new domain node is recognized by simply checking if there is a number inside an oval, and if both objects are no part of an already existing domain node. A domain link is recognized by checking if both endpoints of a line object are contained in (or very close to) domain

Figure 9.11: Screenshot of the linear list module in action. Only meaningful nodes and links are recognized as parts of the structure. 162 nodes. The standard filter objects provided by the recognition service are sufficient to perform these checks. The recognition of the logical structure is very dynamic. If a link or node is removed, the resulting linear list will be recognized correctly. The logical structure is recognized with the help of specialized domain dependent filter objects. Central to the recognition process is one filter object that takes a domain node n together with a list L of domain links and – when applied to a list of domain nodes – returns all those nodes from the list that are connected to node n via a link contained in L. If a new domain node or link has been drawn, then the logical structure is detected using the following approach: First, the filter object just mentioned is used to obtain all domain nodes that are connected to the last linear list node by a domain link and which are not already contained in the linear list. If this collection of domain nodes is not empty, the first drawn node of it will be selected. For this domain node a logic node will be constructed and added to the logic representation of the linear list. The whole process is then repeated recursively until there are no more domain nodes connected to the last linear list node that are not already a part of it. Afterwards the same process is performed using the first list node instead of the last. If a list node or link is removed by the user, the linear list will be reduced to only a start node, and the above described recursive process will be used to recognize the new resulting structure.

9.7.2 Binary tree modules

The two binary tree modules can be seen as extensions of the linear list module (very much in the same sense as trees can be seen as an extension of lists). The goal of these modules is to exemplify more complex types of structure manipulation, such as deletions from a tree or rebalancing operations, which do not only change small details of the appearance but the complete shape of the tree. It represents standard binary search trees and also supports rebalancing by rotations. The symmetric module always maintains a tidy shape of the tree by applying the layout algorithm described by Reingold and Tilford [100]. This means that every recognized node will automatically be moved to its position according to the layout. This way, the tree will always have a uniform look. The second version leaves the layout completely up to the user’s drawing. Hence, the nodes will be arranged as user has drawn them. However, it may occur that restructuring the tree by a command, for instance a rotation, will result in a “bad” shape, i.e. nodes may overlap, links may cross each other etc. The recognition of node objects is the same for both modules and is identical to the list module. However, an additional domain object was introduced representing abstract subtrees. Such an object can be added to a node as a child but cannot have children of its own, since it stands for a complete subtree. Graphically, a subtree is represented by a triangle, as can be seen in Figure 9.12. Supporting subtrees as objects is particularly useful in teaching scenarios, for instance when explaining operations such as rebalancing. In such a case, creating each subtree explicitly by drawing all of its nodes takes considerable time. 9.7 Implemented modules 163

Figure 9.12: Example of a binary tree. Abstract subtrees are represented by triangles.

Recognition of the tree structure consists of detecting the parent-child relations given by links. This is quite straightforward, as it simply involves comparing the y-coordinates of two nodes in addition to a line connecting them. Likewise, left and right siblings can easily be distinguished by their x-coordinates. Users can either draw a tree (even if it is not a legal search tree) or build and modify it by commands. In order to insert a new value in a search tree it is sufficient to draw a node with that value (anywhere on the screen) and issue the insert command. The node will then automatically be inserted by a smooth animation, according to the search tree conditions. In the latter case, both modules will try to maintain a “tidy” graphical representation of the tree, always depending on the current shape. Two nodes can be exchanged in exactly the same way as it was described for linear lists. Directly deleting a node with a gesture command only works for nodes with at most one child, where there is no ambiguity about the action. However, the deletion of an internal node with two children nodes can still be easily accomplished by simulating the steps of the standard delete operation (cf. [94]): first exchange the node with its symmetric predecessor or successor (which always has at most one child) and then delete the node. A tree can be rebalanced by carrying out rotations (cf. our extensive discussion of rotations in PSP and PST in Chapters 3 and 4). A simple left or right rotation gesture on a node will trigger the respective rotation with that node as the “root” of the rotation, as is shown in Figure 9.13.

9.7.3 Petri-net module

This module represents and simulates a state transition Petri-net [95]. A state transition net is a bi-partite graph, consisting of state nodes and transition nodes. State nodes are 164

Figure 9.13: An arrow-shaped pen gesture on a node invokes a right rotation of that node, which will be smoothly animated. either empty or contain a token. In state transition nets, a transition node is ready to “fire” if all state nodes connected to it through incoming links contain a token, and all state nodes connected to it through outgoing links are empty. Petri nets were chosen as a domain because they have a more complex structure than linear lists. State-transition nets in particular were used because of their easy and concise formal definition. The implemented module can be easily extended to support further kinds of Petri nets. This module allows the user to model a Petri-net by drawing state and transition nodes and by connecting them with links. Similar to the linear list module, the recognition is very flexible and dynamic. It is irrelevant in which order or way the user draws the different elements. Depending on the feedback level, the module allows either the creation of arbitrary structures, or only the creation of valid nets. For example, on a high feedback level, the user is not allowed to connect a state node directly to another state node. The connecting line would not be interpreted as a link. Figure 9.14 contains an example of a net that models a mutual exclusion situation, where two processes cannot be both in a critical situation at the same time. Besides drawing nodes and links, the user can paint tokens inside state nodes and delete elements. It is possible to either simulate the firing of only one selected node, or simulate a complete execution step of the net that corresponds to the firing of all ready transitions nodes. A firing transition node is visualized by removing the tokens in all its preceding state nodes, and by creating new ones in its succeeding state nodes. If the user selects an area, all transition nodes inside that area will switch their status between marked and unmarked. Marked transition nodes signal through their color and the color of links connected to them whether or not they are ready to fire. This module could also be well suited for an application in teaching and can also be used for a quick and easy simulation and discussion of state-transition Petri-nets. The possibility to clearly visualize the status of a model by coloring the transition nodes and links is helpful in both areas.

Recognition The recognition of domain objects is very straightforward. A domain state node is represented by an oval and a domain transition node by a rectangle. Tokens are recognized through ovals that are drawn inside domain state nodes that contain no token so far. A link is represented by a line drawn between two different types of nodes. 9.7 Implemented modules 165

Figure 9.14: Screenshot of the Petri-net module.

For the recognition of the logical structure, a similar approach as in the linear list module was chosen. The logical structure can be recognized with the development and application of two basic filter objects. The first filter object ConnectedLinksFilter takes a domain node n and, when applied to a list of domain objects, returns a list of domain links that either start or end in node n. The second domain dependent filter object ConnectedNodesFilter takes a node n together with a set L of links and, applied to a list of domain nodes, returns all nodes that are connected to n via a link in L. These two domain-specific filters are based on the domain independent filter objects provided by the framework, and on three other ones: one that returns when applied to a list of domain objects all domain links, one that returns all domain state nodes and one that returns all domain transition nodes. Every time a new node or link is drawn or removed, the whole structure of the Petri-net is recognized using this approach: 1. From the list of all domain objects determine all domain nodes (both transition and state nodes)

2. Use the ConnectedLinksFilter to get for each node n the list L1 of all domain links that start at n, and the list L2 of all domain links ending in n.

3. Now the ConnectedNodesFilter can be used together with the node n and link lists L1 and L2 to receive all nodes of a different type that are connected to n.

9.7.4 CONNECT4 module

The largest and most sophisticated module implemented so far represents the well- known game Connect four [126], which is shown in Figure 9.15 This module 166

Figure 9.15: Screenshot of the CONNECT4 game module. The player with the dark pieces has just won the game. encapsulates a completely different domain and therefore shows the flexibility of the system regarding the type of domains that can be represented by modules. It also serves as an example for the use of several of the framework services. The difference in the domains is also reflected by the extent of the modules’ logical structure parts. While in the other modules this part is covered by one or two classes, the game CONNECT4 is contained in a separate package. This package contains the pure implementation of the game, including the game dynamics (whose turn it is, whether a player has won, etc.), the game “intelligence” (to make it possible to play against the computer) and all other game relevant issues. As in other modules too, this logical structure can be changed and extended quite independently from the recognition or animation parts. The game starts with the creation of a playfield, which can be done by drawing a grid consisting of an arbitrary number of (approximately) vertical and horizontal lines. Once the playfield is created, the user has several ways to perform a turn. It is possible to draw a piece, fill it if required, and select a column in which to put it. A turn can also be performed faster by simply selecting the column in which the current player wants to throw his or her next piece. A piece will then be automatically created and inserted. By issuing another command on the playfield, the current turn is executed by the computer, which will select a column based on the integrated decision algorithm. This allows for three scenarios. Users can play against each other, against the computer, or the computer can play against itself. Except for inserting pieces, users have several other ways of interaction. They can delete all pieces inside a playfield, completely remove playfields, or “cheat” by delete selected pieces already contained in a playfield. The latter will result in a situation where all pieces above the deleted one will fall down by one level. If one player wins, the row of pieces responsible for the win will be highlighted (see Figure 9.15). If all fields inside the playfield are filled, it will be 9.7 Implemented modules 167 cleared. The module supports several playfields at a time, in which case different games can be played simultaneously. As a side note, due to the underlying distributed structure of the system we get a multi- user network game “for free” without having to spend a single thought on that fact. The decision algorithm for the computer’s moves is very simple; the next move is determined with the help of a simple heuristic. For each column a value is determined that reflects how many pieces of the actual player would be in a row if the piece would be thrown into that column. If the other player would win by throwing a piece into a specific column, the value of this column will be maximized. At the end the column with the highest value is chosen. If two or more columns have the same value, one of them will be selected randomly. This approach has a lot of weaknesses, but it works sufficiently well for the computer to win from time to time against average players. Since the game has been solved (for the standard grid size, the beginner can always win, cf. [6]), a perfect computer player could also be implemented. The recognition of domain objects like pieces and playfield parts is also rather straightforward. Each line represents a potential part of the playfield, while an oval is interpreted as a domain piece. All primitives drawn into a recognized domain piece will be added to that piece’s list of associated primitives, and will therefore be moved and manipulated together with it. The recognition of the playfield is the most difficult part. The recognition process can be divided into different steps. The starting point consists of the line primitives drawn so far by the user. Before the positions of the individual fields can be calculated (which are required in order to place the pieces), the lines that should belong to the playfield must be determined. As a first step, we remove all those lines that are not (approximately) vertical or horizontal. “Removing” in this context means that they will not be considered any further for the playfield recognition. This is done with the help of two filter objects. One filter returns all approximately vertical, the other all approximately horizontal lines when applied to a list of primitives. In the next step, the average lengths of all remaining horizontal lines and all vertical lines are determined, and those lines are removed whose lengths differ by more than a certain value from the respective average value. Afterwards the average x-values of the left ends and right ends of all horizontal lines are determined. We remove all horizontal lines whose x-coordinate of the left or right endpoint differs too much from the average. The same is done with the vertical lines (considering upper and lower points). In the final playfield no intersections among the (approximately) horizontal lines or among the vertical lines are permitted. Hence, all horizontal lines that intersect with another horizontal line will be removed. This is done by successively removing lines until there are no more intersections. The lines removed first are the ones with the most intersections. The same is done for vertical lines. Now the lines of the playfield are known. What comes next is to determine the positions of the fields. In order to obtain them, the lines must be sorted first: the horizontal lines according to their (average) y-values, the vertical ones by their x-values. This is possible 168 since there are no intersections between the lines. After sorting, lines are calculated that lie exactly in the middle of two neighboring horizontal lines. The same is done for vertical lines. The desired positions can now be calculated, since they correspond to the intersections of the lines just calculated. This way of playfield recognition has turned out to be quite reliable. However, it has to be mentioned is that the approach of using average values to remove non-relevant lines only works if the user solely intends to draw a playfield and does not paint other structures requiring line primitives.

9.7.5 Module management

In principle, the framework allows an arbitrary number of modules to be functioning at a time. These can be different modules encapsulating different tasks and domains, or different instances of the same module, each with different settings. This existence of modules side by side requires some form of control, since there is only one global state, and all modules work on this state.

Imagine two modules: module a represents a linear list and module b the CONNECT4 game. When a user draws, for instance, a circle, module a will interpret it as (part of) a node, while module b may recognize it as a piece and move it around the playfield. The management system needs a mechanism to control such situations. This is done by managing the information flow to and from modules. Each module can be either en- abled or disabled. In addition, it can be either active or inactive. A module is only informed about incoming primitive or command events if it is enabled. And when a module issues a general action, then this action is only handed on to the infrastructure component (and from there to the clients) if the module is active. All four combinations of settings are possible. When there is a set of modules and the possibility to enable and activate selected modules, what is still required is a policy that specifies which modules should be enabled and which should be active. Currently, the status of a module can only be set manually when starting the server, or dynamically via the server GUI (see Figure 9.16). When keeping in mind the original goal of the system in a learning environment, it would be very useful to have an automatic policy for setting the status of the different modules. At the moment, a lecturer would have to activate the linear list module manually if he or she wants to explain something from that domain. It would be much more convenient if the framework were able to recognize that the primitives the lecturer is drawing and the commands he/she is issuing fit best to the list module and deactivate other modules as a result (so the above conflict situation cannot occur). 9.8 Summary and outlook 169

Figure 9.16: User interface for server configuration, including the change of a module’s status, a visualization of the current system state, and a recording window.

The system provides a mechanism that prepares it for the implementation of a simple automatic policy. Each module has to maintain a so-called fitting value. This value between 0 and 1 represents how well the input that a module has received so far fits its domain. A value of 1 indicates that the received input fits the domain of that module perfectly, a value of 0 means that the input does not fit at all. Based on this value, the core management could, for instance, activate only the module(s) with the highest current fitting value or deactivate a module if its fitting value falls below a certain threshold.

9.8 Summary and outlook

This chapter has introduced a novel approach to creating interactive visualizations “on the spot” by sketching examples with a stylus. The system developed for this task supports recognition of data structures from user’s freehand input and interaction with the structure by pen gestures. 170

The framework that we have presented has yet to prove its suitability in real-world situations such as lectures or collaborative learning sessions. So far, it has only been used in some selected presentations and undergone informal usability tests. Obviously, new modules for additional data structure domains and algorithms should be developed to cover a growing body of subjects. We consider graph algorithms (such as the ones for finding shortest paths or minimum spanning trees) as very promising candidates, since graphs are prime examples for sketching as an efficient way of construction. Of course, other non-computer science related domains could be supported, for instance molecular or crystalline structures in chemistry. Obviously, the basic recognition service provided by the current version of the system will, for many domains, not be sufficient and more advanced techniques based on statistical methods or machine learning may be required. Future possibilities for research also include concurrent input interpretation by several different domain modules. A completely automatic detection requires methods for the automated selection of the most suitable domain in a given situation.

Chapter 10

Conclusions and future work

This thesis has investigated the potentials and limitations of interactive visualizations for exploring and explaining data structures from the perspectives of both algorithm analysis and algorithm teaching and learning. When analyzing algorithms, a deep understanding of the involved data structures is essential. We have shown for a relatively recent data structure, the priority search pennant, how a suitable visualization can lead to initial assumptions about the complexities of operations on the data structure, which are then verified through formal proof. Priority search pennants are an interesting data structure, as they relax the rigid heap structure known from a similar, well-known data structure, priority search trees. As a result, priority search pennants are easier to maintain under update operations, since rotations are simpler. While the overall asymptotic complexity of insertions and deletions is the same as for priority search trees, the actual cost is reduced. Moreover, a wider variety of balancing schemes, as well as relaxed balancing, can be applied. This advantage comes at a small cost: only one of the typical priority search queue operations, enumerateRectangle, has a worse asymptotic complexity than in priority search trees [49]. In many application domains, this query is not required; instead, another type of range query, minXinRectangle, is essential. We have shown that this operation has the same asymptotic complexity in priority search pennants as in priority search trees. The same was proven for the maxXinRectangle and the minYinXRange operations. In addition, we have established 2·h as a sharp upper and lower bound for the worst case search path lengths in these queries for priority search pennants (where h is the height of the tree) and found that this bound is smaller than for priority search trees, a result that was not expected. Moreover, it was shown that the search paths for the queries in priority search pennants contain at most one fork, which allows for a more efficient practical implementation of these operations. Queries of the type minXinRectangle for points in the plane correspond to queries in sets of ranges used in IP packet classification, where for a given destination address the most specific range must be found which contains that address. In this application domain, we have modified an existing approach based on priority search trees [73] by replacing the priority search trees with priority search pennants or min-augmented range 172 trees [26]. The resulting router table designs offer advantages regarding update times and memory consumption, but also outperform the original approach regarding actual search time. In addition, it was shown that in the case of nonintersecting ranges, a second structure for range queries (as employed in [73]) is not required. We have thereby reduced the memory requirement and update costs of the original solution by another 50%. As we were able to show, this improvement can be transferred to the most general case of conflict-free ranges, as long as insertions are the only updates and no deletions of ranges are carried out. Future research will show whether and how deletions can also be handled efficiently with the reduced structure. Much more than in the field of algorithm analysis, visualizations and animations have been used for teaching algorithms and data structures. While the subjective impressions of both learners and instructors suggest improved learning when using visualizations, early experiments have had very mixed results. This has led to an active branch of research which explores the factors of when and why visualizations can be effective learning aids [87]. Within this framework, we conducted an empirical study on the influence of the level of engagement and interaction that students exert when learning with visualizations. Contrary to the hypothesis, our results showed no significant differences between students who simply viewed algorithm animations and those who could actively change or even visually construct the data structures. A significant influence of the introductory lectures to the topic was found, as well as a strong correlation of test results with the overall performance of the participants in the course. In addition, there is some indication that the possibility to rewind an animation and to watch it in individual steps may have a significant influence as well. Judging from further results of more recent studies conducted by others, we also speculate that students’ constructing of algorithm visualizations should involve the source code of the algorithm. Our refinements to the engagement taxonomy call for further empirical studies to test these hypotheses. When using algorithm animations in class presentations, they should also offer as much interaction as possible, so instructors can develop new examples step by step, react to students’ questions by modifying the input, or involve the students actively in other ways. In addition, such animations must be easy to create, since teachers are reluctant to spend much time on preparing animations. An ideal system would allow an instructor to simply sketch an example “on the spot” without effort. These seemingly conflicting goals – effortless creation and high degree of interaction – have been used as motivation for a new approach to creating interactive algorithm visualizations by sketching. We have developed an open framework for the recognition and animation of data structures from digital ink input. The prototype implementation builds on an existing tool for low- level shape and gesture classification and uses those data as input for the recognition of data structures and commands. Commands (issued as gestures) allow users to interact with the data structures and trigger animations of those actions. The data structures and algorithms are specified in domain modules, which can be plugged into the domain- independent framework. The system is based on a distributed architecture supporting multiple input and output devices. This opens up possibilities for collaborative use of the system in a variety of learning scenarios.

References

[1] S. Adams. Functional pearls: efficient sets – a balancing act. Journal of Functional Programming, 3 (4), pp. 553-561, October 1993. [2] R. Adelmann. A framework for interpretation and animation of (data) structures from graphical primitives. Diploma thesis, University of Freiburg, 2005. [3] G. M. Adel’son-Vel’skiĭ and Y. M. Landis. An algorithm for the organization of information. Soviet Math. Dokl. 3, pp. 1259-1262, 1962. [4] A. V. Aho, J. E. Hopcroft, and J. Ullman. Data Structures and Algorithms. Addison-Wesley, 1983. [5] T. Ahoniemi, E. Lahtinen. Visualizations in preparing for programming exercise sessions. Proceedings of the 4th Program Visualization Workshop, Florence, Italy, June 2006. [6] V. Allis. A knowledge-based approach of Connect-Four. Masters thesis and Technical report IR-163 of the Faculty of Mathematics and Computer Science, Vrije Universiteit Amsterdam, The Netherlands, 1988. [7] C. Alvarado, R. Davis. SketchREAD: a multi-domain sketch recognition engine. Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology (UIST ’04), pp.23-32, Santa Fe, NM, USA, 2004. [8] J. Anderson, A. Corbett, K. Koedinger, R. and Pelletier. Cognitive tutors: lessons learned. Journal of the Learning Sciences 4 (2), 1995. [9] C. R. Aragon and R. G. Seidel. Randomized search trees. Proceedings of the 30th IEEE Symposium on Foundations of Computer Science, pp. 540-545, 1989.. [10] R. N. Awan and B. Stevens. Static/animated diagrams and their effect on students’ perceptions on conceptual understanding in computer aided learning (CAL) environments. In: T. McEwan, J. Gulliksen, and D. Benyon (eds.), People and Computers XIX – The Bigger Picture (Proceedings of HCI 2005), pp. 381-389, London: Springer, 2005. [11] R. Baecker. ‘Sorting out sorting’: a case study of software visualization for teaching computer science. In: J. T. Stasko, J. Domingue, M. H. Brown, and B. A. Price (eds). Software Visualization: Programming as a Multimedia Experience, pp. 369-381. Cambridge: MIT Press, 1998. 174

[12] R. Ben-Bassat Levy, M. Ben-Ari, and P. A. Uronen. The Jeliot 2000 program animation system. Computers and Education, pp. 1-15, 40 (1), 2003. [13] T. Bischoff. Generierung interaktiver Simulationen von Datenstrukturen aus Freihand-Skizzen. Studienarbeit (project report), University of Freiburg, 2006. [14] B. S. Bloom and D. R. Krathwohl. Taxonomy of Educational Objectives; the Classification of Educational Goals, Handbook I: Cognitive Domain. Addison- Wesley, 1956. [15] C. M. Boroni, T. J. Eneboe, F. W. Goosey, J. A. Ross, and R. J. Ross. Dancing with Dynalab. Proceedings of the 27th SIGCSE Technical Symposium on Computer Science Education, pp. 135-129, 1996. [16] C. A. Bröcker. Verteilte Visualisierung geometrischer Algorithmen und Anwendungen auf Navigationsverfahren in unbekannter Umgebung. Ph.D. thesis, University of Freiburg, 1999. [17] M. H. Brown. Algorithm Animation. MIT Press, Cambridge, MA, USA, 1988. [18] M. H. Brown. Zeus: a system for algorithm animation and multi-view editing. Proceedings of the IEEE Workshop on Visual Languages, pp. 4-9, Kobe, Japan, 1991. [19] M. H. Brown and R. Sedgewick. Interesting events. In Stasko et al. (eds), Software Visualization: Programming as a Multimedia Experience, pp. 155-159. Cambridge: MIT Press, 1998. [20] R. Brugger. Interaktive Visualisierung von Prioritätssuchstrukturen. Studienarbeit (project report), University of Freiburg, 2006. [21] W. Citrin and M. D. Gross. Distributed architectures for pen-based input and diagram recognition. Proceedings of AVI ’96, Gubbio, Italy, 1996. [22] T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms, MIT Press/McGraw-Hill, 2001. [23] G. Coulouris, J. Dollimore, and T. Kindberg. Distributed Systems - Concepts and Design, third edition. Addison-Wesley, 2004. [24] P. Crescenzi, C. Demetrescu, I. Finocchi, and R. Petreschi. Leonardo: a software visualization system. Proceedings of the 1st Workshop on Algorithm Engineering (WAE ’97), pp. 146-155, 1997. [25] M. Danielsson. Migration JEDAS–AOF: Einbettung animierter Elemente in Vorlesungsaufzeichnungen. Diploma thesis, University of Freiburg, 2002. [26] A. Datta and T. Ottmann. A note on the IP lookup problem. Unpublished report, University of Freiburg, 2005. [27] R. C. Davis and J. A. Landay. Informal animation sketching: requirements and design. In Proceedings of AAAI 2004 Fall Symposium on Making Pen-Based Interaction Intelligent and Natural, pp. 42-48, Arlington, VA, USA, October 2004. References 175

[28] S. Diehl (ed.). Software Visualization. Berlin: Springer, 2001. (Lecture Notes in Computer Science 2269) [29] S. Diehl, C. Görg, A. Kerren. Animating algorithms live and post mortem. In: S. Diehl (ed.). Software Visualization. Berlin: Springer, 2001. [30] G. Dowling, A. Tickle, K. Stark, J. Rowe, M. Godat. Animation of complex data communications concepts may not always yield improved learning outcomes. Proceedings of the 7th Australasian Conference on Computing Education, pp. 151-154, January 2005, Newcastle, Australia. [31] J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making data structures persisent. Journal of Computer and System Sciences, pp. 86-124, 38 (1), 1989. [32] R. D. Dutton. Weak-heap sort, BIT, 33, pp. 372-381, 1993. [33] R. D. Dutton. The weak-heap data structure. Technical report, University of Central Florida, Orlando, FL, USA, 1992. [34] S. Edelkamp and P. Stiegeler. Pushing the limits in sequential sorting. Proceedings of the 4th International Workshop on Algorithm Engineering (WAE 2000), Saarbrücken, Germany, September 2000. [35] N. Faltin. Structure and constraints in interactive exploratory algorithm learning. In Diehl, Stephan (ed.), Software visualization. Berlin: Springer, 2001. [36] N. Faltin. Strukturiertes aktives Lernen von Algorithmen mit interaktiven Visualisierungen. Ph.D. thesis, University of Oldenburg, 2002. [37] A. Feldmann and S. Muthukrishnan. Tradeoffs for packet classification, IEEE Infocom 2000, pp. 1193-1202, 2000. [38] J. Foley and B. Ribarsky. Next-generation data visualization tools. In L. Rosenblum et al. (eds.), Scientific Visualization – Advances and Challenges, pp. 103-127, Academic Press, 1994. [39] M. Fredman and R. E. Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM, 34, pp. 596-615, 1987. [40] C. Goldstein, S. Leisten, K. Stark, and A. Tickle. Using a network simulation tool to engage students in active learning enhances their understanding of complex data communications concepts. Proceedings of the 7th Australasian Conference on Computing Education, pp. 223-228, January 2005, Newcastle, Australia. [41] G. H. Gonnet. Balancing binary trees by internal path reduction. Communications of the ACM, 26 (12), pp. 1074-1081, 1983 [42] S. Grissom, M. McNally, and T. Naps. Algorithm visualization in CS education: comparing levels of student engagement. Proceedings of the ACM Symposium on Software Visualization, San Diego, CA, USA, 2003. [43] L. J. Guibas and R. Sedgewick. A dichromatic framework for balanced trees. Proceedings of the IEEE 19th Annual Symposium on Foundations of Computer Science 1978, 8-21. 176

[44] P. Gupta. Multi-dimensional packet classification. In: D. P. Mehta, S. Sahni (eds.), Handbook of Data Structures and Applications. Boca Raton: Chapman & Hall/CRC, 2005, 49-1-20 [45] S. Hanke. The performance of concurrent red-black tree algorithms. Proceedings of the 3rd International Workshop on Algorithm Engineering (WAE ‘99), London, UK, July 1999. [46] D. Hendrix, J. H. Cross, J. Jain, and L. Barowski. Providing data structure animations in a lightweight IDE, Proceedings of the 4th Program Visualization Workshop, Florence, Italy, June 2006. [47] C. Hermann, T. Lauer, and S. Trahasch. Eine lernerzentrierte Evaluation des Einsatzes von Vorlesungsaufzeichnungen zur Unterstützung der Präsenzlehre. In Tagungsband der 4. e-Learning Fachtagung Informatik (DeLFI 2006), pp. 39-50, Darmstadt, Germany, September 2006. [48] R. Hinze. A simple implementation technique for priority search queues. Proceedings of the 2001 International Conference on Functional Programming, pp. 110-121, Florence, Italy, September 2001. [49] R. Hinze. A simple implementation technique for priority search queues. Technical Report UU-CS-2001-09, Department of Computer Science, Utrecht University, March 2001. (This is a more detailed version of [48].) [50] C. D. Hundhausen and S. A. Douglas. Using visualization to learn algorithms: Should students construct their own, or view an expert’s? Proceedings of the IEEE International Symposium on Visual Languages (VL '00), September 2000. [51] C. D. Hundhausen and S. A. Douglas. A language and system for constructing and presenting low fidelity algorithm visualizations. In Diehl (ed.) Software Visualization. Berlin: Springer, 2001. [52] C. D. Hundhausen, S. A. Douglas, and J. T. Stasko. A meta-study of algorithm visualization effectiveness. Journal of Visual Languages and Computing 13 (3), pp. 259-290, 2002. [53] C. D. Hundhausen. Toward effective algorithm visualization artifacts: designing for participation and negotiation in an undergraduate algorithms course. Proceedings of CHI ‘98, pp. 54-55, 1998. [54] K. Hünerfauth. Animation und Simulation von Suffix-Trees: Der Algorithmus von Ukkonen. Studienarbeit (project report), University of Freiburg, 2005. [55] W. Hürst, T. Lauer, and E. Nold. A study of algorithm animations on mobile devices. To appear in Proceedings of the SIGCSE Technical Symposium on Computer Science Education, Covington, Kentucky, USA, March 2007. [56] J. Jain, J. H. Cross II, D. Hendrix, and L. Barowski. Experimental evaluation of animated-verifying object viewers for Java. Proceedings of the ACM Symposium on Software Visualization (SOFTVIS 2006), pp. 27-36, Brighton, UK, September 2006. References 177

[57] D. J. Jarc, M. B. Feldman, R. S. Heller. Assessing the benefits of interactive prediction using web-based algorithm animation courseware. Proceedings of the 31st SIGCSE Technical Symposium on Computer Science Education, pp. 377-381, Austin, Texas, USA, March 2000. [58] C. Kehoe, J. Stasko, and A. Taylor. Rethinking the evaluation of algorithm animations as learning aids: an observational study. International Journal of Human-Computer Studies, 54, pp. 265-284, 2001. [59] A. Korhonen. Visual Algorithm Simulation. Ph.D. thesis, Helsinki University of Technology, 2003. [60] A. Korhonen and L. Malmi. Matrix – concept animation and algorithm simulation system. Proceedings of the Working Conference on Advanced Visual Interfaces, pp. 109-114, Trento, Italy, May 2002. [61] E. T. Kraemer, B. Reed, P. Rhodes, and A. Hamilton-Taylor. SSEA: a system for studying the effectiveness of animations. Proceedings of the 4th Program Visualization Workshop, Florence, Italy, June 2006. [62] M. Krebs, T. Lauer, T. Ottmann, and S. Trahasch. Student-built algorithm visualizations for assessment: flexible generation, feedback and grading. Proceedings of ACM ITiCSE 2005, Monte de Caparica, Portugal, June 2005. [63] P. LaFollette, J. Korsh, and R. Sangwan. A visual interface for effortless animation of C/C++ programs. Journal of Visual Languages and Computing, 11 (1), pp. 27-48, 2000. [64] K. S. Larsen, Th. Ottmann, and E. Soisalon-Soininen. Relaxed balance for search trees with local rebalancing. Acta Informatica, 37 (10), pp. 743-763, 2001. [65] T. Lauer. Learner interaction with algorithm visualizations: viewing vs. changing vs. constructing. Proceedings of the 11th Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE 2006), Bologna, Italy, June 2006. [66] T. Lauer. Animation komplexer Datenstrukturen und der dazugehörigen Algorithmen am Beispiel der Fibonacci-Heaps. Staatsexamensarbeit, University of Freiburg, 1999. [67] T. Lauer and R. Danaei-Boroumand. Flexible creation, annotation, and web-based delivery of instructional animations. Proceedings of the AACE World Conference on Educational Multimedia, Hypermedia & Telecommunications (ED-MEDIA 2004), pp. 2240-2247, Lugano, Switzerland, June 2004. [68] T. Lauer, R. Müller, and T. Ottmann. Animations for teaching purposes: now and tomorrow. Journal of Universal Computer Science 7 (5), June 2001. [69] T. Lauer, R. Müller, and S. Trahasch. Learning with lecture recordings: key issues for end-users. Proceedings of the International Conference on Advanced Learning Technologies (ICALT 2004), pp. 741-743, Joensuu, Finland, August 2004. 178

[70] T. Lauer, T. Ottmann, and A. Datta. Update-efficient data structures for dynamic IP router tables. To appear in International Journal of Foundations of Computer Science, 2007. [71] T. Lauer, S. Trahasch, and R. Müller. Web technologies and standards for the delivery of recorded presentations. Proceedings of the 5th IEEE International Conference on Information Technology Based Higher Education and Training (ITHET 2004), Istanbul, Turkey, May 2004. [72] J. Lienhard and T. Lauer. Multi-layer recording as a new concept of combining lecture recording and students handwritten notes. Proceedings of the 10th ACM International Conference on Multimedia, pp. 335-338, Juan-les-Pins, France, December 2002. [73] H. Lu and S. Sahni. O(log n) dynamic router-tables for prefixes and ranges. IEEE Transactions on Computers, 53 (10), pp. 1217-1230, 2004. [74] M. Luber. Multimediale Übungen für ausgewählte Datenstrukturen: Ein AVL- Tree-Plugin für das MA&DA-System. Studienarbeit (project report), University of Freiburg, 2006. [75] L. Malmi, A. Korhonen, and R. Saikkonen. Experiences in automatic assessment on mass courses and issues for designing virtual courses. Proceedings of the 7th Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE 2002), Aarhus, Denmark, June 2002. [76] R. E. Mayer, J. Heiser, and S. Lonn. Cognitive constraints on multimedia learning: when presenting more materials results in less understanding. Journal of Educational Psychology, 93, pp. 187-188, 2001. [77] B. H. McCormick, T. A. DeFanti, and M. D. Brown. Visualization in scientific computing. Computer Graphics, 21 (6), 1987. [78] E. McCreight. Priority search trees. SIAM Journal on Computing, 14 (2), pp. 257- 276, 1985. [79] K. Mehlhorn. Data Structures and Algorithms, Vol. 3: Multidimensional Searching and Computational Geometry, EATCS Monographs in Theoretical Computer Science, New York: Springer Verlag, 1984. [80] K. A. Mohamed. Increasing the accuracy of anticipation with lead-lag timing analysis of digital freehand writings for the perceptual environment. Proceedings of the International Conference of Computational Methods in Sciences and Engineering 2004 (ICCMSE 2004), pp. 387-390, Athens, Greece, November 2004. [81] K. A. Mohamed and T. Ottmann. Fast interpretation of pen gestures with competent agents. Proceedings of the IEEE Second International Conference on Computational Intelligence, Robotics and Autonomous Systems, Singapore, December 2003. References 179

[82] A. Moreno and M. S. Joy. Jeliot 3 in a demanding educational setting. Proceedings of the 4th Program Visualization Workshop, Florence, Italy, June 2006. [83] R. Müller. Wahlfreier Zugriff in Präsentationsaufzeichnungen am Beispiel integrierter Applikationen. Infix-Verlag, St. Augustin, 2000. [84] R. Müller and T. Ottmann. The ‘Authoring on the Fly’ system for automated recording and replay of (tele)presentations. ACM/Springer Multimedia Systems Journal, 8 (3), 2000. [85] B. A. Myers. Incense: A system for displaying data structures. Computer Graphics, 17 (3), pp. 115-125, July 1983. [86] N. Myller. Automatic prediction question generation during program visualization. Proceedings of the 4th Program Visualization Workshop, Florence, Italy, June 2006. [87] T. L. Naps, G. Rößling, V. Almstrum, W. Dann, R. Fleischer, C. Hundhausen, A. Korhonen, L. Malmi, M. McNally, S. Rodger, and J. A. Velázquez-Iturbide. Exploring the role of visualization and engagement in computer science education. ACM SIGCSE Bulletin 35 (2), June 2003. [88] T. L. Naps and S. Grissom. The effective use of quicksort visualizations in the classroom, Journal of Computing Sciences in Colleges, 18 (1), pp. 88-96, October 2002. [89] T. L. Naps. JHAVÉ: supporting algorithm visualization. Computer Graphics and Applications, 25 (5), pp. 49-55, September/October 2005. [90] T. L. Naps, J. R. Eagan, and L. L. Norton. JHAVÉ: An environment to actively engage students in web-based algorithm visualizations. Proceedings of the SIGCSE Technical Symposium on Computer Science Education, pp. 109-113, Austin, Texas, March 2000. [91] M. W. Newman, J. Lin, J. I. Hong, and J. A. Landay. DENIM: an informal web site design tool inspired by observations of practice. Human-Computer Interaction 18 (3), pp. 259-324, 2003. [92] J. Nievergelt and E. M. Reingold. Binary search trees of bounded balance. SIAM Journal of Computing 2, pp. 33-43, 1973. [93] E. Nold. Algorithmenanimationen für mobile Endgeräte. Diploma thesis, University of Freiburg, 2006. [94] T. Ottmann and P. Widmayer. Algorithmen und Datenstrukturen. Third edition. Heidelberg: Spektrum Akademischer Verlag, 1996. [95] C. A. Petri. Kommunikation mit Automaten. Ph.D. thesis. University of Bonn, 1962. [96] W. Pierson and S. Rodger. Web-based animation of data structures using JAWAA. Proceedings of the 29th SIGCSE Technical Symposium on Computer Science Education, pp. 267-271, Atlanta, GA, USA, 1998. 180

[97] N. Pinkwart, U. H. Hoppe, L. Bollen, and E. Fuhlrott. Group-oriented modelling tools with heterogeneous semantics. Proceedings of the 6th International Conference on Intelligent Tutoring Systems. Lecture Notes in Computer Science 2363, pp. 21-30. Berlin: Springer, 2002. [98] R. Rasala. Automatic array algorithm animation in C++. Proceedings of the 30th SIGCSE Technical Symposium on Computer science education, pp. 257-260, New Orleans, LA, USA, 1999. [99] B. Reed, P. Rhodes, E. Kraemer, E. T. Davis, and K. Hailston. The effect of comparison cueing and exchange motion on comprehension of program visualizations. Proceedings of the ACM Symposium on Software Visualization (SOFTVIS 2006), pp. 181-182, Brighton, UK, September 2006. [100] E. Reingold and J. Tilford. Tidier drawing of trees. IEEE Transactions on Software Engineering, 7 (2), pp. 223-228, 1981. [101] P. Rhodes, E. Kraemer, and B. Reed. The importance of interactive questioning techniques in the comprehension of algorithm animations. Proceedings of the ACM Symposium on Software Visualization (SOFTVIS 2006), pp. 183-184, Brighton, UK, September 2006. [102] G.-C. Roman and K. Cox. Pavane: a system for declarative visualization of concurrent computations. Journal of Visual Languages and Computing, 3 (1), pp. 161-193, January 1992.

[103] G. Rößling. ANIMAL-FARM: An extensible framework for algorithm visualization. Ph.D. thesis, University of Siegen, 2002. [104] G. Rößling. The ANIMAL algorithm animation tool. Proceedings of the 5th Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE 2000), pp. 37-40, Helsinki, Finland, 2000. [105] G. Rößling and T. Naps. A testbed for pedagogical requirements in algorithm visualizations. Proceedings of the 7th Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE 2002), pp. 96-100, Aarhus, Denmark, June 2002. [106] D. Rubine. Specifying gestures by example. ACM SIGGRAPH Computer Graphics, 25 (4), pp. 329-337, July 1991. [107] M. Ruiz-Sanchez, E. Biersack, W. Dabbous. Survey and taxonomy of IP address lookup algorithms, IEEE Network, 15 (2), pp. 8-23, March/April 2001. [108] J.-R. Sack and T. Strothotte. A characterization of heaps and its applications. Information and Computation, 86 (1), pp. 69-86, May 1990. [109] S. Sahni, K. S. Kim, and H. Lu. IP router tables. In: D. P. Mehta, S. Sahni (eds.), Handbook of Data Structures and Applications. Boca Raton: Chapman & Hall/CRC, 2005. [110] J. T. Stasko. Using direct manipulation to build algorithm animations by demonstration. Proceedings of Conference on Human Factors and Computing Systems, pp. 307-314, New Orleans, Louisiana, USA, 1991. References 181

[111] J. T. Stasko. TANGO: A framework and system for algorithm animation. IEEE Computer, 23(9), pp. 27-39, 1990. [112] J. T. Stasko. The path-transition paradigm: a practical methodology for adding animation to program interfaces. Journal of Visual Languages and Computing, 1 (3), pp. 212-236, 1990. [113] J. T. Stasko, J. Domingue, M. H. Brown, and B. A. Price (eds). Software Visualization: Programming as a Multimedia Experience. Cambridge: MIT Press, 1998. [114] P. Strohm. Erweiterung von JEDAS um Features für ein TableTop-System. Studienarbeit (project work), University of Freiburg, 2006. [115] M. Thorup. Space efficient dynamic stabbing with fast queries. Proceedings of ACM Symposium on Theory of Computing (STOC ´03), pp. 649-658, 2003. [116] S. Trahasch. Skriptgesteuerte Wissenskommunikation und personalisierte Vorlesungsaufzeichnungen. Berlin: Logos Verlag, 2006. [117] J. Urquiza-Fuentes and J. Á. Velázquez-Iturbide. An evaluation of the effortless approach to build algorithm animations with WinHIPE. Proceedings of the 4th Program Visualization Workshop, Florence, Italy, June 2006. [118] J. Vogel, M. Mauve, W. Geyer, V. Hilt, and C. Kuhmünch. A generic late-join service for distributed interactive media. Proceedings of the 8th ACM International Conference on Multimedia, pp. 259-267, Los Angeles, CA, USA, 2000. [119] J. Vuillemin. A data structure for manipulating priority queues. Communications of the ACM, 21 (4), pp. 309-315, 1978. [120] J. Vuillemin. A unifying look at data structures. Communications of the ACM, 23, pp. 229-239, April 1980. [121] M. Weiser. Some computer science issues in ubiquitous computing. ACM SIGMOBILE Mobile Computing and Communications Review, 3 (3), July 1999. [122] L. Wenyin. On-line graphics recognition: state-of-the-art. In J. Lladós and Y.-B. Kwon (eds.), GREC 2003 – Proceedings of the 5th IAPR International Workshop on Graphics Recognition (LNCS 3088), pp. 291-304, Berlin: Springer 2004. [123] B. Zupancic and H. Horz. Lecture recording and its use in a traditional university course. Proceedings of the 7th Annual Conference on Innovation and Technology in Computer Science Education (ITiCSE 2006), pp. 24-28, Aarhus, Denmark, June 2006. [124] W3. Adobe Flash website. http://www.adobe.com/products/flash/ (last access 29/11/2006). [125] W3. Beez – SVG Bezier animator. http://beez.sourceforge.net (last access 04/12/2006). [126] W3. Connect Four on Wikipedia. http://en.wikipedia.org/wiki/Connect_Four (last access 28/11/2006). 182

[127] W3. Corel Animation Shop 3 product page. http://www.corel.com (last access 04/12/2006). [128] W3. JEDAS – Java Educational Animation System. http://ad.informatik.uni- freiburg.de/jedas (last access 23/11/2006). [129] W3. Lecturnity. http://www.lecturnity.de (last access 24/11/2006). [130] W3. MADA – Multimedia Algorithm & Data Structure Assessment. http://ad.informatik.uni-freiburg.de/mada (last access 02/12/2006). [131] W3. RealMedia website. http://www.real.com (last access 29/11/2006). [132] W3. Recorded lectures on Fibonacci heaps. http://electures.informatik.uni- freiburg.de:8484/catalog/chapter.do?courseId=info2ss2005&chapter=16# (last access 24/11/2006). [133] W3. Scalable Vector Graphics (SVG). http://www.w3.org/TR/SVG/ (last access 24/11/2006). [134] W3. SMART Board. http://www.smarttech.com (last access 05/12/2006). [135] W3. SPSS – Statistical package for the social sciences. http://www.spss.com (last access 24/11/2006).

Appendix A

Proofs and Algorithms for Part I

A1. Proof of Corollary 4.5

Corollary 4.5: For any non-negative integer m,

m ∑ ⎣⎦lg k = (m +1) ⋅ ⎣lg(m +1) ⎦− 2⎣⎦lg(m+1) +1 + 2 k=1 where lg k is the binary logarithm of k.

Proof: (by induction over m)

0 m = 0: ∑⎣⎦lgk = 0 = 1⋅ ⎣⎦lg1 − 2 + 2 k =1 m → m+1: Assume the identity holds for all values smaller than or equal to m

m+1 m ∑ ⎣⎦⎣lgk = lg(m +1) ⎦+ ∑ ⎣⎦lgk k =1 k =1 = ⎣⎦lg(m +1) + (m +1) ⋅ ⎣⎦lg(m +1) − 2⎣⎦lg(m+1) +1 + 2 = (m + 2) ⋅ ⎣⎦lg(m +1) − 2⎣⎦lg(m+1) +1 + 2

Unless m+2 is a power of two, ⎣⎦lg(m + 2) = ⎣lg(m +1)⎦ and hence (m + 2)⋅ ⎣⎦lg(m +1) − 2⎣⎦lg(m+1) +1 + 2 = (m + 2)⋅ ⎣⎦lg(m + 2) − 2⎣⎦lg(m+2) +1 + 2 .

It remains to consider the case when m+2 is a power of two, i.e. ∃i: m+2 = 2i: 184

In that case, lg(m + 2) = ⎣⎦lg(m + 2) = 1+ ⎣lg(m +1)⎦ . Hence, (m + 2) ⋅ ⎣⎦lg(m +1) − 2⎣⎦lg(m+1) +1 + 2 = (m + 2) ⋅(⎣⎦lg(m + 2) −1) − 2⎣⎦lg(m+2) −1+1 + 2 = (m + 2) ⋅(⎣⎦lg(m + 2) ) − (m + 2) − 2⎣⎦lg(m+2) + 2

= (m + 2) ⋅(⎣⎦lg(m + 2) ) − 2lg(m+2) − 2⎣⎦lg(m+2) + 2 = (m + 2) ⋅(⎣⎦lg(m + 2) ) − 2⋅ 2⎣⎦lg(m+2) + 2 = (m + 2) ⋅(⎣⎦lg(m + 2) ) − 2⎣⎦lg(m+2) +1 + 2 □

A.2. Range query algorithms

This section lists the implementations of several range query algorithms for priority search pennants, priority search trees, and min-augmented range trees. As experimental test have found, iterative versions doing the same as their recursive counterparts perform significantly better (in Java implementations).

A2.1 Improved iterative implementation of minXinRectangle in a PSP

The following algorithm is our most efficient implementation of minXinRectangle for priority search pennants. It is optimized in the sense that there are no redundant comparison operations and the fork variable is updated only as long as it is needed, which makes it more than 60% faster than our original recursive implementation. Unfortunately, the code becomes rather long and less readable.

Algorithm: minXinRectangle for PSP (optimized) Input: 3-sided query rectangle given by xleft, xright, ytop Output: the leftmost point in the query rectangle 1. minXinRectangle (xleft, xright, ytop) 2. Min = null 3. N = root 4. fork = null 5. finished = false 6. 7. while (!finished)

8. if yN ≤ ytop

9. if xN ≥ xleft and xN ≤ min{xright, xMin} 10. Min = N 11. fork = null

12. if N.left ≠ null and sN ≥ xleft 13. if N.right ≠ null and sN < xMin 14. fork = N 15. N = N.left 16. else if N.right ≠ null and sN < xMin Appendix A 185

17. N = N.right 18. else 19. finished = true 20. else if xN > sN

21. if N.left ≠ null and sN ≥ xleft 22. N = N.left 23. else 24. finished = true 25. else 26. if N.right ≠ null and sN < xMin 27. N = N.right 28. else 29. finished = true 30. /* endwhile */ 31. 32. if fork ≠ null 33. N = fork.right 34. finished = false 35. while (!finished)

36. if yN ≤ ytop

37. if xN ≥ xleft and xN ≤ min{xright, xMin} 38. Min = N

39. if N.left ≠ null and sN ≥ xleft 40. N = N.left 41. else if N.right ≠ null and sN < xMin 42. N = N.right 43. else 44. finished = true 45. else if xN > sN

46. if N.left ≠ null and sN ≥ xleft 47. N = N.left 48. else 49. finished = true 50. else 51. if N.right ≠ null and sN < xMin 52. N = N.right 53. else 54. finished = true 55. /* endwhile */ 56. /* endif */ 57. 58. return Min

A.2.2 Iterative algorithm for minYinXRange in a PSP

The below algorithm is a non-recursive version of minYinXRange. The conditionals for checking which subtrees must be visited have been simplified for clarity and brevity. In line 6 we assume that yMin = ∞ if Min = null. Note that this algorithm does not visit the 186 left before the right subtree but the non-dominated before the dominated one. Hence, pointers to the dominated and non-dominated child are used rather than to the left and right child. These must either be stored separately or determined by an additional comparison operation.

Algorithm: minYinXRange for PSP Input: an x-range given by xleft, xright Output: a bottommost point in the query range (or null, if no such point exists) 1. minYinXRange (xleft, xright) 2. Min = null 3. N = root 4. fork = null 5. while N ≠ null or fork ≠ null

6. if xright ≥ xN ≥ xleft and yN ≤ yMin 7. Min = N 8. fork = null 9. if “non-dominated child must be inspected” 10. if “dominated child must be inspected” 11. fork = N 12. N = N.nondominatedChild 13. else if “dominated child must be inspected” 14. N = N.dominatedChild 15. else if fork ≠ null 16. N = fork.dominatedChild 17. fork = null 18. else 19. N = null 20. return Min

Appendix A 187

A.2.3 Algorithms for minXinRectangle in a PST

The recursive variant for minXinRectangle in a PST is almost identical to that in a PSP (cf. Algorithm 4.1). The only difference is the stronger pruning condition in lines 9 and 11 due to the heap property. Note that an actual implementation can be made more efficient by only checking the condition yN ≤ ytop once. The triple check in the below listing is for easier readability of the code. As usual, we assume xMin = ∞ if Min = null in lines 7 and 11.

Algorithm: minXinRectangle for PST (recursive) Input: 3-sided query rectangle given by xleft, xright, ytop Output: the leftmost point in the query rectangle (or null, if no such point exists) 1. minXinRectangle(xleft, xright, ytop) 2. Min = null 3. if (!isEmpty) 4. inspect(root) 5. return Min

6. inspect(N)

7. if yN ≤ ytop and xN ≥ xleft and (xN ≤ min{xright, xMin}) 8. Min = N

9. if N.left ≠ null and sN ≥ xleft and yN ≤ ytop 10. inspect(N.left)

11. if N.right ≠ null and sN < min{xright, xMin} and yN ≤ ytop 12. inspect(N.right)

An iterative version cannot be implemented in the same way as for PSP, since there may be more than one fork on the search path. Hence, a mechanism is required to track back the path until the last fork. The below implementation uses the parent pointers of nodes; an alternative would be a stack where all forks are stored. Note that this algorithm uses the property proven by Mehlhorn [79] and listed in Chapter 3.3.5. It was about 40% faster than the recursive implementation in informal tests. 188

Algorithm: minXinRectangle for PST (iterative) Input: 3-sided query rectangle given by xleft, xright, ytop Output: the leftmost point in the query rectangle (or null, if no such point exists) 1. minXinRectangle(xleft, xright, ytop) 2. N = root 3. Min = null 4. // Step 1: find leftmost node below y-threshold 5. while !N.isEmpty and yN ≤ ytop 6. if xN ≥ xleft and xN ≤ min{xright, xMin} 7. Min = N 8. if N.isLeaf 9. break 10. else if sN ≥ xleft 11. N = N.left 12. else 13. N = N.right 14. 15. if (xMin ≤ sN) 16. return Min 17. 18. // Step 2: Trace back right children on search path 19. repeat 20. while !N.isRoot and N = N.parent.right 21. N = N.parent 22. if N.isRoot or xMin ≤ sN 23. return Min 30. else 31. N = N.parent.right 32. until yN ≤ ytop 33. 33. // Step 3: Now we only have to search the subtree below N 34. while !N.isEmpty and yN ≤ ytop 35. if xN ≥ xleft and xN ≤ min{xright, xMin} 36. Min = N 37. if N.isLeaf 38. break 39. else if !left.isEmpty and yN.left ≤ ytop 40. N = N.left 41. else if sN ≥ xleft and !right.isEmpty and yN.right ≤ ytop 42. N = N.right 43. 44. return Min

Appendix A 189

A.2.4 Iterative algorithm for minXinRectangle in a MART This iterative version of the minXinRectangle algorithm for MART closely follows the verbal description given in Chapter 5.4.2. It is about 45% faster than the recursive implementation.

Algorithm: minXinRectangle for MART (iterative) Input: 3-sided query rectangle given by xleft, xright, ytop Output: the leftmost point in the query rectangle (or null, if no such point exists) 1. minXinRectangle(xleft, xright, ytop) 2. N = root 3. // Step 1: find leftmost node below y-threshold 4. while (!N.isLeaf and yN ≤ ytop) 5. if xN ≥ xleft 6. N = N.left 7. else 8. N = N.right 9. 10. if N.isLeaf and yN ≤ ytop and xN < xright 11. return N 12. 13. // Step 2: Find first fitting “umbrella” node 14. while yN > ytop 15. while !N.isRoot and N == N.parent.right 16. N = N.parent 17. if N.isRoot 18. return null 19. else 20. N = N.parent.right 21. 22. // Step 3: Find leftmost node that is below y-threshold 23. while !N.isLeaf 24. if yN.left ≤ ytop 25. N = N.left 26. else 27. N = N.right 28. if xN ≤ xright 29. return N 30. else 31. return null

190

A.3 Examples for minXinRectangle queries in PSP and PST

A.3.1 Worst-case example of minXinRectangle in a PSP

According to Theorem 4.12, the number of visited nodes for a minXinRectangle query is bounded by 2·h, where h is the height of the PSP. The example in Figure A3.1 shows that this bound can actually be reached. Hence, 2·h is both an upper and lower bound for the worst-case number of inspected nodes during minXinRectangle.

Figure A3.1: For this priority search pennant, the query minXinRectangle(14, 64, 17) results in a search path of maximal length, i.e. twice the height of the tree.

A.3.2 Example of minXinRectangle in a PST

The example depicted in Figure A.3.2 illustrates that the number of forks on the search path of a minXinRectangle query in a priority search tree is bounded by the height of the tree rather than by a constant (recall that in PSP there can be at most one fork). The highlighted nodes mark the search path for the query minXinRectangle(24, 65, 20); the pink node is the one storing the result. In addition, the example shows that the number of inspected nodes can be more than 2h, i.e. twice the height of the tree. In this example, 13 nodes are visited, which is more than 2·h = 10. Hence, the worst case search path for minXinRectangle in a PST is longer than in a PSP with the same height. Note that the height for a PSP and a PST with the same number of elements and the same balancing scheme is the same in most cases. While the PST requires one extra level for the leaves, the PSP has an additional level containing the top node.

Appendix A 191

Figure A.3.2: The query minXinRectangle(24, 65, 20) results in a search path consisting of 13 visited nodes, including 4 forks.

192

A.4 Insertion of ranges in conflict-free sets

We briefly sketch our proof showing that the improvement of the router table design proposed in [73] can be generalized for conflict-free ranges, as long as insertions are the only updates. Since restating all the definitions from [73] would be far beyond this appendix, we briefly give a verbal definition of the important terms and refer the reader to [73] for details. Let R be a conflict-free set of ranges and r = [u, v] ∉ R a range to be inserted in R. Let maxP and minP be defined as in [73]. That is, maxP(u, v, R) is the rightmost endpoint of any conflict-free subset whose projection covers a left part of [u, v] (see Figure A.4.1 left). Likewise, minP(u, v, R) is the leftmost endpoint of any conflict-free subset whose projection covers the a right part of [u, v] (see Figure A.4.1 right). Note that maxP and minP may not exist. We therefore extend the definition as follows: maxP(u, v, R) = u – 1 if maxP is undefined according to [73] minP(u, v, R) = v + 1 if minP is undefined according to [73] This ensures that maxP and minP always exist, which relieves us of the extra burden of distinguishing cases where any of the variables is undefined. It is obvious that the computation of maxP and minP will not become more complex through our modification.

xy xy

uv uv minP maxP

Figure A.4.1: Examples of conflicts between the new range [u, v] and the existing range [x, y] ∈ R.

Lemma A.4.1: Let R be a conflict-free set of ranges and r = [u, v] ∉ R. Then [u, v] is in conflict with R ⇔ ∃ [x, y] ∈ R: (x < u ∧ maxP(u, v, R) < y < v) ∨ ∃ [x, y] ∈ R: (u < x < minP(u, v, R) ∧ v < y) i.e. if [x, y] left-overlaps [u, v] and y > maxP, or if [x, y] right-overlaps [u, v] and x < minP (see Figure A.4.1). Appendix A 193

Proof: (i) “⇒”: Assume r = [u, v] is in conflict with R. Then there is at least one range s = [x, y] ∈ R which intersects r and there is no resolving subset for r and s in R, i.e. there is no conflict-free subset Q ⊂ R whose projection Π(Q) = r ∩ s. 1st case: s left-overlaps r (cf. Figure 1, left), i.e. x < u ≤ y < v According to its definition, maxP(u, v, R) is the rightmost endpoint of a resolving subset starting at u and ending by v, or maxP(u, v, R) = u-1, if such no such resolving subset exists. Since we have assumed that there is no resolving subset for r and s in R, we know that y > maxP(u, v, R). Hence, ∃ [x, y] ∈ R: (x < u ∧ maxP(u, v, R) < y < v). 2nd case: s right-overlaps r (cf. Figure 1, right), i.e. u < x ≤ v < y According to its definition, minP(u, v, R) is the leftmost endpoint of a resolving subset starting at v and starting by u, or minP(u, v, R) = v+1, if such no such resolving subset exists. Since we have assumed that there is no resolving subset for r and s in R, we know that x < minP(u, v, R). Hence, ∃ [x, y] ∈ R: (u < x < minP(u, v, R) ∧ v < y). (ii) “⇐”: Both of the above cases also work in the reverse direction; it only remains to show u ≤ y in the first and x ≤ v in the second case to prove the actual overlap of s and r. However, this is easy to see since y > maxP(u, v, R) and by definition, maxP(u, v, R) ≥ u-1. Likewise, in the second case x < minP(u, v, R) and minP(u, v, R) ≤ v+1. □

We now show how we can verify the (necessary and sufficient) condition for a conflict from Lemma A.4.1 by two minXinRectangle queries carried out on our data structure. (We will see that the procedure is a generalization of the same method for nonintersecting ranges.) For the first part of the condition, we have to check if there is a point P in our structure which is to the left and below point (v, u) and further right than maxP(u, v, R), i.e. we look for a point in the south-grounded rectangle bounded by xleft = maxP(u, v, R)+1, xright = v – 1, and ytop = u – 1. We can do this by performing minXinRectangle( maxP(u, v, R) + 1, v – 1, u – 1 ); it is clear that this query will return a value other than null if and only if there is a point in the given rectangle, i.e. if there is a conflict due to a left-overlapping range. For conflicts due to right-overlapping ranges, we must check whether there is a point P in the structure which is to the right and above point (v, u) and below minP(u, v, R), i.e. we are looking for a point in the rectangular area bounded by xleft = v+1, ybottom = u+1, and ytop = minP(u, v, R) – 1. (The right side of the rectangle is open; the boundary is given implicitly by the maximum possible x-value.) 194

As was the case for nonintersecting ranges, we cannot directly query that rectangle because it is not south-grounded. We therefore query the extended rectangle with the same borders except that ybottom = 0. Hence, we perform minXinRectangle( v+1, ∞, minP(u, v, R) – 1 ). This query will never return null, since the rectangle contains the point corresponding to the default range.

Theorem A.4.2: Let r = [u, v] ∉ R and L = minXinRectangle(v+1, ∞, minP(u, v, R) -1), i.e. L is the leftmost point of map1(R) in the query rectangle. Then:

If yL ≤ u, then the query rectangle contains no point P ∈ map1(R) with yP > u.

This means that if the leftmost point is not above u, then there will be no other point above u.

Proof: (by contradiction)

Assume that yL ≤ u and that there is a point P = (xP, yP) ∈ map1(R) is in the rectangle with yP > u. By the proof of Lemma 1 we then know that [u, v] is in conflict with [yP, xP], i.e. there is no resolving subset for the intersection [yP, v].

Since L is the unique leftmost point, we know that xP > xL. Since both L and P are in the query rectangle, we also know that xP, xL > v and yP, yL < minP(u, v, R) and hence yP, yL ≤ v. This yields the overall order

(*) yL ≤ u < yP < minP(u, v, R) ≤ v+1 ≤ xL < xP

Hence, the ranges corresponding to L and P, [yL, xL] and [yP, xP] are intersecting. Since R is conflict-free, there must be a resolving subset Q within R for those two ranges, i.e.

Q is conflict-free and ∏(Q) = [yP, xL].

Note, however, that Q cannot contain any range t = [tleft, tright] such that tleft < minP(u, v, R) ≤ v+1 ≤ tright, since the respective point P(t) would be inside the query rectangle and be further left than L, which is a contradiction.

Let Qleft = {[x, y] ∈ Q : x < minP(u, v, R)} be the subset of Q containing those ranges that start left of minP(u, v, R). It is clear that the projection ∏(Qleft) must also be a range. Also, ∏(Qleft) must end by v, as we just saw that there can be no range starting left of minP(u, v, R) and ending right of v. Also note that there must be a range in Qleft ending at least at minP(u, v, R), otherwise the projection of Q would have a gap at minP(u, v, R), which would also be a contradiction. Hence, ∏(Qleft) = [yP, z], where minP(u, v, R) ≤ z ≤ v. Recall that minP(u, v, R) was defined as the left endpoint of the largest projection ∏(A) of a subset A such that ∏(A) is a range that ends in v. Hence ∏(A) = [minP(u, v, R), v].

Then the projection ∏(A ∪ Qleft) = [yP, v]. Since [yP, v] is a right-aligned sub-range of [u, v], this means minP(u, v, R) ≤ yP. Since minP(u, v, R) – 1 is the upper bound of our query rectangle, P must be outside the query rectangle, which contradicts our initial assumption. □ Appendix A 195

Theorem A.4.2 tells us that it is sufficient to query the extended rectangle in order to find out whether or not the original rectangle contains any point, i.e. whether or not there is a conflict due to a right-overlap. This leads us to the following necessary and sufficient condition:

Theorem A.4.3: Let R be a set of conflict-free ranges and r = [u, v] ∉ R. Then r is in conflict with R if and only if any of the following conditions is true: (1) minXinRectangle( maxP(u, v, R)+1, v-1, u-1 ) exists

(2) minXinRectangle( v+1, ∞, minP(u, v, R)-1 ) returns L such that yL > u.

This theorem is a generalization of the analogous condition for nonintersecting ranges. If R is nonintersecting, it is clear that maxP = u – 1 and minP = v + 1, and hence, we obtain the exact same queries as in Theorem 5.4. Hence, a single structure supporting minXinRectangle queries in O(log n) time is sufficient to maintain a set of conflict-free ranges under insertions within the desired time bound of O(log n), provided that the values for maxP(u, v, R) and minP(u, v, R) can be computed efficiently. This can be done, for instance, by the collection of red- black trees proposed in [73]. However, we note that for the deletion of ranges, an additional structure may be required to check whether conflicts arise.

Appendix B

Materials used in the experiment

The following pages list the materials which were used for the experiment described in Chapter 7.

B.1 Introductory survey

The purpose of this questionnaire was to collect demographic data about the participants and information on students’ preparation for the experiment.

Fragebogen: Multimedia-Übung

Studiengang: ______

Hauptfach: ______

Fachsemester: ______

Alter: ______

Muttersprache: ______

Geschlecht: □ männlich □ weiblich

Ich besuche die Vorlesung im Hörsaal ... □ (fast) immer □ oft (mehr als 50%) 198

□ eher selten (weniger als 50%) □ (fast) nie

Ich beschäftige mich mit den Vorlesungsaufzeichnungen ... □ nur wenn ich nicht in der Vorlesung war □ auch wenn ich in der Vorlesung war □ (fast) gar nicht

Die Vorlesungen zum Thema Fibonacci-Heaps am 6. und 10. Juni ... □ habe ich im Hörsaal mitverfolgt □ habe ich als Aufzeichnung betrachtet □ ich habe nur die Folien durchgelesen □ ich habe andere Materialien gelesen □ kenne ich gar nicht

Ich besuche die Übungsgruppe □ (fast) immer □ oft (mehr als 50%) □ eher selten (weniger als 50%) □ (fast) nie

Übungsaufgaben bearbeite ich meistens: □ alleine □ in der Gruppe

Ich habe vor an der Klausur teilzunehmen: □ ja □ nein

B.2 Post-test

After the time spent with the visualizations, participants had to complete the test which was used for scoring. The first five questions are considered visual questions, as they are very similar to what students had seen or done during the visualization part. Questions 6-10 are referred to as non-visual questions in Chapter 7.

Appendix B 199

Auf dieser und den folgenden Seiten finden Sie einige Aufgaben zu Fibonacci- Heaps. Bitte bearbeiten Sie diese Aufgaben alleine! Sie können beliebig viele Zwischenschritte angeben. Wenn Sie mehr Platz benötigen, verwenden Sie die Rückseiten. Falls Sie sich bei irgendeinem Schritt nicht sicher sind, können Sie das gerne als Kommentar dazuschreiben.

Aufgabe 1: Führen Sie auf dem folgenden Fibonacci-Heap die Operation decreasekey(54 Æ 11) aus:

15 8 21

18 75 29 24 14 49

31 31 41

54

Aufgabe 2: Entfernen Sie aus dem folgenden Fibonacci-Heap mit der delete-Operation den Knoten mit Schlüssel 43:

13 18 34

45 25 54 21

68 57 49 26 43

81 31 70 64

200

Aufgabe 3: Führen Sie auf dem folgenden Fibonacci-Heap die Operation deletemin (Entfernen des Minimums) aus:

4 21 6 1 8

49 2535 13 5 29 24 14

38 31 42

43

Aufgabe 4: Geben Sie den Fibonacci-Heap an, der entsteht, wenn man in einen leeren Fibonacci-Heap nacheinander die Schlüssel 14, 78, 3, 7, 90, 52, 24, 15, 30, 71 einfügt und dann die deletemin-Operation ausführt.

Aufgabe 5: Führen Sie auf dem folgenden Fibonacci-Heap nacheinander folgende Operationen durch: insert(82), deletemin, decreasekey(20 Æ 4), delete(42)

40 34 13 5 19 8

49 2535 20 42 24 14

31 62 79

Appendix B 201

Aufgabe 6: Durch welche Folge von Operationen könnten die Fibonacci-Heaps mit dem folgenden Aussehen entstanden sein? (Wenn es keine solche Folge gibt, schreiben Sie „unmöglich“)

(i)

21 6 1 4 13

(ii)

5

13 78 29

(iii) 7

25

31

49

Aufgabe 7: Wie könnte man Fibonacci-Heaps zum Sortieren von n unsortierten Schlüsseln verwenden (kurze Beschreibung in Worten genügt)? Vergleichen Sie das Verfahren mit den Ihnen bekannten Sortierverfahren bzgl. Zeit- und Platzbedarf.

202

Aufgabe 8: Es wird vorgeschlagen, das Konsolidieren („Aufräumen“) der Wurzelliste statt bei deletemin bei der insert-Operation durchzuführen, da bei insert ohnehin auch das Minimum überprüft und ggfs. aktualisiert werden muss. Die Operation deletemin würde dann lediglich aus einem remove-Schritt bestehen. Wie beurteilen Sie diesen Vorschlag?

Aufgabe 9: Es soll eine zusätzliche Operation increasekey eingeführt werden, die es – analog zu decreasekey – ermöglicht, den Schlüssel eines Knotens N auf den Wert k zu erhöhen. Auch hierdurch wird möglicherweise die Heapbedingung verletzt. Wie könnte man eine solche Funktion implementieren, so dass anschließend die Heapbedingung wieder gilt? (Sie können die Basis-Methoden link, cut und remove für Ihre Beschreibung verwenden.)

Welche Laufzeit hat diese Operation?

B.3 Final questionnaire

After the test, students were asked to complete a final questionnaire about their personal opinion on the experiment and on visualizations.

Bitte beantworten Sie zum Abschluss noch die folgenden Fragen:

1. Ich finde Animationen hilfreich zum Verständnis der Fibonacci-Heaps. □ □ □ □ □ Definitiv ja eher ja weiß nicht eher nein definitiv nein

2. Die Animationen finde ich eher verwirrend. □ □ □ □ □ Definitiv ja eher ja weiß nicht eher nein definitiv nein

Appendix B 203

3. Eine ausführliche Text-Definition (ganz ohne Bilder) reicht zum Verständnis der Fibonacci-Heaps aus. □ □ □ □ □ Definitiv ja eher ja weiß nicht eher nein definitiv nein

4. Eine Text-Definition zusammen mit statischen Bildern reicht zum Verständnis der Fibonacci-Heaps aus. □ □ □ □ □ Definitiv ja eher ja weiß nicht eher nein definitiv nein

5. Animationen sind für das Verständnis der Fibonacci-Heaps besser als Einzelbilder („Vorher – nachher“) □ □ □ □ □ Definitiv ja eher ja weiß nicht eher nein definitiv nein

6. Ich konnte den Animationen gut folgen. □ □ □ □ □ Definitiv ja eher ja weiß nicht eher nein definitiv nein

7. Die Animationen hätten interaktiver sein sollen. □ □ □ □ □ Definitiv ja eher ja weiß nicht eher nein definitiv nein