DEPARTMENT of COMPUTER SCIENCE Carnegie-Mellon Unlvers,W

CMU-CS-82- 154 Monitoring Distributed Systems: A Relational Approach / Richard Snodgrass " L _"_ ....... December, 1982 DEPARTMENT of COMPUTER SCIENCE Carnegie-Mellon Unlvers,W CMU-CS- 82-15 4 Monitoring Distributed Systems: A Relational Approach Richard Snodgrass December, 1982 Department of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 Submitted to Carnegie-Mellon University in partial fulfillment of the requirements for the degree of Doctor Of Philosophy. Copyright @ 1982 Richard T. Snodgrass This research was sponsored in part by the Defense Advanced Research Projects Agency (DOD), ARPA Order No. 3597, monitored by the Air Force Avionics Laboratory Under Contract F33615-78-C-1551, in part by the Ballistic Missile Defense Advanced Technological Center Under Contract DASG60-81-0077, and in part by an NSF fellowship. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency, the BMDATC, the US Government, or Carnegie-Mellon University. Ada is a registered trademark of the Department of Defense (Ada Joint Program Office). Unix is a registered trademark of Bell Laboratories. Abstract Monitoring is the extraction of dynamic information concerning a computational process, as that process executes. Distributed systems, with their qualitative and quantitative increases in complexity, demand an intelligent monitor. The thesis of this dissertation is that monitoring is fundamentally an information processing activity, and that the relational model, as applied in relational databases, is an appropriate formalization of this information. In this approach, the notions of entity (data structures, processes, hardware components, etc.) and relationship (processes running on processors, messages in queues, etc.) are structured as a set of time- varying relations. Queries on this collection of relations are translated into retrieval and computational activities to be performed by the monitor. Data collection is an important aspect of monitoring. After discussing a model of the environment where data collection takes place, a flexible, strongly typed, efficient data collection mechanism is presented. The impact of various features of the environment on this mechanism is examined. The user specifies the desired actions of the monitor with a high.level, non-procedural query language called TQuel. This language is a superset of Quel, a relational database query language, with additional syntax and semantics to incorporate time as an integral part of the language. A formal semantics with several useful properties relating to monitoring is presented. Queries in TQuel must be converted into a procedural form in order to execute efficiently. Update networks, designed for dynamic, incremental updating of derived relations, are introduced as the target language for the TQuel translator. These structures are composed of access nodes, which interface effectively with the system being monitored, and operator nodes, which carry out the desired computations. The generation of update networks involves several sophisticated techniques. Most of these techniques have their origins in relational databases, and have been adapted to the monitoring domain. In order to validate the relational approach, most of the components of the monitor were implemented and instrumented. Measurements show that the monitor can generate and process several hundred events per second, while at the same time presenting the conceptual viewpoint of time-varying relations to the user. Acknowledgements It is a privilege and a pleasure to thank my committee for their contributions to this thesis. Bill Wulf can take a good part of the credit in what research and writing skills I have gained in my years at CMU. Anita Jones was a constant source of good ideas, good questions (equally important) and encouragement. In Joe Newcomer I tapped a wealth of strategies for coming up with just the right abstraction. Zary Segall integrated my research efforts with several others in the department, generating a synergistic whole greaterthan the sum of its parts. Art Evans provided a valuable outside viewpoint, and did so under severe time constraints. Collectively, they represent almost 50 years of experience in designing, programming, and instrumenting distributed systems. This experience is evident in the ideas they generate and the systems they design and build. One would be hard pressed to find five people more suited to participate in this research. In addition to my committee, other members of the department contributed to the thesis. The StarOS group, especially Bob Chansler and lvor Durham, gave freely of their time in explaining the details of StarOS and modifying the operating system to test out my ideas. The other muitiprocessor evaluation seminar participants, including Xavier Castillo and Ajay Singh, provided a forum for developing concepts in a friendly atmosphere. Peter Highnam helped me design the Ethernet protocol and implemented Medic, and Ivor Durham implemented the first version of the StarMon sensors. Karsten Schwans was helpful in all stages of the research. Barat Jayaraman, of the University of North Carolina, was instrumental in developing the TQuel semantics. M. Satyarayanan, Mark Sherman, and Steve Vegdahl were an immense help in the remote production of this document. Finally, Mike Accetta and Sharon Burks provided their unique expertise, administrative and otherwise, enabling me to finish in a reasonable amount of time. Getting a Ph.D. is as much a matter of persistence as a matter of performance. My wife, Merrie Brucks, made the whole effort so much more enjoyable by her companionship, empathy,support, and affection. To all, a sincere thank you. Table of Contents I i Table of Contents I. Approach 1. The Problem 3 1.1. The Cause and the Result 3 1.2. Definitions 3 1.3. The Impact of Complexityon Monitoring 6 1.4. Knowledge Representation 8 2. The Relational Model 1 1 2.1. Entitiesand Relationships 12 2.2. Time 13 2.3. Summary 14 Ii. A Temporal Query Language 3. An Informal Definition 19 3.1. The Quel Retrieve Statement 20 3.2. Adding Time to Quel 21 3.3. The TQuel Retrieve Statement 22 3.3.1. TQuel Expressions 23 3.3.2. Temporal Expressions 24 3.3.3. Event Expressions 26 3.4. The Temporal Selection Component 27 3.5. The Temporal Delimiter Component 27 3.6. Aggregate Operators 28 3.7. An Example 30 3.8. Summary 33 4. Semantics of the TQuel Retrieve Statement 35 4.1. Tuple Calculus 35 4.2. Path Expressions in TQuel 37 4.2.1. The Start, Stop, and At Clauses 38 4.2.2. The When Clause 40 4.3. Formal Semantics 44 ii I Table of Contents Monitoring Distributed Systems 4.4. Aggregate Operators 46 4.4.1. Informal Semantics 47 4.4.2. Formal Semantics 48 4.5. Indeterminacy 51 4.5.1. Semantics 52 4.5.2. Defaults 55 4.5.3. Indeterminacy and Aggregate Operators 57 4.6. Summary 59 II1. Realization 5. A Low Level Data Collection Mechanism 63 5.1. The Environment 64 5.2. The Mechanism 66 5.3. Integrating Sampling and Tracing 69 5.4. Other Uses for Receptacles 70 5.5. Interaction with the Remote Monitor 70 5.5.1. Naming 71 5.5.2. Time 74 5.6. Summary 75 6. The Update Network 77 6.1. Incremental Updating of Temporal Relations 78 6.2. Generic Nodes 81 6.3. Access Nodes 81 6.4. Operator Nodes 82 6.5. Node Interconnection 82 7. Generating the Update Network 85 7.1. Generating an Initial Network 85 7.1.1. Universal Relations 86 7.1.2. Compensation 88 7.1.3. Checkpoint Tuples 89 7.1.4. Other Details 90 7.2. Efficiency 91 7.2.1. Graph Transformation 93 7.2.2. Temporal Order 97 7.2.3. Limiting the Semantics of the When Clause 98 7.2.4. Node Scheduling 100 7.2.5. Other Issues 100 7.3. Summary 102 Table of Contents I iii 8. An Implementation 105 8.1. General Structure of the Monitor 105 8.2. Sensor Specification 108 8.2.1. Sensor Description Files 110 8.2.2. The Description File Preprocessor 112 8.3. The Resident Monitor 115 8.3.1. StarMon: General Structure 115 8.3.2. Sensor Performance Measurements 116 8.3.3. Medic 119 8.4. The Ethernet Protocol 119 8.5. The Remote Accountant 120 8.6. The Update Network 121 8.6.1. Tuple Representation 122 8.6.2. Interpretation and Compilation 123 8.6.3. Update Network Performance 124 8.6.4. Relationship to Data Flow 126 8.7. A Step Back 126 8.8. Evaluation 128 IV. Conclusion and Appendices 9. Conclusion 133 9,1. Surprises 135 9.2. Remaining Problemsand FutureResearch 137 Refe rences 139 Appendix A. BNF of the TQuel Retrieve Statement 153 Appendix B. Proof of the Conversion Theorem 155 Appendix C. Operator Nodes 161 Appendix D. StarMon 167 D.1. The StarOS TaskForce 167 D.2. The Monitor Object 167 D.3. Pipes 168 D.4. Receptaclesand Sensors 170 D.5. A Microcoded Sensor 174 D.6. Efficiency 175 Appendix E. An Extended Example 177 E.1. Sensor DescriptionFile Processing 178 E.2. DescriptionFile Formats 179 E.3. Changes to the Source Code 181 E.4. Initializingthe Monitor 183 iv I Table of Contents Monitoring Distributed Systems E.5. Query Processing 186 Appendix F. Update Network Performance 191 F.I. The Unoptimized Version 192 F.2. The Optimized Version. 195 F.3. The Compiled Version 196 F.4. Summary 200 List of Figures II v List of Figu res Figure 2-1: Relationships between Primitive and Derived Events and Periods 15 Figu re 3-1 : Instantaneous versus Cumulative Count 30 Figure 3-2: Configurationof the PDE task force 32 Figu re 4-1 : The Overlap versus Coverage Interpretations for Combining

Load more