Master-Thesis 12.07.2014 Prof

Master-Thesis 12.07.2014 Prof

Master-Thesis Name: Yves Fischer Thema: Monitoring and Diagnostics for C/C++ Real-Time Applications Fakultät für Informatik und Wirtschaftsinformatik Arbeitsplatz: CERN, Genf Referent: Prof. Dr. Fuchß Korreferent: Prof. Dr. Hoffmann Abgabetermin: 12.07.2014 CERN-THESIS-2014-086 21/07/2014 Karlsruhe, 13.01.2014 Der Vorsitzende des Prüfungsausschusses Prof. Dr. Ditzinger Eidesstattliche Erklärung Statutory Declaration Ich versichere alle verwendeten Quellen angege- I hereby declare that no other person's work has ben zu haben. been used without due reference. The german ver- Alle übernommenen Textzeilen, ganze Textpassag- sion of this statutory is authoritative. en, Tabellen oder Bilder sind mit Quelle angege- ben. Dies gilt unabhängig davon ob die Quelle ein Buch oder eine Veröffentlichung im Internet ist. Auch eine direkte Übersetzung eines fremdspra- chigen Dokuments ist mit Quellenangabe verse- hen. Die deutsche Version dieser Erklärung ist bindend. Prévessin, 12th of July 2014 Yves Johannes Wolfgang Fischer i Acknowledgements First I would like to thank my supervisor at CERN, Felix Ehm, for his support and helpful guidance. His advice, expertise and understanding added considerably to my graduate experience. I would like to express my gratitude to professor Thomas Fuchß, as he provided me with many great points to include and gave me advice whenever it was required. I would also like to thank Stephen Page, who proofread my text and provided me with helpful comments and suggestions. ii Abstract Knowledge about the internal state of computational processes is essential for problem diagnostics as well as for constant monitoring and pre-failure recognition. The CMX li- brary provides monitoring capabilities similiar to the Java Management Extensions (JMX) for C and C++ applications. This thesis provides a detailed analysis of the requirements for monitoring and diagnos- tics of the C/C++ processes at CERN. The developed CMX library enables real-time C/C++ processes to expose values with- out harming their normal execution. CMX is portable and can be integrated in different monitoring architectures. Contents 1 Introduction 1 1.1 Motivation .................................... 1 1.2 Overview of CERN ............................... 2 1.3 Structure of this Thesis ............................. 4 2 Monitoring of C/C++ Systems 5 2.1 Technical Environment ............................. 6 2.2 Motivation .................................... 7 2.3 Related Work .................................. 8 3 Requirements 10 3.1 Terms ....................................... 10 3.2 Functional Requirements ............................ 12 3.3 Technical Requirements ............................. 13 4 Existing Technologies and Solutions 16 4.1 Monitoring Systems ............................... 16 4.2 Logging Systems ................................. 19 4.3 Interprocess Communications ......................... 21 4.3.1 Possibilities ............................... 22 4.3.2 Evaluation ................................ 24 4.4 Existing Software Solutions ........................... 25 4.5 Conclusions ................................... 27 5 Design of CMX Protocol and Data Structures 28 5.1 Design of CMX Data Structures ........................ 28 5.2 Shared Memory ................................. 30 5.3 Design of CMX Protocol ............................ 34 5.3.1 Real-Time Constraints ......................... 35 5.3.2 Concurrent Access to Shared Memory ................ 36 5.3.3 Verification ............................... 43 5.4 Comparison with Similar Algorithms ..................... 45 iv 5.5 Verification with Models ............................ 48 5.5.1 Simple Example of a Promela Model ................. 48 5.5.2 Model of Two Writers ......................... 49 5.5.3 Model of Concurrent Reader/Writer ................. 50 5.6 Conclusions ................................... 52 6 Implementation of CMX 53 6.1 Platform and Toolchain ............................. 53 6.1.1 Compiler ................................ 53 6.1.2 Atomicity of Operations ........................ 55 6.1.3 Processor Memory Consistency .................... 58 6.1.4 Processor Cache Coherency ...................... 64 6.1.5 Choosing a Suitable Timesource ................... 64 6.2 Implementation Overview ........................... 66 6.2.1 The Implementation in C ....................... 66 6.2.2 The C++ API .............................. 66 6.2.3 Independent Usage of CMX ...................... 70 6.2.4 Real-Time Compatibility ........................ 70 6.2.5 Automated Testing ........................... 73 6.2.6 Performance Analysis ......................... 75 6.2.7 Possible Extensions ........................... 77 6.3 Conclusions ................................... 78 7 Integration in CERN Infrastructure 79 7.1 A Remote Agent for CMX ............................ 79 7.1.1 Diagnostic Access in the DIAMON GUI ............... 80 7.1.2 Monitoring of CMX Enabled Applications in DIAMON ....... 80 7.2 Interaction of CMX with Build Tools ..................... 82 7.3 Conclusions ................................... 85 8 Summary 86 Literature 87 Glossary 91 List of Definitions and Requirements 92 List of Figures 93 List of Tables 95 v 1 Introduction High system availability is essential for successfully operating a large industrial facility. For this reason it is important to identify sources of errors and potential problems as early as possible. In the field of computing, system and application monitoring is applied to fulfill this task. This work describes the implementation of application monitoring and diagnostic tools that are suitable for real-time applications, such as the ones which are used in CERN's accelerator control system. 1.1 Motivation Large installations like particle accelerators or industrial sites are expensive in construc- tion and operation. The cost for building the LHC accelerator was about 6 billion CHF. The experiments which depend on the correct functioning of the accelerator are funded independently. The material costs for the ATLAS experiment were 540 million CHF [1, p. 17]. The only time frame in which this investment pays back is when everything is working correctly and, in case of the LHC, collisions can be delivered to the experiments. The condition of a proper operating accelerator depends on the reliability of many smaller or bigger hard- and software components. The BE-CO group, where this work was carried out, is responsible for a large part of the accelerator controls software. Naturally our primary goal is to provide reliable, fault- tolerant software and - in case of unforeseen events - response times as short as possible. Monitoring plays a critical role in early recognition of possible error conditions and fast identification of problem sources. The monitoring system constantly watches about 2,000 machines and applies many rules to detect problems. 1 1 Introduction Monitoring is always limited to what developers consider worth being monitored. Hence, enabling developers to expose metrics easily from within their application in a suitable and standardized way is a key factor for success. We failed to find any existing solution in this area for C/C++ applications that fulfills our requirements to a large extent and is at the same time compatible with the existing mon- itoring and diagnostic system. This was the initial reason to develop a new monitoring and diagnostics library for C/C++, called CMX, at CERN. 1.2 Overview of CERN This project is carried out at CERN in Geneva, where physicists and engineers are re- searching the fundamental structure of the universe. Founded in 1954, the CERN labora- tory is one of Europe's first joint ventures and now has 21 member states. Figure 1.1: CERN Accelerator Complex [2] Today CERN hosts many particle physic experiments. The biggest and most well known is the particle accelerator LHC and the detectors ATLAS and CMS, most known for the discovery of the Higgs-Boson. A bunch of particles in the LHC, that collide in one of the detectors, have gone through a cascade of increasingly powerful accelerators (Fig. 1.1) to reach the speed of 0.999 999 991 2 1 Introduction (a) “The observed probability (local p-value) that (b) “Event recorded with the CMS detector in 2012 the background-only hypothesis would yield the at a proton-proton centre-of-mass energy of 8 same or more events as are seen in the CMS data, TeV. The event shows characteristics expected as a function of the SM Higgs boson mass for from the decay of the SM Higgs boson to a pair of the five channels considered. The solid black line photons (dashed yellow lines and green towers). shows the combined local p-value for all chan- The event could also be due to known standard nels.” model background processes” Figure 1.2: Pictures related to the discovery of the Higgs Boson, CMS Collaboration [3] times the speed of light. Moreover, at the time of collision, every proton has reached a top energy of 3:5 TeV. Compared to energy emissions in the real-world, these energies are still low. However, in the LHC they are so heavily concentrated in space like nowhere else. A complete beam, in the LHC two beams circulating in opposite directions, contains: 2 · 3:5 TeV · (1:1 · 1011 particles) · (2808 Bunches) ' 350 MJ, which is about as energetic as a 400t train, such as the French TGV, travelling at 150 km/h [1]. Colliding inside one of the four detectors, the protons or lead-ions produce sub-atomic particles. Particle detectors use different devices to identify these

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    101 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us