CERN Radiation Monitoring Electronics (CROME) Project in Order to Develop a Replacement
Total Page:16
File Type:pdf, Size:1020Kb
Contributions to the SIL 2 Radiation Monitoring System CROME (CERN RadiatiOn Monitoring Electronics) Master Thesis CERN-THESIS-2017-478 //2017 Author: Nicola Joel Gerber Supervisor CERN - HSE-RP/IL: Dr. Hamza Boukabache Supervisor EPFL - SEL j STI: Dr. Alain Vachoux March 2017 Abstract CERN is developing a new radiation monitoring system called CROME to replace the currently used system which is at the end of its life cycle. As radiation can pose a threat to people and the environment, CROME has to fulfill several requirements regarding functional safety (SIL 2 for safety-critical functionalities). This thesis makes several contributions to increase the functionality, reliability and availability of CROME. Floating point computations are needed for the signal processing stages of CROME to cope with the high dynamic range of the measured radiation. Therefore, several IEEE 754-2008 conforming floating point operation IP cores are developed and implemented in the system. In order to fulfill the requirements regarding functional safety, the IP cores are verified rigorously by using a custom OSVVM based floating point verification suite. A design methodology for functional safety in SRAM-based FPGA-SoCs is developed. Some parts of the methodology are applied to the newly developed IP cores and other parts were ported back to the existing system. In order to increase the reliability and availability of CROME, a new in-system communication IP core for decoupling the safety-critical from the non-safety-critical parts of the FPGA-SoC-based system is implemented. As the IP core contains mission critical configuration data and will have an uptime of several years, it is equipped with several single event upset mitigation techniques such as ECC for memory protection and fault-robust FSMs in order to increase the system's reliability and availability. To make this IP core usable for the overall system, the existing Linux kernel and userspace software stack is adapted for it. Finally, some critical design weaknesses were found in the current system architecture. In order to remediate those weaknesses, an improved system architecture is proposed to be implemented in future design iterations. Acknowledgements I would like to thank my supervisor from EPFL Dr. Alain Vachoux. The numerous courses given by him at EPFL equipped me with all the necessary tools to successfully finish this thesis. His advice during the thesis enabled me to overcome several problems I faced while conducting the project. I would like to express profound gratitude to my supervisor at CERN Dr. Hamza Boukabache. His guidance through this project was invaluable. He provided assistance when needed and at the same time left me the freedom to explore my own ideas. Many thanks to Ciar´anToner for his advice, the fruitful discussions and all his proofreading. And also a big thank you for all the rest of the HSE-RP/IL section for providing such an inspiring and pleasant work environment. And finally, I would like to thank my family. Thank you for all your love, understanding and support throughout my studies. Without you I would not be where I am. Contents 1 Introduction 1 1.1 Motivation for the CROME Project . .1 1.2 Scope of this Thesis . .2 1.3 Organisation of this Report . .3 2 System Architecture5 2.1 System Level Overview . .5 2.1.1 CROME Measuring and Processing Unit . .6 2.1.2 Avnet PicoZed . .8 2.2 Xilinx Zynq Device Architecture . .8 2.2.1 Zynq Boot Sequence . 10 2.2.2 Fitness of Zynq for CROME . 11 2.2.3 PCAP-ICAP Issue . 12 2.3 Proposed Modifications on the Architecture to Increase Reliability and Availability . 13 2.3.1 SIL 3 Microcontroller . 15 3 Methodologies and Techniques for the Design of Safety-Critical Systems 19 3.1 Development Cycle to Achieve SIL 2 . 19 3.2 Overview of Error Correction Schemes . 20 3.2.1 Repetition Codes and Parity Bits . 21 3.2.2 Checksums . 21 3.2.3 Cyclic Redundancy Checks . 22 3.2.4 Error-Correcting Codes . 23 3.2.5 Cryptographic Hash Functions . 24 3.3 Techniques to Increase Reliability and Availability of FPGA Designs . 24 3.3.1 Taxonomy of FPGAs . 24 3.3.2 Influence of Radiation in Deep Sub-Micron Silicon Devices . 25 3.3.3 Single Event Upset Mitigation Techniques . 26 3.4 Overview of Verification Methodologies . 28 3.4.1 Formal Verification . 30 3.4.2 Open Source VHDL Verification Environment Methodology . 32 3.4.3 SystemVerilog Direct Programming Interface . 33 4 Securing PS/PL Data Transfer 35 4.1 Adaption of the FPGA Firmware . 36 4.1.1 Formalisation of the Functional Requirements . 37 4.1.2 Functional Description . 38 4.1.3 Implementation . 40 4.1.4 Verification . 42 4.1.5 Discussion of the Design . 43 4.2 Adaptions of the Linux Software Stack . 44 4.2.1 Proposed Changes in the Linux Software Stack for Future Releases . 45 4.3 Current State of the Development . 45 4.3.1 Final Remarks on the Verification . 46 5 Floating Point Computation 47 5.1 Used Subset of IEEE 754-2008 . 47 5.1.1 Binary Representation and Interpretation . 47 5.1.2 Rounding of Floating Point Numbers . 49 5.1.3 Omitted Functionality . 51 5.2 IEEE 754-2008 Verification Suite . 52 5.2.1 Implementation of Coverage Points . 52 5.2.2 Generic Test Functions . 54 5.3 Floating Point Comparison . 56 5.3.1 Functional Description . 56 5.3.2 RTL Architecture . 56 5.3.3 Verification Environment . 56 5.3.4 Implementation and Verification Results . 57 5.4 Integer to Floating Point Conversion . 57 5.4.1 Functional Description . 58 5.4.2 RTL Architecture . 58 5.4.3 Verification Environment . 59 5.4.4 Implementation and Verification Results . 61 5.5 Floating Point to Integer Conversion . 62 5.5.1 Functional Description . 62 5.5.2 RTL Architecture . 63 5.5.3 Verification Environment . 63 5.5.4 Implementation and Verification Results . 65 5.6 Floating Point Addition . 65 5.6.1 Functional Description . 66 5.6.2 RTL Architecture . 67 5.6.3 Verification Environment . 68 5.6.4 Implementation and Verification Results . 71 5.7 Floating Point Multiplication . 73 5.7.1 Functional Description . 73 5.7.2 RTL Architecture . 73 5.7.3 Verification Environment . 74 5.7.4 Implementation and Verification Results . 76 5.8 Example Application of the Floating Point Cores: Temperature Compensation . 77 6 Conclusion and Outlook 79 6.1 Conclusion on the Completed Work . 79 6.1.1 Floating Point Core Design and Verification . 79 6.1.2 Securing PS/PL Data Transfer . 80 6.1.3 General Methodologies to Increase the Reliability of the Design . 80 6.2 Future Work . 81 6.2.1 General Methodologies to Increase the Reliability of the Design . 81 6.2.2 Architectural Modifications to Increase the Reliability of the Design . 81 Glossary 83 Bibliography 89 List of Figures 2.1 Diagram of the complete radiation monitoring system . 6 2.2 PCBs inside the submodules of CROME . 6 2.3 High level block diagram of a PicoZed SoM . 8 2.4 High level block diagram of a Zynq device . 9 2.5 Block diagram of the PS of a Zynq device . 10 2.6 Block diagram of the ICAP and PCAP interfaces . 12 2.7 Simplified functional block diagram of the current state of the system . 13 2.8 System architecture . 14 2.9 Proposed system architecture for the next iteration . 15 3.1 General V-cycle for HDL development . 20 3.2 Formal verification flow for floating point units used at Intel Corporation . 31 4.1 Block diagram of the current link between the PS and the PL . 35 4.2 Block diagram of the secured link between the PS and the PL . 38 4.3 State diagram of the FSM for securing the PS/PL transfer . 39 5.1 Generic binary representation of a floating point number . 47 5.2 Logarithmic to logarithmic plot of the runtime of a result based adder verification procedure against the uniformly spread seed numbers around the target result . 55 5.3 RTL architecture of the integer to floating point converter . 59 5.4 Testbench architecture for floating point to integer and integer to floating point conversion . 61 5.5 RTL architecture of the floating point to integer converter . 63 5.6 RTL architecture of the floating point adder . 68 5.7 Testbench architecture for floating point addition and multiplication . 71 5.8 RTL architecture of the floating point multiplier . 75 List of Tables 2.1 Probability of failure per hour in continuous mode . .7 4.1 Memory mapping of the configuration parameters . ..