Lewis University Dr. James Girard Summer Undergraduate Research Program 2021 Faculty Mentor - Project Application

Exploring the Use of High-level Parallel Abstractions and for Functional and Gate-Level Simulation Acceleration

Dr. Lucien Ngalamou Department of Engineering, Computing and Mathematical Sciences

Abstract System-on-Chip (SoC) complexity growth has multiplied non-stop, and time-to- market pressure has driven demand for innovation in simulation performance. Logic simulation is the primary method to verify the correctness of such systems. Logic simulation is used heavily to verify the functional correctness of a design for a broad range of abstraction levels. In mainstream industry verification methodologies, typical setups coordinate the validation e↵ort of a complex digital system by distributing logic simulation tasks among vast server farms for months at a time. Yet, the performance of logic simulation is not sucient to satisfy the demand, leading to incomplete validation processes, escaped functional bugs, and continuous pressure on the EDA1 industry to develop faster simulation solutions. In this research, we will explore a solution that uses high-level parallel abstractions and parallel computing to boost the performance of logic simulation.

1Electronic Design Automation

1 1 Project Description

1.1 Introduction and Background SoC complexity is increasing rapidly, driven by demands in the mobile market, and in- creasingly by the fast-growth of assisted- and autonomous-driving applications. SoC teams utilize many verification technologies to address their complexity and time-to-market chal- lenges; however, logic simulation continues to be the foundation for all verification flows, and continues to account for more than 90% [10] of all verification workloads. Logic simulation is the primary tool used to validate a wide range of design aspects, foremost among these being the correctness of the system’s functionality, both in its behavioral (functional verifi- cation) description, as well as in its structural (gate-level verification) one. A large body of research has been devoted to accelerate the logic simulation process [11]. Besides improving the eciency of algorithms, parallel computing has long been widely considered as the es- sential solution to provide scalable simulation productivity. Unfortunately, logic simulation has been one of the most dicult problems for parallelization due to the irregularity of prob- lems and the hard constraints of maintaining causal relations [8]. Today commercial logic simulation tools depend on multicore CPUs and clusters by mainly exploiting the task-level parallelism. In fact, large simulation farms could consist of hundreds of , which would be expensive and power hungry. Meanwhile, the communication overhead might fi- nally outweigh the performance improvement through integration of more machines.

Simulators can be grouped into two families based on their internal architecture: oblivious simulators compute all gates in the system during every simulation cycle and entail a simpler software design. Oblivious simulators have the advantage of low control overhead, but can spend significant computation time unnecessarily evaluating gates over and over whose out- put values do not change from cycle to cycle. Event-driven simulators limit the amount of computation by selectively simulating in each cycle only those gates who inputs have changed since the previous cycle, and whose output may thus change in response to the switching stimulus. While the sequencing of gate evaluation in oblivious simulation can be statically determined at compile-time, event-driven simulators require a dynamic runtime scheduler, hence entail a more complex software structure. However, this latter approach is vastly more common in commercial tools because the scheduler performance overhead is largely o↵set by the fact that for many designs only 1 to 10% of the gates switch at each cycle, thus requiring significantly less computation. In investigating the potential for large performance improve- ments in logic simulators, it is noted that logic netlists present a high degree of structural parallelism that could be exploited by simulating individual gates concurrently. An ideal platform leveraging such concurrency is the modern (GPU), as it includes many simple and identical computational units capable of operating concurrently by executing same instruction sequence on di↵erent data. GPU computing is a recent exten- sion of traditional graphics processing, providing a general purpose programming interface for GPU devices, and making the vast parallel computational resources available for appli-

2 cations beyond the processing of graphic primitives. Platforms for GPU computing include AMD’s FireStream [12] and ’s CUDA [13]. In addition, vendor-independent parallel computing standards such as OpenCL [3] have also been developed.

In recent years, accelerations of logic simulation have been proposed [9, 15]. The novel sim- ulation architectures maximize the utilization of concurrent hardware resources while mini- mizing expensive communication overhead. The experimental results show that GPU-based simulators are capable of handling the validation of industrial-size designs while delivering more than an order-of-magnitude performance improvements on average, over the fastest multi-threaded simulators commercially available. However the deployment of such GPU- based simulators is expensive in terms of hardware resources as well as their integration in commercial simulators. The above approach did not find any breakthrough in IC2 verifi- cation and as such the use of multi-threaded commercial simulators is considered as best practice.

In this research, we explore the use of high-level parallel abstractions and parallel computing for functional and gate-level simulation acceleration aiming to develop a fast simulation tool that can be used by medium-scale size company that cannot a↵ord to spend 50K on existing commercial simulators. The future tool will also be used by universities to train hardware verification engineers.

1.2 Methodology The state of IC verification practice is characterized by the following facts:

The preeminence of three majors companies (Cadence, , and Synop- sis), which control the complete IC verification market.

The existence of advanced methodologies for functional verification:

– OSVVM: Open Source VHDL Verification Methodology[4] – OVM: Open Verification Methodology[5] – UVM: Universal Verification Methodology[6]

The issues related to time closing in gate-level simulation, leading to delay in product delivery.

The nonexistence of viable open-source software for functional and gate-level simu- lation. Those available, i.e, Icarus [2], Verilator[7], and EDA Playground[1] cannot be used in a serious IC design project.

The high demands in industry for well-trained verification engineers.

2Integrated Circuit

3 The above considerations and personal industrial experiences have led us to start this research project with the motivations to:

Provide Open-Source/Open Hardware Engines for Hardware Verification.

Address the ever-increasing computing demand for VLSI verification.

Provide a framework for training of new verification engineers

Provide low-cost verification solutions for industry.

The proposed architecture of the logic simulator to be developed is given below:

Figure 1: Architecture of the Logic Simulator (EBC: Embedded Computing Module, STA: Static Timing Analyzer)

1.3 Aim and Fundamental Questions The central focus of this research is to develop a fast logic simulator that uses parallelism in abstraction and computation. Most existing simulators leverage parallelism at the gate level and deploy the simulation using multi-threading. The approach to be investigated is

4 based of RTL parallelism partitioning using machine learning techniques to build a fast and e↵ective logic simulator. The fundamental questions for this research include:

1. Given a synthesizable Verilog-based design (DUT3), what is the adequate verification flow that allows fast verification and complete functional coverage?

2. How to deploy hardware accelerators within the di↵erent modules of the logic simulator such as the parallel Discrete-event simulator kernel and the OpenCL Client Driver?

3. How to build a cost e↵ective and fast embedded computing cluster to be used to leverage parallel computation?

Our goal is to provide e↵ective solutions to each research question, which at the end will allow us to develop a cost-e↵ective fast logic simulator with its associated parallel .

1.4 Student Role The project is in its initial stage and the focus will be on the research question #1. The initial work to be conducted by the researcher will be on the software architecture of the Timing-based /C++ Model generator. Machine learning techniques will be considered in the development of an ecient RTL-based timing model. The proposed timing-based model will have the following characteristics:

The timing is considered at the module level (RTL) not at the gate-level.

Machine learning techniques will be used to optimize the timing model.

A module path delay is developed by considering Interface Logic Models (ILMs).

An interface logic timing model (ILM) has partial timing of the block that includes the timing at boundary logic only, but hides most of the internal register-to-register logic (Figure 2).

Advantages of using Interface Logic Models include:

– Good performance improvements both in STA (Static Timing Analysis) and GLS (Gate-level Simulation), since only a partial design is used. – High accuracy as interface logic and interface cells are preserved. 3Design Under Test

5 Figure 2: RTL Timing Modeling[14]

1.5 Proposed Timeline with Project Goals The researcher will meet several times a week with Dr. Ngalamou in addition to attending the SURE Wednesday seminars.

1.5.1 General Timeline Weeks 1-4:

Read selected papers on timing analysis of digital circuits along with the first four chapters of Farzad Nekoogar’s Timing Verification of ASICs.

Get familiar with the software tool Verilator.

Read the first four chapter of Aurelien Geron’s Hands on Machine Learning with Scikit- Learn and TensorFlow.

6 Weeks 5-7: Develop a detailed software architecture of the Timing-based C/C++ Model generator.

Weeks 8-9: Start the implementation and testing of the di↵erent modules of the Timing- based C/C++ Model generator.

Week 10: Prepare for SURE presentation.

References

[1] Eda playground. https://www.edaplayground.com/. Accessed: 2020-01-15.

[2] Icarus verilog. http://iverilog.icarus.com/. Accessed: 2020-01-15.

[3] OpenclTM (open computing language). https://www.khronos.org/opencl/. Accessed: 2020-01-15.

[4] Osvvm: Open source verification methodology. http://www.synthworks.com/ blog/osvvm/. Accessed: 2020-01-15.

[5] Ovm: Open verification methodology. https://verificationacademy.com/topics/ verification-methodology/ovm. Accessed: 2020-01-15.

[6] Uvm: Universal verification methodology. https://www.mentor.com/products/fv/ uvm. Accessed: 2020-01-15.

[7] Verilator. https://www.veripool.org/wiki/verilator. Accessed: 2020-01-15.

[8] BRINER JR. J. V. BAILEY, M. L. and CHAMBERLAIN. Parallel logic simulation of vlsi systems. ACM Comput. Surv,26(3):255–294,2013.

[9] Andrew Deorio Debapriya Chatterjee and Valeria Bertacco. Gate-level simulation with gpu computing. ACM Transactions on Design Automation of Electronic Systems, 16(3:30):1–26, 2011.

[10] Harry D. Foster. Trends in functional verification: A 2014 industry study. In Proceedings of DEsign Automation Conference, pages 07–11. IEEE, 2015.

[11] R. M. Fujimoto. Parallel and Distributed Simulation Systems. John Wiley & Sons, Inc., 2000.

[12] Vandome Agnes F. Miller, Frederic P. AMD FireStream: ATI Technologies, Stream pro- cessing, , , GPGPU, High-performance computing, , R520, . Alpha Press, 2009.

7 [13] NVIDIA. NVIDIA CUDA Compute Unified Device Architecture: Reference Manual. NVIDIA Press, 2008.

[14] Gagandeep Singh. Gate-level simulation methodology. , In, pages 1–34, 2015.

[15] Wang Bo Zhu Yuhao and Deng Yandong. Massively parallel logic simulation with gpus. ACM Transactions on Design Automation of Electronic Systems, 16(3):1–20, 2011.

2 Description of Funding for Materials and Supplies

We don’t expect a need for any extra materials.

3 Criteria for Student Applicants

This research requires a student to have a background in Digital Logic, Algorithms, and Programming.

4 SURE Seminars

IaminterestedinPresentationSkills,LiteratureSearchandLibraryResources,andResume Writing.

5 SURE Agreement

The James Girard Summer Undergraduate Research Program is designed to support the ex- ecution of this proposed project by the faculty mentor and a single undergraduate student. After review of faculty proposals, selected projects will be advertised to Lewis University students, and all interested undergraduates will then be required to apply into the program, denoting the project for which they would like to be considered. Student applications will be reviewed for completeness by the program director, and then forwarded to the appropri- ate faculty mentor for final selection of a candidate. Faculty may submit up to 2 projects for funding through the program. Although faculty mentors may also mentor additional students in the summer not funded through the program, the weekly program events and presentations will be exclusive for students in the program.

By submitting this application, you are agreeing to the following responsibilities of a S.U.R.E. Faculty Mentor:

Working closely with your student to ensure a worthwhile educational experience. Regular interactions with your student (a minimum of once a week, but more frequently

8