ABSTRACT

FRANCISCO, LUIS. Machine Learning for , Multilayer CMP Hotspot Detection, and PPA Modeling, with Transfer Learning and Synthetic Training. (Under the direction of Dr. Paul Franzon and Dr. W. Rhett Davis.)

Machine learning is a rapidly advancing research topic that has been applied to multiple

disciplines with high success. The problems we can find in electronic design automation

(EDA) can also benefit from machine learning (ML) based solutions. With the constant

technology nodes scale down, the complexity of EDA problems has drastically increased.

This dissertation uses ML techniques to approach three of the most relevant problems

in integrated circuits design: Design rule checking (DRC), hotspots detection and power,

performance, and area (PPA) estimation. What these problems have in common is that

they are complex and with time-consuming steps. Their outcome is needed early in the

design process for critical decisions to be made ultimately affecting the final design.

The number of design rules is drastically increasing as technology nodes scale down.

This increase makes the design rules decks creation and checking process complex and time-

consuming. This work presents a design rule checking approach using deep learning. The

core of the checker consists of a framework to generate convolutional neural networks and

a parametrized synthetic data generator for training. The models incorporate incremental

transfer learning to reduce the training time when adding new rules. The results show that we can capture most of the checks from a design kit rule manual with less than 1% error

and up to 7.5x faster than traditional design rule checkers.

Chemical mechanical polishing (CMP) is a critical process in (IC)

manufacturing; it ensures the planarity of the layers which comprise the IC. Areas where this

planarity is not met result in significant degradation impacting lithographic pattern fidelity

and performance variability. It is desirable to predict the location of those hotspots regions.

This research uses a deep learning (DL) and multilayer convolutional neural network (CNN)

algorithm to model CMP hotspots for full-chip multilayer layouts. The proposed model achieves a hotspot prediction accuracy of up to 98% with 10 metal layers and up to 10x faster than existing CMP tools.

The power, performance, and area (PPA) of a System-on-Chip (SoC) is known only after a months-long process. This process includes iterations over the architectural design, register transfer level implementation, RTL synthesis, and place and route. Knowing the

PPA estimates for a system early in the design stages can help solve tradeoffs that will affect the final design. This work presents a machine learning approach using gradient boost models and neural networks to fast and accurately predict the PPA. The models use transfer learning to predict the PPA for new design configurations and corner conditions based on previous models. The proposed models achieved PPA predictions up to 99% accurate and using as low as 10 samples can achieve accuracies better than 96%. © Copyright 2021 by Luis Francisco

All Rights Reserved Machine Learning for Design Rule Checking, Multilayer CMP Hotspot Detection, and PPA Modeling, with Transfer Learning and Synthetic Training

by Luis Francisco

A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the Degree of Doctor of Philosophy

Electrical Engineering

Raleigh, North Carolina

2021

APPROVED BY:

Dr. Aydin Aysu Dr. Franc Brglez

Dr. Paul Franzon Dr. W. Rhett Davis Co-chair of Advisory Committee Co-chair of Advisory Committee DEDICATION

To my family and friends who have supported me in all my journeys.

ii BIOGRAPHY

Luis Francisco was born in a small town in the Dominican Republic. He received a Bachelor

of Science degree in Electronics Engineering from Pontificia Universidad Católica Madre y

Maestra in 2007. From 2009 to 2011, he attended the University of Puerto Rico at Mayaguez

and obtained a Master of Science degree in Electrical Engineering. In 2016 he went back to

the University of Puerto Rico at Mayaguez for his Ph.D. and transferred to North Carolina

State University in 2017.

At North Carolina State University, Luis joined Dr. Paul Franzon and Dr. Rhett Davis’s

research group working in Machine Learning applied to Electronic Design Automation

problems in emerging technologies. While working on these research topics, he became

a student member of the Center for Advanced Electronics through Machine Learning

(CAEML).

During his time at North Carolina State University, Luis did several full-time and part-

time internships at GlobalFoundries and Inc.

iii ACKNOWLEDGEMENTS

Transferring to North Carolina State University to join the Ph.D. program in Electrical

Engineering is a decision I will not regret. I will always be in debt to my advisor Dr. Paul

Franzon for giving me the opportunity to join the program and support me. Without that

first opportunity, I will not be writing a dissertation acknowledgments.

I also want to express my gratitude to my co-advisor, Dr. Rhett Davis, for welcoming me

into his research group and supporting me since the first day we met and for all those well-

needed chats, including those not work-related. I am also grateful for the guidance, support,

and patience from my advisors and the opportunity to work in exciting and challenging

research.

I also want to thank Dr. Aydin Aysu and Dr. Franc Brglez for taking the time to be part of

my graduate committee. I also want to thank them for the feedback provided.

I am always grateful to have had the opportunity to meet my mentor while interning at

GlobalFoundries, Robert Pack. He provided me with great professional advice; to the point,

he became a friend to me.

My gratitude also goes to my boys from MRC: Billy, Teddy, Yuejiang, Bowen, Yi, Priyank,

Isaac, and Josh. It is a shame that during this last year, we were not able to go to MRC.

Finally, I want to give my most since gratitude to my friends and family; it is your support

and love that keeps me going.

iv TABLE OF CONTENTS

LIST OF TABLES ...... viii

LIST OF FIGURES ...... x

Chapter 1 Introduction ...... 1 1.1 Motivation ...... 3 1.1.1 Need to Revisit Design Rule Checking ...... 3 1.1.2 Chemical Mechanical Polishing Hotspot Detection ...... 4 1.1.3 Need for Fast and Accurate PPA Models ...... 5 1.2 Research Contributions ...... 6 1.3 Outline ...... 8

Chapter 2 State-of-the-art ...... 9 2.1 Introduction ...... 9 2.2 Design Rules Checking ...... 10 2.3 Multilayer CMP Hotspot Modeling ...... 12 2.4 Power, Performance and Area Modeling ...... 13

Chapter 3 Deep Learning Background ...... 16 3.1 Introduction ...... 16 3.2 Deep Learning Models Structure ...... 17 3.3 Deep Learning Hyperparameters ...... 17 3.4 Convolutional Neural Network ...... 19 3.4.1 Loss function ...... 20 3.4.2 Optimizer ...... 21 3.4.3 Metrics to Measure the Model Performance ...... 21

Chapter 4 Design Rule Checking with Deep Learning ...... 23 4.1 Introduction ...... 23 4.2 Design rule checking ...... 24 4.2.1 How to use Deep Learning for DRC ...... 25 4.3 Single Convolutional Neural Network Model ...... 26 4.3.1 Model Structure Selection ...... 27 4.3.2 Dataset to Train and Test the Single CNN Model ...... 29 4.3.3 Dataset Generation Process ...... 30 4.3.4 Results for Testing the Single CNN Model ...... 32 4.4 DRC Framework with transfer Learning and Synthetic Training ...... 35 4.4.1 Deep Learning Architecture with Transfer Learning ...... 36 4.4.2 Selecting the Layers Size and Adding DRCs ...... 39 4.4.3 Synthetic Dataset Generator ...... 40 4.4.4 Results for the DRC Framework ...... 43 4.5 Testing New Layouts Process ...... 49 4.6 Summary ...... 50

v Chapter 5 Multilayer CMP Hotspot Modeling with Deep Learning ...... 52 5.1 Introduction ...... 52 5.2 Chemical Mechanical Polishing ...... 53 5.3 CMP Hotspot Modeling ...... 54 5.4 Proposed Modeling Architecture ...... 56 5.4.1 Multilayer CNN Model ...... 56 5.4.2 Model Details ...... 57 5.5 Dataset Generation ...... 59 5.5.1 Dataset Augmentation ...... 60 5.6 Model Results ...... 62 5.6.1 Results With a Small Training Dataset ...... 63 5.6.2 Results Increasing the Dataset Size ...... 64 5.6.3 High and Low Topography Hotpots ...... 66 5.7 Summary ...... 71

Chapter 6 Power, Performance and Area Modeling ...... 72 6.1 Introduction ...... 72 6.2 Power, Performance and Area ...... 73 6.3 PPA Modeling with Machine Learning ...... 74 6.3.1 Gradient Boost Regressor ...... 75 6.3.2 Neural Network with Transfer Learning ...... 75 6.3.3 Others ML Algorithms Evaluated in this PPA Prediction ...... 77 6.4 Parameters Used for the PPA Models ...... 78 6.5 Model Creation and Data Collection Framework ...... 80 6.5.1 Sampling the Input Parameters ...... 81 6.5.2 Re-sampling Existing Dataset ...... 83 6.6 Results ...... 86 6.6.1 The Datasets ...... 86 6.6.2 Comparing ML Models ...... 90 6.6.3 Comparing PPA Models With the Number of Samples ...... 95 6.6.4 Predicting Core Configurations PPA with Transfer Learning ...... 100 6.6.5 Predicting Corner Conditions PPA with Transfer Learning ...... 105 6.7 Summary ...... 108

Chapter 7 Conclusion and Future Work ...... 110 7.1 Conclusions ...... 110 7.2 Future Work ...... 111

BIBLIOGRAPHY ...... 114

APPENDICES ...... 120 Appendix A Aditional DRC Results ...... 121 A.1 DRC Results a Set of SRAM Layouts ...... 121 A.2 Synthetic Layout Clips Samples ...... 123 Appendix B Aditional PPA Results ...... 126

vi B.1 PPA Variations for a CNN Accelerator ...... 126 B.2 Additional PPA Varitions for SRAM Dataset ...... 128 B.3 Additional PPA Models Evaluation Performance ...... 129 B.4 Additional Results for Accuracy vs Training Samples ...... 132 B.5 Additional Results for Transfer Learning Models Evaluation for Core Configurations ...... 138 B.6 Additional Results for Transfer Learning Models Evaluation for Corner Conditions ...... 143

vii LIST OF TABLES

Table 3.1 Confussion matrix definition...... 21

Table 4.1 Model parameters example for 1 and 3 DRC violations...... 29 Table 4.2 Labels in the dataset for 3 rules...... 31 Table 4.3 Results to classify one single DRC violation...... 33 Table 4.4 Results summary of testing 20 different types of violations in layouts data unseen by the model. The data consist of an SRAM layout and a Rocket Core design...... 48

Table 5.1 Metrics for the model with data augmentation...... 64 Table 5.2 Confussion matrix and performance stats for model trained with 45k samples...... 66 Table 5.3 Confusion matrix and stats for model trained for 3 classes...... 68 Table 5.4 Model accuracy vs. number of layers included in the training data. . 69

Table 6.1 Designs in the dataset...... 87 Table 6.2 Models evaluation for Rocket Core Default configuration...... 92 Table 6.3 Models evaluation for Rocket Core Small configuration...... 93 Table 6.4 Models evaluation for OpenRisc 1200 core...... 93 Table 6.5 Models evaluation for SRAM memories...... 94 Table 6.6 PPA Accuracy for different train samples for Rocket Core...... 95 Table 6.7 PPA Accuracy for different train samples for OR1200...... 97 Table 6.8 PPA Accuracy for different train samples for Memories...... 100 Table 6.9 Optimal layers to share in the transfer learning neural network. . . . 103 Table 6.10 Transfer Learning models to predict core configurations...... 104 Table 6.11 Transfer Learning models to predict Corners for OR1200 results. . . . 107 Table 6.12 Transfer Learning models to predict Corners for Rocket results. . . . . 108

Table A.1 Results summary of testing 20 different types of violations in layouts data unseen by the model. The data consist of a new set of SRAM layouts...... 122

Table B.1 Models evaluation for Rocket Core Dual configuration...... 130 Table B.2 Models evaluation for Rocket Core Medium configuration...... 130 Table B.3 Models evaluation for Rocket Core Tiny configuration...... 131 Table B.4 Models evaluation for CNN accelerator IP block...... 131 Table B.5 PPA Accuracy for different train samples for Rocket Core Medium configuration...... 133 Table B.6 PPA Accuracy for different train samples for Rocket Core Small con- figuration...... 134 Table B.7 PPA Accuracy for different train samples for Rocket Core Tiny config- uration...... 135

viii Table B.8 PPA Accuracy for different train samples for Rocket Core Dual con- figuration...... 136 Table B.9 PPA Accuracy for different train samples for CNN IP block...... 137 Table B.10 Transfer Learning results for Tiny Core base model with 50 and 150 samples...... 139 Table B.11 Transfer Learning results for Small Core base model with 50 and 150 samples...... 140 Table B.12 Transfer Learning results for Dual Core base model with 50 and 150 samples...... 141 Table B.13 Transfer Learning results for Medium Core base model with 50 and 150 samples...... 142 Table B.14 Transfer Learning results for Rocket Corner FF_0p88V _125C _Rcmax with base model with 50 and 150 samples...... 144 Table B.15 Transfer Learning results for Rocket CornerSS_0p72V _125C _Cmax with base model with 50 and 150 samples...... 145 Table B.16 Transfer Learning results for Rocket CornerSS_0p72V _125C _Rcmax with base model with 50 and 150 samples...... 146 Table B.17 Transfer Learning results for Rocket CornerSS_0p72V _m40C _Cmax with base model with 50 and 150 samples...... 147 Table B.18 Transfer Learning results for OR1200 Corner FF_0p88V _125C _Rcmax with base model with 50 and 150 samples...... 148 Table B.19 Transfer Learning results for OR1200 CornerSS_0p72V _125C _Cmax with base model with 50 and 150 samples...... 149 Table B.20 Transfer Learning results for OR1200 CornerSS_0p72V _125C _Rcmax with base model with 50 and 150 samples...... 150 Table B.21 Transfer Learning results for OR1200 CornerSS_0p72V _m40C _Cmax with base model with 50 and 150 samples...... 151

ix LIST OF FIGURES

Figure 3.1 Deep learning network architecture...... 18 Figure 3.2 Deep learning training flow...... 19

Figure 4.1 Basic design rule definition...... 24 Figure 4.2 Using a deep learning model to detect DRC violations...... 25 Figure 4.3 Deep learning model training flow...... 26 Figure 4.4 Propose model structure...... 28 Figure 4.5 Dataset generation...... 31 Figure 4.6 Training and validation loss for 1 DRC model...... 32 Figure 4.7 Confusion matrix for 3 DRC model...... 33 Figure 4.8 Training and validation loss for 3 DRC model...... 34 Figure 4.9 Layout clip samples for one layer (a) and two layers (b)...... 36 Figure 4.10 Deep network architecture to add new DRC violations incrementally by sharing weights...... 38 Figure 4.11 Parameterized synthetic dataset generation...... 40 Figure 4.12 Training and validation accuracy for rules with a completely deep network trained vs partially trained with transfer learning...... 45 Figure 4.13 Training vs. validation accuracy for 13 of the rules implemented. . . 46 Figure 4.14 Training vs. validation loss for 13 of the rules implemented...... 47 Figure 4.15 Inference flow to test new layouts...... 50

Figure 5.1 Dishing and erosion example for an given layer...... 53 Figure 5.2 Hotspot definition according to the topography profile...... 55 Figure 5.3 Basic deep learning structure...... 56 Figure 5.4 Proposed multilayer deep learning model for CMP hotspots detection. 58 Figure 5.5 Model example for 1 layer, 262,000 parameters for 1 layer patterns. 59 Figure 5.6 Dataset generation and distribution...... 60 Figure 5.7 PCA Data augmented sample...... 61 Figure 5.8 Training and validation accuracy vs. epochs for training data with augmentation...... 63 Figure 5.9 Training and validation accuracy vs. epochs for model trained with 45,000 samples...... 64 Figure 5.10 Training and validation accuracy vs. epochs for model trained with 54,000 samples...... 65 Figure 5.11 Training and validation accuracy vs. training epochs for 3 classes model...... 67 Figure 5.12 Model accuracy vs. number of layers included in the training data. 67 Figure 5.13 Model accuracy for the number of lower layer used to detect a HS. 69 Figure 5.14 Full Layout test using multilayer CMP hotspots model...... 70

Figure 6.1 Area vs. Performance variations for a small IP block design...... 74 Figure 6.2 Neural network with shared weights...... 76 Figure 6.3 Model and data generation framework...... 81

x Figure 6.4 Latin hypercube samples for N = 4 samples for a 2 d parameter. 82 Figure 6.5 Lating hypercube 500 samples for the 4 synthesis parameters. . . . . 82 Figure 6.6 Re-sampled parameters using the algorithm described to reduce the dataset to 50 samples...... 84 Figure 6.7 Area vs. Performance variations for the dataset with 500 samples for 5 corner conditions...... 85 Figure 6.8 Area vs. Performance variations for the dataset with 50 samples for 5 corner conditions...... 85 Figure 6.9 Area vs. performance for the Rocket core variations...... 87 Figure 6.10 Area vs. Leakage for the Rocket core variations...... 88 Figure 6.11 Area vs. performance for OR1200 for multiple corners conditions. . 88 Figure 6.12 Area vs. dynamic power for OR1200 for multiple corners conditions. 89 Figure 6.13 Access vs. Word depth time for memories...... 89 Figure 6.14 Dynamic read energy vs. Word depth for memories...... 90 Figure 6.15 Models accuracy vs. number of train samples for Rocket Default Configuration...... 96 Figure 6.16 Models accuracy vs. number of train samples for Rocket Small Con- figuration...... 96 Figure 6.17 Models accuracy vs. number of train samples for OpenRisc 1200 core. 97 Figure 6.18 Individual PPA components accuracy vs. number of train samples for XGB ...... 98 Figure 6.19 Individual PPA components accuracy vs. number of train samples for NN...... 99 Figure 6.20 Models accuracy vs. number of train samples SRAM...... 99 Figure 6.21 Individual PPA SRAM components accuracy vs. number of train samples for NN...... 101 Figure 6.22 Neural network loss vs training iterations...... 101 Figure 6.23 Weights Shared vs Accuracy for core configuration modeling. . . . . 102 Figure 6.24 Accuracy for PPA models with transfer learning to predict core con- figurations...... 103 Figure 6.25 Weights Shared vs Accuracy for corners condition modeling...... 105 Figure 6.26 Accuracy for PPA models with transfer learning to predict corners conditions for OR1200...... 106 Figure 6.27 Accuracy for PPA models with transfer learning to predict corners conditions for Rocket...... 107

Figure A.1 Clean layout clip sample 1...... 123 Figure A.2 Clean layout clip sample 2...... 124 Figure A.3 Violation layout sample 1...... 124 Figure A.4 Violation layout sample 2...... 125 Figure A.5 Two layers layout sample...... 125

Figure B.1 Area vs. leakage power for a CNN accelerator IP block for 5 corner conditions...... 127

xi Figure B.2 Area vs. dynamic power for a CNN accelerator IP block for 5 corner conditions...... 127 Figure B.3 Area vs. performance for a CNN accelerator IP block for 5 corner conditions...... 128 Figure B.4 Area vs. word depth for memories...... 128 Figure B.5 Leakage power vs. word depth for memories...... 129

xii CHAPTER

1

INTRODUCTION

Machine learning (ML) is a mature enough topic that can solve many problems in multi-

ples disciplines, and electronic design automation (EDA) problems are not an exception.

For years in electronic design automation, we have been using machine learning as a workaround to traditional algorithms but not updating the EDA tools to include Machine

learning algorithms. With the advance in ML, now is the time to rethink that approach and

start using Machine learning-based algorithms in the core of EDA tools in all the design

stages.

Problems that can benefit from ML can be found in the entire integrated circuit (IC) de-

sign process, going from the physical verification process to the synthesis and architectural

stages. Problems like hotspot detection, design rule checking (DRC), power, performance,

and area (PPA) estimation can benefit from data-driven solutions. Those benefits include

1 fast evaluation times, models not technology-dependent, and solutions applied to multiples

designs with minor changes.

As the technology nodes scale down, the complexity of the previously mentioned prob-

lems drastically increase. The IC design cycle is known to be time-consuming. There is also

a need to fully or partially address those problems early in the design loop. In scenarios like

this, ML can play an important role. ML can help accelerate the design process, and when

it cannot substitute traditional algorithms, it gives some initial guidance.

Given how fast technology is changing and the constant demand for new IC designs

increases, we need to learn from previous designs. Solutions that learn from previous

designs can help accelerate new designs and technology development. In this situation,

machine learning using transfer learning can help create models to explore the design

space for new technologies.

This dissertation can be considered multitopic; we address three problems using inno- vative machine learning solutions. For the first two problems, we are using deep learning

to detect and classify design rule violations and chemical mechanical polishing (CMP)

hotspots. Those two problems are in the physical verification domain. The models we use

allow us to reduce any technology dependency and ease the process to transfer the models

to new process nodes. We create a design rule checker that includes transfer learning and

synthetic data to detect design rule violations to allow full scalability to include a complete

ruleset.

The third problem we are addressing is power, performance, and area (PPA) modeling.

This problem is more on the side of the design domain. We create fast and accurate PPA

models using gradient boost and neural network algorithms. These models are data-driven

and accurate enough to optimize the system with reasonable accuracy. In PPA modeling,

the most time expensive aspect is the amount of data. We optimize our solution to require

a minimum amount of data; we also incorporate transfer learning to help create a model

for new designs and corner operating conditions.

2 The nexus in the three problems we are addressing is the use of machine learning

techniques to solve them. The inclusion of machine learning will allow having the outcome

of each problem in the early stages of the design process. ML will also help transfer the

models to new technology processes and learn from previous designs when creating new

designs. We use deep learning to tackle DRC detection and for the CMP hotspots modeling.

To add rules to the DRC, we take advantage of transfer learning which is also used to do

PPA of new designs based on previous ones.

1.1 Motivation

From the architectural design stages to the final physical verification stages, the chip design

process is time-consuming. Iterations of each step can take from hours to weeks. We need

to find fast and accurate solutions to accelerate those steps. Each design stage is constantly

increasing its complexity exponentially in advanced technology nodes. These issues and

needs create a connection in the problems we are addressing in this dissertation.

1.1.1 Need to Revisit Design Rule Checking

Design rule checking (DRC) is the process of verifying that a design layout meets a set of

rules that make it suitable for fabrication. The rules define geometric constraints to satisfy

the physical limitations of the lithography and the fabrication process. Satisfying these

constraints guarantees that the design will achieve a high fabrication yield. If a circuit layout

fails to pass design rule checks, it will not be manufacturable.

The design rule checking process is becoming highly complex in advanced technology

nodes, significantly below 28nm. Every foundry and each technology node requires unique

rule checks. This difficulty is related to the complexity of the lithography process and the

minuscule depth-of-focus tolerance. According to EDA vendors, the number of rules has

grown from a few hundred in 60nm nodes to thousands of 7nm nodes. To be more specific,

3 in a 7nm process they can be over 10,000 rules rule checks [10]. Since there has been an increase in the intricacies of the rules, the speed of the rule

checker has become a critical point in the layout generation of custom circuits. Before

advanced nodes, layout engineers could recognize patterns and fix DRC violations with

relative ease. Nowadays, the design engineer waits a long time for the final sign-off DRC

checker to run. The engineer then fixes the problems flagged, often only to introduce new

ones. This iterative process is becoming unacceptable due to the slow checkers and rules

complexity.

A machine learning DRC can help speed up the checking process because it does not

have to iterate over thousands of Boolean operations. We are not suggesting replacing the

final sign-off DRC with an ML-based one but using the faster ML solution while iterating

on the design. Since we will explain in chapter 2 there have been some approaches to use

ML in DRC but only to estimate a rough number of violations which will not help in custom

layouts or to fix the violations.

1.1.2 Chemical Mechanical Polishing Hotspot Detection

The minuscule depth-of-focus (DoF) tolerance of advanced lithography necessitates the

best possible wafer surface planarity. Chemical mechanical polishing (CMP) steps ensure we obtain a planar surface on the wafer. The number of CMP steps requires in a wafter

have increased by four from 28nm to 7nm process [21]. The areas where the surface is not planar are hotspots areas; the most of those areas, we miss the most reduction in the

manufacturing yield.

The hotspots due to the chemical and mechanical polishing steps are layout pattern-

dependant. The most complex a layout becomes, the more complex it is to find those areas.

Finding and fixing CMP hotspots is critical; they not only affect the manufacturing yield

but they degrade the physical features of the chip. We try to reduce the CMP hotspots by

adding dummy fills; the problem with doing dummy fills is that they affect the parasitics.

4 Every manufacturing process requires unique CMP decks. The CMP decks create is a

time-consuming process that can take months. As we will discuss in chapter 2 the CMP

models are fit with tests chips wafter measurements. This bottleneck creates the need

to explore solutions that can be created with existing data, including existing simulation

results or real measurement data.

A CMP hotspot modeling approach with machine learning will help shorten the decks

creating and reduce the inference time with the models. An ML-based approach like the

one we are proposing will reduce the necessity to rely on expensive proprietary tools to do

the CMP hotspots detection.

1.1.3 Need for Fast and Accurate PPA Models

The power, Performance, and Area (PPA) of a System-on-Chip (SoC) is known only after a

months-long process. This process includes iterations over the architectural design, register-

transfer level (RTL) implementation, RTL synthesis, and place and route. As this process

can take months or even years for an entire chip, missing a PPA can create significant

delays in the design process.

Having an estimate of the PPA in the early design stages can help make critical decisions

that will affect the entire design process. In the early design stage, we can estimate the PPA

the most potential we have to optimize [51]. There is also a need to accurately predict power fluctuations well enough to aid thermal power management design.

A PPA estimation model with the flexibility to be used in a system-level framework is

highly desired. The model should predict PPA for intellectual properties (IP) blocks, cores

and memories accurately. Another desired feature is that it can depend on as many higher-

level parameters as possible and can be integrated into an optimization framework. If each

component is packed with a fast PPA model, the design time will significantly reduce.

When doing design exploration or PPA optimization, there is always one question that

arises. This question related to the possibility to use previous designs to predict future

5 design configurations or corner conditions.

As we will detail in chapter 2 one of the main limitations to create fast and accurate

PPA models is the reduced amount of data we can obtain. Generating good and accurate

PPA data requires the use of complex EDA tools that takes hours or even days to produce a

single data point.

Exploring machine learning approaches for PPA modeling focusing on reducing the

data required to train the models can significantly help create fast and accurate PPA models.

Using strategies and transfer learning to reuse the models trained for previous designs

should help with this data reduction. We are proposing ML models, but models that are

aware of the data required to train.

1.2 Research Contributions

This research brings forth contributions for each of the problems we are addressing. The

main contributions we can find in this dissertation include:

1. The creation of a design rule checker using a CNN feature extraction to detect DRC

violations.

Design rule checking with machine learning had been focused on predicting a total

number of violations. This work introduced a new approach to detect multiple viola-

tions and the same time and labeled them. It can detect violations up to 32x faster

than traditional checkers and with a 92% accuracy.

2. Proposed the use of transfer learning to expand the design rule checker to any

number of rules.

One limitation of the previous design rule checker using CNN is to add more rules

requires too much data. The inclusion of transfer learning reduces the amount of data

to train the model for new rules and the training time. Since each sample requires to

be tested for all the violations, the false positive rate decreases.

6 3. Created a fully parameterized synthetic dataset generator to produce DRC train-

ing data.

The amount of data available to train complex models for DRC is limited; it can make

the process of including new rules a limitation. The synthetic data generator creates a

sampling space of clean and violation layouts and can generate data samples covering

the sampling space in minutes.

4. Implemented a fully scalable design rule checker with transfer learning and train-

ing with synthetic data.

To scale the design rule checker to any number of rules, we combined the synthetic

dataset generator with a deep transfer learning framework to make the process of

adding new rules fast and straightforward. This framework can be transferred to new

technology quickly. We could get a violation detection rate better than 99%, and faster

than 7x compared to traditional checkers.

5. Created a multilayer CMP hotspot model through deep learning.

Typically we find applications of deep learning in EDA with up to 3 channels of input

data. The multilayer model can take any metal layers in a layout as input and predict

hotspots with an accuracy of better than 98% and up to 10x faster than existing CMP

tools.

6. Fast and accurate power, performance, and area models with machine learning.

The time to obtain a PPA simulation is a time-consuming process that can take hours

to generate a prediction. The proposed models can evaluate fractions of seconds and

produce a PPA prediction better than 99% accurate. These models can be packed with

an IP block or core generator to replace the synthesis tool when doing PPA design

exploration.

7. Fast and accurate power, performance, and area models using a reduced number

of samples.

7 The number of data to create a PPA model can be a limitation. The proposed gradient

boots models can create PPA predictions for IP blocks and Cores with over 95%

accuracy with only 10 training points.

8. Power, performance, and area models to predict core configurations using trans-

fer learning.

One recurrent problem is how to learn from previous designs to predict new designs.

Using a transfer learning neural network, the models can predict the PPA for new

cores configurations based on a previous one with close to 96% accuracy. These

models can be used to explore and optimize core configurations.

9. Power, performance, and area models to predict corners conditions using transfer

learning.

As mentioned before, the number of data is the most significant limitation to create a

PPA model. Using transfer learning, the models can predict all the corners requiring

only 15 new samples to achieve accuracies higher than 94%.

1.3 Outline

The rest of this document is organized as follows: chapter 2 reviews the most recent literature

in the topics we are addressing, chapter 3 provides some background in deep learning that will be needed in the following chapters. In chapter 4 we detail the proposed approach to

design rule checking. Chapter 5 shows the proposed deep learning to multilayer chemical

mechanical polishing hotpots.

The following chapter 6 details the proposed solution to power, performance, and area

modeling. The final chapter 7 contains a summary and outlines how this work can be

extended. Additional supporting results are included in the appendices.

8 CHAPTER

2

STATE-OF-THE-ART

2.1 Introduction

In the state-of-the-art, we can find relevant work related to the problems we are addressing.

Not all the literature found provides a machine learning (ML) approach but provides some fundamentals to establish the need for machine learning. ML has proved to solve multiple problems showing significant improvement to traditional methods. In chip design and electronic design automation (EDA), ML is being explored from design exploration to physical verification to accelerate the design cycle [30]. This section organizes the most recent work found in the literature in three subsections associated with each problem we are approaching. We provide an overview of the most relevant research for each subsection and how it relates and differs from the works we are

9 proposing.

2.2 Design Rules Checking

Some work in improving the design rule checking (DRC) rules definition and the checking

process can be found in the research state-of-the-art. A common theme in recent research

on DRC is to move from model-based approaches to data-driven solutions. An existing

methodology to redefine DRC rules is DRC+ [9]. DRC+ defines the rules based on identifying hotspots using pattern printability simulation when a problematic pattern is identified. A

DRC+ rule is crafted to mitigate its effect in future fabrication steps. DRC+ predates the machine learning revolution. It relies more on pattern matching; approaches like this are

beneficial for complex rules [15].

Another data-driven approach is found in OpenDFM [5]. In contrast with conventional

DRC, OpenDFM is not a binary checker that provides only a pass/fail output; it offers more information about a set of parameterized patterns. This work is the starting point to start

exploring new DRC data-driven approaches.

Lithography hotspot detection shares some similarities with design rule checking, both

have to check for polygons geometry. Some works apply ML to lithography [54, 65, 67].

In [67], a support vector machine is trained with geometrical and other critical features.

In contrast, [54, 65] proposed deep learning without any previous feature extraction; the

model itself performs the feature extraction. [66] presents a binary hotpots detection using a convolutional neural network. This work uses an adaptive squish pattern algorithm to

generate training data. This work can detect hotpots in up to two layers. Our proposed work uses as base model similar deep learning approach; however, it goes beyond hotspot

detection, focusing on multiple DRC violation detection at the same time. We also focus on

multiple layers.

Recent works applying machine learning to the design rule problem, most of them

10 focusing on predicting the number of violations, include [23, 25, 26, 37, 68, 69]. These works

focus on generated layouts from VLSI physical design flow [64]. We can include them on the side of traditional machine learning.

The work in [25, 26] predicts DRC violations before the routing stage by using random

forest, gradient boost, and a voting strategy to combine both [25]. The goal of this model is to predict routability and the potential presence of DRC violations. In those works, they

achieve a true positive rate or recall of up to 88%. Another work using a similar approach with random forest is found in [69]. This work intends to include an explanatory model to analyze the features that affect the total number of routing violations.

The research in [37] uses a set of input knobs to the physical design flow to construct a surrogate model and predict the number of violations after the detailed route. This work

tries to predict the number of violations using information from the global route.

The work in [68] employs an ensemble of neural networks (NN) to predict rule violations after the detailed route stage. The NNs use a set of parameters from the global route as

training data. Finally, after a voting algorithm, a final prediction is done and compared to a

random forest model. In [57], a machine learning method is used to detect detailed routing short violations. A neural network is trained to predict shorts violations using features

extracted from a placed and routed netlist.

On the side of deep learning, we can find some work to predict DRC violations [23, 41,

56, 64]. These works focus on a VLSI physical design flow with generated layouts. [64] uses a convolutional neural network (CNN) to predict routability. In this process, they predict

the number of DRC violations and hotspot areas. The model takes input features from

placement and route stages represented as images. Those features are extracted from each

stage of the physical design flow (floorplanning, placement, and routing). [41] uses a CNN and trains it with images of the pin locations in the layout and other features extracted

from the global route. Both [41, 64] can locate hotspot areas around placed cells.

The research in [23] uses a convolutional neural network trained with global routing

11 features to create a congestion map with the total number of violations. The training data

consists of up to 15 commercial designs. [56] expand the work [57], to use a larger deep network the short violations prediction is 90%.

These solutions focus on predicting the number of DRC violations in global and detailed

routing; this has relevance only in a physical design flow for digital ICs for generated layouts.

To have a DRC violations approach that works for custom layouts we need to be able to

classify multiples violations at the same time. We can find transfer learning applied to some

EDA problems [34, 64]. [34] implements neural network models that partially share weights

for power and performance estimation in memories. [34] provides some guidance on how

sharing weights affects performance. The work in [64] starts with a pre-trained RouteNet structure for image pattern recognition and tunes it with the specific training data.

In contrast with the works previously described, our proposed design rule checker

can handle multiple types of DRC. This approach not only finds the number of violations

and locates possible hotspot areas, but it can classify them and point to the violations

in a small action window. This difference allows the framework to be ideal for both tool

generated and full custom layouts. The feature extraction is performed by the model using

the convolutional layers. This method is expandable to any number of rules, including

the most complex ones and extrapolated to different technology nodes. It is suitable to

use on chip layouts to quickly and accurately locate and classify violations while iterating

on designs. An approach like this can help close the gap between DRC and design for

manufacturing (DFM) by providing the ability to train the model with Fab data and with

feedback from designers.

2.3 Multilayer CMP Hotspot Modeling

In the current state-of-the-art for CMP simulation, most of the modeling techniques include

semi-empirical models [22, 27, 28]. These models consist of kernels that are calibrated/fit

12 with measured data from test chips with artificial and known geometries assumed to be

representative of all patterns that will be seen on a real design. Manual geometric extraction

from the layout improves the calibration process. The calibration process may be painful,

time-consuming, and laborious resulting in a productivity bottleneck; it can require weeks

of measurements and simulations, and due to the need to process test wafers, model

calibration can be a costly iterative process. This procedure is typically performed using

proprietary commercial tools [17]. These limitations have prompted our exploration of new CMP modeling techniques, in this case, data-driven methods.

Regarding to hotspot modeling, most of the work in the literature is for lithography

hotspot detection[54, 65, 67]. The research related to ML for hotspot detection depends on algorithms that require previous feature extraction. The feature extraction is done with physical parameters of the designs and technology nodes [67]. There is ML work for lithography HS detection that uses Deep Learning (DL); in these cases, the algorithm itself

does the feature extraction [54, 65]. In terms of Machine Learning for CMP modeling, only one prior work can be found in

recent literature [16]. This work focuses on performing geometric extractions from layouts to train a feed-forward neural network with several hidden layers to obtain a surface profile.

Our work proposes a multilayer deep learning algorithm that uses layout patterns to predict

CMP hotspots. The proposed model takes as input a tensor containing all the layout layers

pattern to identify high and low topography level hotspots.

2.4 Power, Performance and Area Modeling

Power, performance, and area have been explored using different techniques over time. We

can find approaches to estimate PPA in all the design stages, including high-level synthesis,

RTL, place and route.

One of the most PPA estimation tools used in research is McPAT [38]. McPAT allows for

13 estimating power by using analytical models, without the need to use RTL code. McPAT

generates those models from a given structural configuration of a processor. One of the

limitations of this tool is that each core requires unique models, and there are multiples

sources of error that can affect the outcomes [60]. In contrast, we propose a less dependant model on the core, which can also be used for memories and intellectual property (IP)

blocks. [36] presents a regression approach to calibrate McPAT power models. The goal of this work is to reduce the errors in the power estimation of McPAT models.

In terms of machine learning for PPA estimation, we can find several works. The research

in [71] uses a convolutional neural network (CNN) approach for power estimation . The

CNN is trained with power simulations of an RTL/SystemC model for a RISC-V core. This approach needs different test benches to generate training data; this can be seen as a

limitation since extra test benches need to be created. Another limitation this work presents

is due to the model not being trained with architectural parameters or parameters that can

include technology information. This approach does not provide a prediction for area and

delay.

Other research for PPA estimation that uses ML is in [19, 62]. [62] uses clustering and

neural networks to classify different power/performance profiles. A similar approach is in

[19] but for heterogeneous systems, including multiple core frequencies and memories. The main limitation of these two works is that they require real measurements on the

hardware. This limitation reduces the design space exploration ability and does not provide

the estimates early in the design process. As the ML algorithms are trained with existing

hardware measurements, no area estimations are provided.

The work in [70] focus on predicting the dynamic average power. This work uses a graph neural network to generate a vector-based power estimation in a 1000 cycle window. A power,

performance, and area prediction (PPA) predictor for memories using neural networks is

presented in [35]. The proposed model needs over 6,000 data samples to achieve a 3% error. They also offer an optimization strategy to get the optimal PPA. This strategy consists of

14 analyzing all possible memory combinations in the prediction model.

The work from [35] is extended to use transfer learning and move parameters from a memory compiler model to another [34]. The main limitation of this work is the number of samples needed to create a single memory size model. This work provides some guidance on how to analyze weight sharing. Contrary to this work, we focus on creating PPA models for memories by reducing the number of samples.

Work for design space exploration using high-level synthesis parameters for a specific design can be found in [59]. This work is not focused on PPA but provides insights to change parameters in a C++ template to provide an optimal design. Another work for design exploration in high-level synthesis is presented in [33]. This work uses transfer learning to explore the quality of a design generated by a high level synthesis tool, based on a neural network pre-trained for previous designs.

The work in [42] introduces a machine learning framework to dynamic power in field programmable gate arrays (FPGA). This framework uses high-level synthesis and feature extracted test benches to generate switching activity data. They can create models with an average maximum absolute of 9%. The main difference between these works [33, 42, 59] and what we are proposing is that we are focusing on the PPA modeling of the final design, not on the quality of a design generated by a high-level synthesis tool.

In contrast with the previous works described in this section, we focus on creating PPA models with a reduced number of samples. We also provide different modeling alternatives that can use high-level parameters and synthesis parameters. We also focus our efforts to obtain PPA models at least 95% accurate. We want those models to have the ability to predict new designs and corner conditions based on models created for previous designs and corner conditions.

15 CHAPTER

3

DEEP LEARNING BACKGROUND

3.1 Introduction

This chapter provides some background on deep learning models and their components.

We also give some insights on how we can evaluate the performance when using the deep

learning models for inference. This background will be necessary to understand the model we are using to do the design rule checker and the chemical mechanical hotspot modeling.

This chapter also provides fundamentals on neural network (NN) models, which we use

combined with transfer learning for power, performance, and area modeling.

16 3.2 Deep Learning Models Structure

A deep learning (DL) model learns from data emphasizing successive learning layers; this

is, extracting successive meaningful representations from of the data [8]. The deep learning model can learn from the data without adding parameters specific to the problem we are

solving. This feature creates a wide range of applications for such types of models. The term

deep refers to the peculiarity of this type of machine learning (ML) architecture to include a

high number of hidden layers to learn features from the data. The model structure consists

of an input layer, a set of feature extraction or hidden layers, an intermediate layer and an

output layer. Figure 3.1 illustrate an artificial network architecture.

In the structure from Figure 3.1 the input layers map the input to the first intermediate

layer; this means its size depends on the input size. The hidden layers perform a feature

extraction to learn from the data. The intermediate layers map the learned features to an

output layer that will provide the model output. This output can be a label or prediction

probability if we are doing a classification problem; or a discrete or continuous value if

doing a regression.

3.3 Deep Learning Hyperparameters

The model learns from the data by training weights for each layer. Those weights are known

as hyperparameters or trainable parameters. The more parameters the model has, the more

time and data it requires to be trained. Finding the optimal number of parameters is an

empirical and intuitive process. The fewer parameters can cause underfitting, and the more

parameters we include in the model can lead to overfitting. The best approach to find those

optimal hyperparameters is to start with a model that has overfitting, then reduce those

parameters until we do not have the model overfitting [6, 8] on the training and validation data.

17 Figure 3.1 Deep learning network architecture.

The training process of a deep learning model can be seen as an optimization problem,

Figure 3.2 illustrates this training process. To learn/train the weights the model needs fea- ture extraction layers, a loss function, and an optimizer. The optimizer adjusts the weights

by minimizing the loss score. The loss score is created from comparing the predictions with

the targets. In the big picture, this creates an optimization problem. To construct the model we need to understand and define the number of hidden layers, the optimizer, and the loss

function.

The accuracy and the performance of the model depend on how fast and how well the weights and the loss is optimized. After training the model, it is used to predict new samples.

It is essential to differentiate the training time and computational training cost from the

inference/prediction time.

18 Input x

Input Layer

Hidden Weights Layers

Targets Predictions data

y y’ Loss Optimizer Function

Loss Score

Input x Learned model w/ Prediction y optimal weights

Figure 3.2 Deep learning training flow.

3.4 Convolutional Neural Network

The set of feature extraction layers are also called a convolutional neural network (CNN);

this is because, for those layers, we use convolutional layers. The convolutional layers

perform the feature extraction. Those features are extracted by doing a spatial convolution

operation between an input x and a kernel h. This operation is described as follows,

N M

x [n] h [m] = x [k1,k2] h [n k1,m k2] , (3.1) ⇤ k1 k2 · XX where N and M are the dimensions of the input and the kernel, respectively. The initial values for the kernel are randomized and tuned in on every training iteration. The kernel

19 values represent the weights in Figure 3.2. The size and number of kernels for each layer will determine the number of trainable parameters for the convolutional layer.

In a CNN, each convolution operation is followed by an activation function [45]. For the work presented in this dissertation, we use a rectifier linear unit (ReLU) function. The

ReLU function takes only the positive part of its input. This function is easily implemented

as follows,

r (x ) = max (0, x ) , (3.2) where x is the input to each neuron. There are others activation functions sigmoid, softmax,

and softplus [18]. Softmax is typically used for the output layer.

3.4.1 Loss function

As we will discuss in future chapters the problems we are solving, are detection/classifi-

cation problems, which leads us to focus on categorical cross-entropy [18, 61, 63] for loss function. This function is defined as follow,

S

E yi , yˆi = yi log yˆi . (3.3) i =1 · X where yi is the labeled input and yˆi is the model prediction.

The categorical cross-entropy is a good measurement that helps identify how different

the prediction is from the target. This feature makes it ideal for classification problems where the input is labeled as a set of classes.

When we use a neural network for regression in this work, we use the mean absolute

error as the loss function (MAE). This error is given by,

N 1 yi yˆi MAE = abs . (3.4) N yi i =1 X Å ã

20 3.4.2 Optimizer

For the optimizer we focus on a modified stochastic gradient descent RMSprop. RMSprop

is a mini-batch gradient descent algorithm with adaptive learning rate introduced in [20], which can speed up traditional gradient descent. With RMSprop, the learning rate is divided

for a weight proportional to the root mean square of recent previous gradients [20]. Each weight w (t ) is obtained with

⌘ w (t ) = w (t 1) ↵ @ E , (3.5) Sq (w, t ) ∆ with Sq (w, t ) given by

@ E 2 Sq (w, t ) = ↵Sq (w, t 1) + (1 ↵) , (3.6) @ w (t ) Å ã where E is the loss function, ⌘ is the learning rate and ↵ is a parameter defined between

0.1 0.9. ⇠

3.4.3 Metrics to Measure the Model Performance

After training the deep learning models we use them to perform a detection. We need to

define some metrics to evaluate how the models performs. The metrics we used are ex-

tracted from well-known F 1 score scoring. This scores is obtained from the classification confusion matrix in Table 3.1.

Table 3.1 Confussion matrix definition.

Predictions Labels FALSE POSITIVE FALSE TN FP POSITIVE FN TP

21 To define the F 1 score we need to obtain first the precision or positive prediction rate, and the recall or sensitivity. Both precision and recall are defined in terms of true

positive (TP ), false positive (FP ) and false negative (FN ). The recall is also known as the true positive rate (TPR) and the precision as the positive predictive value (PPV). The recall or

TPR is given by,

TP recall = , (3.7) TP + FN

and the precision,

TP precision = . (3.8) TP + FP

The F 1 score is the harmonic mean of precision and recall, it can be calculated by,

precision recall F 1 score = 2 ⇥ . (3.9) ⇥ precision+ recall

Another metric to consider that can also be obtained from the confusion matrix is the

false positive rate,

FP FPR= . (3.10) FP + TN

When detecting a specific target, we try to get a high TPR and a low FPR. But for instances where the cost of missing a negative is high, the most desirable outcome is to have a TPR or

recall close to one, always paying attention to having a low FPR.

22 CHAPTER

4

DESIGN RULE CHECKING WITH DEEP

LEARNING

4.1 Introduction

This chapter presents a design rule checker using a deep learning model with transfer

learning and synthetic data for training. This section details how we created a deep learning

model using transfer learning; and the fully synthetic training data generator. Finally, we

present the results to validate the proposed solution and summarize the chapter.

23 4.2 Design rule checking

Design Rule Checking (DRC) is the process of checking that design geometry satisfies a set of layout rules. The ultimate driver for doing a DRC check is to ensure that, when fabricated, the design will achieve a high yield, as limited by the manufacturing tools and steps. They are generally specified as a set of logical rules. For example, a wire width or spacing has to be greater than a certain minimum value, among others. Figure 4.1 illustrates a width and spacing rule definition.

Figure 4.1 Basic design rule definition.

The rules define geometric constraints to satisfy the physical limitations of the lithogra- phy and the fabrication process. Typically, the design rule manual specifies checks consist- ing of width, length, spacing, area, enclosure, and overlap rules. The number of rule checks has increased to thousands in advanced nodes.

Since there has been an increase in intricacies in rules, the speed of the rule checker has become a critical point in layout generation of custom circuits. Before advanced nodes, layout engineers could recognize patterns and fix DRC violations with relative ease. Nowa- days, the design engineer waits a long time for the final sign-off DRC checker to run. The engineer then fixes the problems flagged, often only to introduce new ones. This iterative process is becoming unacceptable due to the slow checkers and rules complexity. Machine

24 learning can help to speed up the design rule checking process because a neural network

does not have to iterate over thousands of Boolean operations.

As mentioned before, the rules are defined in the design rule manual and each process

design kit (PDK) has its own set of rules. For demonstration purposes, we use the rule set

for FreePDK15 to avoid proprietary issues. The FreePDK15 is a multi-gate 15nm FinFET

open process design kit (PDK) [2, 3, 13].

4.2.1 How to use Deep Learning for DRC

The idea of using deep learning to do design rule checking is that we can take as input to

the model layout images. The model will have the ability to learn from those images and

recognize the violations. We want to take a new layout and not only identify regions where

the violations are located, but point to them in action windows.

Figure 4.2 illustrates how the ultimate goal of this work is to identify multiple simultane-

ous violations. In contrast with what we can find in the state-of-the-art where the violations

can be counted or at the most locate regions where they can be.

Figure 4.2 Using a deep learning model to detect DRC violations.

25 4.3 Single Convolutional Neural Network Model

Our first approach consisted of creating a single deep learning model to detect the design rules violations (DRCV). One advantage of a deep learning model is that it does not require a previous feature extraction method. This characteristic removes the dependency on the technology node and the human iteration experience. This advantage increases the number of features that can be extracted by the model compared to a feature extractor based on designer experience.

A deep learning model consists of various learning layers or hidden layers. The hidden layers are convolutional layers. The learning layers extract features in the form of data rep- resentations of the input. In this case the data representations are done by a convolutional operation described in Chapter 3.

Typically in between each learning layer, a down-sampling layer is used to reduce the size of features extracted. Followed by the hidden layers, a fully connected layer discriminates the learned features to map them to a classification output probability. After defining the model structure, the training process uses labeled data to generate predictions and minimize the prediction error. Figure 4.3 illustrates how the training process is done.

Figure 4.3 Deep learning model training flow.

The training process can be seen as an optimization problem as explained in chapter

26 3; thereby, we must define the loss function and the optimizer. Those parameters have an

impact on model performance. We explored different gradient descent-based optimizers

having the best result with RMSprop introduced in [20]. For the nature of the problem, we use cross-entropy for the loss function. This function captures the difference between

the prediction and the targets. In Chapter 3, we provide more details on each of those

components.

4.3.1 Model Structure Selection

By putting together all the components previously described, we built a deep learning

model with a convolutional neural network to perform the feature extraction. The model

takes as input the layout clip images and provides a classification output probability. By

taking the maximum probability, the layout clips are labeled as a DRC violation, containing

several violations, or DRC clean.

Figure 4.4 summarizes the model structure and illustrates what is expected in each

stage. The model settings are used to evaluate different model structures are as follows:

• The input image size is set according to pixels and metal pitch. For example, 1 pixel

corresponds to 1nm2.

• The number of hidden layers/Convolutional layers is set to avoid overfitting and underfitting. In the same way, the number and size of the filters.

• The resampling layers are added to reduce the output size of the convolutional layers.

We use max pooling for the resampling layers.

• The intermediate layer or fully connected layer maps the extracted features to an

output probability. The size M of this layer has an impact on the total number of

trainable parameters for the model.

27 • The output layer contains a dimension d that is given for the combination of all

the possible DRC violations analyzed. The label is assigned, taking the maximum

probability for a class.

Figure 4.4 Propose model structure.

Table 4.1 shows the settings to construct two different model structures. The first model

is to detect 1 DRC violation. This model contains 159,746 trainable parameters with four

feature extraction layers. The second model is to target 3 DRC violations. The number of

trainable parameters is increased to 1,673,256 as it should detect up to 8 classes (each violation and the possible presence of many rules). These parameters determine the model

complexity, which impacts the model performance. To select the parameters we need to

monitor overfitting and underfitting. The number of parameters has more impact on the

training time than inference time.

Following the previous model structure, the model complexity can be set to target any

number of rules. A more complex model will not always result in better performance. A

deep learning model is as good as the diversity or generalization in the training dataset.

28 Table 4.1 Model parameters example for 1 and 3 DRC violations.

Model Setting Target Rules 1 DRC 3 DRC Size: 200x200, Size: 200x200, Input 1px/1nm 1px/1nm 32: 3x3 filters, 16: 3x3 filters, Conv2D #1 params: 320 params: 160 Downsample: 2x2, Downsample: 2x2, MaxPool2D #1 resize by 2 resize by 2 16: 3x3 filters, 32: 3x3 filters, Conv2D #2 params: 4,624 params: 4,640 Downsample: 2x2, Downsample: 2x2, MaxPool2D #2 resize by 2 resize by 2 16: 3x3 filters, 32: 3x3 filters, Conv2D #3 params: 2,320 params: 9,248 Downsample: 2x2, Downsample: 2x2, MaxPool2D #3 resize by 2 resize by 2 32: 3x3 filters, 64: 3x3 filters, Conv2D #4 params: 4,640 params: 18,496 Downsample: 2x2, Downsample: 2x2, MaxPool2D #4 resize by 2 resize by 2 Size: 128x1, Size: 128x1, FC params: 147,584 params: 1,638,656 Dimension: 2, Dimension: 8, Output pasrams 258 params: 2,056 Total Params 159,746 1,673,256

4.3.2 Dataset to Train and Test the Single CNN Model

Data that includes a large number of DRC free layouts along with layouts containing errors

is necessary to create the model. A dataset with over 250,000 layout clips, is created to train, validate, and test the model. The dataset consists of images basically from SRAM (static

random access memory) layouts using 15nm the public design kit [2]. The initial designs are DRC free, thus creating the need to insert random DRC violations. The training samples

for all the test performed to the model is about 80%, the validation 15% and the testing the

remaining 5%.

29 4.3.3 Dataset Generation Process

The process of generating the dataset is in Figure 4.5. After selecting a layout with no DRC violations, the following steps are performed:

1. Extract the metal layers where the DRC will be performed.

2. Insert random variations in the layouts. Those variations are selected to create DRC

violations. In total 5,000 variants of these designs were auto-generated using random

parameters.

3. Run layouts through a conventional DRC checker to verify that the variations create

real rules violations. A mask is created with this information to label each clip as

containing rule violations or not.

4. Crop the layout using a given windows size; in the result presented, the size of the

window is 200x 200nm. The cropping window is moved with a stride of 150nm to

avoid creating false rules.

5. Convert each layout clip into an image. The relation to image pixels and layout

dimensions is 1px to 1nm2.

To have an evenly balanced dataset, one sample with no DRC violations is added for

each added clip containing violations. The clips in the dataset can include one or multiple

DRC violations. By doing so, the model can identify each rule or diverse rule violations.

The seed to generate the dataset consists of fifty different SRAM designs generated by fifty

teams in a graduate VLSI class. These were designed using the NCSU 15nm PDK [2], all of which were DRC free. In total, 5,000 variants of these designs were auto-generated using

random parameters.

The rules targeted in this initial dataset are the width and spacing standard design rules

in the FreePDK15 [2]. In the dataset we included the following DRC violations:

30 Figure 4.5 Dataset generation.

• M 1.1 - Minimum width of M1 metal layer is 28nm.

• M 1.6 - Minimum spacing of M1 to M1 should be 36nm.

• M 1.7 - End-of-Line spacing of M1 to M1 should be 45nm.

Table 4.2 shows how the dataset samples are labeled for three rules. The dataset in- cludes samples with each independent rule, samples with combinations of them, and clean samples. The number of samples in each category remain the same.

Table 4.2 Labels in the dataset for 3 rules.

Label Description DRC1 Minimum width of M1 metal layer is 28nm DRC2 Minimum spacing of M1 to M1 should be 36nm DRC3 End-of-Line spacing of M1 to M1 should be 45nm DRC12 Sample containing DRC1 and DRC2 DRC13 Sample containing DRC1 and DRC3 DRC23 Sample containing DRC2 and DRC3 DRC123 Sample containing DRC1, DRC2 and DRC3 NDRC No DRC violation

31 4.3.4 Results for Testing the Single CNN Model

A primary model was created to detect one DRC violation. In this initial experiment, the

created model was trained with 50,000 layout clips containing many variations of the same

DRC rule and 50,000 DRC clean. The model was validated with 15,000 samples and tested with 5,000 clips never seen by the model. Figure 4.6 shows the train and validation losses

for the model with no overfitting or underfitting. The model requires around 7 minutes

to be trained; note that the training is done in a 6 CPUs computer without using GPU

acceleration.

Figure 4.6 Training and validation loss for 1 DRC model.

Table 4.3 shows the confusion matrix for the testing samples. The accuracy or recall to

detect the DRC violation (TPR) is 92%. The false positive rate (FPR) is around 28%, and the

false negative rate (FNR) is around 8%. This result is encouraging as users can better deal with false negatives than false positives.

Figure 4.7 presents the classification results for 4,800 samples containing 3 DRC viola-

32 Table 4.3 Results to classify one single DRC violation.

Model predictions True labels NDRC DRCV Accuracy NDRC 1800 700 72% DRCV 2295 205 92% TPR/recall 92% FPR 28%

tions. The confusion matrix shows that the model is classifying the majority of the samples correctly and that the false negatives are not as frequent as false positives. The samples used for the confusion matrix are samples never seen by the model.

Figure 4.7 Confusion matrix for 3 DRC model.

For 3 rules, the diversity in the training dataset is not as high for the 1 rule. When more training iterations are added, the model resulted in overfitting and reduced the validation accuracy. Figure 4.8 illustrates how the validation loss of the model does not decrease while the training loss decreases. This behavior is mainly because, in the results presented, we

33 use only SRAM designs as seeds to generate the dataset. We tried to reduce the model

complexity by adding dropout and regularizer layers, but no significant improvements

resulted. To solve this issue, we proposed to increase the dataset diversity by including

multiple types of designs and adding more rules variations. The community has studied

this issue and its nature makes it challenging to achieve high detection rates to never seen

data [50].

Figure 4.8 Training and validation loss for 3 DRC model.

The time to test a full layout 1000µm x 1000µm equivalent to 5,000 samples takes less

than 13 seconds. We compared the speed of the CNN engine versus running the same rules

set in a conventional DRC engine on one of the layouts. The CNN took 13 seconds versus

420 seconds for a conventional DRC engine 32x faster. Note that the conventional check was done with a runset containing only the violations implemented in the CNN model.

While the accuracy of the proposed solution can be improved, the speed factor enables it

to be used in the early stages of the design flow. Detecting and fixing violations early in the

design reduces the design cycle time.

34 4.4 DRC Framework with transfer Learning and Synthetic

Training

After facing a wall with the single convolutional neural network model, we had to improve

our solution to not depend only on random dataset generation. Another goal we wanted to

meet is to have the ability to expand to support a complete rule set easily, including rules

that involve the interaction of multiple layers. Using a synthetic dataset generator, we solve

overfitting and avoid the model from learning the rough global structure of specific layout

polygons. With the inclusion of transfer learning, we make the process of adding new rules

fast and straightforward. Transfer learning reduces the training time and the amount of

data to train the model for new rules.

The proposed design rule checker framework consists of an ensemble of deep learning

(DL) models which share weights (transfer learning) and a parametrized synthetic dataset

generator (PSDG). The PSDG takes the input of layout and polygons parameters to gen-

erate a sampling space for each design rule violation (DRV). The main advantage of this

framework is that it can be expandable to any number of rules. With the PSDG, we can

generate data to include new checks to the model. The transfer of weights allows reducing

the training time and training resources.

The inputs to the model are layout clips in the form of W W image tensors I x , y .In ⇥ the case of rules that involve only one layer, each pixel I x , y is a scalar value representing

a grayscale image. When multiple layers are involved,I x ,y has three components to

represent an RGB image. Each sample that the PSDG creates an image as described before.

We choose the image size W W to make that one pixel correspond to 1nm. By reducing ⇥ action windows W , we increase the resolution of the detection. In each image, there is

an area without polygons, creating centered images. This centering area is given by the

distance c . Figure 4.9 shows an example of one and two layers image clips. For Figure 4.9,

the layout clip on the left is for a single layer, for example, a metal layer, while the layout on

35 the right is for two interacting layers, for example, a metal layer and a via layer.

Figure 4.9 Layout clip samples for one layer (a) and two layers (b).

4.4.1 Deep Learning Architecture with Transfer Learning

The deep learning model involves a convolutional neural network (CNN) to perform auto-

matic feature extraction and a set of fully connected layers to perform the DRC violations

classification. The CNN contains convolutional layers (CL) and downsampling layers. The

CLs perform a convolution operation to their input, and the downsampling reduces their

output size. We use max pooling for the downsampling layers.

Similar to the single CNN model, we use cross-entropy as the loss function and root

mean square proportional (RMSprop) for the optimizer. The activation function in the CLs

and the FC layers is rectified linear unit (ReLU).

To classify each DRC violation, we create an ensemble of CNN and fully connected (FC)

layers, creating a complex deep network structure. Figure 4.10 illustrates the final deep

network architecture. We use an ensemble of models instead of a single multiclass to reduce

36 the degradation in performance found in section 4.3.4. This new approach reduces the number of trainable parameters and provides an incremental framework to add new DRC checks.

37 Figure 4.10 Deep network architecture to add new DRC violations incrementally by sharing weights.

38 4.4.2 Selecting the Layers Size and Adding DRCs

To reach the final base model structure in Figure 4.10 we took the work in with a single

CNN to create a starting point. The number of CLs in the final base model is five (CL1 to

CL5). To obtain this number, we follow an intuitive approach that consists of increasing

the number of CLs until the model starts overfitting. When we reach overfitting, we go back

to the previous number of CLs. We use the same intuitive approach to the FC layers and

the size of both FC layers and CLs. The CLs size is the number of filters and filter size. Note

that there is no deterministic path to find the number of layers and their size; the best way

is to use intuition and experimentation.

To select the percentage of sharing between rules, we follow some experimentation as

in [34, 40, 52]. From [52], we can see that sharing a high percentage of weights drastically affects the accuracy. They test sharing up to 60% of the parameters for a generic image

classification problem. In our case, we are working with a particular and complex problem; we try to keep the percentage of shared weights around 10% or lower to prevent the model

from learning polygons related to a single violation. Also the synthetic dataset generator

allows us to quickly and easily access training data.

To incrementally add new design rule checks, we start with a pre-trained base model.

The base model is pre-trained for the first rules added to the model (CL1, CL2, CL3). For

each new rule we train the layers CL4, CL5, FC11, and FC2.

The output layer size is two to classify the layout clips as a specific violation or clean.

The output layer contains two probability values. To assign the clip to a class, we take the

maximum of the two. In the case, two or more networks flag an input as a violation, we

assign that input to have multiple violations. If the networks do not flag the input in any

DRC we consider it as clean.

39 4.4.3 Synthetic Dataset Generator

One of the key elements of the framework is the parameterized synthetic dataset generator

(PSDG). The goal of the PSDG is to improve the quality and the diversity in the dataset.

By sampling the layout parameters we plan to have data that mimics all patterns in real

layouts, not only limited to SRAM layouts but standards cells and custom layouts. Figure

4.11 illustrates the proposed method to extend the dataset.

Figure 4.11 Parameterized synthetic dataset generation.

The generator takes a set of input parameters for clean and violation layouts to generate

image clips to train the model. With the parameters, it creates a sampling space using

the Latin hypercube (LHC) sampling strategy. By using Latin hypercube (LHC) sampling

strategy [24, 53] we create a sampling space of an N samples where each sample ni has the same probability; this means the samples are equality spaced. In contrast with random

sampling, LHC generates sampling space that takes into consideration previous generated

samples.

We have a wide range of clean and violation clips with different polygon arrangements

by creating a sampling space; this increases the variance in the dataset. With the PSDG, we

avoid the model from learning only the rough global structure of the polygons and make

sure it learns more about the violations. Another advantage of the PSDG is that we can

adapt our models to a new technology node by changing the parameters.

In the PSDG, we can define parameters for clean and violations for each layout layer via

40 layers, alignment, enclosure, and area. Next, we list the parameters and what each of them defines in the layout clips:

• Min and max layer width: range of polygons width.

• Min and max layer length: range of polygons length.

• Min and max horizontal spacing: spacing between polygons

• Min and max horizontal vertical: spacing between polygons

• Min and max number of polygons per sample.

• Polygons orientation: only vertical, only horizontal, or both.

• Polygons overlap ratio: how much overlap there will be between horizontal and

vertical shapes. Allows creating complex polygons.

• Clean to violation ratio: how many polygons will contain violations.

• Min and max vias width and length: defines vias shapes.

• Vias or layer enclosure (min/max horizontal and vertical distances): defines the enclosure distance for vias and layers.

• Vias or layer alignment (min/max horizontal and vertical distances): determines layers and vias alignment.

• Min and max area for polygons: defines enclosure and alignment areas.

To create an N samples dataset, the generator creates a sample space of N samples of the parameters (P ). We chose LHC to do the intelligent sampling because it is not entirely random; each new sample considers the previous one. Another benefit of LHC for this approach is that we can have a multi dimensional sampling space. Algorithm 1 describes a

41 Algorithm 1 Synthetic Dataset Generator for a DRC 1: N Number of samples

2: Obtain clean parameters vector Pc = p0,...,pm 3: Obtain violation parameters vector Pv = p0,...,pm ⇥ ⇤ 4: Sample layout clean parameters L c = LHC (Pc ) ⇥ ⇤ 5: Sample layout violation parameters L v = LHC (Pv ) 6: i = 0 7: while i < N do 8: Create polygons sample space pic = LHC (Pc [i ]) 9: Create polygons sample space piv = LHC (Pv [i ]) 10: DRP(pic ) 11: DRP(piv ) 12: i = i + 1 13: procedure DRP(poly, n) 14: create empty image I 15: while j < n do 16: xy = Create coordinates from poly[j ] parameters 17: if xy not inside image & center distance then 18: adjust xy to image size 19: Check overlap and adjust spacing 20: Draw coordinates xy in image 21: Save image 22: j = j + 1

42 simplified version of the process to generate the training dataset for a DRC violation. Each

P is re-sampled to create a new sampling space for the polygons corresponding to that

layout clip.

To choose the value for the parameters to create the dataset, we only require the con-

straints for the DRC violation from the process design manual. For the clean layouts, we

need to understand the design types we are making with this process. The dataset generator

enables creating data representing custom layouts, standard cells VLSI layouts, memories,

among others.

4.4.4 Results for the DRC Framework

We implemented, tested and validated our design rule checking framework using Python

and TensorFlow. For demonstration purposes, we use the rule set for FreePDK15 to avoid

proprietary issues. The FreePDK15 is a multi-gate 15nm FinFET open process design kit

(PDK) [2]. This PDK contains over 200 DRC rules. Those rules cover width, length, spacing, area, enclosure, overlap, and alignment.

In the rest of this section, we detail the final model architecture and the dataset gener-

ated. We also analyze the transfer learning and the overall performance of our framework.

4.4.4.1 Base Model Architecture and Generated Dataset

After following the procedure described in section 4.4.2, we ended up with five convolutional

layers and two fully connected layers for our base model. The layers size [CL1,CL2,

CL3,CL4,CL5] is [8,16,24,24,16]. The fully connected layers size [FC1, FC2] is [64,64]. The size of each convolution filters is 3x 3 and 2x 2 downsampling layers. This base model has a

total of 43,362 trainable parameters for an input of 450x 450 pixels resized as by 2.

In the case of rules that involve more than one layer, we use color images. For those

rules, the size of the input layer is adjusted to map a three channels image. Another point

to consider is that depending on the complexty of the rules we change each layer size. We

43 ended with up to three base models. We choose each model depending on how complex is

the rule.

In the case of rules that involve more than one layer, we use color images. For those

rules, the size of the input layer is adjusted to map a three channels image. Another point

to consider is that depending on the complexity of the rules; we change each layer size.

We ended with up to three base models. We choose each model when adding new rules.

For example, for a simple width rule, the base model has 43,362 parameters, and for a

more complex one, it has 252,306 parameters. We tested models with over 2.5 millions

of parameters. Those models resulted in some overfitting and degradation in the testing

accuracies.

To train the base model, we generated a dataset with a width and a spacing violation.

The samples in this dataset are 80,000 clean and 40,000 with the two violations. The dataset

generated uses an action window of 400nmx400nm centered with a distance of 50nm;

this results in 450x 450 pixels images. To choose this action window, we considered that in

the FreePDK15 polygon width is 24nm and the minimum spacing of 2nm. The resolution

inside a 400nmx400nm area is pretty reasonable to detect the violations.

After training the base model, we added 20,000 samples per each new DRC added to

the model. We use this dataset to fine-tune CL4,CL5 and the fully connected layers. For

new rules, the model shares 11% of the hyperparameters (weights) for a total of trainable

parameters 38,634.

To analyze the effect of sharing 11% of the model weights, we created two DRC models with and without transfer learning. Figure 4.12 shows that the loss in accuracy when sharing

the hyperparameters is less than 0.01%. In Figure 4.12 drcx_t is training accuracy, drcx_v

is validation accuracy, and drcx_wtl_v /t is training/validation accuracy with transfer learning model.

With a neglectable impact on the accuracy, we obtain a decrease in the training time

of about 56%. The training time goes from 2.52 hours to 1.10 hours for a 2.3x speedup in

44 training. This training time is an example of one of the base models. Another example is for a

rule with 2 layers it trains in 3.7hrs and with transfer learning 1.9. This is a 48% improvement

and 1.9x speedup. This training time reduction is because the model converges faster and

benefits from using transfer learning. This reduction is also because of the decrease in the

training samples by 4.

Figure 4.12 Training and validation accuracy for rules with a completely deep network trained vs partially trained with transfer learning.

4.4.4.2 Model Training

The hardware used to train the models is a machine with 56 Intel(R) Xeon(R) Gold 5120

CPU @ 2.20GHz processors and a total of 376GB of memory. This machine does not have

any GPU. The total time to train the model for 200 violations is about 187 hours, and the

time to generate the dataset to train the models is 36 hours. The training time is affected by

45 how many training interactions we perform. To help to improve the training time, we use

early stopping [6]. When the training and validation accuracies meet a threshold and start decreasing, we stop the training process and save the model weights for the best validation

accuracy achieved.

To test the model, we selected 20 rules in different layers that cover most of the rules in

the FreePDK15. Figure 4.13 shows the model accuracy during training and validation. In

Figure 4.13, drcx_t is training accuracy and drcx_v validation accuracy. From Figure 4.13, we can verify that there is no overfitting or underfitting. Both the training and validation

accuracy converges between 98% and 99% after 40 training iterations.

Figure 4.13 Training vs. validation accuracy for 13 of the rules implemented.

Figure 4.14 shows that the training and validation losses also settle down after 40 training

iterations. In Figure 4.14, drcx_t is training loss and drcx_v is validation loss. At the

46 beginning of the training, there is some noise and high losses. These are more significant

complex rules that include the iteration of multiple layers.

Figure 4.14 Training vs. validation loss for 13 of the rules implemented.

4.4.4.3 Testing the Model with Unseen Data From Real Layouts

Table 4.4 shows the results summary. The model achieves a true positive rate (TPR) between

96.4% and 100%, and a false positive rate (FPR) between 1.1% and 5.3%. If we compare with

our previous models in section 4.3.4 we have a significant improvement in both TPR and

FRP,and now we have a fully scalable framework that can extend to cover any number of violations.

47 Table 4.4 Results summary of testing 20 different types of violations in layouts data unseen by the model. The data consist of an SRAM layout and a Rocket Core design.

Design Rule Checks Violations Clean DRC Type True Label Found Missed TPR True Label Found Missed FPR drc0 Horizontal width 32,630 32,150 480 98.5% 18,850 17,854 996 5.3% drc1 Horizontal Spacing 36 36 0 100.0% 7,720 7,484 236 3.1% drc2 Vertical overlap 4,464 4,326 138 96.9% 15,796 15,027 769 4.9% drc3 Vertical length 59,159 58,813 346 99.4% 18,289 17,693 596 3.3% drc4 Vertical spacing 120 119 1 99.2% 31,043 30,036 1,007 3.2% drc5 H/V width 20,912 20,875 37 99.8% 6,414 6,162 252 3.9% drc6 Polygon area 21,331 21,197 134 99.4% 16,779 16,311 468 2.8% drc7 H/V Spacing 18 18 0 100.0% 37,034 34,985 2,049 5.5% drc8 Spacing dependant on W & L 2,687 2,657 30 98.9% 48,350 47,804 546 1.1% drc9 H/V width 10,698 10,633 65 99.4% 4,853 4,789 64 1.3% drc10 Polygon area 2,405 2,352 53 97.8% 9,841 9,522 319 3.2% drc11 H/V spacing and notch 30 30 0 100.0% 9,207 8,898 309 3.4% drc12 Spacing dependant on W & L 19 19 0 100.0% 18,351 18,109 242 1.3% drc13 Via shape square 93 93 0 100.0% 13,118 12,634 484 3.7% drc14 Via shape rectangular 77 77 0 100.0% 35,816 34,749 1,067 3.0% drc15 Via spacing 51 51 0 100.0% 63,513 61,045 2,468 3.9% drc16 Via inside layer 230 228 2 99.1% 2,017 1,945 72 3.6% drc17 Via enclosure horizontal 241 238 3 98.8% 503 483 20 4.0% drc18 Via enclosure vertical 99 98 1 99.0% 738 699 39 5.3% drc19 Horizontal width 22,788 22,711 77 99.7% 16,241 15,791 450 2.8% Totals 178,088 176,721 1367 99.2% 374,473 362,020 12,453 3.3%

48 The results in Table 4.4 show that with the synthetic dataset generator, the model is not

learning the global structure of the polygons but it is learning the violations. The model

identifies multiples violations in the defined action windows of 400x 400nm and labels

each of them. Note, this action window is another parameter we can adjust.

The testing data came from several instances of a Rocket Core layout 160µm x 160µm.

There are also SRAM instances in the testing dataset. The SRAMs are merged to create

multiples 16µm x 16µm layouts. We crop those layouts and test them using a conventional

design rule checker tool as reference.

The works we can find in the state-of-the-art as described in 2 does not identify or label

each violation. But if we compare the overall TPR and FPR 99.2% and 3.3% respectively, our

approach presents a significant improvement. Those works estimate the total number of violations using random forest, neural network, surrogate models, and CNN with a TPR as

low as 60% and up to 97% in the best-case scenario. Our approach not only improves those

TPRs, but it detects multiple violations and labels them.

Finally, we measure the run-time of a conventional DRC engine vs. the inference time

for the deep learning model. We use the machine used for training but limiting the number

of cores to 8. We test one violation at a time and only load the layout with the layers needed

for that violation. As a result the deep learning model completed the checking 7.5x faster.

4.5 Testing New Layouts Process

We can also focus on another metric that determines how fast a model can perform in

testing data and new layouts; this is also called the inference time. The process to test a

new layout is shown in Figure 4.15 and consists of the following steps:

1. Extract the metal layers to analyze.

2. Crop the layout using a target windows size square windows, same as for the dataset

using a centering distance.

49 3. Convert layout clips to images and keep track of the coordinates in the layout.

4. Test individual images clip in the model.

5. Create a mask image and/or keep track of the coordinates of each clip.

Figure 4.15 Inference flow to test new layouts.

Using the previously described flow, we can have an estimation of how fast our approach can perform compared to traditional DRC checkers. This approach can be used for any number of rules added to the model. This process allows the model to be included in an interactive DRC flow.

4.6 Summary

In this chapter, we presented a deep learning approach to design rule checking. The initial approach proved that it is possible to use a deep learning model to do design rule checking.

Using this model, we can detect DRC violations with an accuracy of up to 92% and up to

32x times faster than traditional boolean checkers.

To improve the initial approached we moved to use a parameterized synthetic dataset generator and transfer learning to add new design rule checks. The new framework detects

50 DRC violations with a detection rate of 96.4% to 100%, depending on how complex the

rule is. The false alarm rate is 5.3% in the worst-case scenario. The approach can locate the violations and identify them in action windows 7.5x faster than conventional checkers.

51 CHAPTER

5

MULTILAYER CMP HOTSPOT MODELING

WITH DEEP LEARNING

5.1 Introduction

This chapter presents our approach to model chemical and mechanical polishing (CMP)

hotspot. We provide details of the proposed multilayer deep neural network model and how we train it. Finally, we present the results to validate the proposed solution and summarize

the chapter.

52 5.2 Chemical Mechanical Polishing

The minuscule depth-of-focus (DoF) tolerance of advanced lithographic processes is a

crucial challenge that necessitates the best possible wafer surface planarity. To achieve

such, chemical mechanical polishing (CMP) is used. CMP is a wafer planarization process

using the combination of a chemical slurry with mechanical polishing. This planarization

process is used in both [32] front end of line (FEOL) and back end of line (BEOL) layers. When CMP is applied to a layer dishing and erosion may occur. The dishing and erosion are what cause CMP hotspots. Figure 5.1 shows how a layer should be planar and how polishing

may create dishing and erosion.

Figure 5.1 Dishing and erosion example for an given layer.

CMP is layout pattern-dependent [58] and dishing/erosion may occur in regions of the IC with sub-optimal pattern density resulting in non-uniform surface topography. These

regions are known as hotspots (HS). It is vital to quickly find and fix CMP hotspots in

a design prior to tapeout. These topographical excursions may impact the lithographic

process window or degrade the physical features. Not detecting the HS areas will reduce

the yield, which is the ultimate goal of all the design for manufacturing (DFM) techniques.

53 CMP hotspots may be effectively mitigated by the addition of dummy fill patterns to

FEOL and BEOL layers to make the pattern density more uniform across the IC. Adding

dummy fill requires judicious trade-offs due to the potential impact of these fill structures

on critical net parasitics which may impact performance. Therefore optimal fill and accurate

modeling are critical. However, application of CMP is increasingly complex for advanced

nodes. The number of CMP steps has doubled from the 28nm to 10/7nm nodes [21] due to a greater number of layers, new materials, and overall design complexity. These have

created a need for new CMP modeling techniques.

5.3 CMP Hotspot Modeling

The prediction of hotspots due to chemical mechanical polishing consists of identifying

locations on a physical design that are most susceptible to failure during the manufacturing

process. To detect those regions, we need to model the surface profile of the chip. The

nature of the CMP process has a cumulative effect; this means that to detect a HS in a

particular layer, the potential impact of the lower-level layer topographical variations must

be considered. The combination of dishing and erosion in layers can create a surface height

above or below an average threshold. The threshold is a tolerable surface height defined by

the lithographic process and the desired planarity.

When considering a hotspot, an action window is defined; typically, the windows can

be 5µm 5µm but can scale up or down according to the desired resolution of the model. ⇥ The average surface of the windows are modeled. According to the surface average, we can

define a low-level or high-level hotspot; this is illustrated in Figure 5.2. The steps to identify

hotspots is defined as follows:

1. Define an action windows

2. Model the average surface for the action windows

3. Define a threshold according to the technology node

54 4. If the average surface is higher than the threshold, a high surface hotspot is detected

(HSTH).

5. If this average is lower than the threshold, a low surface hotspot (HSTL) is predicted.

Topography Profile Max. HSTH Surface Height Threshold Avg. Surface NHS Height

Topography Threshold Max. Surface Height HSTL 1 2 3 4 5 6 7 Layer

Figure 5.2 Hotspot definition according to the topography profile.

The most complicated part of developing a CMP model is the calibration process. This process is done using data extracted from test chips. This process is costly and time con- suming, and new technology needs new CMP models. Prior art models are dependent on physical parameters and the geometric properties of layout patterns. These issues create the need for exploring new CMP techniques. Machine learning creates the opportunity to mitigate those issues by creating data-driven models. An ML model can quickly adapt to a new process early in design stages, and facilitate an early assessment for early-stage technology and faster turn-around time. ML also provides the opportunity for a quicker runtime model, especially for smaller window sizes.

55 5.4 Proposed Modeling Architecture

The idea of using a deep learning (DL) model is to create the model without using parameters

that strictly depend on the technology characteristics of the designs. This DL feature allows

for creating a model that can quickly adapt to new technologies and that can be initially

trained with data from existing models.

To do the CMP hotspot detection, we are proposing a deep learning model, similar to the

base model we use in the design rule checking problem in Chapter 4, Figure 5.3 illustrates

this structure. The main difference here and what makes this unique is the input data is

not a greyscale or RGB image; it is a multidimensional input tensor. Each input dimension

represents a layer in the layout. For example, for a 12 layers layout, the input x to the model

has 12 dimensions.

Figure 5.3 Basic deep learning structure.

5.4.1 Multilayer CNN Model

The proposed deep learning model consists of a convolutional neural network (CNN) and

a fully connected layer for the classification. The input to the model is an n dimensional

tensor, where n represents the number of desired layers. The focus of considering multilayer

56 input data to feed the network helps to capture the potential effect of layers iteration and

to model the cumulative impact of topography levels among layers. Notice that when we

refer to a multilayer model, it is not only the multilayers of the CNN structure but also that

the model considers multilayer input data.

In the state-of-the-art setup, the CNN typically uses input data with up to 3 channels, as

mentioned before we are proposing a model with input data up to n channels. This differ-

ence makes this a unique modeling approach for electronic design automation problems.

Figure 5.4 illustrates an overview of the of the multilayer CNN model. In the proposed

model each channel in the input data represents a layout pattern. All the patterns for the

desired number of layer are stacked in a data structure creating a multi-channel image.

For each channel, a combination of hidden convolutional layers and downsampling layers

extract features from the input. Those features are flattened to form a fully connected layer

and obtain a probability to classify each sample as containing a hotspot (HS) or non-hotspot

(NH).

An important distinction must be made when we talk about multilayer. It refers not

only to the hidden layers of the proposed network structure but also to the way the model works with the input data structure; the data includes multiple design layers of the layout.

This enables training with complex multiple design layer patterns to capture the potential

inter-layer interaction effects. The model is trained and validated using a labeled dataset

and can capture effects that go beyond physical parameters of the technology and the

design.

5.4.2 Model Details

The overall idea of using deep learning is that the model can learn from the data without

the need for manual feature extraction. Similarly, the model can learn from the multilayer

input patterns without the need of performing time-consuming geometric extraction or

the need to train the model with physical parameters. For achieving this goal, the hidden

57 Replicated: n times to increase model complexity

HS NH

Layers = 1 ~ n

Input: Hidden: Downsampling Fully connected: stack of layers n patterns feature extraction layers Classification layer

Figure 5.4 Proposed multilayer deep learning model for CMP hotspots detection.

layers do the feature extraction and the fully connected layer classify each input.

We constructed a deep learning convolutional neural network followed by a fully con-

nected layer to do the final classification. Different loss functions and optimizers were

analyzed and tested. We obtained good classification performance and accuracy using

categorical cross-entropy for loss function and a modified stochastic gradient descent

RMSprop as the optimizer. For the activation function, we used rectified linear unit (ReLU).

We evaluated sigmoid, softmax, and softplus, but ReLU provided the best results. Each of

these componets for the model were detailed and formulated in the background chapter 3.

Figure 5.5 shows an example of how the model would look for input of only one channel

and three hidden layers. This model is used to predict HSTL and HSTH hotspots. This

model was also modified to stack the hidden convolutional layers and expanded to n dimensional data. This simple model consists of about 262,000 trainable parameters.

58 Convolutional Layer + Convolutional Layer + Convolutional Layer + Convolutional Layer + Pooling Layer Pooling Layer Pooling Layer Pooling Layer

….. ….. ….. Input ...... ….. ….. ….. - Output layer - 64 - 3x3 filters - 16 - 3x3 filters - 32 - 3x3 filters - Parameters: 195 - 32 - 3x3 filters - Pooling(2x2) - Pooling (2x2) - Pooling (2x2) - Pooling (2x2) - Output 79x79x16 - Output 8x8x64 - Flatten 4096 - Output 38x38x32 - Output 18x18x32 - Parameters: - Parameters: 144+16 - Parameters:9,248 - Map to 64 - Parameters:4,640 18,496 - Parameters: 262,208

Figure 5.5 Model example for 1 layer, 262,000 parameters for 1 layer patterns.

5.5 Dataset Generation

In order to train, validate, and test the model a dataset was created. The dataset consists of

14nm layouts. The layouts contain: fill test layouts, CMP test chips, and some test designs.

By combining these three type of layouts, we were able to obtain good pattern diversity in

the data. Note that diversity and variability is an essential feature of any dataset for machine

learning. The dataset was labeled using existing CMP simulations, but it is possible to

generate it using DFM data. Figure 5.6 presents the flow for the dataset creation; this flow is

as follows:

• For each layout run a CMP simulation a specific layer. The windows for the CMP

simulation was 5µ 5µ for the results included in this paper. ⇥ • To each window assign a label: Non-hotspot (NH), hotspot-low topography (HSTL)

and hotspot-high topography (HSTH).

• Convert the windows to a grayscale image for each layer and stack the images obtained

across the metal stack vertically in a multi-channel data structure. One pixel in the

image corresponds to 10nm, hence the image size is 500 500 pixels. ⇥ The dataset is split into three portions: Training data ( 80%), validation data ( 15%), ⇠ ⇠ and test data ( 5%). The test samples are samples never seen by the model and is used to ⇠

59 Data Labeling

Max. HSTH Surface Height Train data Avg. CMP Surface NHS Stack Layouts Height Val. data Simulation layers Max. Test Surface Height HSTL data

Topography Profile Labeled (Window) (Window)

Figure 5.6 Dataset generation and distribution.

measure the model performance. In the same way, layouts new to the model are used to

test full designs and estimate the model performance. The dataset consists of more than

100,000 samples with known hotspots. The samples use to train, validate, and test the

model are varied as part of the experiment performed to verify the flow.

5.5.1 Dataset Augmentation

In the case of small training datasets, in order to increase the diversity of the training

patterns we use data augmentation. With data augmentation, we expect to smooth the

learning curve and reduce the convergence time of the training and validation accuracy. The

augmentation can also be used to verify the diversity in the current dataset, by comparing

the performance between a dataset with and without augmentation.

The data augmentation consists of creating five variations of a single sample. The variations consider a multi-channel input data. The five samples are as follows:

• Flip the sample for each layer in x-axis and y-axis. The flip is done among all channels.

• Flip the sample in both xy-axis.

60 • Rotate each channel for a sample. The rotation angle is uniformly distributed between

90 and 90. • Sample with a principal component analysis (PCA) of all layers.

To create the PCA variation for each pixel pj (i ) in the input data, a size n vector is extracted by,

pj (i ) = z1(x , y ), z2(x , y ),..., zn (x , y ) , (5.1) ⇥ ⇤ where z (x , y ) is pixel value for a layer z . With this vector a new pixel is formed with,

T p¯j (i ) = pj (i ) + ↵i wi vi , (5.2) · where i = 1 n, n is the number of layers, ↵i is a random factor with normal distribution ⇠ N (0,1), wi is the ith eigenvalue of a normalized covariance matrix created with pj , and vi is the ith eigenvector. Figure 5.7 graphically illustrates the sample with PCA augmentation.

Figure 5.7 PCA Data augmented sample.

61 5.6 Model Results

We implemented the proposed method to validate that CMP hotspots can be detected using

a multilayer deep learning model. The implementation was done using Python, Keras, and

TensorFlow. A set of experiments are designed to measure the performance and accuracy

of the model. In the tests, parameters including the number of layers, layers dimension,

optimizers, and loss functions are varied to obtain the optimal model. In the experiment,

two other factors considered are the number of training samples and from which layouts

the samples were extracted from. The results presented in this section are from the model we consider to have the optimal parameters. While determining the best model, in addition

to the performance we consider the samples size and model overfitting and underfitting.

The procedure to obtain the results presented in the rest of this section is as follows:

the previously generated dataset is split into training, validation, and testing samples; the

model is trained and validated at the same time using the training and validation samples;

the model is tested using the testing samples to measure and analyze the performance.

The size of the original patterns (500 500 pixels) is scaled down to 125 125 pixels. After ⇥ ⇥ testing with different sizes, we verified that the model performance was not affected. In

contrast, not scaling down the inputs can cause some overfitting.

To measure the model performance we use some of the metrics described in chapter 3.

These metrics are recall or true positive rate (TPR) and the false positive rate (FPR). The

desired value for the F 1 score is 1. We also focus on getting a high recall for detecting a hotspot, since the nature of the problem can tolerate the existence of false positives.

Another more difficult metric used to estimate is the inference time of the model. For

obtaining this metric, we run existing CMP decks to test a full layout and the proposed

model. We ran both models in the same machine and estimated the runtime in both cases.

The datasets used to measure the performance of the models are evenly balanced, 50%

of the samples are labeled as hotspots while the rest are non-hotspots samples. The even

62 baseline accuracies in the datasets are 50%.

5.6.1 Results With a Small Training Dataset

The model was trained with a small number of samples from a unique layout: 2,560 for

training, 500 for validation and 500 for testing. These samples contain data for 5 metal

layers, the hotspots are detected in the higher layer.

We used the data augmentation described in 5.5.1 to increase the size of this small

dataset and the number of training samples. The development was also used in this experi-

ment to evaluate if we will need to add data augmentation.

This experiment was done as an initial approach to the test of the proposed flow. Figure

5.8 presents the results, where we can see that the validation accuracy is up to 99%. From

Figure 5.8, we can see that the learning curve is similar with and without augmentation.

In Table 5.1, we can see that for HS the recall and the F-1 Score are up to 0.99. This validates that the model can be used to detect hotspots and we can move forward to

incorporate more data samples with more CMP features into the dataset.

Figure 5.8 Training and validation accuracy vs. epochs for training data with augmentation.

63 Table 5.1 Metrics for the model with data augmentation.

Accuracy Recall F-1 Score Training Validation HSTH NH HSTH NH No-Augmentation 1.00 0.99 0.99 0.99 0.99 0.99 Augmentation 0.99 0.98 0.98 0.98 0.98 0.98

5.6.2 Results Increasing the Dataset Size

Figure 5.9 presents the training and validation accuracy for the model trained with 45,000.

Figure 5.10 shows the training and validation for 54,000 samples from multiples layouts; the validation is done with 10,000 samples. The samples contain up to 5 metal layers, so the input data includes five channel images. From Figure 5.9 and 5.10 we can see that the training and validation accuracy’s converge to around 92.0% in both cases.

Figure 5.9 Training and validation accuracy vs. epochs for model trained with 45,000 samples.

In Table 5.2 we can see the confusion matrix when classifying 3000 samples containing

64 Figure 5.10 Training and validation accuracy vs. epochs for model trained with 54,000 samples.

hotspots due to high topography range. In 300 samples only 67 samples containing a hotspot

are missed for accuracy up to 95.5%, and a high recall of 0.96, while the F 1 Score is up to 92%. The overall accuracy of the model is up to 91%. The FPR is 13.26%.

In order to train the model to obtain these results, the training time was approximately

1hr and the inference time for 3000 samples for 5 layers was less than 4 seconds. Using the

same number of samples with data augmentation, the recall improves to 0.98. The problem with the data enhancement is that the training increases 4x times and the computational

cost of computing all the augmented samples is high. The efficiency of the numerical

and ML methods used enabled all DL training and inference experiments to run on a

conventional 64 core non-GPU accelerated computer with no more than 512GB of memory.

The methods are moreover highly scalable to enable far larger full chip DL training and

inference.

65 Table 5.2 Confussion matrix and performance stats for model trained with 45k samples.

Confussion Matrix Summary classes samples correct incorrect acc HSTH 1500 1433 67 95.53% NH 1500 1301 199 86.73% Totals 3000 2734 266 91.13%

Performance Stats precision recall f1-score support HSTH 0.88 0.96 0.92 1500 NH 0.95 0.87 0.91 1500 avg/total 0.91 0.91 0.91 3000

5.6.3 High and Low Topography Hotpots

Figure 5.11 shows the result of training the model for three classes: HSTL, HSTH, and

HS. The model is trained with 62,000 samples and validated with 12,000. The validation

accuracy and the training accuracy converge to 87.2%.

In the Table 5.3 we can see that the recall to detect hotspots (TPR) is 97% which is a high value. The FPR is 12.3%. These metrics reflect that the model can classify high and low level

topography hotspots and keep a high accuracy.

5.6.3.1 Layer Effects in the Hotspot Detection

Figure 5.12 illustrates how the model accuracy is affected when adding more layers to

predict the CMP hotspots in a higher layer. Figure 5.13 shows the testing accuracy of the

model according to the number of lower layers used to predict a hotspot in the layer 10th.

In this case, the model was trained with 22,000 samples and validated with 5,000 samples,

the dataset contained up to 10 metal layers, having an input tensor of 10 dimensional. Table 5.4 presents the final model accuracy versus the number of layers used to detect

66 Figure 5.11 Training and validation accuracy vs. training epochs for 3 classes model.

Figure 5.12 Model accuracy vs. number of layers included in the training data.

67 Table 5.3 Confusion matrix and stats for model trained for 3 classes.

Confussion Matrix Summary classes samples correct incorrect acc HSTH 1667 1625 42 97.48% HSTL 1667 1605 62 96.28% NH 1667 1132 535 67.91% HS 3334 3230 104 96.88% Totals 5001 4362 639 87.22%

Performance Stats precision recall f1-score support HSTH 0.87 0.97 0.92 1667 HSTL 0.84 0.96 0.9 1667 NH 0.92 0.68 0.78 1667 HS 0.86 0.97 0.91 3334 avg/total 0.88 0.87 0.87 5001

the hotspot. We can see that in this case, we can detect hotspots in layer 10 using only 3

lowers metal layers with an accuracy of 88%. This analysis can be done because the model

considers multilayer input data, and can model the cumulative effect of the CMP process.

5.6.3.2 Test Full Layout

After training the model, it can be used to test a full layout. Using the same code structure

created to test and validate the model, a complete layout was cropped in 500 500 pixels ⇥ windows and tested. Using a custom Python script, a Calibre RDB file was created containing

the location of the hotspots found. This enables annotation of the HS on the layout for

any and all layers. Figure 5.14 illustrates results from checking a small layout for 10 metal

layers. The detail shows hotspots for layer 10. An existing CMP modeling deck using state-

of-the-art techniques is run in the same computer where the proposed model was run. The

runtime for the proposed model is approximate 10x faster.

68 Figure 5.13 Model accuracy for the number of lower layer used to detect a HS.

Table 5.4 Model accuracy vs. number of layers included in the training data.

# Layers Accuracy L1 0.7907489 L2 0.88908118 L3 0.88554122 L4 0.88333858 L5 0.88398438 L6 0.88078125 L7 0.86539063 L8 0.89238515 L9 0.87861863 L10 0.91134361

69 Figure 5.14 Full Layout test using multilayer CMP hotspots model.

70 5.7 Summary

This chapter presented a unique and powerful new approach to model and detect CMP

multilayer hotspots for full-chip multilayer layouts. This approach uses a deep learning

algorithm. The algorithm is implemented using a multilayer convolutional neural network

(CNN). The model consists of convolutional layers for automatic feature extraction enabling

the flow to be independent of the technology node. Since the model was not created with geometric shapes from the layouts it can learn and capture effects that go beyond

physical parameters; these effects can be discovered from previous technologies with

transfer learning.

Although in the results presented the model is trained using CMP simulation results

from the existing models and is validated using different test designs and calibrated CMP

simulation tools, it can also be trained with topography measurement data. The proposed

method can predict CMP hotspots with an accuracy of up to 98% with up to 10 metal layers.

The inference time to test a full chip is up to 10x faster than existing CMP tools.

71 CHAPTER

6

POWER, PERFORMANCE AND AREA

MODELING

6.1 Introduction

This chapters presents how we model power, performance and area (PPA) for a core, memo-

ries and intellectual property (IP) blocks using machine learning techniques. It also includes

an approach with transfer learning to do PPA predictions of new designs and operating

corners conditions.

When creating the PPA models, we focus on the number of samples we need to collect

to create the models. We do so because data generation is a highly time-consuming process.

We have a sampling strategy to choose the data points smartly and reuse existing data by

72 re-sampling them.

6.2 Power, Performance and Area

Power, performance, and area (PPA) is needed in all the design stages of a system on a chip

(SoC), but it is known after a months-long process. To accelerate the PPA estimation, we

are using machine learning data-driven models to expedite the process. With a data-driven

model, we can learn from previous chip designs to help accelerate this process even more.

The power has two main components; leakage and dynamic power. The leakage power

is a static power mainly generated by the sub-threshold operation, reverse drain diode, and

gate-induced drain currents. The dynamic power depends on the transitions (charging and

discharging transistor and net capacitance) and is directly affected by how fast the system

operates.

The performance determines how fast the chip can operate; it is measured in terms

of the clock period. The performance is determined by the difference between the clock

period and the worst slack. The area of the design has a direct impact on the fabrication

cost [29]. Increasing the system speed will increase its area. The trade-off between power, performance, and area can be complex and significantly impact the final product.

When designing a chip, we want to optimize the PPA, but as mentioned before, the

process to estimate it is time expensive. We have to run electronic design automation tools

that can take hours for a small design. Figure 6.1 power estimates based on standard-cell

netlists for the same RTL code of a neural network accelerator varying synthesis settings.

This figure illustrates how much variation we can have for performance and area in a small

design. In this design, a single synthesis run can take more than an hour and the PPA

can be known after the synthesis is done. We are not proposing to optimize the PPA in

this work but we are suggesting to create fast and accurate models. These models should

allow the designer to explore different settings and optimization techniques. If we replace

73 time-consuming design stages with fast PPA prediction models, we can optimize quickly.

Figure 6.1 Area vs. Performance variations for a small IP block design.

6.3 PPA Modeling with Machine Learning

In this work, we propose to use machine learning to predict the PPA. Since gathering data

is costly, we focus on modeling algorithms that require a small number of samples. We

explored different ML algorithms like neural networks, decision trees, and surrogate models.

As we will explain later in this chapter, we get better results with gradient boost regression

(GB) and neural networks (NN). For this reason, we focus on GB and NN. We also implement

a shared weights strategy with a neural network to transfer what the model learns from a

design to new designs.

74 6.3.1 Gradient Boost Regressor

Gradient boosting can work in both regression and classification problems. It creates a final

prediction based on an ensemble of weak predictions. Usually, the weak predictions are

in the form of decision trees [44]. In machine learning algorithms we need to optimize a

loss function L y, F and store the weights from a learning kernel h (x ,). The gradient

boost algorithm suggests to create a new kernel function ↵h (xi ,) parallel to the negative

gradient gm (xi ) [14, 44]. This gradient is given by,

@ L yi , F (xi ) gm (xi ) = (6.1) @ F (xi ) ñ ôF (x )=Fm 1(x ).

As the new kernel function and the loss function are highly correlated, we can reduce

the loss optimization problem to a least-squares minimization problem:

N 2 (m ,↵m ) = argmin gm (xi ) ↵h (xi ,) . (6.2) ,↵ i =1 X⇥ ⇤ Finally, we can approximate the next update in the regression target function to be

F (xm )=Fm 1 (x ) + m h (x ,↵m ). (6.2)

A very efficient scalable implementation of gradient boost is XGBoost [7]. XGBoost uses a more accurate approximation of the gradients. Instead of the second partial derivatives

of the loss function, it uses second-order gradients. In this work, we assess both a standard

gradient boost implementation and XGBoost. XGBoost parsity-aware algorithm for sparse

data it tends to perform well.

6.3.2 Neural Network with Transfer Learning

A neural network is a set of connected nodes or neurons. Each connection is associated with a weight that, when trained, learns from the data. The neural network learns by setting

75 those weights to extract different representations of the input data. For an input x the output of a neuron y is a function of a weight w and a bias b :

y = a (wx+ b ). (6.2) the function a(.) is known as the activation function [45, 61]. In the NN models we are proposing we use a rectifier linear unit (ReLU) function for the activation for the hidden layers and a linear function for the output layer (see Chapter 3).

Since the learning is stored in terms of the weights, a NN can share the learning in similar problems. This means that if we train a NN for a specific problem, it is possible to share some of the layers to a problem that has similar features. We propose a model structure that consists of a base model where a set of weights are shared plus a set of layers trained for each unique problem. This will allows us to create future PPA predictions of new designs and predict the operating corners among designs. Figure 6.2 illustrate this structure.

Figure 6.2 Neural network with shared weights.

76 With the model in Figure 6.2, we can have a base model created for one SoC design

and re-train only a subset of weights for each design. This sharing will allow reducing the

number of data to develop new PPA predictions. In the same way, we can predict multiple

operation corners for a design. We base this approach on the fact that the internal design

structures are similar in a core or an IP block design. If we analyze the RTL, they will always

be combinatorial and sequential logic blocks.

6.3.3 Others ML Algorithms Evaluated in this PPA Prediction

In addition to neural network and gradient boost models, we evaluate other decision trees

and surrogate models. These algorithms are used to compare with the other models we are

proposing. As we will explain in the results, GB and NN are the best models we found.

In terms of surrogate models [49], we consider four approaches implemented in [11]. These can be considered more traditional response surface models:

• RBF Interpolant: This is a radial basis function interpolant with linear and cubic

kernels [11].

• GP Regressor: Gaussian stochastic process regressor. This algorithm treats the predic-

tion as a Gaussian process [11].

• Poly Regressor: A third-order traditional polynomial regressor [47]. In simple words, this approach tries to fit the data to a third-order polynomial.

• MARS Interpolant: A multivariate adaptive regression technique. This regressor uses

an ensemble of linear functions, which are combined as need to reduce the overall

model error [12].

For other decision trees or ensemble models, we considered another four models im-

plemented in [48]:

77 • Bagging Regressor: Creates a set of regressors trained on a subset of the training data.

The final prediction is based on the average of the group of regressors. [31].

• AdaBoost Regressor: This regression algorithm uses the principle of GB but with

individual weights adjustment[55].

• Decision Tree Regressor: A single decision tree regressor, we use up to 1,000 estimators

or trees [43].

• Random Forest Regressor: This is a random forest regressor that creates a set of the

decision trees to make a prediction [4]. We use up to 1,000 estimators or trees.

6.4 Parameters Used for the PPA Models

We can find parameters associated to each SoC design stage. Those include the architectural

design, register-transfer level (RTL) implementation, RTL synthesis and place and route

(PR). We will be focusing on modeling the power using RTL synthesis parameters and

architectural parameters. In the case of memories, we will also include memory organization

parameters.

We selected synthesis parameters that can affect the gate-level netlist:

1. Clock period (Tclk).

2. Maximum transition time at a logic node (MaxTran).

3. Maximum fan-out at a logic node (Fanout).

4. Clock uncertainty (Uncertainty).

We have parameters that can affect the RTL code, those are considered architectural

parameters. The architectural parameters are categorical parameters and the generation with data with those parameters is very limited. In this case we are only using:

78 1. Numbers of function-units.

2. L1 cache configurations.

When modeling the PPA for memories, we consider the next set of parameters that affects the memory generator. Those parameters will affect the memory organization. We can have similar memory size but arranged differently:

1. Number of addressable locations (WordDepth).

2. Number of data bits per word (8 to 256) (IOWidth).

3. Number of bit-cells that share a column mux and data-pin (Mux).

4. Number of cycles of latency due to pipelining (Latency).

5. Number of redundant columns for built-in self-repair (Redundancy).

In this work we are modeling the PPA as a four components vector. In the case of an

Intellectual Property (IP) block or a core those components are:

1. Dynamic power: In terms of the dynamic energy internal cell power, the switching

power and the clock period. Edyn=(PInternal + PSwitching) Tclk. · 2. Leakage Power: Total static leakege power (Pleak).

3. Performance: The minimum clock period that can be achieved by the design. It is

measured as the target clock period minus the worst slack.

Cpath = Tclk WorstSlack. 4. Area: Total core area, including all the cells and buffers.

For the memories we have a five components vector. This vector includes the power for the read and write states:

79 1. Leakage Power: Static Leakage Power provided for normal (active) mode and sleep

modes (Pleak).

2. Dynamic read power: In terms of combined read energy through both supplies.

ReadEnerg y = ReadEnerg yVDDIO + ReadEnerg yVDDAR.

3. Dynamic write power: In terms of combined write energy through both supplies.

WriteEnergy = WriteEnergyVDDIO+ WriteEnergyVDDAR.

4. Performance: Memory access time (Ac cessTime).

5. Area: Memory total area.

6.5 Model Creation and Data Collection Framework

We developed an interactive framework that can be used to generate the model in an

expandable and automated way. This framework supports the model creation with synthesis

parameters or architectural parameters; it also should be expandable to any IP or memory

generator. Figure 6.3 illustrates the framework operation flow.

The model generation framework can take as input a design RTL, a memory generator

or a core generator. Using a range of input parameters it creates a sampling space using

Latin hypercube (LHC) and runs the core generator, memory generator, or synthesis tool to

extract the PPA. We can train the model on the fly with those samples or store the dataset.

This framework allows the creation of the models and evaluates their performance in

an automated way. It also provides the ability to:

• Explore the model parameters.

• The number of samples used to construct the models.

• The model performance in terms of accuracy, training time an model parameters.

80 Figure 6.3 Model and data generation framework.

6.5.1 Sampling the Input Parameters

Choosing the right samples can significantly affect the model accuracy and the design space that we can explore. We create the parameters samples space using the Latin hy- percube (LHC) sampling strategy [24, 53]. LHC generates sampling space that takes into consideration previous generated samples.

Latin hypercube is a near-random technique to design experiments. Suppose we want to generate N samples of the one-dimensional input x using LHC. In that case, we partition the cumulative distribution function of x in N evenly regions and randomly choose one sample inside each area. This is a very simple but powerful strategy.

If the input x is two-dimensional, we generate an LHC space in each dimension and create a grid of N by N size to randomly pick one sample in each grid [39]. Note that the input x must be independent in each dimension. This strategy can expand to n-dimensional input. Figure 6.4 illustrates an example for 4 samples of a two-dimensional input parameter.

Figure 6.5 shows 500 samples generated using Latin hypercube. From the plot we can see how much variation we get for the four parameters. The range for the Tclk is (500 10,000ps), MaxTran (20 500ps), Uncertainty (0 200ps), and Fanout (4 50).

81 Figure 6.4 Latin hypercube samples for N = 4 samples for a 2 d parameter.

Figure 6.5 Lating hypercube 500 samples for the 4 synthesis parameters.

82 6.5.2 Re-sampling Existing Dataset

Since dataset generation is a highly time-consuming process, we need to partition an

existing dataset if we want to analyze the model performance for a small number of samples.

It would take many more years to analyze the impact of varying the number of samples if

those samples had to be regenerated continually.

Taking samples randomly from the existing dataset does not guarantee that we cover the

entire sampling space and the range of variations in the parameters. Instead of randomly

taking samples from the initial dataset, we propose generating a new Latin hypercube

sampling space (LHS). To do so, we follow the Algorithm 2. This algorithm generates a new

LHS and, for each new sample, finds which point in the existing dataset is closer to it.

Algorithm 2 Re-sampling Existing Dataset 1: W Read existing dataset 2: M Number in the existing dataset 3: N Number of new samples < M 4: LHS New LHS of N samples 5: for s in LHS do 6: i 0 7: for k in W do 8: Compute the euclidean distance d s k 9: D [i ] d k k 10: i i + 1 11: j index i where min(D ) 12: Append W [i ] to new dataset 13: Remove the sample W (j ) from W

Figure 6.6 shows the parameters re-sampled within the same range in 6.5. We can see

that we still have a significant amount of variance in those parameters but reduced the

number of samples to 50. We can also see that both sampling spaces share similar statistics.

The mean and standard deviation in both cases are comparable.

We analyze how the area and the performance vary from a design after resizing the

83 Figure 6.6 Re-sampled parameters using the algorithm described to reduce the dataset to 50 samples.

84 dataset by comparing Figures 6.7 and 6.8. After reducing the dataset size by 10 with the proposed re-sampling method, we still cover most of the design space. This result validates using this approach; we can start with an extensive dataset and reduce its size to analyze how the models perform for a small number of samples.

Figure 6.7 Area vs. Performance variations for the dataset with 500 samples for 5 corner conditions.

Figure 6.8 Area vs. Performance variations for the dataset with 50 samples for 5 corner conditions.

85 6.6 Results

In this section, we present some results of the PPA models we created for multiple designs.

We analyze up to six different designs and a memory generator. We examine the accuracy

of the machine learning models and explain the process to create them. We also include

some analysis on the number of samples needed to develop an accurate PPA estimator.

This section also presents the proposed transfer learning approach to predict PPA for

different designs and corners conditions. For transfer learning, we analyze how sharing weights affect model accuracy.

6.6.1 The Datasets

We created a dataset consisting of parameterized cores and IP blocks (logic datasets). This

dataset is used to train and validate the proposed modeling approaches. The dataset also

includes memory macros from a memory generator (SRAM data). We do not have the L1

caches in the cores; that is the main reason to include separate SRAM data.

The logic datasets consist of RTL code and Synthesis-only results for power, perfor-

mance, and area for several Rocket Core Configurations (L1 Cache not included) and one

other design (OpenRISC 1200). This dataset also includes a convolutional neural network

(CNN) accelerator IP block. The Rocket is an open-source parameterized core generator;

it generates synthesizable RTL [1]. The OpenRISC 1200 [46] is another open-source core non-parameterized; we only include a configuration for this core.

Table 6.1 has a summary of the designs we include in the dataset. Each Rocket core

configuration contains different parameters as the number of functional units and cache

organization. We used a commercial 22nm technology with a reference methodology for

synthesis.

We can see significant variations in the PPA for the Rocket configurations. Figure 6.9

shows the area as a function of the performance for the five configurations; we can see the

86 Table 6.1 Designs in the dataset.

Design Samples Description Rocket Default 500 1 big core Rocket Medium 500 1 medium core Rocket Dual 500 2 big cores Rocket Small 500 1 small core Rocket Tiny 500 1 tiny core OR1200 1,000 OpenRisc CPU CNN 500 cnn accelerator SRAM 8,248 Total bits ranging from 2K to 99K, 1 port.

difference up to 3x . We can also see variations for the leakage power and the area in Figure

6.10. As expected, leakage power is highly correlated with the area.

Figure 6.9 Area vs. performance for the Rocket core variations.

The dataset also contains power, performance, and area under five unique corner conditions. Figure 6.11 shows how the area and the performance vary for all the corners conditions. We can see the variations in the corners conditions, especially for the low clock periods. Figure 6.12 shows that the variations are consistent for the dynamic energy.

As mentioned before, the dataset contains the PPA for multiple memory configurations.

87 Figure 6.10 Area vs. Leakage for the Rocket core variations.

Figure 6.11 Area vs. performance for OR1200 for multiple corners conditions.

88 Figure 6.12 Area vs. dynamic power for OR1200 for multiple corners conditions.

From 6.13 and 6.14 we can see that there are significant variations in the access time and read energy when we rearrange the word depth differently. This result is consistent with the rest of the PPA components. In appendix B there are additional results for the rest of the PPA components.

Figure 6.13 Access vs. Word depth time for memories.

By the variations we can find when we try to correlate the PPA metrics, we can see how

89 Figure 6.14 Dynamic read energy vs. Word depth for memories.

important it is to have a fast and accurate PPA model. The value of this dataset goes beyond

the scope of our work. Given the complexity and the time needed to generate a dataset like

this, it can be used for future research projects.

6.6.2 Comparing ML Models

We created the PPA models with the logic cores and IP block and the SRAM using the

previous generated data. Initially, those models were created using 500 samples, of which

80% was used to train and validate the models. The model was tested with the remaining

20% for a total of 100 test points.

We evaluate each model base on the overall accuracy given in terms of percentage of

the mean absolute error (MAE):

N 100 yi yˆi MAE = abs . (6.2) N yi i =1 X Å ã We also get the standard deviation of the absolute error. This allows us to evaluate the time

required to train the models and to test new samples. We measure those times relative to

the shortest time. We choose to do this instead of iterations because each model approach

90 has a different target function. We analyze the accuracy by PPA component and the overall accuracy.

The input to each model is a vector with the parameters previously described. Instead of having a single model to predict the PPA output vector, we decided to include a model per component. For the results in this section, the input to the model is normalized to have a zero mean and unit variance. We do not perform any normalization or scaling to the output except to the leakage power parameters when they are too small. In the case of the

SRAM models, we normalized the PPA outputs to be a value between zero and one.

The neural network model used for of four layers of size [128,256,256,256] for a total number of parameters of 165,505 trainable parameters. We adjust the network size to the training and validation losses to converge and reduce any possible overfitting and underfitting.

Table 6.2 shows the models results for the Rocket core default configuration. This ta- ble shows that the XGBoost model achieves the highest overall accuracy with the lowest standard deviation. In general, the boosting algorithm performs especially well in terms of accuracy. If we analyze the training and testing time, the XGB compares well to the smallest training and test time for the decision tree regressor (DTR). The absolute training times range from a few milliseconds to five minutes, and the testing time per 100 points is in the fractions of microseconds. The neural network, as expected, has the highest training, highest testing cost and lower accuracy.

In table 6.2 and in the rest of this section, we use the following naming convention:

RBFCubic is a radial basis function regressors with a cubic kernel, RBFLinear with a linear kernel, and Poly with a polynomial kernel. GP Gaussian process regressor and MARS is a

Multivariate adaptive regression model. RF refers to the random forest regressor, BR to the bagging regressor, and AB to the AdaBoost regression model. GB stands for gradient boost regression model, NN for neural network and XGB for the modified version of gradient boost XGBoost. Finally, DTR is the decision tree regression model.

91 Table 6.2 Models evaluation for Rocket Core Default configuration.

Accuracy (%) = 100% - MAE Relative Time / Size Model Std Dev Dynamic Leakage Critical Area PPA Train Test Size Pwr Pwr Path RBFCubic 96.26 95.10 99.06 97.48 96.97 3.63 16 7 153 RBFLinear 97.58 95.59 98.77 97.68 97.40 2.91 6 2 153 GP 91.77 91.44 95.88 91.99 92.77 7.62 726 4 76 Poly 96.74 92.80 98.40 96.40 96.09 3.31 2 4 1 MARS 97.40 94.80 99.24 97.58 97.26 3.84 47 3 1 RF 97.87 96.99 99.15 98.43 98.11 2.38 906 333 1,886 BR 97.86 96.95 99.16 98.43 98.10 2.39 924 412 1,893 AB 97.56 88.96 98.33 95.73 95.14 2.99 159 83 17 GB 97.58 96.89 99.17 98.47 98.03 2.25 217 10 73 NN 97.59 96.55 99.11 97.71 97.74 2.46 69,009 1,354 162 XGB 97.84 97.15 99.21 98.72 98.23 1.99 20 2 3 DTR 97.18 95.81 98.80 97.83 97.40 3.11 1 1 3

When analyzing the training and testing time, we need to emphasize that we are com-

peting with tools that take hours to generate a single sample for the testing time. Also, the

training time is required only once per design. In Table 6.2 the training time goes from 5

minutes to as low as less than 5 milliseconds.

Table 6.3 shows the model results for another Rocket core configuration (small con-

figuration). These results are consistent with what we saw for the default configuration.

Another conclusion we can gather from these results and the previous table is that the sur-

rogate models are significantly less accurate than the boost and decision trees algorithms.

They also have a higher standard deviation, meaning they will have more dispersion in the

prediction errors.

Table 6.4 shows additional results for the OpenRisc 1200 core. For this design, we still

get the highest accuracy for the boost and decision trees algorithms. In this design, we get

better accuracy for the neural network. This is mostly due to the range of output values;

thus, when we focus on optimizing the NN for transfer learning, we scale the output values

to one. In the appendix B we show additional results for others designs. Those results are

consistent with what we show in this section.

By using the five parameters previously described, we created the PPA models for the

92 Table 6.3 Models evaluation for Rocket Core Small configuration.

Accuracy (%) = 100% - MAE Relative Time / Size Model Std Dev Dynamic Leakage Critical Area PPA Train Test Size Pwr Pwr Path RBFCubic 95.94 95.02 98.69 97.47 96.78 3.15 8 8 153 RBFLinear 96.12 95.28 98.50 97.52 96.85 2.89 5 2 153 GP 91.98 92.18 95.89 92.91 93.24 8.33 694 4 76 Poly 96.04 93.41 98.13 96.49 96.02 3.31 1 4 1 MARS 96.01 95.30 98.85 97.63 96.95 3.51 66 3 1 RF 96.63 97.00 99.11 98.54 97.82 2.99 749 325 1,886 BR 96.64 97.01 99.12 98.54 97.83 2.98 793 426 1,891 AB 96.53 93.96 98.18 96.88 96.39 2.72 32 18 5 GB 96.17 96.91 99.13 98.56 97.69 2.54 181 10 73 NN 96.61 96.65 98.73 97.91 97.47 2.54 56,208 1,344 162 XGB 96.78 97.21 99.21 98.83 98.01 2.38 17 3 3 DTR 95.89 96.38 98.79 98.13 97.30 3.44 1 1 3

Table 6.4 Models evaluation for OpenRisc 1200 core.

Accuracy (%) = 100% - MAE Relative Time / Size Model Std Dev Dynamic Leakage Critical Area PPA Train Test Size Pwr Pwr Path RBFCubic 97.25 94.49 98.45 97.39 96.90 3.84 24 33 313 RBFLinear 97.56 94.84 98.60 97.48 97.12 3.29 14 4 313 GP 94.31 90.83 95.93 92.73 93.45 7.03 2,386 11 156 Poly 96.89 92.37 98.39 96.23 95.97 3.42 1 3 1 MARS 97.97 95.70 98.70 97.99 97.59 2.56 54 2 1 RF 98.49 98.31 98.94 99.20 98.73 1.74 781 374 1,942 BR 98.49 98.30 98.94 99.18 98.73 1.74 806 451 1,946 AB 96.73 86.15 97.43 93.41 93.43 3.56 254 209 24 GB 98.34 98.23 98.88 99.20 98.66 1.47 150 14 38 NN 98.23 98.01 98.85 98.99 98.52 1.60 58,863 1,163 84 XGB 98.48 97.99 98.95 99.04 98.62 1.59 19 3 2 DTR 97.83 97.34 98.53 98.73 98.11 2.72 2 1 3

93 one port SRAM. This memory goes from 2K to 99K bits. Same as for the logic cores, we used the total samples in the dataset (8,248) split in 80% for validation and for the remainder of testing.

Table 6.5 shows the model evaluation results for the memories. We can see that the gradient boost provides good accuracy from those results, but the best accuracy comes from the neural network. The NN also has a low standard deviation. If we compare the model error of our models to some state-of-the-art models with 3%, our best model has an error lower than 0.5%. The models here have a better fitting, but we also used more samples for the training. In training, the neural network will take longer to train but has better evaluation time than some of the tree algorithms.

Table 6.5 Models evaluation for SRAM memories.

Accuracy (%) = 100% - MAE Relative Time / Size Model Std Dev Read Write Leakage Access Area PPA Train Test Size Energy Energy Pwr Time RBFCubic 99.80 99.88 98.01 99.31 98.97 99.19 1.98 686 714 9,972 RBFLinear 99.82 99.89 98.26 99.39 99.05 99.29 1.87 627 121 9,972 Poly 99.59 99.47 93.83 97.56 97.10 97.51 2.35 4 12 5 MARS 90.34 91.69 85.82 92.37 86.88 89.42 10.71 15 6 5 RF 99.22 99.01 98.88 99.91 97.32 98.87 1.37 704 2,545 4,666 BR 99.23 99.00 98.89 99.91 97.31 98.87 1.36 763 2,907 4,667 AB 89.27 88.21 85.78 92.46 81.26 87.40 11.07 313 511 19 GB 99.48 99.36 98.54 99.48 97.98 98.97 0.96 190 92 19 NN 99.55 99.62 99.27 99.53 99.20 99.43 1.15 299,916 1,796 41 XGB 98.53 98.27 95.96 98.74 96.39 97.58 2.10 29 13 1 DTR 98.52 98.22 98.13 99.88 96.00 98.15 2.17 1 1 6

From the results in this section, we can conclude that it is possible to create accurate

PPA models. The model creation and evaluation time are significantly lower if we compare the time required by EDA tools to generate a PPA output. Our next effort was given to reduce the number of samples used to create the model.

94 6.6.3 Comparing PPA Models With the Number of Samples

After creating models to predict the power accurately, performance and area, we focused

on reducing the number of samples needed to generate the models. To achieve this goal, we analyze how the accuracy of the models is affected when we reduce the number of

training data points. Having a model with a reduced number of samples will make it useful

in practical chip design applications.

To analyze which model performs better with a reduced number of samples, we resam-

pled the dataset to extract a subset of samples from 10 to 490. The models were trained for

each subgroup and tested with the remaining samples.

Figure 6.15 shows how the accuracy for the models changes for different train samples

in the Rocket core default configuration. From this figure, we can see that the gradient boost

is the one with the best performance. At the 10 samples mark, the surrogate models achieve

an accuracy of 88.5% to 90.7%. At this mark, the gradient boost algorithms up to 95.1%.

XGB provides the best accuracy for a low number of samples. As expected with such a low

number of samples the NN gets the lowest accuracy of 79.7%. If we move to 20 training

samples the gradient boost algorithms goes up to 96.6% while the NN improves to 91.1%.

Table 6.6 shows the accuracy values for 10 to 100 samples used for training.

Table 6.6 PPA Accuracy for different train samples for Rocket Core.

Accuracy (%) Train RBFCubic RBFLinear GP Poly MARS RF BR AB GB NN XGB DTR Samples 10 88.53 91.25 84.39 85.59 90.67 92.80 92.81 94.14 94.17 79.65 95.09 94.07 20 93.01 91.85 86.63 89.32 95.26 96.32 96.31 96.15 96.16 91.08 96.55 95.61 30 95.23 94.77 91.26 93.31 96.30 96.85 96.84 96.67 96.74 93.37 96.75 96.19 40 94.67 94.20 90.62 93.03 96.82 96.92 96.91 96.48 96.96 92.85 97.01 95.75 50 95.15 94.45 91.39 93.26 96.94 96.95 96.95 96.60 97.09 94.91 97.11 96.18 60 94.82 94.79 91.81 92.42 96.43 97.12 97.13 96.62 97.31 94.87 96.70 96.36 70 95.44 95.38 91.77 94.54 97.11 97.06 97.06 96.39 96.97 94.99 97.17 96.26 80 95.39 95.86 91.31 94.42 95.45 96.96 96.98 96.00 97.05 95.78 96.71 96.32 90 95.78 96.01 91.71 94.73 97.20 97.09 97.09 96.54 97.27 95.84 97.36 96.67 100 95.32 95.82 92.10 95.04 97.46 97.18 97.17 96.51 97.30 95.79 97.36 96.80

95 Figure 6.15 Models accuracy vs. number of train samples for Rocket Default Configuration.

In Figure 6.16 we present the results for a Rocket core small configuration. The results

are consistent with the default configuration, with the XGB having the best accuracy for a

reduced number of samples.

Figure 6.16 Models accuracy vs. number of train samples for Rocket Small Configuration.

If we repeat the same analysis as before for another design, in this case, the OpenRisc

1200, we will come across the same conclusion. From Figure 6.17 we can see that the

96 XGboost provides the highest accuracy for a reduced number of samples. For 10 samples,

it achieves 95.7% overall accuracy. The surrogate models achieve an accuracy of 81.5% to

84.7%. For this design at the 10 samples mark the NN achieves 80.4%. We can verify those

results from Table 6.7.

Figure 6.17 Models accuracy vs. number of train samples for OpenRisc 1200 core.

Table 6.7 PPA Accuracy for different train samples for OR1200.

Accuracy (%) Train RBFCubic RBFLinear GP Poly MARS RF BR AB GB NN XGB DTR Samples 10 81.46 80.91 81.64 86.20 84.74 90.58 90.66 94.60 94.25 80.44 95.96 95.71 20 91.09 89.08 82.05 85.67 91.56 94.48 94.69 95.68 95.36 83.49 95.28 92.51 30 90.88 91.23 86.72 86.01 94.20 96.72 96.76 96.28 96.66 87.33 96.73 95.71 40 90.53 92.29 81.74 91.30 96.30 96.82 96.81 96.35 97.14 91.76 97.25 96.33 50 93.73 93.42 87.03 91.64 95.71 96.64 96.64 96.24 96.96 92.78 96.68 94.12 60 94.59 93.02 89.62 92.73 96.20 97.10 97.12 96.77 97.34 92.80 97.52 96.55 70 93.71 93.89 89.06 94.48 96.58 96.71 96.73 95.53 96.97 93.98 96.89 95.97 80 94.18 93.98 88.66 93.38 96.63 97.20 97.18 96.60 97.25 94.25 97.34 96.64 90 94.55 94.56 90.45 94.61 97.27 97.47 97.46 96.34 97.64 95.39 97.63 96.91 100 95.38 94.69 91.01 94.70 96.95 97.45 97.45 96.62 97.64 95.10 97.91 97.00

In Figure 6.18 we take a closer look at XGboost; the model provides the best accuracy

at a low number of samples. We can see that the PPA component that XGB predicts with

97 the lower accuracy is the leakage power; this is primarily due to the linearity in this metric.

The component with the best prediction is the performance. In general, the overall PPA is

accurate for several samples as low as 10.

Figure 6.18 Individual PPA components accuracy vs. number of train samples for XGB

The other model we are focusing on is the neural network (NN). If we analyze the NN

performance from Figure 6.19 it predicts all the PPA components with consistent accuracy.

In contrast with XGB, this accuracy is lower. To achieve an accuracy over 93% it requires at

least 30 training points. To improve this downsize, we used transfer learning, as explained

in the next section.

We also evaluate the model accuracy for the SRAM models. In this case, the models with

the better performance at low samples were the radial basis function model with a cubic

kernel (RBFCubic) and the neural network, for both models achieved a 95% accuracy with

200 and 300 samples, respectively. This is a significant positive result, considering that we

can explore this wide range of memories with a low number of pieces. Figure 6.20 illustrates

the results for all the models. We can see that the gradient boost models start to give higher

than 95% accuracies after 750 samples.

98 Figure 6.19 Individual PPA components accuracy vs. number of train samples for NN.

Figure 6.20 Models accuracy vs. number of train samples SRAM.

99 Table 6.8 shows the detail results from 10 to 3,000 samples. Note that 3,000 is only 36%

of the full dataset, and for most of the modes, we get a high accuracy at this point.

Table 6.8 PPA Accuracy for different train samples for Memories.

Accuracy (%) Train Samples RBFCubic RBFLinear Poly MARS RF BR AB GB NN XGB DTR 10 87.69 87.70 73.27 86.59 76.42 76.37 78.32 74.44 68.54 62.52 66.62 50 89.48 89.45 80.26 88.71 83.48 83.34 84.05 84.32 87.01 83.16 79.21 100 92.83 93.06 86.41 90.13 85.61 85.65 83.90 88.12 91.35 87.19 85.22 200 95.08 93.00 90.47 89.23 86.14 86.15 85.75 89.15 91.16 88.86 84.81 300 95.15 95.35 95.64 89.67 88.76 88.72 86.16 91.51 94.84 90.79 87.39 400 95.66 95.47 95.32 89.06 89.66 89.64 86.38 92.85 96.00 91.71 89.64 500 96.45 96.35 96.22 89.36 90.21 90.14 86.49 94.87 96.87 92.80 89.73 750 96.93 96.58 96.92 89.34 91.79 91.80 87.57 95.62 97.54 94.08 90.29 1000 96.82 96.73 97.05 88.79 92.93 92.90 86.95 95.76 97.65 94.18 91.85 1250 97.17 97.20 97.07 89.25 93.83 93.86 87.73 96.74 97.75 94.65 93.33 1500 97.25 97.38 97.08 89.06 93.82 93.83 86.90 96.96 97.95 94.50 93.31 1750 97.62 97.41 97.10 88.84 94.31 94.35 87.34 97.11 98.14 95.12 93.96 2000 97.43 97.91 97.28 88.77 95.57 95.58 88.55 97.56 98.62 95.22 95.21 2250 97.95 98.08 97.34 88.93 95.63 95.64 87.99 97.39 98.71 94.92 95.29 2500 98.13 98.18 97.49 89.44 95.78 95.79 87.94 97.49 98.64 95.00 95.14 2750 98.30 98.37 97.47 89.62 96.29 96.27 88.20 97.59 98.68 95.49 95.09 3000 98.42 98.50 97.49 89.75 96.56 96.56 89.19 97.97 98.82 95.62 95.49

If we analyze the neural network model from Figure 6.21, we can see that it predicts all

the components with very consistent accuracy. The PPA components accuracy converge to

the overall PPA accuracy when we increase the number of training samples.

6.6.4 Predicting Core Configurations PPA with Transfer Learning

One question we have been trying to answer is if we can learn from previous designs to

predict new designs. In this case, if we can learn from previous core configurations when we need to model the PPA for a new one. To address this problem, we previously described

a transfer learning approach based on a neural network model.

We created a base neural network model in which we trained the default rocket core

configuration. For every new configuration, we are trying to predict the PPA, and we fine-

tuned the output layers of this model with data from the design we are predicting. We use

the same approach to predict corner operating conditions.

100 Figure 6.21 Individual PPA SRAM components accuracy vs. number of train samples for NN.

The neural network model used for the four layers of size [128,256,512,256,64] for a total number of 313,089 neurons. The size of this network was chosen to avoid over

and underfitting [6]. Figure 6.22 shows that after 70 training iterations, the training and validation losses converge and reduce to a value close to zero. We normalize the input data

to zero mean and unit variance for this neural network and scale the output data to be

between zero and one.

Figure 6.22 Neural network loss vs training iterations.

101 To decide the number of parameters we will share on the network, we re-trained to

predict new core configurations for different weight sharing percentages. From Figure 6.25 we can see that for a low number of samples, the optimal share ratio is around 50%. This

means that we will share the first three layers and re-train for each new core configuration

the last two layers on our final model.

Figure 6.23 Weights Shared vs Accuracy for core configuration modeling.

From Table 6.9 sharing 3 layers means fine-tuning only 147,841. From this table, we can

see that for both core configuration and corner conditions, PPA prediction sharing 3 layers

provides accuracy in the over 96% when re-training the network with only 10 samples. To

evaluate the model accuracy we use the rest of the samples remaining in the dataset, in

this case 490. This table was created with a base model trained with 150 samples.

Figure 6.24 shows the results of predicting four core configurations with the transfer

learning model. We can see that for low as 10 samples; we get over 94% accuracy. This result

proves that the model is learning from the previous core configuration. A single neural

network model to predict core configurations was achieving only 91% accuracy (see Table

6.6). Note that the base model used in the result from Figure 6.24 is trained only with 50

102 Table 6.9 Optimal layers to share in the transfer learning neural network.

Accuracy with Re-train samples NN to Predict Core Configurations Shared Layers Share Parameters 100 50 20 10 Sharing % 5 65 96.58 96.08 95.81 95.71 99.98% 4 16,513 96.63 96.12 95.84 95.50 94.74% 3 147,841 96.85 96.05 96.04 96.14 52.88% 2 279,425 97.05 96.52 95.78 95.79 10.93% 1 313,089 97.32 96.87 95.43 95.47 0.20% NN to Predict Corners 5 65 97.28 97.38 97.00 96.62 99.98% 4 16,513 97.35 97.42 96.62 96.75 94.74% 3 147,841 97.41 97.35 96.55 96.65 52.88% 2 279,425 97.39 97.30 96.36 96.58 10.93% 1 313,089 97.45 97.33 96.42 96.23 0.20%

samples.

Figure 6.24 Accuracy for PPA models with transfer learning to predict core configurations.

In Table 6.10 we can see the results for prediction core configurations with a base model trained with 150 samples. With this base model with only 15 training points, we can predict new core configurations PPA with 95.3% accuracy and up. We can see that component is predicted with a consistent accuracy close to the average. The leakage power component is

103 the one with the least accuracy. One other point to consider is that if we compare with the

previous PPA models we get a training speed up to 8x .

The accuracy of the NN without transfer learning model with 20 samples for the Rocket

configurations was about 91% with 20 samples. While the previous neural network structure

presented in the first section of the results requires at least 40% data points to achieve 91%

accuracy. We can get close to 94% and up for the four configurations using only 5 samples with transfer learning. This result means that the model is learning from the previous

designs.

Considering that we need to generate the 150 or 50 samples only once to train the

base model and have a significantly high number of core configurations, this is a valuable

solution for predicting new designs.

Table 6.10 Transfer Learning models to predict core configurations.

Accuracy (%) Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup Tiny Configuration 5 96.16 92.46 96.95 96.36 95.48 3.1 8.1 15 96.37 93.57 95.96 96.56 95.61 3.3 7.9 Small Configuration 5 93.90 89.22 97.70 92.68 93.38 4.4 8.1 15 95.19 93.44 97.86 96.52 95.75 3.6 8.2 Medium Configuration 5 94.20 92.65 97.77 95.46 95.02 4.0 15 94.87 92.37 97.96 96.07 95.31 3.9 6.9 Dual Configuration 5 94.42 89.61 96.52 93.82 93.59 5.7 8.8 15 94.94 92.36 98.28 95.93 95.38 4.1 7.6

104 6.6.5 Predicting Corner Conditions PPA with Transfer Learning

With PPA prediction for multiple corner conditions, we can increase the possibilities of

design exploration and optimization. Using the previously generated model for one cor-

ner condition, we used the same transfer learning approach to predict four new corner

conditions.

From Figure 6.25 we can see that also predicting corners when retraining with a low

number of samples sharing 3 layers in the model provides the best accuracy. This allows us

to reuse the existing models without the need to retrain the entire model or generate new

data.

Figure 6.25 Weights Shared vs Accuracy for corners condition modeling.

Figure 6.26 shows the result of using the model with transfer learning to predict the

PPA in four corners in the OpenRisc 1200 core. In Figure 6.26 we can see that for only 15

samples, the model can predict each corner condition with over 94% accuracy. The base

model here was pre-trained with 50 data points for the corner FF_0p88V _m40C _Cmin.

Table 6.11 summarizes the results of using the base model trained for corner condition

FF_0p88V _m40C _Cmin with 50 samples. Same as before, the worst accuracy is for the

105 Figure 6.26 Accuracy for PPA models with transfer learning to predict corners conditions for OR1200.

leakage power due to its linearity. We can see that for 15 samples, we can predict the PPA with up to 95% accuracy for all four corners. With transfer learning, we reduce the training

time up to 9x for the corner conditions. Note that the area does not change for different

corner conditions.

Figure 6.27 shows the result of predicting corners for a Rocket configuration. These

results are from the base model trained for the corner FF_0p88V _m40C _Cmin from a

Rocket core configuration. We can see that starting at 10 samples, the PPA prediction for

the four corners is higher than 93%.

Finally, Table 6.12 summarizes the results for predicting the corner operating conditions

for the rocket model. With 15 samples, we can predict the PPA for all the conditions with up

to 95% accuracy and reduce the training time by 8x . The result from this design is consistent with what we saw in the OpenRisc core. Same as for the core configurations predictions with

only 5 samples, we improve the NN accuracy without transfer learning. With 5 training data

points, the transfer learning model achieves accuracy over 92%, with 15 samples over 95%.

This result means that the model is learning the new corners from the previous corners

conditions. In Appendix B we include additional results for others designs that align with

106 Table 6.11 Transfer Learning models to predict Corners for OR1200 results.

Accuracy (%) Samples Dynamic Leakage Critical Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup

SS0p 72Vm 40C C max 5 94.24 91.03 95.93 91.76 93.24 5.1 8.2 15 96.06 91.84 96.81 95.54 95.06 4.6 9.3

SS0p 72V125C C max 5 92.94 91.05 91.97 91.76 91.93 6.3 7.5 15 95.27 92.40 93.71 95.54 94.23 5.5 6.8

SS0p 72V125C R cmax 5 94.40 91.09 92.30 91.76 92.39 5.8 9.0 15 95.26 91.63 94.12 95.54 94.14 5.4 8.4

SS0p 72Vm 40C C max 5 94.54 88.41 92.47 91.76 91.79 6.5 7.8 15 96.26 90.74 93.68 95.54 94.05 5.7 8.0

Figure 6.27 Accuracy for PPA models with transfer learning to predict corners conditions for Rocket.

107 the ones described in this section.

Table 6.12 Transfer Learning models to predict Corners for Rocket results.

Accuracy (%) Samples Dynamic Leakage Critical Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup

SS0p 72Vm 40C C max 5 93.53 90.67 94.10 94.07 93.09 5.7 8.5 15 94.47 92.91 98.15 95.66 95.30 4.1 8.3

SS0p 72V125C C max 5 94.88 91.01 87.95 94.07 91.98 7.2 6.4 15 94.95 93.15 97.41 95.66 95.29 4.6 7.2

SS0p 72V125C R cmax 5 92.82 90.79 88.99 94.07 91.67 6.9 8.4 15 93.85 93.00 97.42 95.66 94.98 4.2 7.9

SS0p 72Vm 40C C max 5 94.39 89.85 90.24 94.07 92.14 6.8 6.5 15 93.29 92.89 97.42 95.66 94.81 4.3 6.7

6.7 Summary

This chapter presented a machine learning based approach to model the power perfor-

mance and area for a system on a chip, memories and an intellectual property block. The

proposed models can predict the PPA quickly and with high accuracy. We evaluated different

model techniques getting better results with gradient boost models and neural networks.

Using different designs to test the model, we could get accuracies up to 99%. When con-

straining the number of samples used, we do the PPA prediction with 96% accuracy training

the model with only 10 data points.

This chapter also showed an approach to model PPA of new designs and operating

corners conditions using a neural network with shared weights. This transfer learning

108 approach allows learning from previous cores and designs to estimate the PPA in new ones

using a low number of data points. With only 10 to 15 new samples, we could model the PPA

sharing weights from previous models. With 5 to 15 samples, predict all core configurations with 93.6% to 95.8% accuracies. In the case of predicting corner operating conditions with 91.7% to 95.3% accuracies. These new models improved the accuracy in base neural

network models, demonstrating that the model acquired some learning from the earlier

data. Models like these can be included with a core generator for help with parameters

optimization and exploration.

Given that generating data for design exploration to optimize and model PPA is very

expensive, the value of the work presented in this chapter is valuable and suitable for

practical chips designs applications.

109 CHAPTER

7

CONCLUSION AND FUTURE WORK

7.1 Conclusions

In this work, we applied machine learning to three of the most relevant problems we

encounter in electronic design automation when designing integrated circuits: desing rule

checking, hotspots detection, and power, performance, and area modeling. Note that the

list of contributions of this research was listed in the introduction chapter 1.

In chapter 4 we presented a deep learning approach to design rule checking. The ap-

proach uses a parameterized synthetic dataset generator to generate training data quickly.

We included transfer learning to add new design rule checks. This approach allows adding

new design rule violations fast and straightforward; it also can be expanded to a new tech-

nology node. We showed the results of using this model in a completely new layout with a

110 detection rate of 96.4% to 100%, depending on how complex the rule is. The false alarm

rate is as low as 5.3% in the worst-case scenario. The proposed DRC checker can identify

and locate violations in small action windows 7.5x is faster than conventional checkers.

This solution to DRC provides an alternative while iterating on the design. From work in

chapter 4, we extract four of the contributions in this dissertation.

The other problem we were addressing is chemical mechanical hotspot detection, pre-

sented in chapter 5. The proposed approach uses a deep learning model and uses multilayer

input data to capture the cumulative effect of CMP.The model can detect hotspots in new

chips with 10 metal layers up to 98% accurate. It can do the inference 10x faster. A key

benefit of this DL approach is that it enables the detection of hotspots that may escape

traditional approaches based on design rules or restrictive semi-empirical or physics-based

modeling. The work in chapter 5 results in one of the main contributions of this research.

In chapter 6, we presented a machine learning approach to power, performance, and

area. The proposed PPA models are focused on using gradient boost models and neural

networks. We were able to create PPA models with accuracies better than 99%. Those

models can create synthesis PPA predictions in fractions of seconds in contrast with tools

that can take hours or days. We focused on reducing the number of samples needed to

create the models and could achieve high accuracies of over 96% with only 10 data points.

Our approach also included transfer learning to learn from previous design and predict

the PPA for new core configurations and IP blocks corner conditions. These models are

precious when doing design exploration and PPA optimization. This work contributed to

four of the contributions in this dissertation.

7.2 Future Work

This work contributed to establishing a solid base to use machine learning for design rule

checking, chemical mechanical polishing hotspots detection, and fast and accurate power,

111 performance, and area modeling. However, some aspects can be improved and can help extend this work. Next, we list some ideas on how this can be done for each specific topic:

1. Design Rule Checking

• We demonstrated that this is scalable to any number of rules part of future work

can include transfer the design rule checking framework to another process

design kit.

• Like the work done for CMP hotspots, work in a model that can take in a single

input all the layers in a layout. For this, it will be needed to create a multilayer

deep learning model.

• The inference time can be improved if the framework implementation is opti-

mized.

2. Chemical Mechanical Polishing Hotspots Detection

• The logical extension of this approach may be used to model mixed effects

which, when combined, predict a hotspot that either separately might have

missed. For example, how CMP hotspots may affect lithography hotspots.

• Explore the dual effect of CMP dishing and shallow lithographic process depth

of focus.

• Explore a transfer learning approach and translate this model to a new process

node.

3. Power, Performance and Area Modeling

• Use the transfer learning approach to create PPA predictions of future physical

design stages. With the transfer learning and the existing models, using a reduced

number of samples for the global route, the PPA in this stage could be predicted.

The same can be done for the detailed route.

112 • Use the existing models to create a framework to optimize the PPA of a system-

on-chip, including cores, memories, and accelerators. The models can substitute

the time consuming runs of the synthesis tools.

• Expand the number of parameters used for core configurations to create a more

generic core generator optimization and design exploration framework.

113 BIBLIOGRAPHY

[1] Asanovic,´ K. et al. The Rocket Chip Generator. Tech. rep. UCB/EECS-2016-17. EECS Department, University of California, Berkeley, 2016.

[2] Bhanushali, K. & Davis, W. R. “FreePDK15: An Open-Source Predictive Process Design Kit for 15Nm FinFET Technology”. Proceedings of the 2015 Symposium on Interna- tional Symposium on Physical Design. ISPD ’15. Monterey, California, USA: ACM, 2015, pp. 165–170.

[3] Bhanushali, K. N. “Design Rule Development for FreePDK15 : An Open Source Pre- dictive Process Design Kit for 15nm FinFET Devices” (2014).

[4] Biau, G. “Analysis of a Random Forests Model”. 13.null (2012), 1063–1095.

[5] Buurma, J. et al. “OpenDFM Bridging the Gap Between DRC and DFM”. IEEE Design Test of Computers 29.6 (2012), pp. 84–90.

[6] Caruana, R. et al. “Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping”. Proceedings of the 13th International Conference on Neural Information Processing Systems. NIPS’00. Denver, CO: MIT Press, 2000, 381–387.

[7] Chen, T. & Guestrin, C. “XGBoost: A Scalable Tree Boosting System”. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16. San Francisco, California, USA: ACM, 2016, pp. 785–794.

[8] Chollet, F. Deep learning with python. en. New York, NY: Manning Publications, 2017.

[9] Dai, V.et al. “Developing DRC Plus Rules through 2D Pattern Extraction and Clustering Techniques”. Proc SPIE 7275 (2009).

[10] Design Rule Complexity Rising. http://www.semiengineering.com/design- rule-complexity-rising. (Accessed: March 2021). 2018.

[11] Eriksson, D. et al. “pySOT and POAP: An event-driven asynchronous framework for surrogate optimization”. arXiv preprint arXiv:1908.00420 (2019).

[12] Everingham, Y. & Sexton, J. “An introduction to Multivariate Adaptive Regression Splines for the cane industry”. 2011.

[13] FreePDK15. https://www.eda.ncsu.edu/wiki/FreePDK15:Contents. 2020.

[14] Friedman, J. H. “Greedy function approximation: A gradient boostingmachine.” The Annals of Statistics 29.5 (2001), pp. 1189 –1232.

114 [15] Ghaida, R. S. et al. “A Framework for Double Patterning-Enabled Design”. Proceedings of the International Conference on Computer-Aided Design. ICCAD ’11. San Jose, CA, USA: IEEE Press, 2011, 14–20.

[16] Ghulghazaryan, R. & Wilson, J. “Application of Machine Learning and Neural Net- works for Generation of Pre-CMP Profiles of Advanced Deposition Processes for CMP Modeling”. ICPT 2017; International Conference on Planarization/CMP Technology. 2017, pp. 1–6.

[17] Ghulghazaryan, R. et al. “FEOL CMP modeling: Progress and challenges”. 2015 Inter- national Conference on Planarization/CMP Technology (ICPT). 2015, pp. 1–4.

[18] Goodfellow, I. et al. Deep Learning. MIT Press, 2016.

[19] Greathouse, J. L. & Loh, G. H. “Machine Learning for Performance and Power Mod- eling of Heterogeneous Systems”. 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 2018, pp. 1–6.

[20] Hinton, G. Neural Networks for Machine Learning. http://www.cs.toronto.edu/ ~tijmen/csc321/slides/lecture_slides_lec6.pdf. (Accessed: March 2021). 2014.

[21] Huang, H. et al. “New CMP processes development and challenges for 7nm and beyond”. 2018 China Semiconductor Technology International Conference (CSTIC). 2018, pp. 1–5.

[22] Hui, C. et al. “Hotspot detection and design recommendation using silicon calibrated CMP model”. Proc.SPIE 7275 (2009), pp. 7275 –7275 –12.

[23] Hung, W.-T. et al. “Transforming Global Routing Report into DRC Violation Map with Convolutional Neural Network”. Proceedings of the 2020 International Symposium on Physical Design. ISPD ’20. Taipei, Taiwan: Association for Computing Machinery, 2020, 57–64.

[24] Iman, R. “Latin Hypercube Sampling” (1999).

[25] Islam, R. & Shahjalal, M. A. “Soft Voting-Based Ensemble Approach to Predict Early Stage DRC Violations”. 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS). 2019, pp. 1081–1084.

[26] Islam, R. & Shahjalal, M. A. “Predicting DRC Violations Using Ensemble Random Forest Algorithm”. Proceedings of the 56th Annual Design Automation Conference 2019. DAC ’19. Las Vegas, NV, USA: Association for Computing Machinery, 2019.

[27] Katakamsetty, U. et al. “20nm CMP model calibration with optimized metrology data and CMP model applications”. Vol. 9427. 2015, pp. 9427 –9427 –9.

115 [28] Katakamsetty, U. et al. “Cutting-edge CMP modeling for front-end-of-line (FEOL) and full stack hotspot detection for advanced technologies”. Proc.SPIE 10148 (2017), pp. 10148 –10148 –8.

[29] Katz, R. H. http://bnrg.cs.berkeley.edu/~randy/Courses/CS252.S96/ Lecture05.pdf.

[30] Khailany, B. et al. “Accelerating Chip Design With Machine Learning”. IEEE Micro 40.6 (2020), pp. 23–32.

[31] Kotsiantis, S. et al. “Bagged Averaging of Regression Models”. Vol. 204. 2006, pp. 53–60.

[32] Krishnan, M. et al. “Chemical Mechanical Planarization: Slurry Chemistry, Materials, and Mechanisms”. Chemical Reviews 110.1 (2010). PMID: 19928828, pp. 178–204. eprint: https://doi.org/10.1021/cr900170z.

[33] Kwon, J. & Carloni, L. P.“Transfer Learning for Design-Space Exploration with High- Level Synthesis”. Proceedings of the 2020 ACM/IEEE Workshop on Machine Learning for CAD. MLCAD ’20. Virtual Event, Iceland: Association for Computing Machinery, 2020, 163–168.

[34] Last, F. & Schlichtmann, U. “Partial Sharing Neural Networks for Multi-Target Re- gression on Power and Performance of Embedded Memories”. Proceedings of the 2020 ACM/IEEE Workshop on Machine Learning for CAD. MLCAD ’20. Virtual Event, Iceland: Association for Computing Machinery, 2020, 123–128.

[35] Last, F.et al. “Predicting Memory Compiler Performance Outputs Using Feed-Forward Neural Networks”. ACM Trans. Des. Autom. Electron. Syst. 25.5 (2020).

[36] Lee, W. et al. “PowerTrain: A learning-based calibration of McPAT power models”. 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). 2015, pp. 189–194.

[37] Li, B. & Franzon, P.D. “Machine learning in physical design”. 2016 IEEE 25th Confer- ence on Electrical Performance Of Electronic Packaging And Systems (EPEPS). 2016, pp. 147–150.

[38] Li, S. et al. “McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures”. 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 2009, pp. 469–480.

[39] Li, X. https://users.ece.cmu.edu/~xinli/classes/cmu_18660/Lec01. pdf.

[40] Li, Z. & Hoiem, D. “Learning Without Forgetting”. Vol. 9908. 2016, pp. 614–629.

[41] Liang, R. et al. “DRC Hotspot Prediction at Sub-10nm Process Nodes Using Cus- tomized Convolutional Network”. Proceedings of the 2020 International Symposium

116 on Physical Design. ISPD ’20. Taipei, Taiwan: Association for Computing Machinery, 2020, 135–142.

[42] Lin, Z. et al. “HL-Pow: A Learning-Based Power Modeling Framework for High-Level Synthesis”. 2020 25th Asia and South Pacific Design Automation Conference (ASP- DAC). 2020, pp. 574–580.

[43] Loh, W.-Y. “Classification and Regression Trees”. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1 (2011), pp. 14 –23.

[44] Natekin, A. & Knoll, A. “Gradient Boosting Machines, A Tutorial”. Frontiers in neuro- robotics 7 (2013), p. 21.

[45] Nwankpa, C. et al. “Activation Functions: Comparison of trends in Practice and Research for Deep Learning” (2020).

[46] OpenRisc 1200 HP, Hyper Pipelined OR1200 Core. https://opencores.org/ projects/or1200_hp. (Accessed: March 2021).

[47] Ostertagova, E. “Modelling Using Polynomial Regression”. Procedia Engineering 48 (2012), 500–506.

[48] Pedregosa, F.et al. “Scikit-learn: Machine learning in Python”. Journal of machine learning research 12.Oct (2011), pp. 2825–2830.

[49] Qi, W. “IC Design Analysis, Optimization and Reuse via Machine Learning” (2017).

[50] Reddy, G. R. et al. “Machine Learning-Based Hotspot Detection: Fallacies, Pitfalls and Marching Orders”. 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 2019, pp. 1–8.

[51] Reducing Power At RTL. https://semiengineering.com/reducing-power- at-the-rtl-level. 2020.

[52] Sarwar, S. S. et al. “Incremental Learning in Deep Convolutional Neural Networks Using Partial Network Sharing”. IEEE Access 8 (2020), pp. 4615–4628.

[53] Shields, M. D. & Zhang, J. “The generalization of Latin hypercube sampling”. Reliabil- ity Engineering and System Safety 148 (2016), pp. 96–108.

[54] Shin, M. & Lee, J.-H. “Accurate lithography hotspot detection using deep convolu- tional neural networks”. Journal of Micro/Nanolithography, MEMS, and MOEMS 15.4 (2016), pp. 1 –13.

[55] Solomatine, D. & Shrestha, D. “AdaBoost.RT: A boosting algorithm for regression problems”. Vol. 2. 2004, 1163 –1168 vol.2.

117 [56] Tabrizi, A. F.et al. “Eh?Predictor: A Deep Learning Framework to Identify Detailed Routing Short Violations From a Placed Netlist”. IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems 39.6 (2020), pp. 1177–1190.

[57] Tabrizi, A. F. et al. “A Machine Learning Framework to Identify Detailed Routing Short Violations from a Placed Netlist”. Proceedings of the 55th Annual Design Au- tomation Conference. DAC ’18. San Francisco, California: Association for Computing Machinery, 2018.

[58] Tripathi, S. et al. “CMP Modeling as a part of Design for Manufacturing”. International Conference on Planarization / CMP Technology. 2007, pp. 1–6.

[59] Venkatesan, R. et al. “MAGNet: A Modular Accelerator Generator for Neural Networks”. 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 2019, pp. 1–8.

[60] Walker, M. et al. “Hardware-Validated CPU Performance and Energy Modelling”. 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2018, pp. 44–53.

[61] Wang, Q. et al. “A Comprehensive Survey of Loss Functions in Machine Learning”. Annals of Data Science (2020).

[62] Wu, G. et al. “GPGPU performance and power estimation using machine learning”. 2015 IEEE 21st International Symposium on High Performance Computer Architec- ture (HPCA). 2015, pp. 564–576.

[63] Wu, G. et al. “Improved Expected Cross Entropy Method for Text Feature Selection”. 2015 International Conference on Computer Science and Mechanical Automation (CSMA). 2015, pp. 49–54.

[64] Xie, Z. et al. “RouteNet: Routability Prediction for Mixed-Size Designs Using Convolu- tional Neural Network”. Proceedings of the International Conference on Computer- Aided Design. ICCAD ’18. San Diego, California: Association for Computing Machin- ery, 2018.

[65] Yang, H. et al. “Imbalance aware lithography hotspot detection: a deep learning approach”. Design-Process-Technology Co-optimization for Manufacturability XI. Ed. by Capodieci, L. & Cain, J. P. Vol. 10148. International Society for Optics and Photonics. SPIE, 2017, pp. 31 –46.

[66] Yang, H. et al. “Detecting Multi-Layer Layout Hotspots with Adaptive Squish Patterns”. Proceedings of the 24th Asia and South Pacific Design Automation Conference. ASP- DAC ’19. Tokyo, Japan: Association for Computing Machinery, 2019, 299–304.

118 [67] Yu, Y. et al. “Machine-Learning-Based Hotspot Detection Using Topological Clas- sification and Critical Feature Extraction”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34.3 (2015), pp. 460–470.

[68] Zeng, W. et al. “Design Rule Violation Hotspot Prediction Based on Neural Network Ensembles”. CoRR abs/1811.04151 (2018). arXiv: 1811.04151.

[69] Zeng, W. et al. “Explainable DRC Hotspot Prediction with Random Forest and SHAP Tree Explainer”. Proceedings of the 23rd Conference on Design, Automation and Test in Europe. DATE ’20. Grenoble, France: EDA Consortium, 2020, 1151–1156.

[70] Zhang, Y. et al. “GRANNITE: Graph Neural Network Inference for Transferable Power Estimation”. Proceedings of the 57th ACM/EDAC/IEEE Design Automation Confer- ence. DAC ’20. Virtual Event, USA: IEEE Press, 2020.

[71] Zhou, Y. et al. “PRIMAL: Power Inference using Machine Learning”. 2019 56th ACM / IEEE Design Automation Conference (DAC). 2019, pp. 1–6.

119 APPENDICES

120 APPENDIX

A

ADITIONAL DRC RESULTS

This appendix presents additional results for testing a new SRAM layout using the DRC

framework. It also shows sample images for some of the rules implemented.

A.1 DRC Results a Set of SRAM Layouts

This layouts has a set of SRAM from FreePDK15 replicated to create a new 16umx16um.

121 Table A.1 Results summary of testing 20 different types of violations in layouts data unseen by the model. The data consist of a new set of SRAM layouts.

Design Rule Checks Violations Clean DRC Type True Label Found Missed TPR True Label Found Missed FPR drc0 Horizontal width 659 653 6 99.0% 62 62.000 0 0.0% drc1 Horizontal Spacing 28 653 6 96.0% 604 593 11 1.8% drc2 Vertical overlap 187 27 1 96.0% 293 273 20 6.8% drc3 Vertical length 731 180 7 96.0% 294 281 13 4.4% drc4 Vertical spacing 28 704 27 96.0% 453 426 27 6.0% drc5 H/V width 1220 27 1 99.0% 33 31 2 6.1% drc6 Polygon area 763 1205 15 95.0% 91 86 5 5.5% drc7 H/V Spacing 625 724 39 98.0% 112 106 6 5.4% drc8 Space dependant on W&L 0 614 11 100.0% 576 537 39 6.8% drc9 H/V width 61 0 0 97.0% 87112.5% drc10 Polygon area 18 59 2 100.0% 24 23 1 4.2% drc11 H/V spacing and notch 0 18 0 0.0% 42 39 3 7.1% drc12 Space dependant on W&L 000100.0% 62 61 1 1.6% drc13 Via shape square 12 0 0 100.0% 83 76 7 8.4% drc14 Via shape rectangular 0 12 0 100.0% 299 271 28 9.4% drc15 Via spacing 0000.0% 512 507 5 1.0% drc16 Via inside layer 14 0 0 100.0% 57 52 5 8.8% drc17 Via enclosure horizontal 99 14 0 96.0% 244 230 14 5.7% drc18 Via enclosure vertical 54 95 4 96.0% 241 225 16 6.6% drc19 Horizontal width 653 648 5 99.0% 63 61 2 3.2% Totals 5,152 5,633 124 97.8% 4,153 3,947 206 5.0%

122 A.2 Synthetic Layout Clips Samples

Figure A.1 Clean layout clip sample 1.

123 Figure A.2 Clean layout clip sample 2.

Figure A.3 Violation layout sample 1.

124 Figure A.4 Violation layout sample 2.

Figure A.5 Two layers layout sample.

125 APPENDIX

B

ADITIONAL PPA RESULTS

This appendix shows additional results for the PPA chapter.

B.1 PPA Variations for a CNN Accelerator

126 Figure B.1 Area vs. leakage power for a CNN accelerator IP block for 5 corner conditions.

Figure B.2 Area vs. dynamic power for a CNN accelerator IP block for 5 corner conditions.

127 Figure B.3 Area vs. performance for a CNN accelerator IP block for 5 corner conditions.

B.2 Additional PPA Varitions for SRAM Dataset

Figure B.4 Area vs. word depth for memories.

128 Figure B.5 Leakage power vs. word depth for memories.

B.3 Additional PPA Models Evaluation Performance

129 Table B.1 Models evaluation for Rocket Core Dual configuration.

Accuacy (%) = 100% - MAE Relative Time / Size Model Std Dev Dynamic Leakeage Critial Area PPA Train Test Size Pwr Pwr Path RBFCubic 96.30 95.07 99.06 97.38 96.95 3.34 9 8 153 RBFLinear 97.11 95.87 98.97 97.73 97.42 2.93 6 2 153 GP 91.39 89.93 95.60 91.29 92.05 8.67 646 5 76 Poly 96.67 92.24 98.51 95.92 95.83 3.42 1 4 1 MARS 97.05 94.46 99.24 97.43 97.04 3.63 80 3 1 RF 97.59 97.20 99.15 98.63 98.14 1.59 783 327 1,887 BR 97.60 97.19 99.16 98.63 98.15 1.59 827 436 1,892 AB 97.20 88.15 98.44 94.97 94.69 4.65 387 218 48 GB 97.34 96.94 99.08 98.54 97.98 2.03 190 11 73 NN 97.16 96.39 99.03 98.23 97.70 2.59 54,784 1,305 162 XGB 97.67 97.42 99.11 98.72 98.23 1.76 20 3 4 DTR 96.56 96.43 98.80 98.31 97.52 2.45 1 1 3

Table B.2 Models evaluation for Rocket Core Medium configuration.

Accuacy (%) = 100% - MAE Relative Time / Size Model Std Dev Dynamic Leakeage Critial Area PPA Train Test Size Pwr Pwr Path RBFCubic 95.09 94.69 98.59 97.29 96.42 4.35 9 9 153 RBFLinear 96.00 95.68 98.63 97.73 97.01 3.34 5 2 153 GP 89.57 91.00 95.50 93.24 92.33 7.83 627 5 76 Poly 96.05 94.16 98.60 96.83 96.41 3.01 1 3 1 MARS 96.08 94.45 99.08 97.10 96.68 3.76 66 3 1 RF 96.48 96.35 99.13 98.26 97.55 2.34 718 320 1,884 BR 96.48 96.36 99.12 98.22 97.54 2.34 760 410 1,890 AB 96.36 92.27 98.04 96.31 95.74 2.68 251 173 41 GB 96.17 96.85 99.05 98.48 97.64 2.32 183 10 73 NN 95.76 96.85 98.82 97.93 97.34 2.77 50,171 1,279 162 XGB 96.45 96.49 99.14 98.29 97.59 2.20 16 2 3 DTR 95.87 94.92 98.73 97.41 96.73 3.15 1 1 3

130 Table B.3 Models evaluation for Rocket Core Tiny configuration.

Accuacy (%) = 100% - MAE Relative Time / Size Model Std Dev Dynamic Leakeage Critial Area PPA Train Test Size Pwr Pwr Path RBFCubic 96.26 95.20 98.77 97.59 96.95 2.67 11 7 153 RBFLinear 96.97 95.75 98.48 97.63 97.21 2.59 7 2 153 GP 92.46 92.21 95.44 92.57 93.17 7.05 711 5 76 Poly 96.43 93.46 98.26 96.41 96.14 3.03 2 3 1 MARS 97.02 94.51 99.10 97.53 97.04 3.56 63 3 1 RF 97.56 96.59 99.17 98.62 97.98 2.45 810 316 1,880 BR 97.56 96.61 99.18 98.61 97.99 2.44 868 405 1,887 AB 97.02 93.93 98.26 97.13 96.59 3.02 60 30 7 GB 97.49 97.17 99.17 98.81 98.16 1.91 195 10 73 NN 97.49 96.79 99.03 97.37 97.67 2.55 57,147 1,218 162 XGB 97.57 97.03 99.26 98.78 98.16 1.94 19 2 3 DTR 96.62 95.49 98.83 97.80 97.18 3.52 1 1 3

Table B.4 Models evaluation for CNN accelerator IP block.

Accuacy (p) = 100 - MAE Relative Time / Size Model Std Dev Dynamic Static Critial Area PPA Train Test Size Pwr Pwr Path RBFCubic 93.18 83.68 97.25 97.26 92.84 7.38 8 14 153 RBFLinear 94.17 88.22 97.26 97.36 94.25 5.96 4 4 153 GP 88.65 78.21 94.83 93.07 88.69 9.94 571 10 76 Poly 94.10 89.79 95.07 96.84 93.95 5.46 2 9 1 MARS 95.98 90.64 98.02 98.54 95.80 3.95 84 8 1 RF 96.04 89.22 98.52 99.20 95.74 4.03 674 743 1,820 BR 96.02 89.23 98.52 99.19 95.74 4.05 695 958 1,826 AB 95.31 81.40 97.97 97.51 93.05 4.18 184 161 30 GB 95.80 87.98 98.51 99.37 95.42 4.34 143 25 72 NN 93.48 92.18 98.20 98.02 95.47 5.35 50,154 1,376 162 XGB 96.16 90.03 98.48 99.27 95.98 3.78 19 3 3 DTR 94.82 86.38 98.24 98.97 94.60 6.18 1 1 3

131 B.4 Additional Results for Accuracy vs Training Samples

132 Table B.5 PPA Accuracy for different train samples for Rocket Core Medium configuration.

Accuracy (%) Train RBFCubic RBFLinear GP Poly MARS RF BR AB GB NN XGB DTR Samples 10 96.91 96.76 95.66 96.91 96.50 93.75 93.77 94.61 94.64 82.82 94.97 93.96 20 97.31 96.65 96.41 96.01 98.00 96.45 96.41 96.21 96.06 85.26 96.29 95.03 30 97.75 97.47 95.45 96.75 98.62 96.69 96.70 96.43 96.70 92.40 96.46 96.00 40 98.08 97.64 95.92 97.49 97.98 95.97 95.97 95.47 96.55 93.04 95.67 94.97 50 98.27 98.16 97.23 98.00 98.74 96.89 96.86 96.02 97.04 93.74 96.66 96.01 60 98.10 98.18 96.56 97.69 98.62 97.11 97.10 96.33 97.05 94.10 97.06 96.13 70 97.83 98.10 96.36 97.57 98.72 96.81 96.81 96.24 96.90 94.53 96.94 96.06 80 98.24 98.26 97.46 98.20 98.47 97.01 97.03 96.00 97.16 95.23 96.52 95.70 90 97.83 98.29 96.90 97.72 98.85 96.82 96.80 96.19 96.96 95.59 97.00 96.14 100 98.01 98.44 96.84 98.03 98.13 97.36 97.37 96.71 97.42 95.46 96.26 96.89 110 98.19 98.27 96.80 97.54 98.19 97.28 97.28 96.20 97.28 95.58 96.87 96.26 120 98.50 98.59 97.27 98.30 98.63 97.13 97.14 96.10 97.14 96.32 97.21 96.31 130 98.29 98.46 97.64 98.18 98.01 97.35 97.35 96.45 97.42 96.25 96.22 96.60 140 98.46 98.47 97.13 98.08 98.36 97.32 97.32 96.63 97.50 96.23 97.48 96.89 150 98.27 98.57 97.29 98.32 98.93 97.08 97.09 96.33 97.10 96.06 96.93 96.30 160 98.31 98.58 97.79 98.44 98.76 97.17 97.16 95.81 97.10 96.00 97.17 96.07 170 98.23 98.53 97.24 98.41 98.76 97.25 97.25 95.79 97.23 96.55 97.12 96.56 180 98.12 98.41 96.98 98.25 98.40 97.43 97.42 95.96 97.59 96.43 97.26 96.68 190 98.38 98.44 97.70 98.22 98.36 97.44 97.43 95.99 97.33 96.49 97.31 96.42 200 98.22 98.50 97.15 98.13 98.43 97.32 97.30 95.63 97.30 96.50 97.29 96.48 210 98.45 98.49 97.13 98.31 98.09 97.38 97.39 95.91 97.24 96.54 97.35 96.45 220 98.49 98.81 97.27 98.57 98.68 97.59 97.59 95.73 97.47 96.95 97.30 96.76 230 98.63 98.76 97.01 98.49 98.73 97.59 97.59 95.53 97.45 96.84 97.36 96.84 240 98.31 98.62 97.45 98.47 98.07 97.40 97.37 95.80 97.32 96.74 97.10 96.50 250 98.63 98.73 97.70 98.48 98.63 97.52 97.52 95.83 97.49 96.98 97.30 96.96 260 98.61 98.83 97.63 98.38 98.64 97.58 97.58 95.92 97.52 97.00 97.43 96.76 270 98.52 98.70 97.18 98.45 98.41 97.53 97.53 96.04 97.42 96.95 97.38 96.73 280 98.39 98.57 97.43 98.37 98.39 97.51 97.50 95.65 97.42 96.78 97.35 96.83 290 98.39 98.53 97.04 98.39 98.76 97.40 97.39 95.44 97.29 96.99 97.35 96.60 300 98.44 98.77 97.16 98.46 98.40 97.73 97.72 95.17 97.42 97.18 97.34 96.86 310 98.24 98.56 97.02 98.11 98.09 97.57 97.54 95.77 97.23 96.84 97.34 96.17 320 98.45 98.61 97.22 98.29 98.43 97.44 97.45 95.97 97.35 97.03 97.42 96.56 330 98.49 98.72 97.67 98.46 98.72 97.57 97.57 96.45 97.45 97.03 97.52 96.70 340 98.59 98.67 97.53 98.44 98.41 97.78 97.77 95.86 97.50 97.11 97.55 96.78 350 98.40 98.65 97.38 98.42 98.36 97.67 97.68 95.81 97.67 96.84 97.59 96.72 360 98.44 98.74 97.29 98.38 98.38 97.68 97.67 95.37 97.47 97.24 97.70 96.20 370 98.56 98.69 97.37 98.39 98.34 97.49 97.48 95.25 97.46 97.12 97.45 96.78 380 98.48 98.69 97.89 98.43 98.34 97.60 97.60 95.79 97.34 97.20 97.48 96.53 390 98.57 98.62 97.39 98.53 98.35 97.63 97.63 95.74 97.45 96.96 97.60 96.30 400 98.70 98.88 97.68 98.59 98.32 97.56 97.57 95.65 97.72 97.35 97.69 96.38 410 98.75 98.67 97.82 98.29 98.40 97.50 97.48 95.79 97.43 96.83 97.24 96.66 420 98.38 98.48 97.51 98.26 98.12 97.10 97.10 95.45 97.00 96.59 97.08 96.31 430 98.52 98.51 97.71 98.12 98.32 97.81 97.83 95.92 97.65 96.94 97.69 96.82 440 98.68 98.78 96.96 98.49 98.21 97.81 97.82 95.67 97.62 97.34 97.71 96.88 450 98.65 98.62 97.81 98.29 98.35 97.82 97.84 95.62 97.65 97.21 97.61 96.78 460 98.32 98.47 97.22 98.01 97.77 97.16 97.20 95.59 96.55 97.03 96.84 95.85 470 97.83 97.84 96.66 97.65 97.69 97.19 97.17 95.84 97.08 96.11 96.85 96.18 480 98.53 98.76 95.90 98.43 97.28 98.32 98.30 95.62 98.15 97.57 98.08 96.96 490 99.02 99.02 97.60 98.17 98.31 98.23 98.24 96.74 97.74 98.04 97.66 96.84

133 Table B.6 PPA Accuracy for different train samples for Rocket Core Small configuration.

Accuracy (%) Train RBFCubic RBFLinear GP Poly MARS RF BR AB GB NN XGB DTR Samples 10 87.00 90.44 83.29 85.91 91.40 93.88 94.02 94.33 94.28 75.73 95.11 94.84 20 92.52 91.18 89.11 88.89 94.80 96.04 96.07 95.90 95.86 84.78 96.43 94.64 30 94.66 94.03 86.57 90.80 96.50 96.63 96.61 96.31 96.68 91.76 96.67 95.76 40 94.40 93.80 90.76 92.45 96.73 96.77 96.77 96.42 96.79 92.74 96.43 96.05 50 93.16 93.34 87.66 87.98 95.19 96.61 96.61 95.99 96.59 93.14 96.36 95.83 60 95.15 94.60 90.55 94.60 96.81 97.00 97.00 96.41 97.03 94.27 97.05 96.30 70 95.08 94.90 89.92 93.30 95.37 97.00 97.02 96.14 97.04 94.19 96.93 96.25 80 95.99 95.65 92.13 94.75 96.71 97.09 97.10 96.36 97.20 94.26 97.11 96.40 90 95.56 95.88 91.78 95.01 96.63 97.10 97.11 96.16 96.89 95.03 97.05 96.46 100 95.91 95.65 93.06 95.60 97.17 97.11 97.12 96.21 97.19 95.42 97.19 96.55 110 95.17 95.85 89.87 94.40 96.31 97.33 97.32 96.35 97.32 95.60 97.38 96.65 120 95.63 96.14 91.40 95.22 96.92 97.23 97.25 96.36 97.15 96.12 97.06 96.50 130 96.26 96.13 92.81 95.31 96.64 97.24 97.22 96.16 97.26 96.07 97.22 96.51 140 95.70 95.87 91.54 94.86 97.01 97.16 97.16 96.35 97.21 95.90 97.22 96.54 150 95.67 96.38 90.59 95.85 96.62 97.10 97.10 96.41 97.18 96.50 96.94 96.46 160 95.30 96.11 91.00 95.47 95.96 97.46 97.45 96.48 97.34 96.47 96.77 96.73 170 95.93 96.10 93.00 95.02 96.97 97.35 97.30 95.39 97.24 96.30 97.70 96.31 180 95.68 95.96 91.58 95.26 96.23 97.53 97.52 96.16 97.30 96.53 97.42 96.92 190 96.18 96.25 92.34 95.89 96.56 97.40 97.40 96.53 97.24 96.55 97.36 96.78 200 96.20 96.59 92.96 95.55 96.75 97.37 97.36 96.25 97.33 96.77 97.45 96.49 210 96.39 96.51 92.89 95.31 96.75 97.54 97.54 95.74 97.40 96.40 97.53 96.46 220 96.13 96.41 92.99 95.61 96.62 97.39 97.37 96.44 97.32 96.95 97.58 96.68 230 96.63 96.66 93.72 95.58 96.41 97.52 97.51 96.20 97.59 96.51 97.64 96.78 240 96.68 96.57 93.33 95.62 96.48 97.58 97.58 96.12 97.50 96.79 97.58 96.92 250 96.26 96.55 92.22 96.09 96.30 97.39 97.39 96.33 97.40 96.33 96.80 96.80 260 96.24 96.48 92.51 95.63 96.62 97.66 97.65 95.75 97.46 96.36 97.44 96.64 270 96.45 96.66 93.29 95.97 96.82 97.50 97.49 96.38 97.38 96.98 97.60 96.73 280 96.05 96.26 91.25 95.06 96.50 97.61 97.60 95.51 97.47 97.20 97.63 96.87 290 96.42 96.70 93.32 95.63 96.18 97.35 97.36 95.82 97.35 97.04 97.37 96.72 300 96.57 96.85 92.10 95.77 96.64 97.68 97.68 95.54 97.47 97.05 97.56 96.70 310 96.39 96.44 92.35 95.38 96.73 97.36 97.37 96.13 97.03 97.02 97.66 96.62 320 96.62 96.75 93.03 95.23 96.28 97.39 97.40 95.93 97.37 97.04 97.58 96.75 330 96.43 96.70 92.47 95.93 96.66 97.46 97.46 95.46 97.28 96.97 97.51 96.71 340 96.19 96.32 93.43 95.61 96.46 97.56 97.56 95.48 97.50 97.18 97.55 97.02 350 96.66 96.71 93.50 95.74 96.50 97.48 97.50 95.84 97.26 97.13 97.53 96.69 360 96.61 96.77 93.44 95.77 96.42 97.62 97.62 96.36 97.43 97.09 97.66 97.07 370 96.52 96.60 93.45 95.80 96.47 97.20 97.20 96.07 97.22 96.89 97.42 96.44 380 96.28 96.38 92.22 95.62 96.19 97.02 97.03 96.05 97.16 96.66 96.91 96.48 390 96.38 96.57 91.95 95.79 96.40 97.31 97.31 95.97 97.17 96.63 97.18 96.70 400 96.20 96.09 91.32 95.53 96.34 97.33 97.28 96.00 96.88 96.28 97.42 96.75 410 95.95 96.01 92.64 95.43 95.99 96.95 96.95 95.81 97.28 96.42 97.21 96.32 420 96.24 96.21 92.35 95.57 96.06 97.76 97.77 95.71 97.65 97.04 97.77 97.04 430 97.21 97.02 94.75 95.41 96.92 97.63 97.62 95.47 97.54 97.25 97.57 96.79 440 96.45 96.64 90.70 95.33 96.27 97.33 97.33 94.64 97.18 97.07 97.42 96.60 450 95.86 95.99 92.66 95.14 95.98 96.79 96.81 95.49 96.63 96.79 96.96 96.17 460 95.40 95.89 90.59 95.36 93.66 97.03 97.05 95.25 97.20 96.67 97.11 96.44 470 95.75 95.71 92.70 94.79 95.35 96.61 96.59 94.85 96.48 96.35 96.59 95.91 480 95.87 95.93 91.88 94.44 95.15 97.37 97.37 96.55 97.91 95.93 97.54 96.44 490 94.98 95.20 85.40 93.35 96.09 96.86 96.89 94.47 97.05 96.27 97.16 96.05

134 Table B.7 PPA Accuracy for different train samples for Rocket Core Tiny configuration.

Accuracy (%) Train RBFCubic RBFLinear GP Poly MARS RF BR AB GB NN XGB DTR Samples 10 97.20 96.64 95.43 96.24 96.42 93.34 93.40 94.50 95.13 75.43 95.64 94.51 20 96.88 96.30 96.52 94.64 97.89 96.15 96.16 95.95 95.78 85.18 96.29 94.18 30 98.07 97.61 96.06 96.13 97.14 96.63 96.64 96.56 96.69 87.67 95.13 95.74 40 97.73 97.49 97.05 97.15 98.39 96.53 96.51 96.18 96.54 93.85 96.43 95.31 50 98.07 98.01 97.11 97.79 98.43 96.97 96.97 96.33 96.72 94.05 96.95 95.99 60 97.76 97.85 96.83 97.43 98.00 97.34 97.36 96.59 97.41 94.35 96.91 96.46 70 98.07 98.24 97.34 97.81 98.68 97.21 97.20 96.21 97.35 95.20 97.25 96.32 80 98.11 98.19 96.75 98.00 98.42 97.30 97.31 96.77 97.38 95.57 97.14 96.86 90 98.46 98.28 97.69 98.25 98.82 97.30 97.30 96.72 97.37 95.16 97.14 96.86 100 98.47 98.42 97.89 98.07 98.50 97.52 97.53 96.41 97.63 95.99 97.48 96.73 110 98.35 98.43 96.96 98.34 98.86 97.39 97.40 96.56 97.29 96.29 97.31 96.63 120 98.50 98.62 97.69 97.87 98.23 97.73 97.74 96.59 97.77 95.45 97.74 97.11 130 98.58 98.56 98.08 98.20 98.76 97.60 97.59 96.69 97.54 96.33 97.50 96.90 140 98.54 98.62 97.89 98.40 98.82 97.64 97.64 96.50 97.67 96.62 97.53 97.11 150 98.36 98.49 97.44 97.89 98.69 97.67 97.68 96.51 97.73 96.21 97.39 97.17 160 98.56 98.65 97.94 98.53 98.77 97.44 97.45 96.64 97.51 97.05 97.49 96.51 170 98.40 98.44 97.52 98.35 98.93 97.46 97.46 96.55 97.41 96.79 97.39 97.00 180 98.64 98.68 97.64 98.34 98.11 97.83 97.85 96.83 97.80 96.87 97.38 97.19 190 98.59 98.67 97.98 98.20 98.80 97.70 97.71 96.36 97.76 96.66 97.54 97.20 200 98.64 98.75 98.04 98.18 98.45 98.01 98.01 96.24 97.93 96.99 97.92 97.20 210 98.58 98.66 97.78 98.51 98.83 97.72 97.72 96.54 97.67 96.94 97.74 97.03 220 98.65 98.65 98.03 98.03 98.49 97.78 97.79 96.72 97.66 97.14 97.59 97.18 230 98.73 98.74 97.94 98.10 98.59 97.87 97.88 96.58 97.83 97.10 97.76 97.30 240 98.52 98.64 97.67 98.13 98.43 97.72 97.71 95.67 97.78 97.07 97.68 97.17 250 98.68 98.76 97.95 98.30 98.72 97.85 97.85 95.86 97.77 97.18 97.81 97.17 260 98.68 98.77 97.97 98.47 98.75 97.50 97.50 96.63 97.49 97.11 97.56 96.76 270 98.69 98.81 98.41 98.03 98.32 97.99 97.99 96.05 97.88 97.39 97.90 97.34 280 98.72 98.78 97.88 98.31 97.87 98.08 98.08 96.48 98.00 97.22 97.77 97.11 290 98.81 98.76 98.33 98.38 98.62 97.65 97.65 96.11 97.62 97.31 97.67 97.07 300 98.64 98.73 97.97 98.27 98.56 97.88 97.88 95.66 97.81 97.36 97.75 97.29 310 98.63 98.74 97.87 98.20 98.34 97.94 97.94 96.63 97.87 97.42 97.57 97.09 320 98.69 98.93 97.79 98.56 98.42 97.93 97.95 95.72 98.01 97.62 97.93 97.34 330 98.76 98.84 98.13 98.33 98.69 97.77 97.76 96.35 97.87 97.20 97.94 97.00 340 98.70 98.86 98.19 98.08 97.98 98.06 98.05 96.17 98.04 97.56 98.05 97.25 350 98.74 98.77 98.32 98.20 98.22 97.77 97.76 96.57 97.72 97.21 97.75 97.21 360 98.69 98.84 98.27 98.11 98.38 98.05 98.05 96.46 97.94 97.68 98.04 96.89 370 98.48 98.66 97.29 98.11 98.58 98.15 98.15 95.84 98.02 97.19 98.16 97.31 380 98.66 98.69 98.03 98.21 98.06 97.84 97.82 96.18 97.79 97.28 97.86 96.99 390 98.72 98.86 97.66 98.45 98.33 97.86 97.84 96.46 97.69 97.44 97.70 96.98 400 98.76 98.78 98.44 98.44 98.18 97.89 97.89 95.69 97.88 97.19 97.87 97.23 410 98.71 98.87 98.09 98.15 98.09 97.61 97.62 96.02 97.77 97.41 97.64 97.14 420 98.70 98.58 97.49 98.20 98.14 97.66 97.67 96.19 97.63 97.02 97.55 96.88 430 98.71 98.68 97.95 98.01 98.30 97.89 97.90 95.61 97.89 97.24 98.00 97.18 440 98.59 98.69 98.06 98.31 97.80 97.78 97.80 95.81 97.90 97.22 97.89 97.03 450 98.90 98.98 98.39 98.17 98.52 98.23 98.23 96.99 98.06 97.60 98.12 97.47 460 98.51 98.69 97.51 98.05 97.93 97.71 97.72 96.03 97.74 97.54 97.85 97.04 470 98.26 98.54 97.05 98.40 97.33 97.76 97.76 96.20 97.88 96.82 97.76 96.63 480 98.30 98.27 97.72 97.66 97.35 98.23 98.22 96.08 98.03 97.92 98.04 97.47 490 98.66 98.31 97.46 97.79 96.57 97.80 97.82 95.82 97.85 97.27 98.24 96.52

135 Table B.8 PPA Accuracy for different train samples for Rocket Core Dual configuration.

Accuracy (%) Train RBFCubic RBFLinear GP Poly MARS RF BR AB GB NN XGB DTR Samples 10 94.49 95.53 95.42 94.26 96.20 92.62 92.37 93.88 94.21 71.15 94.21 93.18 20 96.81 97.03 94.24 95.78 97.40 94.73 94.60 93.55 96.26 88.19 94.13 94.17 30 97.85 97.68 97.12 97.48 98.01 96.40 96.42 95.96 96.43 91.06 96.57 95.19 40 98.05 97.62 97.10 94.88 97.74 96.25 96.36 95.59 96.57 92.09 95.77 94.52 50 97.84 97.98 96.60 96.82 98.34 96.70 96.72 96.25 96.92 94.06 96.70 96.19 60 98.09 98.05 97.61 98.10 98.58 96.85 96.86 96.14 96.99 94.92 96.73 96.10 70 98.28 98.27 96.53 98.01 98.73 97.02 97.02 96.50 97.08 94.51 96.99 96.64 80 98.25 98.26 97.28 97.74 98.36 96.94 96.93 95.72 97.08 95.35 96.52 96.54 90 98.40 98.31 97.56 97.82 98.66 97.35 97.34 96.18 97.38 95.45 97.19 96.70 100 98.39 98.16 96.90 98.19 98.66 97.26 97.24 96.22 97.40 95.56 97.09 96.52 110 98.32 98.28 97.56 98.44 98.75 97.02 97.03 95.85 97.23 95.07 97.22 96.56 120 98.57 98.59 96.76 97.94 98.42 97.50 97.50 96.07 97.72 96.55 97.40 96.72 130 98.21 98.55 97.61 97.70 97.91 97.34 97.30 96.49 97.70 96.35 96.93 96.54 140 98.49 98.32 97.74 97.91 98.24 97.54 97.53 96.29 97.62 96.80 97.22 96.59 150 98.59 98.60 97.55 97.97 98.43 97.25 97.26 95.74 97.46 96.97 97.18 96.80 160 98.59 98.57 97.76 98.05 98.00 97.44 97.44 96.18 97.43 96.90 96.88 96.68 170 98.43 98.56 97.45 98.38 98.68 97.27 97.27 95.94 97.46 96.70 97.27 96.46 180 98.64 98.66 97.94 98.11 98.44 97.37 97.37 96.12 97.59 97.21 97.48 96.67 190 98.53 98.48 98.18 97.94 98.40 97.57 97.58 96.05 97.70 96.74 97.45 96.60 200 98.59 98.61 97.89 98.16 98.46 97.53 97.53 96.26 97.68 96.80 97.58 96.70 210 98.67 98.61 98.00 98.04 98.41 97.58 97.59 95.42 97.67 96.82 97.73 96.78 220 98.46 98.65 96.80 98.34 98.59 97.26 97.25 96.03 97.43 96.89 97.43 96.67 230 98.68 98.64 97.69 98.19 98.56 97.44 97.44 95.53 97.58 97.35 97.57 97.02 240 98.74 98.76 97.95 98.12 98.55 97.69 97.69 95.90 97.80 97.22 97.80 96.76 250 98.68 98.57 97.90 98.18 98.52 97.76 97.75 95.69 97.69 97.10 97.50 97.06 260 98.58 98.80 97.47 98.23 98.60 97.70 97.69 95.50 97.74 97.26 97.71 96.86 270 98.70 98.74 97.64 98.29 98.65 97.51 97.50 95.80 97.59 97.37 97.67 96.92 280 98.80 98.61 97.76 98.08 98.60 97.07 97.05 95.61 97.42 96.90 97.56 96.49 290 98.74 98.71 97.51 98.20 98.54 97.56 97.53 95.28 97.71 97.35 97.62 97.02 300 98.57 98.63 97.07 97.95 98.10 97.60 97.59 95.72 97.80 97.21 97.65 96.71 310 98.64 98.71 97.53 97.94 98.14 97.63 97.65 95.91 97.48 96.75 97.22 96.37 320 98.66 98.75 97.87 98.03 98.21 97.78 97.76 95.70 97.63 97.40 97.37 97.00 330 98.74 98.80 97.85 98.22 98.54 97.46 97.46 95.50 97.68 97.22 97.77 96.81 340 98.87 98.78 97.63 98.13 98.37 97.79 97.77 95.74 97.74 97.33 97.83 97.10 350 98.72 98.70 97.90 98.21 98.45 97.10 97.11 95.32 97.43 97.14 97.36 96.80 360 98.80 98.88 97.49 98.33 98.47 97.81 97.80 95.21 97.77 97.37 97.88 97.28 370 98.37 98.38 97.55 97.86 98.10 97.54 97.55 95.39 97.64 96.72 97.70 97.05 380 98.80 98.68 98.21 97.90 98.11 97.58 97.60 95.35 97.54 97.34 97.50 96.91 390 98.57 98.56 96.96 98.11 98.23 97.17 97.18 95.70 97.34 96.74 97.33 96.68 400 98.77 98.66 97.59 97.98 98.07 97.19 97.16 95.31 97.55 96.92 97.43 96.40 410 98.62 98.42 97.10 97.68 97.44 97.21 97.23 95.55 97.28 96.82 97.64 96.44 420 98.43 98.36 97.16 97.46 97.77 97.40 97.42 95.16 97.40 96.70 97.43 96.69 430 98.86 98.76 97.61 97.92 97.94 97.56 97.54 95.68 97.67 96.94 97.02 96.65 440 98.55 98.50 97.79 97.76 98.32 97.67 97.66 94.88 97.63 97.24 97.86 97.04 450 98.50 98.62 97.75 97.78 98.13 97.66 97.70 94.98 97.79 96.69 97.71 96.33 460 98.50 98.54 96.40 97.53 97.11 97.75 97.75 95.01 97.76 96.65 96.88 96.56 470 99.01 98.48 97.15 96.83 96.19 97.53 97.51 95.70 97.66 96.32 97.69 97.03 480 98.62 98.60 97.90 97.79 97.45 97.36 97.36 94.81 97.28 96.95 97.46 96.56 490 98.17 98.65 96.86 98.21 96.95 98.18 98.17 96.27 98.66 96.72 98.52 97.66

136 Table B.9 PPA Accuracy for different train samples for CNN IP block.

Accuracy (%) Train RBFCubic RBFLinear GP Poly MARS RF BR AB GB NN XGB DTR Samples 10 78.82 85.21 73.48 79.80 89.35 90.84 90.85 91.73 92.02 74.81 92.06 91.90 20 90.98 92.10 86.30 88.03 93.05 94.92 95.00 94.76 94.48 87.06 94.68 93.64 30 90.24 92.09 82.01 88.16 93.98 94.48 94.48 94.62 94.29 88.99 94.47 93.05 40 89.29 92.42 86.01 90.05 94.26 94.90 94.88 94.78 94.62 91.56 94.57 93.92 50 88.70 91.72 80.14 88.81 93.96 93.92 93.97 93.33 93.41 90.06 93.10 92.87 60 91.36 92.93 84.51 90.10 95.17 95.46 95.43 94.95 95.27 92.04 95.46 94.48 70 91.53 93.24 85.88 91.73 95.20 95.41 95.42 94.99 94.87 92.33 95.23 93.82 80 91.85 93.73 85.17 92.36 95.36 95.66 95.65 95.21 95.50 93.15 95.76 94.57 90 92.14 93.47 88.28 91.73 95.61 95.75 95.72 94.88 95.43 93.18 95.60 94.82 100 92.11 93.52 87.30 92.44 95.05 95.46 95.43 94.88 95.14 94.12 95.24 94.66 110 92.11 93.66 86.39 92.57 95.70 95.76 95.76 95.50 95.58 94.00 95.80 94.53 120 91.81 93.66 86.03 92.89 95.43 95.46 95.42 94.09 94.97 94.17 95.47 94.28 130 91.14 93.36 86.17 92.48 95.22 95.37 95.38 93.98 95.05 94.33 95.47 94.12 140 92.86 93.90 86.68 92.82 95.55 95.70 95.68 94.79 95.35 93.99 95.66 95.17 150 91.57 93.39 86.08 92.73 95.47 95.57 95.59 94.81 95.18 94.02 95.59 94.86 160 91.54 93.34 85.51 92.56 95.65 95.36 95.34 94.03 94.83 94.54 95.45 93.87 170 92.32 93.92 86.20 93.30 95.67 95.74 95.75 94.58 95.39 94.63 95.68 94.54 180 91.33 93.31 85.12 93.22 95.17 95.49 95.49 94.34 95.18 94.85 95.69 94.66 190 93.21 94.27 88.59 93.89 95.79 95.96 95.93 94.19 95.66 95.13 95.97 95.01 200 91.87 93.62 84.88 92.67 95.50 95.15 95.13 93.84 94.78 93.65 95.26 94.66 210 92.00 93.65 86.59 93.35 95.39 95.41 95.37 94.00 95.25 94.84 95.60 94.62 220 92.14 93.97 87.50 93.43 95.64 95.69 95.71 93.42 95.38 95.10 95.80 94.38 230 91.89 93.77 86.12 93.56 95.47 95.62 95.59 94.05 95.29 95.03 95.72 93.80 240 92.78 93.99 87.56 93.38 95.75 95.65 95.67 94.63 95.33 95.32 95.70 94.54 250 92.54 94.01 87.32 93.60 95.50 95.73 95.71 94.70 95.27 95.06 95.80 94.62 260 91.58 93.69 86.24 93.26 95.54 95.31 95.30 93.68 94.80 95.12 95.35 94.20 270 92.85 94.01 87.03 93.49 95.45 95.74 95.71 94.28 95.61 95.02 95.84 94.90 280 92.85 94.12 84.27 93.58 95.54 95.59 95.61 94.19 95.24 95.36 95.73 94.30 290 92.82 94.20 87.46 93.83 95.52 95.78 95.79 94.23 95.37 95.21 95.79 94.68 300 92.23 93.81 87.40 93.50 95.59 95.18 95.18 93.67 94.86 95.38 95.43 94.22 310 92.20 93.98 86.48 93.76 95.60 95.78 95.78 93.20 95.59 95.69 95.83 94.73 320 93.22 94.33 87.91 93.58 95.59 95.53 95.51 94.36 95.14 95.34 95.60 94.61 330 92.78 94.35 85.77 93.73 95.46 95.84 95.84 94.96 95.55 95.33 95.91 94.97 340 92.47 94.04 87.71 93.77 95.25 95.12 95.16 94.28 95.07 95.29 95.48 94.07 350 92.54 93.61 88.26 93.42 95.28 95.20 95.17 93.57 95.06 95.41 95.26 94.44 360 92.22 93.69 86.86 93.57 95.35 95.56 95.60 93.23 95.29 95.03 95.85 94.82 370 92.62 93.83 86.63 93.77 95.57 95.44 95.46 93.45 95.18 95.07 95.84 94.15 380 93.01 94.36 87.07 93.97 95.59 95.61 95.63 93.25 95.30 95.43 95.80 94.18 390 92.60 94.05 86.49 93.66 95.54 95.49 95.48 93.15 95.27 95.35 95.56 95.00 400 92.60 94.29 84.57 93.31 95.55 95.30 95.31 94.41 94.75 95.46 95.60 94.62 410 92.36 93.93 86.47 93.51 95.46 95.33 95.33 94.37 95.05 95.46 95.59 94.11 420 91.81 93.84 86.31 93.33 95.37 95.07 95.09 93.07 94.99 95.43 95.47 94.44 430 92.81 94.16 88.31 93.86 95.38 95.46 95.48 93.61 95.09 95.31 95.66 94.53 440 92.34 94.19 85.37 94.05 95.52 95.29 95.37 94.44 94.81 95.39 95.58 94.94 450 93.12 94.41 89.01 94.12 95.48 95.78 95.77 93.71 95.50 95.66 95.78 93.72 460 92.86 94.46 85.84 93.82 95.86 95.60 95.59 93.32 94.98 95.95 95.74 94.18 470 91.74 93.90 85.99 93.65 95.75 95.33 95.36 93.19 94.77 95.85 96.19 93.56 480 91.87 93.85 84.54 93.50 95.35 94.96 94.95 93.27 94.32 95.70 95.43 92.74 490 93.49 94.82 87.24 93.98 95.07 95.76 95.74 92.34 96.14 95.14 95.73 96.04

137 B.5 Additional Results for Transfer Learning Models Evalu-

ation for Core Configurations

138 Table B.10 Transfer Learning results for Tiny Core base model with 50 and 150 samples.

Rocket Tiny Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 93.42 91.29 96.05 94.74 93.87 4.37 8.4 10 94.94 92.56 96.23 96.00 94.93 4.01 7.5 15 93.89 90.96 96.60 93.13 93.65 4.65 8.1 20 95.23 91.49 96.87 95.47 94.77 4.26 7.3 25 95.53 91.98 97.25 96.39 95.29 3.57 5.7 30 95.97 92.73 96.94 96.16 95.45 3.41 5.9 35 94.78 92.85 97.28 95.92 95.21 3.76 5.7 40 95.63 92.76 97.80 96.55 95.69 3.36 6.0 45 96.06 94.17 97.57 96.58 96.09 3.19 4.9 50 95.75 93.34 97.60 95.80 95.62 3.61 4.9 60 95.43 94.21 97.49 97.13 96.07 3.21 4.2 70 95.88 92.31 97.56 96.55 95.58 3.69 4.1 80 95.78 93.52 97.84 96.42 95.89 3.47 4.4 90 95.25 94.16 97.60 96.97 96.00 3.36 3.0 100 95.85 94.38 97.91 97.29 96.36 3.10 3.0

Rocket Tiny Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 96.16 92.46 96.95 96.36 95.48 3.14 8.1 10 96.25 93.14 97.62 96.33 95.84 3.13 9.4 15 96.37 93.57 95.96 96.56 95.61 3.29 7.9 20 95.56 93.29 98.00 96.11 95.74 3.12 5.6 25 96.06 94.18 98.00 96.66 96.23 3.02 6.0 30 96.66 94.13 97.77 97.24 96.45 2.78 6.0 35 94.12 93.56 97.51 96.12 95.33 3.01 6.3 40 96.31 94.10 98.06 96.84 96.33 2.98 6.2 45 96.43 94.28 97.19 97.04 96.23 2.84 4.5 50 96.21 94.18 98.14 97.09 96.41 2.85 4.7 60 96.76 92.98 97.85 96.76 96.09 3.03 4.4 70 96.59 94.81 97.76 97.16 96.58 2.71 4.1 80 96.54 94.58 98.20 97.10 96.60 2.81 4.5 90 96.61 94.18 97.60 96.85 96.31 3.22 3.0 100 96.16 95.04 98.10 97.10 96.60 2.74 2.9

139 Table B.11 Transfer Learning results for Small Core base model with 50 and 150 samples.

Rocket Small Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 93.90 89.22 97.70 92.68 93.38 4.42 8.1 10 95.32 92.43 97.56 95.78 95.27 4.16 9.0 15 95.19 93.44 97.86 96.52 95.75 3.63 8.2 20 95.40 93.51 98.17 94.39 95.37 3.41 6.6 25 94.12 93.93 97.97 96.36 95.59 3.74 5.9 30 95.21 92.13 98.17 96.09 95.40 4.12 6.7 35 95.14 92.66 97.91 95.35 95.27 3.89 5.4 40 95.01 91.63 98.29 96.30 95.31 3.61 5.4 45 95.01 93.47 98.29 96.45 95.80 3.97 5.1 50 95.40 93.79 98.27 96.92 96.10 3.19 4.7 60 94.33 93.01 97.86 96.46 95.42 4.19 4.2 70 95.48 92.69 98.09 96.76 95.76 3.80 3.7 80 95.11 94.61 98.03 96.61 96.09 3.55 4.2 90 95.29 94.85 98.51 96.95 96.40 3.18 2.9 100 94.95 93.59 98.06 96.77 95.84 3.91 3.1

Rocket Small Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 95.38 92.61 97.24 96.30 95.38 3.46 7.4 10 96.22 93.29 97.50 97.04 96.01 3.09 7.2 15 94.82 94.16 98.37 96.74 96.02 3.35 7.3 20 95.47 94.52 98.38 96.93 96.32 3.14 6.6 25 96.11 94.55 98.12 96.75 96.38 2.91 5.8 30 96.08 94.86 98.53 97.14 96.65 2.99 6.0 35 96.01 94.51 98.63 97.07 96.55 2.92 5.4 40 95.81 94.04 98.60 97.00 96.36 2.89 6.2 45 96.17 95.10 98.58 97.34 96.80 2.80 4.7 50 96.38 95.30 98.73 97.42 96.96 2.60 5.1 60 96.04 94.98 98.74 97.15 96.73 3.07 4.3 70 96.26 94.52 98.63 96.91 96.58 2.86 4.1 80 96.15 95.25 98.73 97.37 96.88 2.89 4.7 90 95.92 94.87 98.73 97.36 96.72 2.69 3.1 100 96.06 95.48 98.84 97.27 96.91 2.96 3.0

140 Table B.12 Transfer Learning results for Dual Core base model with 50 and 150 samples.

Rocket Dual Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.42 89.61 96.52 93.82 93.59 5.68 8.8 10 94.11 91.64 97.57 94.92 94.56 4.33 7.2 15 94.94 92.36 98.28 95.93 95.38 4.07 7.6 20 95.25 92.86 97.86 96.48 95.61 3.70 8.3 25 95.89 90.30 97.97 94.79 94.74 3.78 5.5 30 95.84 92.22 98.17 94.52 95.19 3.64 6.2 35 95.87 93.17 98.34 95.88 95.81 3.87 6.0 40 95.30 94.22 97.74 96.20 95.87 3.73 6.0 45 95.45 93.07 98.41 96.58 95.88 4.00 5.0 50 95.96 94.59 98.39 96.61 96.39 3.41 5.0 60 95.99 93.51 98.32 96.22 96.01 3.72 4.5 70 95.67 93.73 98.34 96.66 96.10 3.65 4.1 80 95.70 92.56 98.40 96.61 95.82 3.63 3.9 90 96.04 94.74 98.38 96.97 96.53 3.54 2.9 100 96.34 94.48 98.30 96.72 96.46 3.50 3.1

Rocket Dual Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 95.15 91.54 97.49 94.80 94.74 3.94 9.4 10 96.42 94.58 98.24 96.19 96.36 3.11 8.9 15 96.88 94.51 98.89 96.62 96.73 3.05 8.6 20 97.09 94.20 98.54 97.06 96.72 2.96 6.9 25 96.75 89.17 98.90 96.55 95.34 3.32 5.7 30 96.84 94.45 98.86 96.71 96.71 2.77 6.6 35 97.03 95.18 98.74 97.19 97.03 2.84 6.2 40 96.29 95.28 97.73 97.10 96.60 3.01 6.5 45 96.66 95.60 98.94 97.24 97.11 2.78 5.0 50 96.92 95.48 98.81 97.29 97.12 2.74 4.7 60 97.00 95.18 98.89 97.02 97.02 2.88 4.8 70 96.90 94.98 98.99 97.04 96.98 2.88 4.1 80 97.21 95.41 98.93 96.97 97.13 2.82 4.6 90 97.11 95.40 98.97 97.15 97.16 2.91 3.3 100 97.06 95.38 98.99 96.99 97.11 2.78 2.8

141 Table B.13 Transfer Learning results for Medium Core base model with 50 and 150 samples.

Rocket Medium Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.20 92.65 97.77 95.46 95.02 3.97 8.7 10 95.28 92.08 97.40 96.40 95.29 3.60 8.7 15 94.87 92.37 97.96 96.07 95.31 3.92 6.9 20 95.21 91.80 97.79 95.50 95.08 4.13 7.1 25 95.23 90.72 98.04 96.27 95.06 3.99 5.6 30 94.44 92.33 98.05 95.91 95.18 3.92 5.9 35 95.15 93.54 97.45 96.53 95.67 3.81 5.6 40 95.25 94.16 98.23 96.72 96.09 3.15 5.7 45 95.59 93.85 98.32 95.72 95.87 3.53 4.2 50 95.37 93.02 98.25 96.72 95.84 3.65 4.6 60 94.83 94.28 98.25 97.05 96.10 3.42 4.2 70 95.45 93.85 98.14 97.00 96.11 3.37 4.1 80 95.40 93.98 98.18 97.44 96.25 3.28 4.5 90 95.54 94.34 98.02 97.07 96.24 3.35 3.0 100 95.05 95.39 98.27 97.54 96.56 2.93 3.0

Rocket Medium Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 95.08 93.76 97.57 95.07 95.37 3.39 8.6 10 94.85 94.09 97.70 97.50 96.03 3.05 8.3 15 93.67 94.91 98.70 96.17 95.86 2.86 8.0 20 94.66 94.53 98.76 96.91 96.21 2.79 7.2 25 96.26 94.35 98.59 96.92 96.53 2.81 5.8 30 95.19 94.37 98.50 96.61 96.17 2.99 4.9 35 95.79 95.44 98.46 97.67 96.84 2.81 5.8 40 96.12 95.28 98.84 97.48 96.93 2.57 6.0 45 96.15 95.38 98.72 97.32 96.89 2.65 4.9 50 96.05 95.04 98.73 97.37 96.80 2.65 5.2 60 96.41 95.64 98.66 97.47 97.04 2.61 4.4 70 95.83 95.40 98.69 97.44 96.84 2.60 4.1 80 96.34 95.81 98.69 97.33 97.04 2.48 4.4 90 96.33 95.30 98.76 97.64 97.01 2.63 3.1 100 96.36 95.75 98.87 97.56 97.13 2.53 3.0

142 B.6 Additional Results for Transfer Learning Models Evalu-

ation for Corner Conditions

143 Table B.14 Transfer Learning results for Rocket Corner FF_0p88V _125C _Rcmax with base model with 50 and 150 samples.

Rocket FF_0p88V _125C _Rcmax Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 93.53 90.67 94.10 94.07 93.09 5.70 8.5 10 93.36 89.79 97.92 95.01 94.02 4.94 8.4 15 94.47 92.91 98.15 95.66 95.30 4.12 8.3 20 95.04 92.10 98.06 95.90 95.28 4.08 7.6 25 94.58 93.23 97.82 95.96 95.40 4.32 6.1 30 95.81 93.77 98.21 96.54 96.08 3.81 6.2 35 95.79 92.59 98.04 94.91 95.33 3.65 5.9 40 95.79 94.15 98.22 96.27 96.11 3.81 6.4 45 95.89 93.36 98.35 96.17 95.94 3.87 4.5 50 95.43 93.01 97.97 95.79 95.55 4.23 4.3 60 96.03 93.15 98.09 96.95 96.06 3.70 4.0 70 95.96 94.19 98.31 96.95 96.35 3.21 4.1 80 95.83 94.77 98.29 96.73 96.41 3.26 4.3 90 96.08 93.95 98.46 97.07 96.39 3.64 3.1 100 95.55 94.35 98.19 96.71 96.20 3.71 3.1

Rocket FF_0p88V _125C _Rcmax Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.81 90.52 97.56 95.05 94.48 4.82 9.4 10 96.63 91.14 98.94 95.51 95.56 3.72 8.0 15 96.69 91.96 98.59 95.27 95.63 3.09 8.9 20 96.89 95.93 98.59 97.37 97.19 2.85 6.8 25 96.69 95.42 98.83 96.92 96.96 2.93 5.5 30 96.63 96.14 98.55 97.20 97.13 2.84 5.6 35 96.83 94.40 98.92 96.96 96.78 2.73 5.8 40 97.06 96.26 98.37 97.55 97.31 2.79 6.0 45 96.72 95.85 98.76 97.13 97.11 2.87 4.6 50 96.90 95.93 98.71 97.18 97.18 2.95 4.6 60 96.99 96.15 98.91 97.61 97.41 2.79 4.2 70 97.00 96.03 98.76 97.39 97.30 2.76 3.8 80 96.44 95.86 98.91 97.37 97.15 2.66 4.3 90 97.02 96.18 98.96 97.60 97.44 2.84 3.3 100 96.74 95.29 98.80 97.30 97.03 2.97 2.9

144 Table B.15 Transfer Learning results for Rocket Corner SS_0p72V _125C _Cmax with base model with 50 and 150 samples.

Rocket SS_0p72V _125C _Cmax Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.88 91.01 87.95 94.07 91.98 7.23 6.4 10 92.79 90.51 96.89 95.01 93.80 5.14 6.3 15 94.95 93.15 97.41 95.66 95.29 4.56 7.2 20 95.90 92.14 95.29 95.90 94.81 4.61 6.1 25 95.12 93.31 96.54 95.96 95.23 4.64 5.1 30 96.26 93.96 97.72 96.54 96.12 3.85 5.2 35 95.92 92.86 97.81 94.91 95.38 3.84 4.7 40 96.37 94.21 96.83 96.27 95.92 4.01 5.3 45 95.78 93.49 97.59 96.20 95.77 3.90 4.6 50 95.63 93.11 96.71 95.73 95.29 4.59 4.5 60 96.31 92.86 97.09 96.81 95.77 3.91 4.0 70 96.31 94.21 97.53 96.99 96.26 3.40 3.7 80 96.24 94.75 97.43 96.99 96.35 3.44 3.9 90 96.42 94.05 97.67 97.01 96.29 3.76 2.9 100 95.99 94.62 97.44 96.71 96.19 3.81 2.6

Rocket SS_0p72V _125C _Cmax Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.70 91.20 95.17 95.05 94.03 5.08 7.7 10 96.91 91.11 96.40 95.51 94.98 3.99 6.6 15 96.98 92.73 94.48 95.27 94.87 3.52 7.7 20 97.35 95.49 97.04 97.37 96.81 2.93 7.1 25 97.20 95.48 97.79 96.92 96.85 3.05 5.1 30 96.89 95.90 97.95 97.20 96.99 3.01 5.3 35 96.07 93.46 98.27 96.96 96.19 2.87 5.1 40 97.30 96.14 97.87 97.55 97.21 2.85 5.8 45 97.24 95.96 98.10 97.28 97.15 2.87 3.8 50 97.15 95.94 97.91 97.18 97.05 3.03 4.2 60 97.66 96.11 97.72 97.57 97.26 2.96 3.6 70 97.39 96.10 98.21 97.35 97.26 2.83 3.5 80 97.11 95.80 98.01 97.62 97.13 2.64 4.1 90 97.46 96.19 98.11 97.61 97.34 2.89 3.0 100 97.34 95.60 97.89 96.95 96.95 3.16 2.8

145 Table B.16 Transfer Learning results for Rocket Corner SS_0p72V _125C _Rcmax with base model with 50 and 150 samples.

Rocket SS_0p72V _125C _Rcmax Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 92.82 90.79 88.99 94.07 91.67 6.90 8.4 10 92.88 90.64 96.93 95.01 93.87 5.06 6.8 15 93.85 93.00 97.42 95.66 94.98 4.22 7.9 20 94.92 92.03 95.86 95.90 94.68 4.39 6.7 25 94.39 93.15 96.48 95.96 95.00 4.61 5.9 30 96.04 93.89 97.84 96.54 96.08 3.90 5.5 35 95.42 92.82 97.86 94.91 95.25 3.70 5.5 40 95.90 94.23 96.83 96.27 95.81 4.05 5.3 45 96.02 93.47 97.76 96.21 95.86 4.00 4.6 50 95.56 93.03 96.69 95.80 95.27 4.51 4.8 60 96.23 93.04 97.16 96.66 95.77 3.97 3.8 70 96.29 94.18 97.56 96.90 96.24 3.36 3.7 80 96.16 94.51 97.51 96.69 96.22 3.57 4.0 90 96.23 93.85 97.69 96.96 96.18 3.79 2.7 100 95.74 93.67 97.57 96.71 95.92 3.93 3.0

Rocket SS_0p72V _125C _Rcmax Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.38 90.08 95.53 95.05 93.76 5.45 8.1 10 96.71 90.99 97.02 95.51 95.06 4.02 7.2 15 97.06 94.22 95.48 95.27 95.51 3.45 7.6 20 97.18 95.88 97.25 97.37 96.92 3.00 5.8 25 96.13 95.58 97.73 96.92 96.59 3.08 5.6 30 97.30 96.07 97.78 97.20 97.09 2.96 6.2 35 96.96 93.70 98.29 96.96 96.48 2.86 5.8 40 97.19 96.27 97.77 97.55 97.19 2.89 6.2 45 96.89 95.91 98.14 97.38 97.08 2.85 5.1 50 97.10 95.85 97.84 97.03 96.95 3.14 4.5 60 97.27 96.14 97.67 97.59 97.17 2.97 4.3 70 97.24 96.03 98.16 97.50 97.23 2.89 4.0 80 97.27 95.83 97.98 97.52 97.15 2.72 4.1 90 97.30 96.13 97.89 97.68 97.25 2.97 2.8 100 97.07 95.45 98.06 97.20 96.95 3.19 3.0

146 Table B.17 Transfer Learning results for Rocket Corner SS_0p72V _m40C _Cmax with base model with 50 and 150 samples.

Rocket SS_0p72V _m40C _Cmax Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.39 89.85 90.24 94.07 92.14 6.84 6.5 10 92.15 88.72 96.87 95.01 93.19 5.47 6.3 15 93.29 92.89 97.42 95.66 94.81 4.33 6.7 20 94.82 90.88 95.12 95.90 94.18 4.74 5.1 25 94.97 92.74 96.30 95.96 94.99 4.93 5.4 30 96.03 93.44 97.40 96.54 95.85 4.00 4.4 35 95.50 92.58 97.69 94.91 95.17 3.86 5.1 40 95.63 93.78 96.72 96.27 95.60 4.21 4.9 45 95.81 92.87 97.76 96.17 95.65 4.14 4.1 50 95.20 92.47 96.66 95.80 95.03 4.78 4.6 60 96.07 92.56 97.04 96.78 95.61 4.13 3.6 70 96.16 93.97 97.56 96.92 96.15 3.47 3.2 80 96.02 94.76 97.45 97.04 96.32 3.55 3.4 90 96.08 93.56 97.69 97.10 96.11 3.86 2.7 100 95.66 93.85 97.38 96.62 95.88 3.98 2.9

Rocket SS_0p72V _m40C _Cmax Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 93.12 89.22 95.30 95.05 93.17 5.88 6.9 10 96.67 90.60 96.56 95.51 94.83 4.16 6.5 15 96.75 93.09 94.21 95.27 94.83 3.64 6.3 20 96.93 95.39 97.13 97.37 96.71 3.08 6.2 25 95.49 95.49 97.83 96.92 96.43 3.21 4.9 30 97.15 95.52 98.04 97.20 96.97 2.96 4.6 35 96.90 94.27 98.24 96.96 96.59 2.93 5.2 40 97.09 95.66 97.84 97.05 96.91 3.09 5.0 45 96.68 95.60 98.05 97.07 96.85 3.00 3.9 50 97.02 95.60 97.82 97.15 96.90 3.19 3.9 60 97.13 95.82 97.72 97.56 97.06 3.06 3.8 70 97.05 95.52 98.21 97.25 97.01 2.91 3.6 80 97.17 95.73 98.16 97.54 97.15 2.75 4.0 90 97.12 95.87 98.12 97.67 97.19 3.04 2.6 100 96.94 95.26 98.02 97.19 96.85 3.24 2.9

147 Table B.18 Transfer Learning results for OR1200 Corner FF_0p88V _125C _Rcmax with base model with 50 and 150 samples.

OR1200 FF_0p88V _125C _Rcmax Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.24 91.03 95.93 91.76 93.24 5.08 8.2 10 94.59 90.49 97.09 95.25 94.35 5.50 8.4 15 96.06 91.84 96.81 95.54 95.06 4.58 9.3 20 95.73 91.53 96.68 95.69 94.91 4.49 7.7 25 96.30 92.05 97.07 94.53 94.99 4.79 7.7 30 96.21 92.52 96.91 95.55 95.30 4.54 8.2 35 95.94 92.21 97.20 95.69 95.26 4.31 7.6 40 96.03 92.11 97.25 95.44 95.20 4.58 8.4 45 96.39 90.22 96.79 94.34 94.44 4.17 6.8 50 95.68 91.67 96.84 95.87 95.02 4.81 7.0 60 96.47 93.90 97.45 95.47 95.82 4.15 5.6 70 96.30 92.32 97.36 96.07 95.51 4.50 5.5 80 96.74 93.64 97.43 96.21 96.00 4.17 6.4 90 96.15 93.79 97.37 96.23 95.89 4.08 4.7 100 96.68 94.19 97.48 95.76 96.03 4.01 4.3

OR1200 FF_0p88V _125C _Rcmax Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.52 92.55 96.67 95.95 94.92 3.89 10.8 10 96.92 94.31 97.15 95.90 96.07 3.82 10.2 15 94.46 93.49 96.62 96.61 95.30 4.29 10.2 20 94.76 94.46 97.18 95.02 95.36 5.15 7.0 25 96.92 94.59 97.19 95.85 96.14 4.24 7.6 30 96.68 93.86 97.58 97.00 96.28 4.06 7.5 35 96.44 94.35 97.73 96.99 96.37 4.14 7.9 40 96.77 94.62 97.57 97.14 96.52 3.78 8.5 45 96.14 94.74 97.64 96.96 96.37 4.09 6.8 50 96.49 94.18 97.70 97.06 96.36 4.27 7.1 60 96.71 94.12 98.03 97.01 96.47 4.16 6.4 70 96.87 94.57 98.24 97.09 96.70 3.76 5.4 80 96.72 94.17 97.98 97.11 96.50 3.97 5.9 90 96.81 94.91 97.96 97.25 96.73 3.77 4.5 100 97.04 94.31 98.08 97.18 96.65 4.15 4.6

148 Table B.19 Transfer Learning results for OR1200 Corner SS_0p72V _125C _Cmax with base model with 50 and 150 samples.

OR1200 SS_0p72V _125C _Cmax Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 92.94 91.05 91.97 91.76 91.93 6.26 7.5 10 95.93 90.41 74.19 95.25 88.94 11.19 9.4 15 95.27 92.40 93.71 95.54 94.23 5.54 6.8 20 94.99 91.97 94.18 95.73 94.22 5.31 7.3 25 95.73 91.88 92.46 94.53 93.65 5.61 6.6 30 95.60 92.73 94.19 95.55 94.52 5.27 7.4 35 96.04 92.25 94.23 95.69 94.55 5.32 7.2 40 95.02 92.77 94.60 95.44 94.46 5.14 6.6 45 94.80 91.86 95.17 94.07 93.97 4.73 6.2 50 95.99 91.53 93.71 95.79 94.25 5.95 6.0 60 96.40 94.01 94.26 94.85 94.88 5.24 5.5 70 96.53 92.74 95.11 96.09 95.12 5.09 5.3 80 96.26 94.09 95.17 96.21 95.43 4.95 5.9 90 96.58 94.07 95.74 95.47 95.47 4.78 4.3 100 96.30 94.04 95.80 96.11 95.56 4.61 4.2

OR1200 SS_0p72V _125C _Cmax Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 93.34 92.20 74.43 95.95 88.98 9.90 8.7 10 96.26 94.59 87.94 95.90 93.67 6.58 8.9 15 95.77 93.89 91.61 96.61 94.47 5.45 8.8 20 94.85 94.74 92.22 95.11 94.23 6.52 7.7 25 96.06 94.67 92.95 95.85 94.88 5.52 7.0 30 96.36 94.47 92.58 97.00 95.10 5.54 7.5 35 96.57 93.91 94.52 96.99 95.50 5.26 7.3 40 96.24 94.71 94.38 97.14 95.62 4.96 6.4 45 96.58 95.11 95.07 96.99 95.93 5.10 6.9 50 96.49 94.56 93.86 97.02 95.48 5.54 6.0 60 96.46 94.20 93.99 97.01 95.41 5.54 5.7 70 96.80 94.74 94.31 97.09 95.74 5.38 5.4 80 96.63 94.60 95.72 97.02 95.99 4.85 5.7 90 96.96 95.26 95.02 97.25 96.12 4.96 4.4 100 96.73 94.60 95.74 97.16 96.06 5.12 4.1

149 Table B.20 Transfer Learning results for OR1200 Corner SS_0p72V _125C _Rcmax with base model with 50 and 150 samples.

SS_0p72V _125C _Rcmax Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.40 91.09 92.30 91.76 92.39 5.83 9.0 10 96.12 90.71 82.81 95.25 91.22 8.86 8.8 15 95.26 91.63 94.12 95.54 94.14 5.39 8.4 20 96.04 91.52 94.24 95.66 94.36 5.26 7.1 25 96.52 92.04 92.78 94.53 93.97 5.45 7.1 30 96.37 92.54 94.32 95.55 94.70 5.08 7.0 35 96.12 92.33 95.20 95.69 94.84 4.81 7.5 40 96.22 92.68 94.68 95.44 94.75 5.09 7.9 45 96.26 91.66 95.08 94.76 94.44 4.58 6.6 50 95.72 92.12 94.03 95.81 94.42 5.51 6.6 60 96.63 93.88 94.23 96.23 95.24 4.77 5.7 70 96.34 93.54 95.30 96.14 95.33 4.67 5.7 80 96.68 94.01 95.35 96.21 95.56 4.84 6.2 90 96.35 94.28 95.72 96.09 95.61 4.45 4.5 100 96.82 94.45 95.68 95.92 95.72 4.37 4.2

OR1200 SS_0p72V _125C _Rcmax Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.01 92.95 74.46 95.95 89.34 9.66 10.0 10 97.04 94.43 88.29 95.90 93.92 6.43 10.1 15 95.71 93.77 91.19 96.61 94.32 5.27 9.5 20 95.10 94.69 93.04 95.02 94.46 6.31 8.3 25 97.07 94.54 93.18 95.85 95.16 5.51 7.7 30 96.94 94.38 91.73 97.00 95.01 5.69 7.7 35 96.19 94.07 94.40 96.99 95.41 5.07 7.6 40 96.85 94.39 94.64 97.14 95.76 4.61 7.2 45 96.67 94.90 94.81 96.57 95.74 5.01 6.4 50 96.71 94.53 93.57 97.11 95.48 5.45 6.6 60 96.99 94.19 94.56 97.00 95.69 5.18 5.9 70 96.96 94.55 94.69 97.10 95.82 5.15 5.5 80 96.92 94.48 95.35 97.09 95.96 4.66 6.5 90 96.73 95.18 94.77 97.24 95.98 5.03 4.0 100 97.06 94.42 95.85 97.14 96.11 5.04 4.3

150 Table B.21 Transfer Learning results for OR1200 Corner SS_0p72V _m40C _Cmax with base model with 50 and 150 samples.

OR1200 SS_0p72V _m40C _Cmax Base Model 50 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.54 88.41 92.47 91.76 91.79 6.53 7.8 10 96.38 90.64 72.86 95.25 88.78 10.90 8.0 15 96.26 90.74 93.68 95.54 94.05 5.75 8.0 20 95.95 90.60 93.70 95.77 94.00 5.58 6.4 25 96.36 90.99 92.51 94.53 93.60 5.87 6.3 30 96.31 91.67 93.90 95.55 94.36 5.37 7.0 35 96.08 91.48 94.21 95.69 94.36 5.40 5.6 40 96.13 91.55 94.55 95.44 94.41 5.36 6.4 45 96.34 90.16 94.84 93.79 93.78 4.79 5.8 50 95.52 90.43 93.49 95.85 93.82 6.16 5.2 60 96.55 93.14 93.78 95.69 94.79 5.25 5.6 70 96.68 93.40 94.94 96.01 95.26 4.89 5.0 80 96.57 92.99 95.22 96.11 95.22 4.97 5.4 90 96.14 93.58 95.51 96.07 95.33 4.70 4.1 100 96.78 94.16 95.61 96.01 95.64 4.36 4.1

OR1200 SS_0p72V _m40C _Cmax Base Model 150 Samples Dynamic Leakeage Critial Area PPA Std Train Train Pwr Pwr Path Dev Time Speedup 5 94.13 93.37 70.00 95.95 88.36 10.98 7.0 10 96.72 94.19 87.45 95.90 93.57 6.88 7.4 15 96.68 92.86 90.01 96.61 94.04 5.86 8.3 20 94.89 94.22 91.95 95.02 94.02 6.88 5.8 25 97.01 92.00 92.75 95.85 94.40 5.83 6.9 30 96.88 93.36 92.13 97.00 94.84 5.77 6.0 35 95.81 93.73 94.37 96.99 95.23 5.35 6.5 40 96.78 94.14 92.62 97.14 95.17 5.38 6.5 45 97.00 94.06 94.20 97.11 95.59 5.51 6.1 50 96.61 93.75 93.22 97.11 95.17 5.96 6.2 60 96.88 92.99 94.43 97.01 95.33 5.61 5.2 70 96.90 93.75 93.77 97.09 95.38 5.62 5.2 80 96.94 93.76 95.26 96.81 95.69 4.86 5.8 90 96.83 94.52 94.89 97.23 95.87 5.22 4.1 100 97.07 93.86 95.39 97.18 95.87 5.43 4.1

151