EQUITY RESEARCH INDUSTRY UPDATE

June 3, 2021 The Next Technology Frontier TECHNOLOGY/ & COMPONENTS SUMMARY Artificial Intelligence, once the stuff of science fiction, has arrived. Interest is high and adoption increasing from to smartphones. Investors have taken note and rewarded early leaders like . Advances in semiconductors and software have enabled sophisticated neural networks, further accelerating AI development. Models continue to grow in size and sophistication, delivering transformative breakthroughs in image recognition, natural language processing, and recommendation systems. We see AI as a leading catalyst for Industry 4.0, a disruptive technology with broad societal/economic benefits. In this report, we explore key concepts underpinning the evolution of AI from a hardware and software perspective. We consulted more than a dozen leading public and private companies working on the latest AI platforms. We see a large and rapidly expanding AI accelerator opportunity. We estimate AI hardware platform TAM at $105B by 2025, a 34% CAGR.

KEY POINTS

■ AI/ML/DL: Artificial Intelligence enables machines to simulate human intelligence. (ML) is one of the most prevalent AI techniques, where - trained models allow machines to make informed predictions. Within ML, (DL) uses Artificial Neural Networks to replicate the compute capabilities of biological neurons. DL is showing promise in AI research, providing machines the ability to self-learn.

■ Drivers: We highlight three factors driving the latest DL breakthroughs: 1) Rapid Data Growth—global data is expected to reach 180ZB by 2025 (25% CAGR), necessitating AI to process this data and create meaningful inferences; 2) Advanced Processors—the decline of Moore’s Law and shift to have sparked specialized AI silicon development, providing critical performance gains; 3) Neural Networks—DL performance scales with exponential data and neural network model growth.

■ Hardware/Software: As Moore’s Law sunsets, we see diminishing performance gains from transistor shrinkage. engineers are increasingly focused on architectural improvements. The market is seeing a growing trend toward heterogeneous computing, where multiple processors (GPUs, ASICs, FPGAs, DPUs, CPUs) work together to improve performance. Software is critical to accelerated AI performance and seeing corresponding incremental investment.

■ Applications/Markets: AI workloads are classified as Training or Inference. Training is the creation of an AI model through repetitive data processing/learning. Training is compute-intensive, requiring the most advanced AI hardware/software. Generally located in hyperscale DC, we estimate training TAM at $21B by 2025. Inference utilizes a trained model to predict results from a dataset. We see inference increasingly moving to edge devices, improving speed/cost. Led by Smartphones/PCs/IoT//Auto, we see an $84B Edge market by 2025.

■ Competitive Backdrop: NVIDIA is the clear AI leader, with dominant training share (~99%) and growing inference (~20%). Being nimble is key, as competitors Rick Schafer Wei Mok must adapt quickly to a rapidly changing market. Hyperscalers are developing in- 720-554-1119 212-667-8387 [email protected] [email protected] house AI solutions for custom/proprietary workloads, where merchant silicon is not Andrew Hummel, CFA available. Traditional semi vendors are consolidating to strengthen Cloud/Edge AI 312-360-5946 offerings. AI has also inspired a wave of semiconductor startups. [email protected] For analyst certification and important disclosures, see the Disclosure Disseminated: June 3, 2021 23:45 EDT; Produced: June 3, 2021 23:36 EDT Appendix.

Oppenheimer & Co Inc. 85 Broad Street, New York, NY 10004 Tel: 800-221-5588 Fax: 212-667-8229 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Contents

Artificial Intelligence: The Next General-Purpose Plateau of Clock Speeds and the Megahertz Myth ...... 34 Technology ...... 3 Measuring Performance with FLOPS and TOPS ...... 34 AI: The Next General-Purpose Technology ...... 3 Benchmarking AI Training/Inference Results ...... 37 The Industrial Revolution and Industry 4.0 ...... 3 ResNet-50 ...... 37 Single Shot Detection (SSD) ...... 37 Artificial Intelligence ...... 5 Neural (NMT) ...... 37 Background and AI Classification ...... 5 Transformer ...... 37 Artificial Intelligence Fundamentals ...... 7 NLP (BERT) ...... 38 Machine Learning ...... 7 Deep Learning Recommendation Model (DLRM) ...... 38 Training ...... 9 Mini-Go ...... 38 Inference ...... 11 Deep Learning ...... 11 AI Accelerators in Datacenters ...... 40 Artificial Neural Networks ...... 12 Enterprise Servers ...... 40 AI Applications...... 13 Cloud Computing ...... 40 Image Processing ...... 13 Hyperscalers ...... 42 Natural Language Processing...... 13 Datacenter AI Startups...... 44 Recommendation Systems ...... 13 AI Accelerators at the Edge...... 45 Case Study: History of AI Cycles ...... 15 Edge Infrastructure: Cloud and Telco ...... 46 Robotics: Rise of Machines ...... 47 Moore’s Law and the Implications on Semiconductor Autonomous Vehicles: New Age of Transportation ...... 49 Industry ...... 17 Endpoint Devices: PCs, Smartphones, . 53 Moore’s Law: Industry Guide to Innovation in the Last Half PCs and Smartphones ...... 53 Century ...... 17 Embedded ...... 54 Dennard Scaling ...... 18 Silicon IP, Custom Silicon ...... 55 A New Compute Paradigm: Emergence of AI Specialized Silicon ...... 19 Leading Public Companies Developing AI Silicon ...... 56 Achronix ...... 56 AI Hardware: CPU, GPU, ASIC, FPGA, DPU ...... 21 AMD ...... 57 AI Silicon: It Starts with the Hardware ...... 21 Broadcom ...... 57 CPU: and ARM ...... 22 ...... 57 GPU ...... 24 Marvell ...... 58 ASIC ...... 25 NVIDIA ...... 59 FPGA ...... 26 NXP ...... 60 DPU ...... 27 ...... 61 Heterogenous Computing: All Chips Play a Role ...... 29 ...... 61

AI/ML Software, Frameworks/Libraries; Software 2.0 ...... 30 Leading Startup Companies Developing AI Silicon ...... 63 Programming Languages ...... 30 Blaize Semi ...... 63 Deep Learning Frameworks and Libraries ...... 31 Cerebras Systems ...... 63 TensorFlow ...... 32 EdgeCortix ...... 63 Pytorch ...... 32 Flex Logix ...... 64 Caffe ...... 32 Graphcore ...... 64 Keras ...... 32 Groq ...... 64 Scikit-Learn ...... 32 Mythic AI ...... 65 Application Programing Interfaces ...... 33 SambaNova Systems ...... 65 Software 2.0: Software Writing Software ...... 33 SiFive ...... 66 Tenstorrent ...... 66 Measuring Silicon and AI Performance ...... 34

2 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Artificial Intelligence: The Next General- Purpose Technology

AI: The Next General-Purpose Technology

Artificial Intelligence (AI) has the potential to impact societal structures and drive economic growth. AI is often described as a general-purpose technology (GPT). GPT is pervasive across industries, offers clear cost/performance gains, and facilitates innovation. Prior GPT like the steam engine, electricity, and the internet had profound impacts that altered economic and social structures. AI is similarly disruptive and center stage for the phase of industrial development, Industry 4.0. Advances in hardware and software coupled with rapidly evolving machine learning (ML)/deep learning (DL) are allowing the promise of AI to become reality. In this paper, we take a deep dive to explore artificial intelligence and its impact on the technology of the future.

Exhibit 1: Industrial Revolution

Source: Oppenheimer & Co. Estimates

The Industrial Revolution and Industry 4.0

Since the 1700’s, the world has gone through three distinct technological periods, referred to as industrial revolutions. These periods marked turning points in history where technology innovation accelerated productivity, economic growth, and improved daily lives. The first industrial revolution, from 1760 to 1840, harnessed the power of steam. The steam engine shifted production from labor-intensive “made-by-hand” to a more mechanized process, giving rise to the textiles industry and factory system while greatly improving productivity. The second industrial revolution (late 1800s) marked advancements in manufacturing introducing electricity, the internal combustion engine, and the assembly line. This in turn allowed railroads, the telegraph, and automobiles to develop and flourish. The third industrial revolution, also known as the digital revolution or Industry 3.0, was led by the development of the transistor. The transistor is often mentioned as the most important invention of the 20th century. Manufacturing

3 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

improvements and transistor density scaling led to incredible compute performance gains. This paved the way for things we take for granted every day like personal , the internet, and smartphone. As computing performance continues to grow and semiconductors proliferate, we are entering Industry 4.0.

Industry 4.0 was coined by Prof. Klaus Schwab, Founder and Executive Chairman of the World Economic Forum in 2016. Industry 4.0 is defined by a range of new technologies that combine the physical, digital and biological worlds to improve human lives. Markets from retail to healthcare to finance are embracing and adopting new technologies to transform businesses and disrupt their industries.

Industry 1.0 was relatively slow, taking decades to reach mass adoption. Industry 2.0 & 3.0 saw the pace of adoption accelerate with increased globalization and trade. In 1903, about 10% of US households operated a landline telephone. It took nearly ten decades to reach 95% of US households in 2002. In comparison, cellular phones were 10% of US households in 1994 and took two decades to reach 95% by 2018. Industry 4.0 will be marked by rapid breakthroughs in , including artificial intelligence, augmented reality, 5G, blockchain, internet of things, 3D printing, and robotics. AI ML/DL, which allow machines to program machines, will allow for rapid proliferation of Industry 4.0.

Exhibit 2: Share of US Household Telephone/Cellphone Adoption

100% Landline Telephone Cellular Phone 80%

60%

40%

20%

0% 1903 1932 1961 1990 2019

Source: Visualcapitalist, Oppenheimer & Co. Estimates

4 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Artificial Intelligence

Background and AI Classification

Artificial Intelligence (AI) will be a driving force for Industry 4.0. While a common buzzword used today, the term “Artificial Intelligence” is mostly credited to Dartmouth Professor, John McCarthy. In the summer of 1956, he hosted an eight-week research conference where the term was officially coined and the field of artificial intelligence was born. AI is the simulation of intelligence in machines to reflect cognitive behavior similar to humans. Some cognitive functions include perception, learning, reasoning, and problem solving. By its strictest definition, basic “AI” has been around since the beginning of modern computing via rule-based mathematical algorithms to achieve seemingly “intelligent” results. Fast forward to today, improvements in technology, process, data management, and distribution (among others) have made AI “real.”

Artificial intelligence enables machines to replicate characteristics and behaviors of humans. Thus, the primary method to classify machines is based on how well they replicate human-like behavior. AI can be classified into two types. Type1 is based on capability of AI, comparing the machine intelligence to human intelligence. Type2 is based on functionality, classified as the likeness to the human mind and ability to think and feel.

Exhibit 3: Artificial Intelligence Classification

Artificial Intelligence

Type 1 Type 2 (Capability) (Functionality)

Reactive Limited Theory of Self Weak AI Strong AI Super AI Machines Memory Mind Awareness

Source: Oppenheimer & Co. Estimates

Type-1 (capability) artificial intelligence classifies a machine’s level of intelligence and how it compares against human intelligence. This is classified into three different types: I) narrow AI, II) general AI, III) super AI.

I) Artificial Narrow Intelligence (ANI), known as “weak AI” is the most prevalent form of AI that exists today. Narrow AI is designed to perform a specific task and cannot function beyond what it is programmed to do. Some examples include virtual assistants, facial recognition, manufacturing robots, recommender systems, and self-driving cars. Utilizing the density and performance of existing technology, ANI excels at correlations and pattern identification within large datasets. ANI also achieves these results faster and more accurately than humans. As such, ANI could prove a threat to replace traditional human jobs. The below include some modern examples of ANI:

5 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

AI Assistant: (Apple), Alexa (Amazon), Cortana () Language Processing: Translation Engine Recommender Systems: Facebook, Netflix, YouTube, TikTok Autonomous: Robotics, Self-driving cars

II) Artificial General Intelligence (AGI), referred to as “strong AI,” is a concept where machines have the ability to perform intellectual tasks. AGI resembles human intelligence with the ability to autonomously learn and solve problems. AGI can perform perceptual tasks such as vision and language, and cognitive tasks such as reasoning, understanding, and thinking. Currently, there are no AGI systems or technology that exist, but a good amount of progress has been made in this field. Researchers believe neural networks and quantum computing are key to advance AGI. Some examples of AGI portrayed in movies are Jarvis– Ironman and Samantha–Her.

III) Artificial Super Intelligence (ASI) is a theoretical concept and occurs when machines become sentient. Its consciousness and its intelligence surpass the capacity of humans. ASI decision-making and problem-solving capabilities exceed the human thought process, making machines more capable at everything. The potential of ASI is appealing but raises potential concerns around machine self-awareness and may evoke emotions, needs, beliefs, and desires. A wide range of ASI examples are creatively portrayed in science fiction films: HAL–2001 Space Odyssey, TARS–Interstellar, Ava–Ex Machina, and Skynet– Terminator.

Type-2 (functionality) artificial intelligence classifies AI machines based on likeness to the brain and how they “think.” The four types of functionality AI are: I) reactive machine, II) limited memory, III) theory of mind, and IV) self-awareness.

I) Reactive Machine is the simplest form of AI that doesn’t have memory capabilities. Lacking memory, the machine is unable to use past data or experiences to determine future actions. Only when the machine is given present data, will the machine react to it. It is designed to perform specific programmed tasks. These machines have no concept of the world and respond to a limited set of inputs. IBM’s Deep Blue and Google’s Alpha Go are examples of a reactive machine.

II) Limited Memory, as the name suggests, is a machine with some form of memory capability. This machine can make informed decisions based on past data or events, although its memory is short-lived. Most AI applications today fall under this category. Self-driving vehicles are a popular example. The vehicle collects real-time data from other vehicles on the road, lane markings, and traffic lights; and makes decisions along the way. However, the driving experience doesn’t shape the machine long term the same way a human driver learns over many years behind the wheel. New AI models are breaking this mold, as networks are constantly being retrained from data from inference, allowing the model to adapt to changes in road conditions, detours, construction, and new sidewalks, etc.

III) Theory of Mind is an advanced type of AI that can relate to and understand others. A term taken from psychology, theory of mind focuses on the ability for AI to understand the entities in the world it’s interacting with and discern their thought process, needs, beliefs, and emotions. In humans, this is crucial for social interaction and for humans to form civilizations. Theory of mind applied to AI is currently in concept, although a branch of research into artificial emotional intelligence is working to advance this field.

6 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

IV) Self Awareness is the state of AI that has its own conscious and is self- aware. Self-aware AI can understand and evoke emotions similar to humans while having its own thoughts. This type of AI resembles human intelligence.

Artificial Intelligence Fundamentals

In this section, we dive deeper into three fundamental AI concepts: machine learning, deep learning, and neural networks. While all three terms may appear somewhat interchangeable, they are each unique in their own respect. AI is a broader term, outlining all the aspects that make machines reflect human-like qualities. Machine learning (ML) is the algorithmic technique applied to machines giving them the ability to learn without being explicitly programmed by a person. Since machine learning is a subset, all ML is AI, but not all AI is ML. An example of ML would be rule-based systems, such as if-then-else logic statements. Through pre-programmed code, the system replicates human knowledge demonstrating “fake intelligence,” though it lacks the ability to learn. Deep Learning is a newer method of machine learning, utilizing multi-layered artificial neural networks, similar to and inspired by biological neurons. Deep learning can automatically extract, learn, and infer results delivering higher accuracy in human-like tasks such as object detection, speech recognition, and language translation. Given the resemblance to human brains, the availability of computing power, and advances in AI research, deep learning use is increasing broadly.

Exhibit 4: Artificial Intelligence, Machine Learning, Deep Learning Venn Diagram

Source: Oppenheimer & Co. Estimates

Machine Learning (ML) is a subset of AI where trained models/algorithms provide machines the ability to learn from data/experiences. These machines then make predictions on new information without explicit programming. ML has been around since the 1980s in the form of supercomputers for university and government research. More recently, two factors have contributed to the rising prevalence of ML research: 1) rapid growth of data and model data sets, and 2) increased compute processing power. The shift to digital has driven exponential data growth, coming from essentially every device and industry. As the pace of data generated continues to grow, the volume of data has surpassed humans’ general ability to comprehend it. There was roughly 45ZB (zettabytes) of data generated in 2019. The amount of data generated annually is expected to reach 180ZB by 2025, a 26% CAGR. Concurrently, semiconductor innovation has led to a dramatic improvement in compute ability. The fastest

7 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

today, the Fugaku, reached 442 PFLOPs according to Top500. Humans will increasingly rely on automated ML tools to learn from and utilize this vast data lake.

Exhibit 5: Data Generated Forecast (ZB) Exhibit 6: Supercomputer Processing

200 ZB 7 TFLOP 180 ZB 10 Fagaku: 442 106 PFLOP 160 ZB 105

120 ZB 104

103 80 ZB 2 45 ZB 10 40 ZB 101 100 ZB 1990 1995 2000 2005 2010 2015 2020 2019 2020 2021 2022 2023 2024 2025 Note: TFLOPS axis shown in Log10 scale Source: IDC, Oppenheimer & Co. Estimates Source: Top500, Oppenheimer & Co. Estimates

ML is already being used in a wide range of products, though isn’t obvious to most as it powers applications in the background. Applications such as image recognition, fraud detection, and recommendation systems are utilized in a broad variety of applications from shopping and security to transportation and healthcare. ML will eventually be ubiquitous as industries leverage troves of data to improve efficiency and improve the customer/user experience.

The ML process involves seven steps: 1) gathering data, 2) preparing data, 3) model selection, 4) training, 5) evaluation, 6) tuning, and 7) prediction. We note some additional non-trivial steps include deployment, on-going maintenance, and retraining to maintain accuracy. Steps one to three are typically viewed as the preparation stages of ML. The first step is to collect the sample dataset. The data can vary including text, audio, or image formats. The dataset is then scrubbed to ensure normal organization with proper tags and annotations. Quality data is critical for a trained model to deliver accurate predictions. Model selection is important as models vary by and use case. The three main types of ML algorithms include: 1) Regression—simple linear, lasso, logistic, multivariate; 2) Classification—Naive Bayes, K-Nearest neighbors, decision trees, random forest; and 3) Clustering—TwoStep, K-means, Mean-Shift. Density-based spatial clustering of application with noise (DBSCAN), using Expectation-Maximization (EM) clustering using Gaussian Mixture Models (GMM). Neural networking, or deep learning, is an emerging technique, which we will expand on later in this report.

8 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Exhibit 7: Machine Learning Methods

Regression Classification Clustering

Source: Medium ML Concepts, Oppenheimer & Co. Estimates

ML is often split into two distinct phases: training and inference. Training (associated with above steps four to six) refers to the process of using data to develop a predictive model. Inference (step 7) refers to making decisions and predictions based on new data.

Exhibit 8: AI Training and Inference

Source: Intel, Oppenheimer & Co. Estimates

Training is the process of feeding labeled data into a system to develop a prediction model. This process can be repeated multiple times to refine prediction results and improve accuracy.

To better explain training concepts, we can use a standard linear equation represented as y = (m * x) + b. y is the output and x is the input. The dataset has many parameters, tags, and annotations. The collection of these parameters is outlined in a matrix and denoted as model weights (the strength of the relationship between input/output). Weights are represented in the equation as m. Similarly, b is grouped into a matrix of its own denoted as biases. Weights and biases are important concepts in ML as they are the primary variables determining how models are trained. In the training step, randomized weights and biases are used to predict the output. During the first training cycle, models typically yield poor results. As the proper weights and biases are identified, and values are adjusted, the output improves with each successive training cycle. Once satisfied, the model is validated against an untested sample dataset. This evaluation step is used to identify how a model will perform with real world data. The tuning step involves further

9 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

parameter adjustments to improve model results. One example of parameters we can adjust is the number of times we want to train the dataset. Once a target accuracy level has been reached, the trained network is said to have “converged,” meaning it is ready to be optimized and deployed as part of an AI-powered service.

Exhibit 9: Machine Learning Weights, Biases Exhibit 10: Training Cycle

y = m * x + b TRAIN DATA TEST & UPDATE [w, b] OUTPUT WEIGHT INPUT BIAS

m m 1,1 1,2 m m WEIGHTS = 2,1 2,2 m m 3,1 3,2

b b 1,1 1,2 b b BIASES = 2,1 2,2 b b 3,1 3,2 MODEL [w, b] PREDICTION

Source: Google, Oppenheimer & Co. Estimates Source: Google, Oppenheimer & Co. Estimates

Training is compute and time-intensive (potentially taking days to weeks), particularly compared to inference, due to the number of steps, repetition, the volume of data, and model complexity. For this reason, training is mostly done in large scale datacenters (i.e., Amazon, Microsoft, Google, research supercomputers, etc.) with better access to resources and the latest computing capabilities. To date, training has been dominated by GPU compute, representing ~99% of the market. A confluence of factors, including the slowing of Moore’s Law and the recognized need for AI-optimized silicon, have ushered a new wave of semiconductor innovation. New ASIC-based startups have emerged vying for a piece of the training market. Additionally, cloud hyperscalers are increasingly designing chips in-house aimed at this market.

There are different training methods in ML. We touch on the main ones here.

Supervised Machine Learning is the most common form, where the ML model is trained on a labeled data set. The results are compared to the output, and parameters are calibrated with human assistance. The model is easier to train with this approach because of the high degree of human involvement in both labeling the data properly and adjusting parameters of the model.

Unsupervised Machine Learning is used on unlabeled data. The machine looks at a dataset and attempts to decipher patterns, without human intervention. The goal is for the machine to identify patterns and recognize some form structure humans would miss.

Semi-supervised Machine Learning offers a mix between the prior two methods. In this method, a model is trained on a small labeled dataset. The results are then used to label features on an even larger unlabeled dataset. The combined two datasets are then trained together to develop the model.

Reinforcement Machine Learning learns from its interactions. This method is considered more difficult given the algorithm isn’t fed a training dataset. Instead, it learns on-the-go through trial and error. It builds a dataset from gathering data 10 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

of its actions and the results it produces. This method is often used in robotics, gaming, and navigation. As such, reinforcement learning was used in Google’s AlphaGo to beat humans in the board game Go.

Inference is the end result of the machine learning process. Inference refers to the deployment of a trained model to make accurate predictions (inference) using real world data. At this stage, the value of machine learning is realized. Compared to training, performing inference on new data is less resource-intensive. For this reason, general purpose have historically been “good enough.” While inference can be performed in the datacenter, workloads are increasingly performed on edge devices. Applying inference on the local device, closer to the data source, removes latency and yields faster real-time results. vision, voice recognition, and language processing leverage ML inference.

Deploying AI inference comes with a unique set of challenges. Some include load- balancing, auto-scaling, and maintaining high-utilization for cost optimization. Open- source orchestration platforms such as Kubernetes are playing critical roles in deploying, managing, and scaling software container applications. Additionally, inference needs to meet certain accuracy requirements, latency budgets, and service-level agreements (SLAs).

Training and inference are often viewed as distinct sub-segments of the machine learning process. Early AI processor/accelerator products were generally targeted toward, and utilized for, specific training or inference workloads. As technology and use-cases have evolved, AI is seeing an increasing convergence of training/inference workloads on the same chip. Major AI players have introduced products applicable for both training/inference workloads.

Exhibit 11: Neural Network Model Parameters (B) Exhibit 12: Deep Learning Performance Scales with Data

200B GPT-3

160B

120B ONE

80B

40B Turing Megatron NLG LM T5 BERT GPT-2

B Source: Towards Data Science, Oppenheimer & Co. Estimates

Jan-20

Jun-20

Oct-18

Feb-19 Feb-20 Aug-19 Mar-20

Source: TensorFlow, Oppenheimer & Co. Estimates

Deep Learning is a specific method of machine learning, where training is based on complex sets of algorithms called neural networks. Deep learning algorithms attempt to draw conclusions by continually analyzing data within a given logical structure. ML excels at structured data to process large volumes of unstructured data (e.g., images, voice). An advantage of deep learning over ML is the ability for feature extraction. In deep learning, a machine analyzes and discovers features from datasets on its own, without human 11 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

intervention. This proves valuable considering the unstructured nature of real-world data. One simply feeds raw data into the algorithm and the rest is done by the model. Deep learning performance is correlated to the amount of data used for training. Natural language model parameters are increasing 10x a year (GPT-3 has 175B parameters). As shown in Exhibit 12, older machine learning algorithms generally reach a performance plateau with data, while deep learning performance scales with increased data.

AI researchers have embraced deep learning. Deep learning artificial neurons are inspired by biological neurons (Exhibits 13 and 14 below). The goal of an artificial neuron is to mimic the function of its biological counterpart. Data inputs resemble the dendrites, the node or summation function resemble the nucleus, and the activation function (output) resembles the axon terminals, which are connected to dendrites in other biological neurons. When connecting multiple artificial neurons, we get a multi-layered web of algorithms called neural networks.

Exhibit 13: Biological Neuron Exhibit 14: Artificial Neuron

Source: Towards Data Science, Oppenheimer & Co. Estimates Source: Towards Data Science, Oppenheimer & Co. Estimates

Artificial Neural Networks are computational networks inspired by biological brains. In the context of AI, neural networks are a collection of artificial neurons, or interconnected nodes (mathematical functions), working together to create the deep learning algorithm. The basic structure of a neural network contains three main layers: 1) input layer, 2) hidden layer, and 3) output layer. The input layer represents the external raw data fed into the neutral network. The real training process is performed in the hidden layer. The hidden layer includes a collection of nodes (or neurons) that perform mathematical calculations. The term “deep” refers to the hidden layers in a neural network. A traditional neural network contains 2–3 hidden layers as shown in Exhibit 15. Deep neural networks can have as many as 150 layers and are continuing to grow. The connections between nodes are assigned weights that represent the node’s importance. At each node, calculations are performed based on the associated weight. The results are summed, and a bias term is added. The results are then passed through an activation function, which determines whether the calculation satisfies certain criteria. If it doesn’t, no data is passed from that node. If it meets the criteria, data is passed to the next layer in the network and the process is repeated until data arrives in the output layer. There, the final result is presented. The initial data run through a neural network may yield undesired results. A key concept to neural networks is the ability for back-propagation. After comparing the predicted result with the desired result, the algorithm could travel back from the outer layer to the earlier nodes in the hidden layer and self-adjust the weights. The process repeats and in essence, the model “trains itself” until the desired output is achieved.

12 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Exhibit 15: Neural Network

Source: Oppenheimer & Co. Estimates

AI Applications

Artificial intelligence has the disruptive ability to impact applications across industries. While AI is utilized in a variety of applications today, its deployment remains in the very early innings relative to its eventual potential. Some examples of common AI applications include virtual assistants, image recognition, fraud detection, spam filtering, navigation systems, and electronic payments. Underpinning these applications are three fields of AI research concentrated on extracting and understanding real world data: 1) image processing, 2) natural language processing, and 3) recommendation systems. In this section, we explore these applications and how AI is utilized across various industries.

Image Processing involves the application of mathematical functions and transformations (e.g., smoothing, sharpening, contrasting, and stretching) to raw images. By adjusting or tuning certain parameters, the resulting image can yield higher resolution, normalized brightness, improved contrast ratios, cropping, edge detection, and other enhancements. The processed image increases a machine’s ability to analyze specific data elements. Image processing is a subset of computer vision, which attempts to teach machines how to perceive and understand visual information, with the goal of replicating human vision traits. Rapid advancements in ML algorithms, specifically deep learning, along with improved compute processing power has enabled real-time image inference.

Natural Language Processing (NLP) provides machines the ability to read, interpret and derive meaning from human languages. Traditional computer languages (e.g., C#, Java, Python) follow a specific instruction set that runs when free of syntax errors. Human language or natural languages contain complex features such as large varied vocabularies, grammar/syntax rules, semantics, word play, context, linguistic differences, etc. This makes human languages diverse and complex; however, humans can effectively interpret these contextual complexities. These complexities in human languages are difficult to seamlessly program into machines. For these reasons, NLP is more challenging than image processing. For example, parameters in image processing models are growing ~2x/year while NLP is growing ~10x/year.

Recommendation Systems provide personalized products, services and information to users based on data analysis. The recommendation system can derive predictions from a variety of factors, including the behavior and history of

13 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

the user and/or other users. Recommendation systems are being used to transform the digital world and reflect the experiences, behaviors, preferences, and interests of the user. They provide quick and to-the-point suggestions tailored specific to each user’s needs and preferences. These applications already operate behind the scenes of many content providers. As such, recommendation systems play an essential role in the generation of accurate search results (Google), streaming content (Netflix), online shopping (Amazon), social media (Facebook), and so on. Past recommendation systems largely relied on traditional machine learning techniques (i.e., content-based filtering, collaborative filtering). The emergence of deep learning is driving renewed interest, as new models deliver meaningful improvements over traditional methods.

Exhibit 16: Examples of AI Applications

Image Processing Natural Language Processing Recommendation Systems Facial Recognition Sentiment Analysis Online Shopping Object Detection Speech Recognition Movie Suggestions Chatbots Customized Playlist Color Enhancement Text Extraction Personalized News Detection Machine Translation Job Recommendations Handwriting Recognition Intent Classification Search Queries Image Restoration Urgency Detection Restaurant Suggestions Pattern Recognition Text Summarization Social Networking

Source: Algorithmxlab, Edureka, Oppenheimer & Co. Estimates

Industries

AI, as a general-purpose technology, has use-cases spanning nearly every industry. Its pervasiveness drives innovation, transformation, and generates broad economic opportunities for its adopters. Exhibit 17 outlines several industries adopting AI and some common applications.

Exhibit 17: Select Industries and Examples of AI

Industries Examples Agriculture Crop and Soil Monitoring, Agricultural Robots, Predictive Analytics Education Personalized Learning, Voice Assistants, AI Tutors Finance Algorithmic Trading, Fraud Detection, Credit Underwriting Healthcare Drug Discovery, Interpretation of Images, Robot Assisted Surgery Law Review Processes, Draft Documents, Analyze Contracts, Predictive Results Media & Entertainment Video Quality Upscaling, Image Restoration, Personalized News Recaps Manufacturing Industrial Automation, Domestic Robotics Transportation Robotaxis, Drones, Space Exploration, Travel Assistance

Source: Algorithmxlab, Edureka, Oppenheimer & Co. Estimates

14 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Case Study: History of AI Cycles

The history of artificial intelligence has been characterized by two hype cycles (with booms and busts). Hype is common with emerging technologies such as the railway mania in the 1840s, the dot-com bubble in 1990’s and most recently the crypto bubble in 2018. As Gartner describes, the potential of new technologies triggers interest, publicity then fuels growth to a “peak of inflated expectation” followed by a steep decline to a “trough of disillusionment” as interest wanes and technologies fail to deliver. For AI, the confluence of exuberance and elevated expectations led to subsequent fall-out periods known as AI winters—periods where interest, research, and funding declined. In this section, we explore prior boom cycles and AI winters, and highlight reasons we believe AI is at the beginning of another boom and here to stay.

Golden Years—1950s—The golden years of AI came during the 1950s, which saw an explosion of ideas and concepts in the field of “thinking machines.” The famous Turing Test, a test to determine if a machine can imitate a human, was performed by Alan Turing in 1950. This period saw other groundbreaking innovations such as semantic networks (a precursor to neural networks), the perceptron (a simple conception of a neuron that could recognize 20x20 pixel images) and the first 250-word English-Russian translation. This period of innovation culminated at the Dartmouth Conference in 1956, where the term artificial intelligence was coined. A lot of hype and potential prompted funding from DARPA. Machine translation was particularly important given the geopolitical climate of the cold war. With plentiful government funding and breakthroughs in research, optimism was high for the future of AI.

AI Winter—1970s—Inflated expectations and a failure to deliver on those promises led to the first AI Winter in the 1970s. Reports commissioned by government agencies to update the state of AI research concluded it was expensive, made little progress, and unlikely to achieve anything useful. In retrospect, the demise can be attributed to two obstacles: 1) Limits on the definition of AI and what problems it could solve; and 2) Limited computational power. There was little to no progress made on machine translation. Eventually DARPA, the primary contributor to AI research, pulled its funding.

AI Boom – 1980s—Interest in AI research was rekindled in the 1980s with the rise of “expert systems” such as XCON, LISP Machines, and Symbolics. These machines simulated human decision-making to answer questions on a specific knowledge domain. These systems followed rule-based “if-then” logic and were implemented by corporations to reduce labor costs. Expert systems helped automate and simplify decision-making tasks for industries. Government funding returned to support expert systems and also attracted interest from the public, including the tech sector. Once again, media and hype on AI elevated expectations.

AI Winter II—1987 to 1993—The start of the second AI Winter began in 1987. Expectations for expert systems had spiraled out of control. Expert systems were criticized for being expensive, lacking common sense results, and not “true AI”. Queries for information yielded convoluted pre-programmed answers. Expert systems were difficult to update, could not learn, and were unable to handle multiple inputs. Meanwhile, alternative computing devices in desktop computers from Apple and IBM were steadily improving compute performance and gaining commercial adoption. With the crash of world financial markets in 1987, companies and governments pulled back funding for AI research. Eventually, expert systems became obsolete with the rise of personal computers.

15 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

AI Boom—2010s to Present—The tech sector saw tremendous growth during the mid-90s and with advances in compute performance (hardware & software) and the evolution of the internet. During this time, fundamental aspects of the technology ecosystem laid the groundwork for more realistic AI expectations. Thus, a new AI boom was triggered in the early 2010s with advances in machine learning and deep learning techniques. Neural networks became a reality as data collection/storage capabilities were in abundance and compute resources were more affordable. Government investments have picked up along with enterprises, academics and venture capital. We believe we’re in the early stages of a “forever” AI growth cycle.

Exhibit 18: Artificial Intelligence Historical Hype Cycle

Source: Oppenheimer & Co. Estimates

While earlier AI cycles brought innovation, over-inflated expectations led to disappointment. Over the last few decades, the field of AI has steadily progressed with a more grounded approach. Interest in AI has picked up and “artificial intelligence” has once again become a buzzword, leading to speculation of another hype cycle. We believe the current AI cycle is in the early stages, and is more durable than past cycles for several reasons. 1) As opposed to over-inflated expectations in prior cycles focused on lofty end goals, our discussions with AI thought leaders suggest more realistic expectations on silicon hardware, full stack software, systems integration, neural network models, cloud computing, cloud-to-edge, and ML/DL techniques. 2) Whereas AI development in prior cycles was led by research institutions and government funding, today, commercial enterprises have made AI core to the future of their businesses. Large tech companies have dedicated AI research teams and venture capitalists are increasingly making investments. 3) Prior cycles failed to deliver viable products. Today, we’re seeing AI applications grounded in practicality, with products including image tagging, personalization of news feeds, product recommendations, voice-to-text translations, grammar/spelling checks, and personal AI chat-bots to name a few. AI remains an evolving field with repeated changes in sentiment. In our view, AI is here to stay.

16 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Moore’s Law and the Implication on Semiconductor Industry Moore’s Law: Industry Guide to Innovation in the Last Half Century

Semiconductors are the foundation of the digital age. In 1956, the same year “Artificial Intelligence” was coined in Dartmouth, NH, William Shockley founded Shockley Semiconductor Laboratory in Palo Alto, CA. This would be the first commercial establishment to develop semiconductors and lay the foundation for what would become “Silicon Valley.” Since the introduction of the microprocessor, the pace of innovation has been relentless giving birth to new technologies like , the internet, and smartphones. Increasing of integrated circuits have been produced every year, and in 2018 over one trillion integrated circuits (ICs) were produced. Already ubiquitous in everyday devices, IC proliferation continues unabated. For the most part, silicon innovation has relied on transistor shrinkage for performance gains. As transistors reach their physical limitations and demand for unique AI workloads grows, we’ll see less emphasis on increased transistor density and more focus on performance optimization and acceleration.

Moore’s Law has been viewed as the IC industry’s innovation road map for the last 55 years. Integrated circuits are assembled by building many transistors into a chip instead of individually wiring them together. In 1965, Gordon Moore, who co-founded Intel, published his observations on transistors in integrated circuits in an article titled, “Cramming more components onto integrated circuits.” From his observations, Moore predicted the number of transistors in an integrated circuit would double every year, revised in 1975 to the doubling of transistors every two years. His prediction became Moore’s law and foretold future technologies, including “home personal computers, automatic controls for automobiles and personal portable communicating equipment,” which have now all become ubiquitous. Companies and research institutions alike regarded Moore’s law as a principal guideline, dedicating time and resources to ultimately fulfill this prophecy. Companies like Intel and AMD rode the wave, sustaining growth through general purpose CPUs. The first commercial microprocessor in 1971, the Intel 4004 had a of 2,250. By comparison, today AMD’s EPYC (Rome) processor has 39.5B transistors and AI startup Graphcore’s 2nd gen Colossus MK2 processor contains 59.4B transistors. While these trends apply to microprocessors, similar trends can be seen in other ICs. For example, in NAND memory, price per gigabyte has decreased 37% annually since 2000.

Exhibit 19: Growth of CPU Transistors (No. of Transistors) Exhibit 20: NAND Flash ASP per GB ($/GB) 1011 AMD $10 10 10 Rome $7.94 9 $8 10 Pentium IV 8 10 $6 107 106 Intel 8086 $4 5 $1.86 10 $2 $0.64 104 $0.31 $0.12

103 $0

2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 1970 Note: Transistors axis is Log10 scale Source: Oppenheimer & Co. Estimates Source: Gartner, Oppenheimer & Co. Estimates

17 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

The doubling of IC transistors was made possible by shrinking the dimensions of transistor nodes, in turn, improving performance and power efficiency. For example, the Intel 4004 was manufactured on a 10-micron (10,000nm) node while the AMD Rome and Graphcore Colossus MK2 processors are manufactured on 7nm geometry 1,400x smaller. Chip makers are already manufacturing on smaller 5nm nodes, with 3nm projected for volume production in 2H22 (per TSMC).

Performance gains started to moderate, despite successive node shrinkage, beginning in 2005 with the demise of Dennard Scaling. Linked to Moore’s law, Dennard Scaling stated that as transistors get smaller, their power density remains constant, allowing manufacturers to raise clock frequencies 1.4x per generation without impacting power consumption and heat generation. Therefore, integrated circuits were 40% faster every generation. This concept held true until transistors approached sub 65nm nodes and clock speeds approached 4.0 GHz. The effects of quantum mechanics presented a greater challenge. One of these effects is quantum tunneling—a phenomenon where electrons jumped the barrier between the source and drain, since the distance between them was so small. This leakage created excess heat generation and higher energy consumption. It became apparent the tradeoff between higher clock speed and power were unfavorable. Thus, a collective decision by the industry capped clock speeds at 4.0 GHz and shifted focus to architectural design.

Exhibit 21: Moore’s Law

8 10 Transistors (thousands) 107

106 Single Thread Performance 5 10 (SpecINT x103)

104 (MHz) 103

Power (Watts) 2 10 Number of 101 Cores (#)

100 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020

Source: Github/Karlrupp, Oppenheimer & Co. Estimates

Over time, innovation in multi-core designs, architectural enhancements (e.g., FinFET, SuperFiN), chiplets, 3D scaling, advanced packaging techniques, and scaling transistors have contributed to performance increases. Unfortunately, these performance gains were more gradual than the speed-ups from significant clock frequency increases. Multi-core processors also faced another phenomenon known as dark silicon. With dark silicon, power levels are unable to efficiently support the operation of all cores, resulting in non- operational transistors spanning as much as 80% of chip area in a 5nm node. The benefits of incrementally smaller transistors were less apparent from a performance/cost perspective. Performance/speeds have plateaued while costs have been rising. Packing 18 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

transistors closer together continues to increase heat and resistance thus negates any speed and power gains. Higher cost equipment such as advanced EUV further fuels rising capital intensity. Global semi equipment spending topped $107B in 2020, a 4x increase since 2009 and is forecast to exceed $130B by 2025. These challenges are most apparent from Intel, the semiconductor process leader for most of the last 50 years. Recently, Intel has struggled with multi-year production delays at its 10nm node and has also pushed back its 7nm roadmap.

Exhibit 22: Worldwide Semiconductor Capital Equipment Spending ($B)

$140 B 40%

$120 B 30% $100 B 20% $80 B 10% $60 B 0% $40 B $20 B -10%

$ B -20%

2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 Semi Captial Spending ($B) Y/Y

Source: IDC, Oppenheimer & Co. Estimates

A New Compute Paradigm: Emergence of Specialized AI Silicon

Moore’s law is rooted in the physics of transistor shrinkage. The industry has extended Moore’s law of late mostly through physical enhancements. We’re seeing an increased focus on raw computing performance via new hardware and software methodologies. Researchers have increasingly turned to GPU hardware as AI computation outpaces Moore’s law. An analysis in 2018 from OpenAI shows prior to 2012, AI tracked closely with Moore’s Law—compute performance doubled every two years. Post 2012, AI training performance has doubled every three months (growing 300,000x since 2012). More recently, NVIDIA has suggested AI models are doubling every two months. This performance is attributable to a confluence of new deep learning algorithms coupled with specialized silicon hardware and software.

19 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Exhibit 23: Two Distinct Eras in AI Training Models (PFLOP/s-days)

104 AlphaGoZero Neural Machine 102 Translation

[Y VALUE]00 2 Year Doubling (Moore's Law) AlexNet ResNets 10-2

Deep Belief Nets 10-4 DQN

10-6 TD-Gammon v2.1 3.4 Month Doubling 10-8 NETtalk LeNet-5

10-10 1980 1985 1990 1995 2000 2005 2010 2015 2020

Source: OpenAI, Oppenheimer & Co. Estimates

AI performance saw a significant boost with the 2012 release of the AlexNet model. AlexNet used GPU accelerators and convolutional neural networks to top its competitors (by a wide margin) in an image classification competition. CPUs are serial, typically with 2–64 cores computing sequentially. Alternatively, GPU architectures consist of thousands of cores, allowing for thousands of computations running in parallel. Initially utilized for rendering graphics at high frame rates (largely for gaming), GPUs found a massive new market opportunity with high performance compute (HPC). NVIDIA, the high-performance GPU leader, was quick to capitalize on this opportunity, releasing software and GPUs dedicated to AI and HPC applications. As a result, NVIDIA’s market value increased >50x from ~$7B in 2012 to >$400B today, surpassing Intel to become the most valuable semiconductor company. NVIDIA’s AI success hasn’t gone unnoticed, leading to the rise of multiple AI start-ups. Some in the industry argue GPUs are not optimized for AI workloads while others envision lower cost purpose-built silicon to accelerate specific AI workloads. FPGAs and ASICs figure to play a significant role in the growth of AI. A new crop of AI entrepreneurs are designing chips from the ground up, specialized for AI. Hyperscale cloud service providers like Google, Amazon, etc.) have simultaneously pursued in-house AI chip design tailored to their proprietary workloads. As Moore’s law winds down its amazing 50-year run, new innovations in AI silicon will define computing for the next half century.

20 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

AI Hardware: CPU, GPU, ASIC, FPGA, DPU

AI Silicon: It Starts with the Hardware

The use of artificial intelligence in datacenters and edge computing has created a market for specialized AI hardware. Given the slowing of Moore’s law and the shift to heterogenous system architectures, we’re seeing expanding opportunities for all types of processors. We estimate the global AI hardware platform market at $25B in 2020, and forecast to grow at a 34% CAGR to $105B TAM by 2025. We estimate GPU reaching 25% of the AI hardware platform market by 2025. We see the more nascent ASIC market as the fastest growing, exceeding 10% of the AI hardware platform market by 2025. We see the DC/Cloud infrastructure market at $6B in 2020, growing to $21B by 2025 and the larger edge market at $19B in 2020, expanding to $84B or 80% of the AI market by 2025.

Exhibit 24: AI Hardware Platform by Processor Type Exhibit 25: AI Hardware Platform by End Market

$120B $120B $105B $105B $100B $100B

$80B $80B

$60B $60B

$40B $40B $25B $25B $20B $20B

$B $B 2020 2021 2022 2023 2024 2025 2020 2021 2022 2023 2024 2025 CPU GPU ASIC FPGA Other Edge DC/Cloud Infra

Note: Other includes DPU, MCU, DSP, and ASSP Note: Edge includes Auto, IoT, and Endpoint (PC, Smartphones, Tablets, etc.) Source: IDC, Gartner, Oppenheimer & Co. Estimates Source: IDC, Gartner, Oppenheimer & Co. Estimates

There are five main hardware technologies used for AI/ML workloads: (CPU), (GPU), field-programmable gate array (FGPA), application-specific integrated circuit (ASIC), and (DPU). For most of computing history, the CPU has been the primary system processor, complemented by co-processors. Co-processors are specialized hardware used to offload specific tasks, such as mathematical calculations or sound and graphics, thereby alleviating the burden on the CPU. The CPU has preserved its staying power, propelled by consistent performance improvements and the versatility of its general-purpose nature. AI requires unique techniques and computations, requiring new architectures to meet dynamic performance needs.

Training a neural network involves the use of billions of data points and calculations. Parameters and weights are repeatedly updated and re-prioritized to optimize a model. The sheer volume of data and these repetitive computations have implications for hardware preference. Serial computing—calculations that are performed sequentially or one at a time—has traditionally been dominated by CPU. However, as new processor architectures have come to market, CPUs have been displaced for training, as serial speeds struggle to keep pace with parallel compute. —performing multiple calculations simultaneously—has become essential to driving higher throughput and faster training speeds. Additionally, given the large data volumes flowing through 21 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

compute and memory cores, proximity between the two plays a critical role in enabling faster speeds and efficiency. Inference has a unique set of power and performance requirements, whether executed in a datacenter or on an edge device. Accordingly, researchers utilize different hardware to optimize performance for different AI use cases. Chip designers, compelled to increase AI compute performance, have integrated AI- specific cores into processors or have built entirely new architectures to accelerate AI/ML workloads. These AI-specific hardware designs are referred to as AI accelerators. Here we breakdown some applications of AI accelerators.

Exhibit 26: Serial Computing Exhibit 27: Parallel Computing

Source: Oppenheimer & Co. Estimates Source: Oppenheimer & Co. Estimates

CPU: x86 and ARM

The Central Processing Unit (CPU) is the main processor in a computer system responsible for executing the binary calculations (1’s and 0’s) of an instruction set. Often referred to as the “brain” of your device, the flexibility of CPUs led to its dominance for most of modern computing. CPUs are ubiquitous. In AI, while CPUs are not ideally equipped to support the intensive demands of neural network training, they are still widely used for inference workloads.

CPUs are general purpose processors capable of effectively performing computation. In 1978, Intel released the 8086 processor, setting the precedent for the x86 processors manufactured today. These processors are based on the Complex Instruction Set Computer (CISC) instruction set. With the success of the x86 processor family and continued performance gains from Moore’s law, Intel established dominance in the desktop/notebook PC and server markets. Today, Intel and AMD are the primary designers of x86 processors, with Intel still holding dominant market share, although AMD has recently gained significant ground. The complexity of x86 processors allows for higher peak performance but with higher power consumption levels relative to ARM (discussed below).

Exhibit 28: PCs (Desktop and Notebook) x86 CPU Units Market Share Exhibit 29: Server x86 CPU Units Market Share 100% 100%

80% 80%

60% 60%

40% 40%

20% 20%

0% 0%

2014 2015 2016 2017 2018 2019 2020

2Q16 4Q13 2Q14 4Q14 2Q15 4Q15 4Q16 2Q17 4Q17 2Q18 4Q18 2Q19 4Q19 2Q20 4Q20 Intel AMD Intel AMD Source: IDC, Oppenheimer & Co. Estimates Source: IDC, Oppenheimer & Co. Estimates

22 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

ARM (Advanced RISC Machines) processors have become increasingly relevant as electronics transition to smaller, more portable form factors and larger devices strive for greater power efficiency. The Reduced Instruction Set Computer (RISC) architecture has a simpler, less complex instruction set compared to x86 (CISC), therefore consuming less power. This attribute makes ARM desirable for smartphones, tablets and IoT devices as is critical due to a battery’s limited power supply. By its nature, x86 processors were incompatible with device portability and better battery life. As such, the electronics industry turned to ARM processors due to the simpler design, lower cost, better power efficiency and lower heat generation. ARM-based processors are beginning to gain traction with laptops and even in server, encroaching on markets where x86 dominates. Notably, NVIDIA recently (April 2021) announced its first CPU Grace, an ARM- based processor designed for large scale AI and HPC in the datacenter (launching in 2023).

Exhibit 30: ARM Chips Sold (Billions)

30 25 25 190B+ Cumulative 20 ARM Chips Sold

15

10

5

0

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

Source: ARM, Oppenheimer & Co. Estimates

Arm Ltd developed the ARM architecture and licenses its technology platform to chip designers/OEMs (e.g., Apple, Qualcomm, Samsung, etc.), who in turn, integrate ARM into their own SoC (system on chip) designs. ARM generates revenue from: 1) upfront license fees; and 2) royalties per chip. As of 2020, there were more than 190 billion ARM-based processors in devices cumulatively worldwide (>25B in 2020 alone). SiFive is an emerging competitor with technology based on RISC-V. RISC-V is an open-source architecture that allows for customization of instruction sets to optimize performance and power for specific tasks.

CPUs are fairly generic and less practical to handle intense AI/ML training workloads. CPUs execute calculations sequentially vs. parallel. Most computers today still use the , developed in 1945. Since the design specified separate memory and computing cores, performance is limited by the speed of memory access, since the connection is shared through a common bus. This created a data flow choke point known as the “von Neumann bottleneck.” Despite this limitation, CISC CPUs (e.g., Intel Xeon, AMD EPYC) are still used for datacenter inference workloads. RISC-based ARM processors have emerged for datacenter general purpose compute (e.g., AWS Graviton and Kunpeng) and for inference workloads (e.g., AWS Inferentia). RISC and RISC-V will be significant for inference in edge devices. RISC-V has also appeared in datacenter AI use cases (e.g., Alibaba Hanguang).

23 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

GPU

Graphics Processing Units (GPUs) are specialized processors originally designed to offload and accelerate image rendering, animation, and video processing from the CPU. Today, advanced GPUs perform complex calculations for big data research, HPC, AI/ML, and training neural networks.

On-board graphics co-processors for consumer applications existed in the ‘80s, until video game graphics transitioned from 2D to 3D in the ‘90s. This led to the rise of add-in graphics cards referred to as 3D accelerators. The release of NVIDIA’s iconic GeForce 256 in 1999 was marketed as “the world’s first GPU” giving recognition to this new category of programmable graphics accelerator.

Gaming graphics present a computationally intensive workload. Gaming has become increasingly complicated as games require faster frame rates, environments mimic real world physics and visuals approach photorealism. Games require billions of computations per second. AR (augmented reality) and VR (virtual reality) acceptance will accelerate this trend. Computational complexity essentially required GPUs to be parallel computing devices. Unlike CPUs with two to 64 cores, GPUs contain thousands of cores enabling massive parallel processing. For example, NVIDIA’s GeForce 3090 RTX has 10,496 cores. All else equal, the higher the core count, the higher the processing power. NVIDIA brands its GPU cores as “CUDA cores,” while AMD calls theirs “Stream Processors.” With significant success in graphics workloads, data scientists began applying massive parallel computing capability to other applications.

Exhibit 31: PCs (Desktop and Notebook) GPU Units Market Share Exhibit 32: Server GPU Units Market Share 100% 100%

80% 80%

60% 60%

40% 40%

20% 20%

0% 0%

2016 2017 2018 2019 2020

2Q14 4Q14 2Q15 4Q15 2Q16 4Q16 2Q17 4Q17 2Q18 4Q18 2Q19 4Q19 2Q20 4Q20 4Q13 Nvidia AMD Nvidia AMD

Source: IDC, Oppenheimer & Co. Estimates Source: IDC, Oppenheimer & Co. Estimates

In the mid-2000s, researchers realized GPUs were well suited to handle high performance computing (HPC) applications such as machine learning, oil exploration, weather modeling, scientific computing, genomics, financial risk modeling, etc. These applications accelerated when NVIDIA launched CUDA, a computing platform/API that streamlined software programming to harness the multi-core performance of GPUs for general purpose computing (GPGPU). GPGPU computation targeted workloads traditionally done by CPUs and extended the GPU’s applicability beyond graphics. Arguably, the most mainstream or widely adopted GPU application is artificial intelligence, and more specifically deep learning.

24 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Recognizing the GPU’s value and AI market opportunity, NVIDIA released the Tesla V100 GPU in 2017. The Tesla V100 featured new “Tensor Cores,” designed specifically for AI neural network computing. Driven by its best in-class AI accelerators, NVIDIA became the market leading AI hardware provider with ~99% datacenter training share. NVIDIA has since launched its T4 Tensor Core GPU AI accelerator for inference and the Ampere- based A100 for training/inference workloads. (Discussed in more detail later in this report). The success of GPU led other manufacturers to re-think AI-specialized hardware. AMD repurposed its GPGPU into its RDNA (Radeon for gaming) and CDNA (Compute for HPC/AI) technologies, the latter driving its Instinct MI100 GPU for datacenters. Additionally, after exiting the discrete GPU market on more than one occasion, Intel has returned with its Xe product line.

Exhibit 33: NVIDIA Gaming GPU FP32 TFLOP Exhibit 34: Datacenter AI Accelerator GPU

GTX 1650 400 GTX 1060 GTX 1660 300

RTX 2060 GTX 1070 RTX 2070 200 GTX 1080

RTX 2080 FP32 TFLOP RTX 3070 100 RTX 3080 RTX 3090 0 FP32 TFLOP 0 10 20 30 40 V100 A100

Source: NVIDIA, Oppenheimer & Co. Estimates Source: NVIDIA, Oppenheimer & Co. Estimates

ASIC

Application Specific Integrated Circuits (ASICs) are custom designed integrated circuits for specific applications. While GPU generally outperforms CPU for AI/ML workloads, there are arguments against GPU optimization for specific artificial intelligence applications. As such, a wave of startup companies developing AI-purpose built silicon have emerged to challenge GPU dominance.

The recent increase in ASIC designs for the AI accelerator market has brought with it wide variations in naming parlance. Neural processor unit (NPU) and (DLP) are commonly cited. Google calls its AI accelerator “the TPU” (tensor processor unit). Startup Graphcore calls its processor “the IPU” (intelligence processing unit). The vision processor unit (VPU) for machine vision has been used by Intel. Other references include neural net processor (NNP), reconfigurable dataflow unit (RDU) and graph streaming processor (GSP), among others. These chips generally contain AI-specific cores that accelerate deep learning algorithms through parallel computing.

As ASICs are tailored for specific (fixed) applications, they should theoretically yield higher performance with better power efficiency vs. programmable solutions. In reality, there are a number of factors determining performance, power and cost that manufacturers need to consider. Designing a new ASIC chip requires substantial capital investment and needs an experienced engineering team. Development needs to be frequently updated (e.g., new generations) to keep up with the latest neural network models and manufacturing processes. Additionally, there needs to be high volume demand to justify the cost. The higher the volume, the lower the production cost per chip. Lastly, software design is

25 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

proving increasingly important for optimal ASIC performance, particularly in AI. While hardware design is important, software helps organize and prioritize data flow through the processor, thus helping with performance, power, cost, and accuracy. While some ASIC designers go it alone, others have partnered with existing scaled ASIC players to take advantage of engineering and manufacturing resources (e.g., Google TPU uses Broadcom; Groq TSP uses Marvell). By partnering, the design process can be more efficient—in terms of functionality, cost, and timeline. Some notable AI accelerator startups for the datacenter include Cerebras Systems, Graphcore, Groq, Habana Labs (Intel), and SambaNova while companies like Blaize and Mythic focus more on edge compute.

Exhibit 35: Notable AI Startups with Custom ASIC Solutions

Company Processor Node High Level Attributes

Blaize El Cano Graph Streaming Processor (GSP) 14nm 16 TOPS of INT8 Compute, 7 Watts Power Consumption 2 Cerebras Systems Wafer Scale Engine-2 (WSE-2) 7nm 46,225mm ; 2.6Trillion Transistors; 850,000 AI-Optimized Cores Graphcore Colossus MK2 GC200 Intelligence Processing Unit (IPU) 7nm 59.4Billion Transistors; 250 TFLOPS, FP16 Performance Groq GroqChip Tensor Streaming Processor (TSP) 14nm 250 TFLOPS, FP16 Compute, 1,000 TOPS INT8 Performance Mythic M1108 Analog Matrix Processor (AMP) 40nm Analog/Compute-In-Memory Architecture, 35 TOPS INT8 Compute SambaNova Systems Cardinal SN10 Reconfigurable Dataflow Unit (RDU) 7nm 40Billion Transistors, 100s TFLOPS of Compute Tenstorrent Grayskull 12nm 368 TOPS INT8 Performance; 65 Watts Power Consumption

Source: Company Reports; Oppenheimer & Co. Estimates

FPGA

Field Programmable Gate Arrays (FPGAs) are semiconductor ICs designed to allow future customization. As opposed to the fixed architectures common in ASICs/GPUs/CPUs, FPGA hardware includes configurable logic blocks and programmable interconnects. These allow functionality updates even after the chip has been shipped and deployed. FPGAs are gaining recognition in AI/ML given their flexibility and parallel computing ability.

Commercial FPGAs came into prominence in the 1980s, led by the formation of industry mainstays (in 1983) and Xilinx (in 1984). The 90’s saw a period of rapid growth for FPGAs in the networking and telecom industries. The telecom market is a particularly large user of FPGAs, as wireless connectivity standards continually change within generational transitions (e.g., the evolution from 1G to now 5G connectivity). FPGAs are sold “off the shelf” allowing products to enter the market more quickly than ASICs. Equipment manufacturers often use FPGAs to emulate ASIC-type performance on the first version of equipment. Once functionality and standards are settled, vendors often transition to the lower cost, higher performance, and more energy efficient ASIC as a stream-lined alternative for high volume applications (e.g., 5G RAN).

More recently, FPGA use has increased in the datacenter for HPC and AI/ML acceleration. FPGA incumbents have become attractive acquisition targets. Intel acquired Altera in 2015 and AMD announced plans to acquire Xilinx in 2021.

Perhaps the biggest reason FPGAs weren’t adopted early for AI workloads is programming challenges. FPGAs are often cited by engineers as difficult to program. Traditional software is programmed sequentially. FPGAs require concurrent programming throughout all the logic blocks. Programming challenges have led to a limited pool of engineers. However, we have noticed an increasing degree of development resources to

26 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

mitigate these drawbacks. There are now more software resources available, namely Zebra from Mipsology. Zebra can facilitate FPGA replacing a CPU/GPU in a trained neural network. In our view, the flexibility of FPGA will afford it a role in the future of heterogeneous computing, likely in inference applications.

FPGA investment continues, as providers try to carve out an AI niche. Intel (Altera) has introduced the 10 NX FPGA AI accelerator and launched several software development kits to facilitate programming. AMD (w/Xilinx) has the Alveo AI accelerator cards and also has its own Vitis software platform. More recently, Achronix announced it will go public via SPAC, making it the only independent public FGPA company once the deal finalizes. Achronix also provides embedded FPGA IP (Speedcore), which has shown increasing traction as chip designers look to incorporate flexibility of FPGA for certain compute functions. We’re seeing more FPGA startups coming out of stealth mode, including Flex Logix, Efinix, and Neuchips. We see FPGAs finding a place in AI/ML as researchers strike a balance between efficiency and flexibility.

Exhibit 36: List of Select FPGA Vendors

Company FPGA Products

Intel (Altera) Stratix 10 NX FPGA Xilinx Virtex UltraScale+ 32 Achronix Speedster7t Flex Logix InferX X1 Edge Co-Processor Efinix Trion Titanium Neuchips RecAccel Accelerator

Source: Company Reports, Oppenheimer & Co. Estimates

DPU

Data Processing Units (DPUs) are specialized processors that offload specific tasks from the CPU. As datacenter infrastructure disaggregates, and network speeds exceed 100G/200G/400G and beyond, a new class of processors have emerged to tackle the rising challenge of data movement. Network Interface Cards (NICs) were the prelude to the DPU. As additional features including memory, storage, and accelerators were added to datacenter networks, NICs evolved into a category of their own, increasingly referred to as SmartNICs. SmartNICs not only enable connectivity to the network, but also accelerate a variety of network-specific capabilities including network services, security, and storage functions. Amazon AWS was an early pioneer of SmartNICs with its AWS Nitro product.

Hyperscalers and other cloud service providers are consistently expanding capacity to keep pace with growing data demand. SmartNIC offloaded networking tasks to the CPU, and over time, its role expanded to include storage and security. The industry eventually saw an opportunity to bring SmartNICs to the public cloud and broader market. As such, DPUs were born. The DPU is a SoC generally containing certain key elements. Some of these elements include high-speed networking interfaces, software programmable processing cores (often ARM, FPGA, MIPS), accelerator engines (for offloading networking tasks, and optimizing for AI/ML, security, storage), high-speed packet processing, memory controllers, and the ability to run its own operating system.

DPUs are best known for offloading networking/infrastructure workloads from the CPU, thus accelerating overall network performance. DPUs are also capable of a variety of networking tasks such as compute offload, storage, security, , and data path optimization. Increased demand for cloud services has put pressure on datacenters, and

27 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

more specifically the host CPU. By offloading network related tasks to the DPU, datacenters free up CPU capacity, allowing them to more effectively process application- based workloads (e.g., compute instances for cloud clients).

DPUs remain in the early stages of adoption. NVIDIA sees every datacenter having its own DPU platform within the next five years. As DPUs play an increasingly critical role in modern cloud architecture, we’ve seen corresponding investment and M&A activity, particularly as companies look to strengthen their datacenter positioning. In 2020, NVIDIA closed its acquisition of Mellanox—adding Mellanox’s Bluefield DPU (among other networking products) to its market leading GPU repertoire. Now in its second generation, the Bluefield-2’s ARM core processor offloads the critical networking, storage, and security features from the CPU, freeing up ~one-third of the CPUs cores for application processing. NVIDIA also recently announced its DOCA 1.0 software stack to better integrate and optimize Bluefield DPU functionality in the datacenter. NVIDIA’s upcoming Bluefield-3, expected to sample in early 2022, is expected to deliver 10x performance improvement over Bluefield-2.

Also in 2020, AMD announced its planned acquisition of Xilinx. Recall, Xilinix acquired SolarFlare for its NIC capabilities in 2019. With Xilinx, AMD will receive a FPGA-based SmartNIC, leading to a three-pronged datacenter product roadmap including CPU, GPU, and FPGA. In April 2021, Marvell closed its acquisition of Inphi, a leader in silicon photonics connectivity. The combination provides Marvell a unique portfolio of datacenter/networking offerings, integrating Inphi’s high-speed silicon optics with Marvell’s OCTEON SmartNICs custom ASICs and Prestera top of rack switch. In 2021, Achronix announced plans to trade publicly, led by its FPGA/eFPGA driven DPU, 5G, and AI/ML product lines.

The evolving landscape of SmartNICs/DPUs has created market opportunities for both established and emerging semiconductor companies. In addition to the aforementioned offerings from NVIDIA, AMD, Marvell, and Achronix, we note Broadcom and Intel also offer SmartNIC solutions. Broadcom has its Stingray product line while Intel has FPGA- based SmartNICs. Several private companies developing SmartNIC/DPU solutions include Fungible, Pensando, Netronome, and Ethernity Networks.

Exhibit 37: Port Speeds Outpacing CPU Performance Exhibit 38: SmartNIC, DPU Vendors

Vendors SmartNIC, DPU Products 6,000 Compute Cycles per Server NVIDIA/Mellanox Bluefield DPU Network Port Speed per Server (Gbps) 100 Broadcom Stingray SmartNIC DPU Need Ethernity Networks ACE-NIC is Increasing 4,000 Netronome Agilio SmartNIC Pensando Systems Distributed Services Card 50 Marvell LiquidIO DPU 2,000 Intel/Altera Intel FPGA SmartNIC Xilinx Alveo U25 SmartNIC Napatech NT200A02 SmartNIC 0 0 Fungible Fungible F1 DPU 2012 2014 2016 2018 2020 2022 Source: Oppenheimer & Co. Estimates Source: Xilinx, Oppenheimer & Co. Estimates

28 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Heterogeneous Computing: All Chips Play a Role

The computing industry has been historically dominated by homogenous computing architectures. Originally, x86 multi-core CPUs were used to execute and manage compute tasks relegating the GPU to rendering graphics and running simulations. This was inefficient. The CPU had to process operating system overhead along with application specific tasks. With the wind down of Moore’s Law, emergence of compute-intensive fields like AI, and rise in parallel computing, a new heterogeneous system architecture has emerged and gained traction.

Heterogeneous System Architecture or Heterogeneous computing refers to a system containing multiple types of computing architectures, such as multicore CPUs, GPUs, FPGAs, DPUs, and ASICs working together. The CPU operates as a general-purpose processor that runs the operating system and manages compute resources. The other processors are accelerators for specific types of workloads/computation. FPGAs can accelerate data processing and networking, and GPUs/ASICs perform the heavy lifting computations. DPUs offload and accelerate networking tasks. Networking fabric plays an increasing critical role in scaling datacenter performance as workloads are moving to hundreds and thousands of nodes. Various interconnects including optical connectivity, Infiniband, Ethernet, and Omni-Path are used to connect these compute units. Memory and software also play integral roles in unifying the system.

The primary advantage of heterogeneous computing is higher computing performance, better efficiency, and lower latency. Based on the 2020 TOP500 supercomputers list, 28% of the 500 highest performing supercomputers in the world utilized an NVIDIA GPU accelerator, up from 14% in 2015. The TOP500 serves as a barometer of technology innovation and reflects the latest trends in the computing industry. We believe that heterogeneous computing will represent an increasing share of supercomputer and datacenter architectures, dominated by a combination of GPU, ASIC, and FPGA.

Exhibit 39: New Era in Processor Performance

Single Core Era Multi-Core Era Heterogeneous Computing

We

We are are We are Thread Perfomrance

- Here Here Here

ThroughputPerfomrance

App Specific Performnace Single

Time Time Time Source: Opensource Forum, Oppenheimer & Co. Estimates

29 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

AI/ML Software, Frameworks/Libraries; Software 2.0

Software plays a critical role in the operation of AI hardware and development of AI frameworks. Hardware relies on software to execute AI/ML algorithms, and to prioritize/optimize data flow through the system. AI model demands are growing well in excess of hardware advances, putting stress on the underlying hardware to achieve incremental processing efficiency. We see software as integral to increasing processor efficiency on an individual level and also as part of a scaled-out system. In speaking with chip designers, a common theme is increasing software investment. Several companies highlighted software engineer headcount exceeding hardware. One startup noted ~two- thirds of its engineering talent is software-focused. We attribute this to inherent flexibility in software and its ability to improve hardware performance. Additionally, software is less capital-intensive relative to hardware. The massive growth in AI research has led to an evolution in software programming, particularly in machine learning, deep learning frameworks, and libraries. We see the overall AI software market growing at a 18% CAGR to $280B by 2025, with core AI-centric software—technologies critical to the function of the application—accounting for 20%, or $55B.

Exhibit 40: Worldwide Artificial Intelligence Software Forecast ($B)

$300B $280B AI Non-Core Software $250B AI Core Software

$200B

$150B $125B

$100B

$50B

$B 2020 2021 2022 2023 2024 2025

Source: IDC, Oppenheimer & Co. Estimates

Programming Languages

Machine learning is one of the fastest growing fields in computer science. While there is no “best programming language” for ML, some are more appropriate and better suited for specific AI workloads. For example, the R language was created by statisticians in academia for data analysis and modeling. This makes R ideal for biomedical and bioengineering industries, as these fields are largely rooted in AI/ML data analysis workloads. Here we cover some of the commonly used AI computing languages.

Python is the most widely used language in AI/ML due to its simplicity and ease of use. It’s often the go-to language for AI given its broad support and large community of developers. Developers frequently utilize Python for NLP, chatbot, and data mining applications given the breadth of libraries and frameworks.

30 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Additionally, many universities and coding boot camps teach Python as a staple of their curriculums. As a testament to its popularity, there are approximately 8.2 million Python developers in the world.

R is a popular programming language for statistics, data analysis, and graphical visualization. Its commonly used for implementing ML techniques such as sentiment analysis, regression, classification, and decision tree formation. R has seen good traction in health-based industries including bioinformatics, bioengineering, and biomedical.

JavaScript is an object-oriented programming language often used for creating dynamic websites. It is a widely used language that’s built on top of standard HTML and CSS, allowing the developer to interact with objects on websites. JavaScript is an unconventional option for high performance use case. That said, it’s relatively easy to code and may be suitable for certain ML applications such as recognizing and identifying objects on web cameras.

Java is one of the most ubiquitous programming languages. It is broadly used across many applications outside of AI. Within AI, its strengths lie in customer support applications, network security, fraud detection, and other general AI programming. Java also includes graphical add-on packages, which are compelling for projects that require attractive graphic interfaces.

C++ is one of the oldest and most popular programming languages. C++ has been proven reliable, stable, and versatile across a broad range of applications. Accordingly, C++ is often used by developers looking to enhance applications or devices with AI, including IoT, AR/VR, games, and robotics.

Exhibit 41: Survey of Programming Languages Used for AI/ML in Last 12 Months

100% 77% 80%

60%

40% 22% 22% 20% 20% 20%

0% Python R JavaScript Java C++

Note: Survey results are not-mutually exclusive. Data as of 3/15/21 Source: Developer Economics, Oppenheimer & Co. Estimates

Deep Learning Frameworks and Libraries

It’s impractical to build brand new software algorithms for every new neural network model. Models are complex and include numerous layers. Frameworks and libraries provide the building blocks for designing, training, and validating deep learning models. At a high level, frameworks and libraries offer pre-written code that solve for common coding problems, allowing developers to focus on the core algorithm. In other words, the developer doesn’t have to “re-invent the wheel” for developing common tasks. The developer can leverage pre-existing code from community repositories such as Github and implement them into their code. There are many machine learning frameworks and 31 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

libraries targeted for different purposes. A few high-level criteria needed to build a sustainable framework include ease of programming, execution speed, a large/dedicated developer community, and an open-source ecosystem. Here we highlight several of the most popular frameworks:

TensorFlow is by far the most commonly used open-source machine learning framework, created and maintained by Google. TensorFlow offers a robust library for the most complex deep learning neural networks, with use cases spanning text applications, image recognition, sound recognition, time series analysis, and others. It supports multiple programming languages including Python, C++, and R. TensorFlow contains a vast collection of documentation and a deep community for support and updates.

PyTorch is an up-and-coming open-source ML/DL framework developed by Facebook. Pytorch is known for simplicity and customizability, and it expedites the design process to reach production deployment more quickly. PyTorch has an active development community, supporting applications such as computer vision and natural language processing. It was built mainly for Python but also supports C++. Pytorch also comes with CUDA integrations right out of the box.

Caffe is a deep learning framework developed at the University of California, Berkeley. Caffe is used by more experienced developers for deep learning. An open-source framework, Caffe is known for its image processing speed and is accordingly suitable for image recognition workloads. Caffe can process over 60 million images per day with the NVIDIA K40 GPU. Caffe is compatible with multiple programming languages including C, C++, Python, and MATLAB. Caffe also supports both CPU and GPU processors.

Keras is a widely used open-source framework developed in Python to run on top of TensorFlow. Keras was designed for fast experimentation to achieve quick results. Keras automates core tasks to more quickly generate an output. Keras models can be easily deployed on the web, mobile iOS and Android applications. Keras is known for its fast computation, user-friendliness, and ease of access— attractive for beginners. Keras supports Python and runs on GPU and CPU processors.

Scikit-Learn is a free open-source Python framework used for machine learning. Scikit-Learn provides the best option for plain machine libraries with integrated graphics. It has a wide assortment of ML algorithms great for beginners and simpler data analytics. One of the main drawbacks is there are no libraries for deep learning or reinforcement learning.

Exhibit 42: 2021 Github Stars by AI Frameworks

180,000 150,000 120,000 90,000 60,000 30,000 0 TensorFlow keras PyTorchSckit-learn Caffe MXNet CNTK theano

Source: AI Index, GitHub, Oppenheimer & Co. Estimates 32 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Application Programming Interface

Application Programming Interfaces () are a common way to allow programmers to interact with different software and hardware solutions. APIs let developers programmatically interact and exchange data without sharing the underlying source code of a software platform. This provides developers better access to software ecosystems, eases coding, and limits risks of exposing trade secrets within the underlying source code. As an example of how APIs work, travel booking websites can pull data from various sources including airlines, hotels, and cruise lines, utilizing an API instead of individually sourcing code from each provider. When a customer books a trip, the bookings site exchanges details with each provider through the API. In computing, APIs allow developers access to hardware devices through the software supporting them. Given diminishing returns from Moore’s law, there is a growing emphasis on tighter integration between hardware and software to maximize performance. With their ability to bridge the development community with software/hardware solutions, APIs provide access to the computing power (e.g., access to GPU, CPU, ASIC) needed for neural network development, training, and inference. This is a major reason for NVIDIA’s success—its GPU and CUDA combination was a key contributor to the recent popularity of AI research.

CUDA (Compute Unified Device Architecture) is NVIDIA’s proprietary software platform and API that allows programmers to best utilize NVIDIA GPUs. Released in 2006, CUDA was the first API to allow CPU-based applications to leverage the large core resources in GPUs for GPGPU computing. Over time, NVIDIA improved its software designs, and in combination with its best-in-class GPU hardware, created a complete platform for AI research. CUDA comes with deep learning libraries such as cuDNN, TensorRT, and DeepStream that can further accelerate AI training/inference workloads. The NVIDIA A100, purpose- built for AI, achieves 312 TFLOPS via the hardware alone. The performance goes from 19.5 TFLOP of FP32 performance (no Tensor Cores) to 312 TFLOPS on mixed precision with Tensor Cores, a 16x increase. CUDA supports NVIDIA hardware but is compatible with multiple programming languages including C++, Fortran, Python, and others. The CUDA developer ecosystem boasts more than two million users.

OpenCL (Open Computing Language) is an open-source parallel computing platform that supports all types of computing hardware including CPU, GPU, FPGA, DSP (digital signal processors), and other processors. Developed by The Khronos Group (a non-profit consortium of 150 member companies), OpenCL was released in 2009, as an open standard to accelerate parallel computing tasks across heterogenous hardware. A key aspect of OpenCL is its portability across hardware. Considering its broad compatibility, the FPGA community, particularly Altera and Xilinx, have been key advocates for OpenCL.

Software 2.0: Software Writing Software

The traditional method of writing software (e.g., software engineers coding in C++, Python, etc.) is being challenged by new neural network capabilities where machines program machines. Software 1.0 is the process where code is written by humans, explicitly telling a machine what to do with specific lines of code. This process is slow, tedious, prone to error, and requires a technical skill set.

Given the arrival of neural networks, we’re seeing the early stages of a shift toward automation in software development. With Software 2.0, neural networks have become more adept at coding and software development. Instead of humans generating explicit

33 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

instructions line by line, the neural network allows the machine to program by example. For example, a model is specified for a particular desired outcome (e.g., “to win a chess game”). Next, a basic neural network is developed to perform optimization (back propagation) techniques, eventually deriving a software program that achieves the desired outcome. We are seeing an increasing number of tools focused on software automation, allowing individuals with limited to no coding experience to train a model. Platforms from cloud hyperscalers such as AWS’ Sagemaker and Google’s AutoML allow users with no coding skill to train a model. While still in the early innings, Software 2.0 is an emerging functionality built on the capabilities of neural networks.

Measuring Silicon/AI Performance

Plateau of Clock Speeds and the Megahertz Myth

There are many variables that make direct product comparisons challenging. Traditional CPU metrics include Megahertz (MHz) and Gigahertz (GHz) clock rates (aka - clock speed). Clock speeds were heavily marketed by Intel to promote performance improvements during the ‘80s and ‘90s. During this period, Intel was in full stride, meeting a yearly performance cadence in line with Moore’s law. As Intel rolled out annual processor updates, advertising higher clock speeds, this perpetuated the megahertz myth. Clock speeds were suitable to compare CPUs within the same product generation and family. However, clock speeds were less useful explaining computational performance and comparing against competing processors.

In 1995 clock speeds were 100 MHz and grew to 3.8 GHz by 2005, a 40x increase. Clock speeds have since plateaued. As previously mentioned, a collective industry decision in 2005 left clock speeds at ~3.8 GHz as CPU Integrated Device Manufacturers (IDMs) experienced a spike in heat dissipation at clock speeds >4.0 GHz. This tradeoff was untenable, as power consumption and heat generation greatly outpaced incremental clock speeds. 2021 top-of-the-line Xeon server CPUs run in the 2.0 GHz to 3.8 GHz range. As a side note, clock speed caps have created a hobbyist/enthusiast market for overclocking, a practice of increasing CPU/GPU clock rates above the certified manufacturer rate, using advanced aftermarket cooling systems (e.g., larger heatsinks, heat pipes, water cooling, refrigerants) to push higher performance levels.

Measuring Performance with FLOPS and TOPS

Computers use binary code (0s and 1s), with the level of computational precision presented in various number point formats. Some common fixed-point integer formats include 4-bit (INT4) and 8-bit (INT8). There are also floating-point formats including 16-bit half-precision (FP16), 32-bit single precision (FP32), and 64-bit double precision (FP64). Generally, higher precision formats need more processing power, more memory and consume more power, due to the higher bit count. Calculation complexity at higher bit counts also lowers processing speed.

While not all encompassing, training workloads are frequently done in floating point (FP) formats, as FP can handle a wider numerical range more accurately. This is notable as the distribution of weights and activations in deep learning models can vary widely. FP32 has become the adopted standard for AI training; however, FP16 has gained traction for applicable models, given the aforementioned system/performance benefits of lower precision formats. Inference workloads are often done in integer formats, as these formats drive higher throughput, lower latency, and lower power consumption, with only a modest reduction in accuracy.

34 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Some chip providers have developed a hybrid mixed precision format. For example, NVIDIA found success with its own hybrid TensorFloat-32 (TF32) format, which has the precision of FP16 but the range of FP32. Mixed precision format reduces memory requirements and shortens training time. While similar, TF32 isn't exactly the same as FP32.

AI inference capabilities are increasingly adopted into edge devices, where the benefits of lower precision number formats become paramount. With many neural networks trained on FP32, the network can be quantized into an integer format (INT8 is a popular example) to run at the edge. Accordingly, quantization simplifies a larger set of data (in this case an FP32 neural network) into a smaller set of more generalized data (an INT8 neural network). Quantization lowers the bits required to represent a specific piece of information.

FLOPS (Floating Point Operations per Second) has become a widely adopted performance metric and is a more accurate estimation relative to clock speeds. FLOPS approximates how many floating-point calculations (FP16, FP32, FP64) a processor can execute per second. GPUs and supercomputers were early adopters of FLOPS to measure performance. It made sense for GPUs, since rendering images requires millions of calculations on vectors to determine lines and shapes. All else equal, higher FLOPS equates to higher processor performance. NVIDIA’s first GPU in 1999, the GeForce 256, generated 50 GFLOPS (gigaFLOPS; equates to one billion FLOPS) of FP32 performance. NVIDIA’s GeForce RTX 3090 (released in 2020) delivers 36 TFLOPS (teraFLOPS; equates to one trillion FLOPS) of FP32 performance. NVIDIA’s A100 achieves a massive 312 TFLOPS of FP32 performance. While FLOPS are a useful reference, measured performance on real-world applications should be part of any performance evaluation.

Exhibit 43: TFLOP (FP16) Performance

Company TFLOP (FP16) Max TDP (Watts) TFLOPS/Watts

Cerebras WSE-1 2580 20000 0.13 Google TPU v2 46 280 0.16 Google TPU v3 123 450 0.27 GraphCore IPU1 125 150 0.83 GraphCore IPU2 250 150 1.67 Groq TSP100 250 300 0.83 NVIDIA A100 312 400 0.78 NVIDIA V100 125 300 0.42 SambaNova SN10 >300 N/A N/A

Source: Company Reports, Oppenheimer & Co. Estimates

TOPS (Tera Operations per Second) is another metric that has emerged to gauge neural network performance. TOPS is a measure of max throughput, presented as the trillions of operations per second a chip can process. Similar to FLOPS, the higher TOPS metric indicates better overall throughput, all else equal. Over the last few years, edge processors have increasingly included AI acceleration, with performance typically measured in TOPS. As inference chips are more prevalent at the edge, limited power consumption, low latency and output accuracy are all critical metrics. This makes integer formats more appropriate for edge inference applications. That said, many datacenter inference processors also utilize INT4/INT8 formats, leveraging the same operational advantages. To provide some examples, Apple’s A14 Bionic generates 11 TOPS, while Qualcomm’s Snapdragon 888 generates 26 TOPS. NXP’s i.MX 8M Plus includes a NPU that achieves 2.3 TOPS. NVIDIA’s server T4 GPU reaches 130 TOPS on INT8.

35 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Exhibit 44: Datacenter AI Vendors TOPS (INT8) Exhibit 45: Edge AI Vendors Reported TOPS

AI Vendors for Datacenter TOPS (INT8) TDP (Watt) TOPS/Watt AI Chip Edge Vendors Markets Reported TOPS

Alibaba Hanguang 800 825 280 2.95 Bionic Mobile 11 AWS Inferentia 128 N/A N/A Google Edge v1 TPU IoT 4 Baidu Kunlun 260 150 1.73 Hailo-8 Auto, IoT 26 Cambricon MLU 270 128 75 1.71 Horizon Robotics Journey 3 Auto 5 Google TPU v2 45 200 0.23 Huawei Ascend 310 IoT 22 Google TPU v3 90 200 0.45 MediaTek 1000 Dimensity Mobile 4.5 Groq TSP100 1000 300 3.33 Mythic Analog Matrix Processor High-end Edge 35 Huawei Ascend 910 640 310 2.06 NXP i.MX 8M Plus Edge Computing 2.3 NVIDIA A100 624 400 1.56 888 Mobile 26 NVIDIA T4 130 70 1.86 Samsung 990 Mobile 15

Source: Company Reports, Oppenheimer & Co. Estimates Source: Company Reports, Oppenheimer & Co. Estimates

Distilling everything into one metric does have its benefits for ease and simplification, although one must be aware of the intricacies of assessing AI performance. FLOPS and TOPS are a great starting point; however, there are numerous other variables impacting real-world processor/system performance. There are differences in chip architecture, application specialization, model size, networking bandwidth, memory, power consumption, transistor density, core count, system scale, and software integration, among others.

Using specialization as an example, a chip may be developed specifically for image recognition models, another for recommendations while another could be designed for all- around purposes. These specifications will likely drive varying degrees of performance depending on use-case. Power consumption is another important factor, as it can impact both the pure performance and total cost of ownership for the end user. Power consumption has clear implications for battery-powered devices but is becoming increasingly important for datacenters which are tasked with building world-class compute capabilities on a specific power budget. Accordingly, a common metric to incorporate power into performance evaluation is FLOPS/TOPS per watt. For example, QCOM’s Cloud AI 100 generates 400 TOPS of peak performance with up to 75W of power consumption (equates to ~5 TOPS/W). Alternatively, NVIDIA’s A100 (NVLink version) generates 624 TOPS of peak performance (1,248 TOPS w/ sparsity) with up to 400W of power consumption (equates to ~1.5 TOPS/W or ~3 TOPS/W w/sparsity). However, we note some caveats with perf/watt metrics. Peak TDP (Thermal Design Power) is assigned to the denominator, though, in real-world application this value rarely gets reached. Additionally, performance analysis should be considered on average power consumption of execution for a fixed amount of workload (e.g., one million inferences). This provides “energy consumed for a set amount of work” calculations.

Software is another key contributor to AI performance gains. NVIDIA was able to garner significant performance gains by adding software optimizations to its pure GPU capabilities. For example, NVIDIA’s A100 goes from FP32 (no Tensor Cores) 19.5 TFLOPS to mixed precision with Tensor Cores at 312 TFLOPS, a 16x increase. While we highlighted several of the challenges comparing AI performance, industry groups such as MLCommons were established to promote AI research and provide fair unbiased evaluation of hardware and software performance for AI training and inference.

36 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Benchmarking AI Training/Inference Results

MLCommons is a consortium of industry and academic institutions with the goal of promoting and accelerating AI innovation. AI as an industry remains nascent, thus the field is fragmented with many hardware, software, and modeling techniques. MLCommons attempts to unify those differences through: 1) Benchmarks and Metrics—to deliver fair, standardized, and transparent like-for-like performance measurement; 2) Datasets and Models—to provide large databases for startups to develop AI models; 3) Best Practices—to establish common conventions and enable model sharing across infrastructure and researchers globally.

MLCommons is best known for its AI training/inference MLPerf. The growing diversity and volume of hardware and deep learning software frameworks has driven a need for standardized benchmarking. We believe benchmarking drives competition and accelerates the pace of innovation.

The MLPerf Training v0.7 benchmark consists of eight models used to reflect diverse commercial and research AI/ML training tasks: three Vision (Image Classification, Light & Heavy Object Detection), three Language (Recurrent & Non-recurrent Translation, NLP), one Commerce (Recommendation), and one Research (Reinforcement Learning).

ResNet-50—Residual Neural Network or ResNet, is one of the most widely used models for evaluating image classification performance. ResNet came to prominence after it won the 2015 ImageNet challenge, successfully training a 150-plus-layer deep neural network. In comparison, prior winners AlexNet only had eight layers and VGG had 19 layers. MLPerf uses ResNet-50 v1.5 model, a 50-layer convolutional deep learning model.

Single Shot Detection (SSD)— SSD is an established neural network model used for low latency object detection. In MLPerf, SSD is used as the lightweight variation for object detection. The lightweight variation uses 300x300 image size, typical of images on smartphone devices.

Mask RCNN—Mask RCNN performs object detection and instance segmentation on images. In MLPerf, it is used as the heavyweight variant in the object detection benchmark. Object detection is a regression task. It identifies areas of interest on an image and creates bounds around these objects. The instance segmentation feature detects and classifies distinct objects within the image. Mask-RCNN uses an image size of 800x1333 and is often used in automotive applications.

Neural Machine Translation (NMT)—NMT is a machine translation approach that uses artificial neural networks to translate sentences, as opposed to legacy statistical machine translation and phrase-based translation systems. In MLPerf v0.7, the benchmark uses the WMT English-German dataset. Google’s Neural Machine Translation (GNMT) is one of the first large-scale commercial deep learning translation models. It is an end-to-end design that translates words and sentences from a source language to a target language with improved fluency and accuracy over time. It is used in Google Translate.

Transformer—Transformer is a language translation model. The benchmark is tested against the WMT English to German dataset. One key differentiating component of Transformer compared to NMT models is the use of attention- based ML technique. This technique mimics cognitive attention, hence, it places emphasis (attention) on the important parts of the data input.

37 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

NLP (BERT)—Bi-directional Encoder Representation from Transformers (BERT) is one the most commonly trained models used in natural language processing (NLP) tasks. BERT is an open-source NLP model developed by Google. BERT allows the machine to understand the meaning of text in languages. BERT has been used to improve the accuracy of search results. In ML Perf, the benchmark is tested against the Wikipedia dataset.

Deep Learning Recommendation Model (DLRM)—is a neural network-based recommendation model for addressing personalization and recommendation tasks. Developed by Facebook, DLRM became an open-source model in 2019. In the MLPerf v0.7 training benchmark, the dataset use the Criteo 1 TB click- through data logs from online shopping, search results, and social media content ranking.

Mini-Go—Mini-Go is a reinforcement learning model that learns by repetition. Mini-Go plays against itself whereby the model is continuously updated from the game’s results. It is inspired by the AlphaGo model.

MLPerf submissions are per system, not per chip, and vary in scale. The benchmark submissions measure the time required to train a dataset to achieve a specific accuracy level for the eight different models. Per MLPerf v0.7 results, there were 71 AI system that submitted results—54 from NVIDIA, eight from Google, seven from Intel (no AI accelerator), and two from Huawei. NVIDIA dominated the benchmark results for commercially available cloud and on-premise systems; its V100 and A100 demonstrated the fastest performance. We also note that in the research, development, and internal category, Google’s TPU led most results.

Exhibit 46: Select MLPerf Training v0.7 Results (Results in Minutes)

Image Object Translation Translation Recom- Reinforcement ID Accelerators # Software NLP Classification Detection Recurrent Non-recurrent mendation Learning Commercially Available 3 Ascend 910 512 Mindspore 1.59 17 A100 8 Merlin HugeCTR 3.33 30 A100 480 PyTorch 0.62 33 A100 1024 MXNet 0.82 34 A100 1024 PyTorch 0.71 1.5 35 A100 1536 MXNet 0.83 36 A100 1792 TensorFlow 17.07 38 A100 2048 PyTorch 0.8

Research and Development 65 TPU v3 4096 JAX 0.47 0.26 0.4 67 TPU v3 4096 TensorFlow 0.48 0.46 0.35 0.4 69 TPU v4 64 TensorFlow 4.5 1.43 2.08 1.63 5.7 1.21 70 TPU v4 256 TensorFlow 1.82 1.06 1.29 0.78 1.8 71 Ascend 910 512 TensorFlow 1.56

Note: Benchmark Results in minutes; full training results at mlperf.org Source: MLPerf, Oppenheimer & Co. Estimates

While MLPerf strives to provide balanced benchmarking, its testing is imperfect and open for interpretation. One area of critique is the categorization of AI systems in two buckets: 1) Commercially Available, and 2) Research & Development. Commercially Available includes AI systems already deployed in the market (cloud or on-premise), thus having undergone the rigorous testing/qualifications for entry into the mass commercial market. Submissions in the research & development category contain experimental, in

38 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

development, and internally used hardware/software configurations, which have different standards requirements (allowing adjustments to maximize certain performance). On all commercially available systems, NVIDIA’s A100 trained the BERT NLP model in 49 seconds, faster than Google’s TPU v3, which trained the same model in 57 minutes. Alternatively, the research & development variant of Google’s TPU v3 trained the model in 23 seconds.

Additionally, the results don’t account for the number of processor and efficiency metrics such as cost and power consumption. Using the prior example, the commercial A100 training results utilized 2,048 processors, while the research variant of Google’s TPU v3 utilized 4,096 processors. The results also don’t consider cost and power consumption, factors that are important for scaling out datacenters. Another area of critique is that many of the AI benchmarks are based on older models, and not reflective of the latest billion- parameter AI models. Additionally, the benchmarks do not entirely represent real-world AI applications.

Larger organizations have resources to optimize system performance for benchmark tests. Submissions have been limited from AI startups. Given their limited scale, we believe resources and talents are better allocated on chip/system design and to address specific customer’s requirements, rather than benchmark testing.

Exhibit 47: AI Accelerator Relative Speedup Compared to V100

3x 2.7x 2.5x 2.3x 2.4x 2.3x 2.4x 2.4x 2.2x 2.1x 1.9x 2.0x 2.0x 1.9x 2x 1.8x 1.5x

1x 0.7x

Speedup Over SpeedupOver V100 0x Image Object NLP Object Translation Translation Recommend Reinforment Classification Detection BERT Detection (Recurrent) (Non-recurrent) -ation Learning ResNet-50 (Heavy) (Light)

TPUv4 (R&D) A100 (Commerial)

Source: NVIDIA, MLPerf, Oppenheimer & Co. Estimates

We don’t see MLPerf results as the “end all be all” of AI performance ranking, rather a general guideline for system capability. In our view, the results provide a gauge for how far the AI accelerator industry has come. For example, when comparing 2020’s v0.7 results to 2019’s v0.6 results, the best results for the five unchanged benchmarks improved by an average of 2.7x. We also see leaders in the market innovating at a similar pace. NVIDIA’s A100 Ampere scores improved 2–4x compared to V100 while Google’s TPU v4 was on average 2.7x faster than its TPU v3. MLPerf also serves as a marketing tool for chip companies to advertise performance vs. competitors.

39 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

AI Accelerators in Datacenters Datacenters centralize an organization’s IT equipment and operations for the purpose of storing, managing, and disseminating data critical to a company. From a high level, datacenter infrastructure consists of server racks and storage, networked together to enable shared data resources and applications. The evolution of datacenter infrastructure coincides with changes in technology. Accordingly, datacenters can range in size and scope. In the past, businesses used proprietary on-premise mainframes maintained by internal IT teams. In recent years, the evolution of big data, IoT, artificial intelligence and the rise of cloud computing have driven datacenter transformation. Organizations are moving away from on-premise “enterprise” datacenters, and are increasing focus on public, private, and hybrid cloud datacenters. We estimate DC/Cloud AI hardware platform market a $6B market in 2020 growing at a 28% CAGR to a $21B market by 2025.

Exhibit 48: DC/Cloud AI Hardware Platform Market Forecast ($B)

$25B 90%

75% $20B

60% $15B 45% $10B 30%

$5B 15%

$B 0% 2020 2021 2022 2023 2024 2025 DC/Cloud Infra DC/Cloud Y/Y%

Source: IDC, Gartner, Oppenheimer & Co. Estimates

Enterprise servers were the standard in the pre-cloud environment. Companies housed equipment on premise to facilitate their IT infrastructures and to establish an online presence. Enterprise server systems are generally costly and require ongoing maintenance. Scale also becomes a challenge with enterprise models. In periods of high usage, enterprises need to add additional servers to meet capacity; however, this drives costs higher during periods of low usage. Additionally, as technology evolves and in- house servers age, enterprises need to re-invest in new technologies to maintain updated/competitive infrastructures. As such, cloud computing offered an outsourced infrastructure model, leading to a transformation in how companies allocate compute resources and cost. While enterprise servers maintain a meaningful presence in corporate infrastructure, cloud is clearly taking share as a percentage of datacenter capacity.

Cloud computing is a dense network of servers/storage, providing a broad range of outsourced computing services—compute, storage, networking, database, analytics, software, AI—over the internet (“the cloud”). Cloud service providers typically use a pay- as-you-go, or computing “as a service” model, delivering “utility-like” computing akin to how power companies deliver electricity and gas.

Some of the services offered through the cloud include Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). Businesses can generally elect to pay for services when needed, allowing companies to better allocate

40 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

resources elsewhere, increasing productivity and enhancing core value. These services also allow businesses to scale infrastructure up or down as business needs change. The development/evolution of cloud computing has been instrumental in accelerating startup culture. Cloud services are expanding to every industry and also to consumers, where music, video, and gaming content are available through numerous streaming services. We believe consumer AI services have become the largest users of GPU services in the datacenter. More broadly, GPU-based AI accelerators, which are available from all major CSPs, have allowed companies to scale out HPC and AI/ML applications. In the case of IaaS, virtual compute machines, servers, storage, and networking give users access to compute abilities otherwise inaccessible due to physical infrastructure limitations and high costs.

Exhibit 49: 2019 Cloud IaaS Market Share Exhibit 50: Cloud IaaS Forecast ($B)

$250B Others Compute $230B 22% Storage Amazon $200B Tencent 47% 2% $150B Google 2019 $49B Market 4% $100B IBM $49B 5% $50B Alibaba 7% Microsoft $B 13% 2019 2020 2021 2022 2023 2024 2025

Source: IDC, Oppenheimer & Co. Estimates Source: IDC, Oppenheimer & Co. Estimates

Amazon Web Services is a pioneer in cloud services. Amazon introduced its Amazon Elastic Compute Cloud (EC2) in 2006. EC2 allows users to rent virtual machines and run applications on Amazon’s servers. AWS’ first virtual machine was a Linux OS running on a 1.7GHz Xeon processor with 1.74 GB of RAM. Amazon charged $0.10/hour for this virtual machine. Since then, EC2 has expanded to over 400 virtual machines offering a broad set of hardware configurations to support various customer infrastructure needs. Some of the common offerings include general purpose compute, compute optimization, memory optimization, accelerated computing and storage optimization. AWS is a perennial IaaS market leader, accounting for $23B revenues, or 47% of the IaaS market in 2019. We see the public cloud IaaS market growing at a 30% CAGR to a $230B market by 2025.

Exhibit 51: Amazon Web Services EC2 Accelerated Compute Services

Memory Bandwidth On-demand 1-yr Reserved 3-yr Reserved Instance Tasks Chips (GB) (Gbps) $Price/Hr Effective Hourly Effective Hourly

p4d.24xlarge Training, AI/ML, HPC 8 NVIDIA A100 1152 400 $32.77 $19.22 $11.57 p3dn.24xlarge Training, AI/ML, HPC 8 NVIDIA V100 768 100 $31.22 $18.30 $9.64 inf1.24xlarge AI/ML Inference 16 AWS Inferentia 192 100 $7.62 $4.57 $3.05 g4dn.metal AI/ML Inference, Graphics 8 NVIDIA T4 384 100 $7.82 $4.69 $3.13 g4ad.16xlarge Graphics 4 AMD Radeon Pro 256 25 $3.47 $2.08 $1.62 f1.16xlarge 8 Xilinx UltraScale+ FPGA 976 25 $13.20 $8.50 $6.10

Note: Prices are as of 2/28/2021 Source: AWS, Oppenheimer & Co. Estimates

41 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Hyperscalers

Rapid cloud transformation is driving an arms race amongst cloud vendors, leading to broader market consolidation. Accordingly, a number of leading cloud service providers (CSP) have evolved, including Amazon AWS, Microsoft Azure, Google Cloud, Alibaba, Tencent, and Baidu. Along with Apple, and Facebook, these eight hyperscalers are making efforts to dominate big data and cloud computing, further expanding their vertical reach. These providers have collective buying power capable of driving seismic shifts in the tech landscape. This is particularly true given their substantial infrastructure investment requirements. These tech giants in combination account for nearly 50% of global infrastructure server and storage spend. Hyperscalers spent $32B on infrastructure in 2020 and forecast to increase spend at a 12% CAGR to >$57B by 2025.

Exhibit 52: Infrastructure Spend ($B) Exhibit 53: Infrastructure Spend ($B)

$60B $120B 25%

$50B $100B 20% $40B $80B 15% $30B $60B 10% $20B $40B

$10B $20B 5%

$B $B 0% 2020 2021 2022 2023 2024 2025 2020 2021 2022 2023 2024 2025 Digital Dedicated Comm. Cloud (non-tier1) Hyperscaler Digital Dedicated Comm. Cloud (non-tier1) Hyperscaler Y/Y% Growth

Source: IDC, Oppenheimer & Co. Estimates Source: IDC, Oppenheimer & Co. Estimates

The eight large hyperscalers reported combined 2020 revenue of $1.3T. Utilizing their massive scale, these companies have gradually expanded their sector reach, and can transform an industry. This behavioral pattern is rippling into the semiconductor value chain. Multiple hyperscalers have developed their own chips for artificial intelligence applications. With hyperscalers at the forefront of outsourced computing services, it is understandable they would need tailored, AI-specific architectures to achieve optimal performance for a broad range of AI models/applications.

42 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Exhibit 54: 2020 Hyperscaler Revenues ($B) Exhibit 55: 2020 Hyperscaler Balance Sheet Cash ($B)

$400B $200B $386 $196 $300B $150B $294 $132 $137 $200B $100B

$183 $84 $143 $73 $100B $50B $62 $96 $86 $74 $17 $25 $35

$B $B

Baidu Baidu

Apple Apple

Google Google

Alibaba Alibaba

Tencent Tencent

Amazon Amazon

Facebook Facebook Microsoft Microsoft

Source: Factset, Oppenheimer & Co. Estimates Source: Factset, Oppenheimer & Co. Estimates

Semiconductor hardware is naturally tied to Moore’s law. That said, the pace of hyperscale technology consumption isn’t synchronized with chip vendor roadmaps. Hyperscalers are pushing for faster systems that deliver better personalized services. Armed with strong balance sheets (combined $625B in cash exiting 2020) and a willingness to spend, hyperscalers are entering the semiconductor market both organically (hiring their own development teams) and via M&A.

Early signs of this activity can be traced to 2008 when Apple acquired PA Semi. Apple has since developed its series of SoC/SiP processors: A-series (iPhone), M-Series (Mac), S- Series (), T-Series (TouchID), W-Series (Bluetooth/WiFi), and H-Series (Headsets). Despite not being a dedicated semiconductor IDM, Apple is TSMC’s largest customer accounting for 25% of revenues in 2020. Apple’s flagship A14 Bionic processor contains a 16-core neural engine dedicated to AI and imaging workloads.

While Apple has made a mark in semiconductor design for edge devices, several other hyperscalers are making inroads in the datacenter market. Datacenter training accelerators are currently dominated by NVIDIA’s V100 and A100 GPUs, accounting for ~99% of the AI training market. Hyperscalers and CSPs are encroaching on this market, pursuing customized AI/ML silicon. Google, which trails AWS and Microsoft in cloud scale, is leading with its custom TPU AI accelerator. Now in its fourth generation, the TPU v4 is an ASIC developed for AI training and inference workloads. Google remains a top A100 customer. Amazon acquired Annapurna Labs in 2015, and engineers have since designed Graviton2, an ARM-based CPU for general purpose server compute. Amazon has also developed its AWS Inferentia processor, a custom ASIC for accelerating AI inference workloads. Facebook and Microsoft have yet to officially launch internal AI silicon though, both have made public announcements expressing development interest in custom solutions. Alibaba has developed and deployed its own Hanguang ASIC to the cloud for AI inference, search, and recommendation workloads. Baidu is planning to launch its 2nd generation Kunlun 2 ASIC, a multi-purpose cloud-to-edge AI accelerator. Beyond internal development, hyperscalers are increasingly partnering with ASIC providers to source custom training/inference capabilities. For example, AWS recently partnered with Intel’s Habana Labs to offer instances of its Gaudi inference system. We believe the path of least resistance is not for hyperscalers to completely vertically integrate their processor needs. Rather, we see CSPs selectively developing tailored/custom solutions for select applications where differentiation is key and merchant silicon unavailable.

43 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Datacenter AI Startups

The large datacenter AI market opportunity, in combination with increasing AI investment (both by strategic and venture investors), has seeded multiple AI startups. A handful of AI chip unicorns have grabbed the most attention and venture capital funding. One of the early leaders is SambaNova Systems with $1.1B raised to-date, following its $676M Series D capital raise in April 2021. Graphcore has raised $710M to-date, followed by Cerebras with ~$475M, Groq with $367M, and Tenstorrent with $42M, to name a few.

SambaNova Systems, based in Palo Alto, CA, has taken a total system approach to its DataScale platform, emphasizing both hardware and software solutions. DataScale features SambaNova’s custom AI accelerator, the Cardinal SN10 Reconfigurable Dataflow Unit (RDU), and full-stack software capabilities. DataScale is a complete AI neural network processing system built for training and inference workloads. Graphcore, based in Bristol UK, developed an AI processor called the intelligence processing unit (IPU), now in its second generation—the Colossus GC200. Graphcore is designing Colossus into its IPU M2000 server unit and into its IPU-POD64 server rack. Graphcore is taking this approach one step further with a complete cloud-based dedicated AI/ML platform GraphCloud.

Cerebras Systems has grabbed the industry’s attention building the largest chip ever. Based in Los Altos, CA, Cerebras Systems’ 2nd generation Wafer Scale Engine 2 (WSE-2) boasts a size 56x larger than a typical GPU, including 2.6B transistors and 850K AI cores. Its WSE-2 powers its CS-2 datacenter system to accelerate AI training. Groq, another Palo Alto-based startup, developed a Tensor Streaming Processor (TSP) it calls GroqChip. GroqChip accelerates training and inference workloads, with a focus on AI inference at batch size 1. This makes Groq well-suited for real-time processing and other low-latency inference applications. Groq completed a $300M Series C funding round in April 2021, bringing its total capital raised to $367M. Tenstorrent based in Toronto, Canada is focused on accelerating AI inference in the datacenter. Tenstorrent developed the Grayskull AI chip (now in production) and is currently testing its next generation Wormhole processor. Notably, Tenstorrent hired legendary chip architect, Jim Keller, who led development of Apple’s first SoC, AMD’s Zen architecture, and Tesla’s self-driving vehicle chip.

Exhibit 56: Venture Capital Investment

$8B VC Semiconductor Investments

$6B

$4B

$2B

$B

2013 2020 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2014 2015 2016 2017 2018 2019 2000

Source: Pitchbook, Oppenheimer & Co. Estimates

44 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

AI Accelerators at the Edge

In the last decade, computing has gradually migrated from wired computers and on- premise datacenters to mobile devices and cloud datacenters. Datacenters were early adopters of AI given the need to collect, store and analyze massive amounts of data. Large amounts of data are transferred every day from edge devices into the cloud. This data movement creates added latency. New products/services including smart retail, smart cities, smart robotics, and self-driving cars create massive troves of high-resolution sensor data. As these services expand, the amount of data generated will eventually prove too large to efficiently transfer to the cloud. One solution is bringing AI compute and storage closer to the data source, or closer to the edge (Edge computing). In turn, this reduces latency, power consumption, and bandwidth requirements.

Faster network connectivity (e.g., 5G) is a key enabler of edge acceleration. Global public 5G network build-outs remain at an early stage, with China furthest along to-date. As 5G technology improves, we see growing demand for enterprise/industrial private 5G networks. We see the expansion of both public/private 5G networks, and associated faster data speeds, driving demand for consumer, commercial and industrial connected devices (e.g., IoT, automotive, factory robotics, etc.). We believe eventually every electronic device will include some form of AI-acceleration.

Improving latency is a critical factor behind edge computing growth. In a typical cloud network, data travels from local devices to cloud servers (often located hundreds of miles away) and back again. In edge computing, a processor is located on the device and computation is performed locally. This can reduce latency from seconds to milliseconds. Lower latency reduces costs and can yield near real-time results, generating a better user experience. Edge processors also have a size advantage over datacenter processors, thus consuming less power. While power consumption varies by design and use-case, edge processors generally draw one to ten watts (W). Alternatively, datacenter servers have meaningfully higher power consumption. For example, a system with two CPUs and 16 GPUs can consume upwards of 10kW.

Exhibit 57: Cloud to Edge Computing Relationship Image

Source: Kalray, Oppenheimer & Co. Estimates

Edge computing also benefits data privacy and security. Growth in personalized services has led to a large amount of sensitive consumer data stored online. Edge computing can increase data security as local processing limits data transmission to the cloud, thus lowering the risk of data interception. Additionally, edge computing enables more efficient

45 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

encryption layers and unique security features including biometric sensing and facial recognition.

Despite significant interest in, and use cases for, edge computing, we see it as a complement to hyperscale cloud computing, rather than a replacement. For example, AI training will continue to reside in the cloud where compute resources are more robust. AI inference will increasingly be executed at the edge, driven by efficiency gains from faster and/or real-time processing. Utilizing edge computing, disruptive technologies including VR/AR, facial recognition, voice recognition, smart robotics, and Advanced Driver- Assistance Systems (ADAS) (among many others), can be deployed more efficiently in smartphones, IoT devices, industrial devices, and autonomous vehicles. Many established semiconductor companies have recognized this opportunity and are now investing in dedicated and embedded AI chips. For example, Apple included a dedicated 16-core Neural Engine in its A14 Bionic SoC. Additionally, we’re seeing several AI hardware startups focusing on the edge inference market. We estimate the edge AI hardware platform market at $19B in 2020 growing at a 36% CAGR to $84B by 2025.

Exhibit 58: Edge AI Hardware Platform Forecast ($B)

$100B 160%

140% $80B 120%

$60B 100% 80%

$40B 60%

40% $20B 20%

$B 0% 2020 2021 2022 2023 2024 2025 Edge Edge Y/Y%

Note: Edge includes Auto, IoT, and Endpoint (PC, Smartphones, Tablets, etc.) Source: IDC, Gartner, Oppenheimer & Co. Estimates

Edge Infrastructure: Cloud and Telco

The rapid growth of edge computing has led cloud and telecom service providers to rethink their edge market strategy. Cloud and telecom providers are increasingly shifting from centralized mega datacenters to regionally distributed edge datacenters, to enable improved last-mile connectivity. Edge datacenters are smaller facilities, typically 50 to 100 racks and up to 50,000 square feet, located closer to dense populations. This allows edge datacenters to deliver lower latency and improved user experiences. Edge datacenters are typically connected to a central datacenter through fiber optic interconnects.

AI and emerging data intensive applications—video streaming, cloud-based gaming, AR/VR, IoT, etc.—require fast network response times and low latency (usually defined as sub-5 milliseconds). Critical AI applications such as running smart factories, cashier-less retail, and autonomous driving require AI compute processing at the edge. Accordingly, edge datacenters are seeing growing investment. Edge infrastructure was ~12% of overall

46 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

infrastructure spending in 2019, expected to grow at a 20% CAGR to $40B by 2025, representing 24% of total infrastructure (core and edge) spend.

Exhibit 59: Core and Edge Spending Forecast ($B); Mix (%) Exhibit 60: Edge Infrastructure Spending ($B)

$180B $40B 30%

$150B 24% 25% $30B 16% $120B 12% 20% $90B $20B 15% 76% $60B 88% 83% 10% $10B $30B 5%

$0B $0B 0%

2019 2020 2021 2022 2023 2024 2025

2019 2020 2021 2022 2023 2024 2025 Core Infra. Edge Infra. Server Storage Y/Y % Growth

Source: IDC, Oppenheimer & Co. Estimates Source: IDC, Oppenheimer & Co. Estimates

Robotics: Rise of Machines

We see robotics as a natural beneficiary of artificial intelligence. AI algorithms, and more specifically neural networks, are a high-level attempt to replicate human intelligence. Alternatively, robotics is anchored in the physical domain, with programming designed to illicit a specific action or movement from a robot. As AI algorithms become more complex and are increasingly used to program robotic machines, we see an interesting opportunity for intelligent robots.

The majority of robots today are industrial robots. They are programmable machines that carry out specific tasks, mostly utilized in large scale manufacturing operations. Industrial robots have generated the most traction with automotive and electronics manufacturers, making up nearly 60% of the overall market mix. Industrial robots come in a variety of forms: articulated, Cartesian, cylindrical, SCARA (selective compliance assembly robot arm), delta; and can perform many tasks such as welding, assembly, painting, and packaging, among others. Robots have a wide array of sensors and actuators, allowing them to perform repetitive tasks autonomously and accurately. The ability to customize industrial robots, consistent repetition, endurance, and operational precision make robots more capable and efficient than humans at many manufacturing tasks.

47 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Exhibit 61: Global Operational Industrial Robots (1000s) Exhibit 62: 2019 End Markets of Operational Industrial Robots

(1000s) 3,000 2722 2,500 34% 25% 10% 7%3% 6% 15% 2,000 1632 1235 1,500 1021 1,000 500 0% 20% 40% 60% 80% 100% 0 Automotive Electronics/Electrical

Metal/Machinery Plastic/Chemical

2014 2010 2011 2012 2013 2015 2016 2017 2018 2019 2009 Food Others Global Operational Industrial Robots Unspecified

Source: International Federal of Robotics, Oppenheimer & Co. Estimates Source: International Federation of Robotics, Oppenheimer & Co. Estimates

Advances in compute ability, safety standards and the implementation of artificial intelligence have led to a nascent market for collaborative robotics or cobots—robots that are intended to collaborate and work alongside humans. The key distinction from stand-alone robotics is lack of safety fence requirements, which are typical with industrial robots. Technology and design innovations have eased robot interactions with humans, including the addition of AI safety mechanisms for perception and response, and the use of lightweight materials and round edges. Cobots are seeing increased usage in traditional industrial automation, and in home and commercial service functions.

To collaborate with humans, cobots rely on proximity and vision sensor data. Transmitting data to the cloud and back is too slow. This data needs to be processed at the point of interaction to achieve the desired low-latency, real-time response. Accordingly, robotics designers are increasingly building cobots with internal AI inference capabilities.

Mobile robots, including automated guided vehicles (AGV) and more recently autonomous mobile robots (AMR), have been gaining traction. AGVs are commonly found in warehouses, distribution centers, manufacturing facilities, and other environments where repetitive material/supply movement is common. AGVs have minimal on-board intelligence, and navigate using magnetic strips, guide wires, and sensors that require extensive (and expensive) coordination with the facility infrastructure. Due to AGVs limitations, AMRs have grown in popularity. AMRs navigate via on-site GPS, AI-driven machine vision and sensors and on-board edge processors to understand the environment and optimally execute tasks. AMRs are able to maneuver around obstacles and determine/utilize alternative routes to reach a destination. Additionally, AMRs are able to receive over-the-air software updates, creating added efficiency for operators. While AMRs are rooted in industrial use-cases, their functionality can be viewed as a scaled down version of autonomous vehicles, which we will discuss in more detail in the next section.

We see the trend toward smarter robotics continuing as: 1) improving network connectivity, including private 5G networks, enable more connected devices, 2) robots are increasingly capable of collaborating with humans, 3) automation increases efficiency and reduces operating costs, and 4) robots help address trade barrier challenges. The International Federation of Robotics estimates there were 2.7 million industrial robots operating in global factories during 2020. We expect this number to triple to ~eight million robots by 2025. As robot functionalities become increasingly autonomous, we see the semiconductor industry, and specifically AI providers, as primary beneficiaries.

48 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Exhibit 63: Top 10 Countries Industrial Robots Installation 2019 Exhibit 64: Top 15 Countries Mfg. Robot Density 2019

China 1,000

Japan 800 US 600 S. Korea Germany 400 Italy 200 France

Taiwan 0

US Italy

Mexico (robotsper 10K empoyees)

Spain

China

Japan

Austria Taiwan

India Sweden

Denmark

Germany

Singapore

HongKong

SouthKorea Netherlands (1000s) 0 50 100 150 Belgium/LUX

Source: International Federal of Robotics, Oppenheimer & Co. Estimates Source: International Federation of Robotics, Oppenheimer & Co. Estimates

Autonomous Vehicles: New Age of Transportation

Technology advancements and increasing competitive pressures are driving transformation in the automotive industry at an unprecedented rate. Previous innovations were mostly focused on design and manufacturing and came largely from within the industry. For example, the assembly line, seat belts, internal combustion engine, and automatic transmission are all auto OEM innovations. Historically, car manufacturers held tight reign over innovation and oversaw a vertically integrated supply chain that prioritized only a few trusted external suppliers. This created a level of conservatism in the industry, leading to new component development/qualification timelines lasting anywhere from 18 to 24 months. While long design cycles are commonplace in the auto industry (and unlikely to meaningfully change), we see emerging technologies and the entry of new EV/AV OEMs like Tesla forcing incumbent auto OEMs to adapt to the changing landscape.

We see semiconductors facilitating the current wave of disruption. In our view, three of the most important emerging trends in the automotive industry include: 1) Connectivity— allowing vehicles to communicate with other vehicles and the outside environment, 2) Electrification (EV)— the shift away from fossil fuels toward drivetrain electrification for hybrid and fully electric vehicles, and 3) Autonomous Vehicles (AV)—including the development of Advanced Driver Assist Systems (ADAS) to improve safety, avoid collisions, and the eventual development of fully autonomous or self-driving vehicles. Building on these trends, we see automotive as one of the fastest growing markets for semiconductor content over the next several years. Automotive was a $37B market in 2020, forecast to grow at a 10% CAGR, reaching $72B by 2025. For semiconductor IDMs, automotive is typically less cyclical than consumer (particularly mobile) and datacenter markets. This is due to stringent quality and specification requirements, driving long product cycles and high switching costs. Accordingly, an automotive design win tends to be “sticky” and long-lasting for semiconductor providers. We estimate ~$400/vehicle of semiconductor content today, and see it exceeding $5,000/vehicle when fully autonomous E/V robo-taxis arrive.

49 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Exhibit 65: Automotive Semiconductor Market Forecast

$80B 30%

20% $60B $20B

$11B 10% $40B 0% $20B $41B -10%

$B -20% 2019 2020 2021 2022 2023 2024 2025 Other EV/HEV ADAS Y/Y Growth

Note: Other includes aftermarket, body, chassis, infotainment, instrument cluster, powertrain and safety Source: Gartner, Oppenheimer & Co. Estimates

Automotive OEMs are accelerating their AV capabilities with an early focus on ADAS. Several common ADAS features include (but are not limited to) emergency braking, lane assistance, backup cameras, adaptive cruise control, and self-parking systems. ADAS features were initially available in luxury vehicles but are increasingly available in lower- end models as consumer demand, industry standards and government regulations have expanded attach rates. For semiconductors specifically, ADAS was a $6B market in 2020 growing at a 23% CAGR to $20B by 2025. EV/HEV was a $4B market in 2020 forecast to grow at a 22% CAGR (similar to ADAS) reaching $11B by 2025.

Technology developers and automotive OEMs are working to broaden ADAS feature sets. Accordingly, new technologies such as AI machine vision, LiDAR, radar, and AI inference processors are driving vehicles closer to fully autonomous operation. AV are increasingly able to steer, accelerate, brake, and avoid collision without human intervention. We expect the deployment of commercial vehicles, or robo-taxis that can drive without human presence, to begin ramping over the next several years. While AV progress continues apace, we anticipate reliable, fully autonomous vehicles are 5–10 years from becoming commonplace. Today’s vehicles have varying degrees of technology adoption and levels of autonomy. As such, the Society of Automotive Engineers (SAE) has defined six levels of driving automation.

Level 0: No Automation—All driving aspects are done by the human even in vehicles with enhanced warning or intervention systems.

Level 1: Driver Assistance—The vehicle can support the driver with steering, accelerating, and braking in single-driving tasks (e.g., cruise control). The human performs all aspects of driving.

Level 2: Partial Automation—Using information from the environment, one or multiple ADAS systems can execute various steering, acceleration, and braking driving tasks dynamically, with the expectation for human intervention.

Level 3: Conditional Automation—At level 3, the vehicle becomes the “primary” driver and is classified as an autonomous driving system (ADS). The vehicle undertakes most aspects on behalf of the driver. The driver still must be present with the expectation to intervene and take active control when requested.

50 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Level 4: High Automation—The ADS performs all dynamic driving tasks. If a human driver does not respond to requests for intervention, the vehicle can still safely pull over.

Level 5: Full Automation—The ADS can autonomously perform all driving tasks in all roadway and environmental conditions with no assistance required from humans.

Exhibit 66: Autonomous Vehicle Forecast (000s)

70,000 60,000 50,000 40,000 30,000 20,000 10,000 0 2019 2020 2021 2022 2023 2024 2025 Level 1 Level 2 Level 3 Level 4

Source: IDC, Oppenheimer & Co. Estimates

Level 1 and 2 vehicles represent most autonomous vehicles shipped today. In 2019, IDC estimated 31M autonomous vehicles shipped (Level 0 excluded), with Level 1 and 2 nearly 100% of the AV market. AV shipments are expected to double to 62M by 2025, with Level 1 and 2 approximating 96% of the market. Scaling Level 3 vehicles into the market will mark a key step toward broad AV adoption. With ADS as the “primary” driver, the industry will likely face stricter regulatory requirements along with hesitancy from consumers. As these challenges are overcome, we expect growing adoption for Level 3 and 4 vehicles. IDC forecasts Level 3 and 4 vehicles accounting for ~4% of the 62M AV market in 2025. Meaningful adoption of Level 5 vehicles is well beyond the forecast period.

Automotive manufacturers are developing an ecosystem of hardware and software systems to further vehicle autonomy. Full AV relies on the integration of numerous sensor and communication systems, including GPS, light detection and ranging (LiDAR), cameras, radar, infrared sensors, ultrasonic sensors, dedicated short range communication (DSRC), inertial navigation system (INS), odometry sensors, and maps. The vehicle needs to capture and process high quality images covering a 360⁰ field of vision around the vehicle. Considering the sizeable amount of data collected and processed, AVs are lightheartedly referred to as “datacenters on wheels.” The auto market is very well suited for on-board AI inference processors to execute real-time driving decisions.

51 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Exhibit 67: Select Vendors Making AV Chips

AI Vendor AI/ADAS Products Platform; Orin, Atlan Processor Mobileye Mobileye 8 Connect, EyeQ SoCs Tesla Proprietary (FSD) Full Self Driving Chip Horizon Robotics Horizon Journey Processor Indie Semiconductor FMCW Lidar; Sonosense. Ecosense Self-Parking Qualcomm Snapdragon 602 Automotive Platform NXP S32 Automotive Platform Jacinto 7 Processors ADI Drive 360 Ambarella CVflow Chip

Source: Company Reports, Oppenheimer & Co. Estimates

We see semiconductor IDMs using different approaches to develop/expand their automotive market opportunities. Accordingly, we’ve witnessed: 1) established automotive vendors bolstering their portfolio via M&A; 2) companies expanding existing product lines to fit automotive use-cases; and 3) emerging new startups addressing niche auto markets. Most notably in 2016, Qualcomm attempted to acquire NXP, the No. 2 automotive semiconductor supplier (eventually fell through due to regulatory issues). Infineon became the No. 1 automotive semi supplier after its acquisition of , a broad supplier of automotive products. Renesas, the No. 3 automotive semi supplier, acquired both IDT and Intersil. Analog Devices is acquiring Maxim Integrated, in large part to strengthen its catalog of automotive offerings.

NVIDIA, Intel, and Qualcomm are three of the largest semiconductor IDMs, with their primary businesses outside of the automotive space. That said, each has made inroads in auto. NVIDIA has its DRIVE AGX platform for robotics/AV. The NVIDIA DRIVE AGX is an AI software-defined platform that processes data from an array of sensors. The current generation Orin SoC (in production for 2022 models) itself was designed to infer deep neural networks, delivering Level 2 to Level 5 autonomous driving capabilities. NVIDIA suggests its next gen Atlan SoC can reach 1,000 TOPS, a ~4x improvement over Orin. Intel built upon its $15B acquisition of Mobileye to become a leader in ADAS. Mobileye’s fifth generation EyeQ5 SoC fuses all sensing data to achieve cohesive AV performance. Qualcomm is leveraging its platform of Snapdragon SoCs (originally for mobile devices) and connectivity solutions to gain a foothold into AV and vehicle-to-“X” communications (V2X). Qualcomm partnered with Veoneer to develop the Snapdragon Ride Platform for autonomous vehicles.

The auto industry has been historically challenging for semiconductor startups. Considering typically long auto product cycles, the strength of a startup’s balance sheet (or often lack thereof) can make or break a relationship with OEMs/suppliers. In essence, OEMs need to feel comfortable that their suppliers will be “in business” through a potential seven-year-plus product cycle. Additionally, stringent quality requirements and supply chain reliability present further risks. These barriers have led auto OEMs to give preference to established partners. While these risks remain top of mind for auto OEMs, this trend is beginning to change as innovative automotive technologies are increasingly developed by the startup community. We also see increasing competitive pressures from a wave of disruptive entrants, including Tesla, Waymo, Cruise, Aptiv, ARGO, Indie, Valens, Autotalks, AEVA, Navitas, and others. Notably, Tesla drew attention with the development of its internally designed Full Self-Driving SoC (the FSD; previously named Autopilot Hardware 3.0). The FSD includes ARM cores, a GPU, and two neural processor units. Auto manufacturers are recognizing the need to be faster and more flexible with

52 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

innovation. They also appear more open to working with startups. A host of startups including Horizon Robotics, indie Semiconductor, Hailo, GEO Semiconductor, and many more are working on AI chips to better facilitate autonomous driving and related technologies.

Endpoint Devices: PCs, Smartphones, Internet of Things

Compute, storage, and connectivity applications are increasingly permeating into consumer end devices. The consumer edge market was long dominated by desktop and notebook PCs, but have unsurprisingly been passed by smartphones. Internet of Things (IoT) presents a relatively new growth opportunity, opening the door to countless endpoint devices for semi providers. Enhanced by 5G and power efficient ARM chips, we see all consumer endpoint devices eventually adopting some degree of AI acceleration. Given low barriers to entry and a large market opportunity, we see established semiconductor vendors and startups increasingly entering this market.

PCs and Smartphones

As noted above, PCs were the traditional consumer endpoint computing device, prior to broad adoption of tablets and smartphones. The PC’s breadth of available applications and overall compute ability made it an essential consumer device. PCs have limited mobility due to higher power consumption and connectivity limitations. PCs are largely designed with x86 processors, which deliver high performance at the expense of high- power consumption. Further, many PCs are unable to connect to cellular networks, relegating them to areas where WiFi is available. With the advent of more mobile and power efficient ARM processors, computing is now available in a broader selection of devices, with smartphones growing to represent the largest install base. That said, we see PCs increasingly adopting features, including ARM-based processors and better connectivity interfaces to improve mobility and consumer experience. The PC market has been in slow decline but recovered in 2020 as the COVID pandemic accelerated work- from-home/learn-from-home trends. PC had a 1.8B install base in 2020 compared to a 4.3B install base for smartphones. The vast majority of today’s edge AI chips are used in high-end smartphones. Notably, about one-third of the 1.3 billion smartphone units shipped in 2020 contained an AI processor.

Exhibit 68: Global PC and Smartphone Install Base Exhibit 69: Smartphone Units Forecast (B)

5B 1.6B iOS 1.5 B Android 4B 4.3B 1.4B 1.3 B 3B Thousands 1.2B 2B 1.8B 1.B 1B

B 0.8B 2019 2020 2021 2022 2023 2024 2025 PCs/Tablets Smartphones

Source: Gartner, Oppenheimer & Co. Estimates Source: IDC, Oppenheimer & Co. Estimates

Apple, Samsung, Qualcomm, and MediaTek are leading smartphone edge processor development. Apple’s internally developed A14 Bionic SoC includes a dedicated 16-core

53 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Neural Engine. Samsung’s Exynos 2100 SoC features an AI engine with a 3-core NPU. Qualcomm and MediaTek are two prominent merchant handset SoC designers. Qualcomm’s Snapdragon 888 SoC includes dedicated AI cores and a Hexagon Vector 480 processor. MediaTek’s Dimensity 1200 SoC features a 6-core APU 3.0 for AI. These AI-enabled processors, in concert with software solutions, allow smartphone OEMs to provide an increasing number of value-added features. Some of these features include faster data processing and connectivity, better video quality, facial recognition, recommendation and sorting, camera image enhancement, language translation and voice assistants, among many others.

Embedded

ARM processors, and their associated power efficiency, are driving a new wave of connected consumer devices beyond smartphones. Coupled with AI cores, these edge AI chipsets are increasingly penetrating IoT devices such as smart home appliances, speakers, drones, tablets, wearables, hearables and cameras, to name a few. ARM has a vast ecosystem, with 25B units shipped in 2020, and >190B cumulative shipped to date. Enabling these devices with edge AI capabilities improves speed, response time, data security, and creates a better overall user experience.

Bringing AI capability to the edge presents a huge opportunity for embedded (MCU) systems. A microcontroller (MCU) is an integrated circuit designed to perform a single task or program within a device (e.g., TV remote control, functions on a microwave, etc.). MCUs are ubiquitous in all modern electronic devices. There are over 300 billion MCUs in the world today, and more than 26 billion units were sold in 2020 according to SIA. MCUs represented a $15B market in 2020, forecast to grow to $20B by 2025. MCUs are poised to achieve greater product penetration with emerging technologies in cloud, mobile, AI, and IoT. One of the most important trends in edge AI is the evolution of IoT machine learning, also referred to as TinyML (e.g., embedding ML accelerators onto single-board MCUs).

Exhibit 70: Microcontroller Units (B) and ASP ($)

30B $3.50 $3.00 $2.50 20B $2.00 $1.50 10B $1.00 $0.50

B $0.00

2003 2017 2000 2001 2002 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2018 2019 2020 MCU Units (B) ASP ($)

Source: SIA, Oppenheimer & Co. Estimates

TinyML applies inference workloads on small low-powered devices. Several advantages underpin TinyML, including lower latency, higher bandwidth, lower power consumption, and improved privacy capabilities. CPUs consume 65 to 85 watts, and GPUs can consume anywhere from 200W to 500W. Alternatively, a typical single MCU consumes milliwatts, or microwatts, roughly 1000x less power consumption compared to CPU/GPU.

54 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

These AI optimized devices enable inferencing at low cost. The average MCU price in 2020 was $0.60 and ASPs continue to fall. Google’s deployment of TensorFlow Lite, an open-source deep learning framework for mobile/edge inference, provided market validation for TinyML.

Embedded edge AI is attracting investment from both established semiconductor vendors and venture capital. Edge AI hardware design is often simpler as inference workloads don’t require the same raw computing power as training workloads. Accordingly, the less stringent requirement on compute has allowed entry for many startups to develop silicon on relatively older 28nm and 14nm process nodes. Additionally, since edge compute is utilized across a broad group of industries (finance, healthcare, retail, transportation, etc.) and endpoint applications (processing, to computer vision, memory), we see companies developing niche product competencies.

Silicon IP, Custom Silicon

Silicon IP allows vendors to use pre-designed or custom IP cores/blocks to accelerate development of differentiated silicon solutions. ARM is most notable in the semiconductor industry, licensing its ARM cores across various low power end point devices. Many established semiconductor vendors (and startups) design custom ASIC silicon. Broadcom and Marvell boast two of the largest custom ASIC design teams. Google used Broadcom’s custom ASIC design to manufacture the (TPU) and Groq partnered with Marvell to develop the Tensor Streaming Processor (TSP).

Exhibit 71: Edge and Endpoint Compute, Embedded, Startups, IP

Edge Compute Embedded Startups IP Firms Apple NXP BrainChip ARM Qualcomm STMicro Horizon Robotics Cadence Samsung Ambarella Gyrfalcon Rambus Intel Microchip Mythic CEVA NVIDIA Renesas GreenWaves AMD Synaptics Esperanto Broadcom MediaTek Achronix Blaize Marvell

Source: IDC, Company Reports, Oppenheimer & Co. Estimate

55 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Leading Public Companies Developing AI Silicon

Achronix Semiconductor— Founded in 2004, Achronix develops high-performance, standalone FPGAs and embedded FPGA (eFPGA) IP solutions. These include its flagship Speedster7t FPGA family (standalone devices), Speedcore eFPGA IP (embedded), VectorPath accelerator cards, and the Achronix tool suite. Achronix targets the high-end FPGA market (upwards of $10B market opportunity by 2025) and sees a broad opportunity for data acceleration and AI functionality. Accordingly, Achronix FPGAs and eFPGA IP solutions are well suited for applications in 5G infrastructure, AI and machine learning, ADAS, computational storage, SmartNIC, industrial and military, among others.

We see Speedster7t ASPs in the $1,000 range, with VectorPath cards in the $8,500 ASP range. Speedcore royalties result in much lower ASPs but typically are associated with very high volumes, with customers paying a larger up-front license fee. Currently, Achronix is in the process of going public via its recently announced SPAC combination with ACE Convergence Acquisition Corp. (Nasdaq: ACEV). The deal is expected to close in mid-2021.

The Speedster7t family is Achronix’s latest generation of standalone FPGAs. Designed on TSMC’s 7nm FinFET process technology, the Speedster7t family is optimized for AI/ML, compute acceleration and high-performance networking applications. Speedster7t devices feature high-speed, dedicated 2D network-on-chip (NoC) technology, and support GDDR6, PCIe Gen5, and 400G Ethernet interfaces with SerDes technology that operates up to 112 Gbps. Within the core fabric, Speedster7t devices include configurable machine learning processor (MLP) blocks that have up to 32 multipliers and support integer formats from 4-32 bits with a wide range of floating-point modes. Currently, Achronix has 26 customers sampling Speedster7t devices and VectorPath cards, and expects Speedster7t FPGAs to enter into full production in 2H21. Achronix is already working on its next generation (5nm) Speedster FPGA family, which will contribute meaningfully to the company’s growth in 2024. Management sees Speedster products as ~70% of its revenue mix long-term.

Achronix’s Speedcore eFPGA technology is delivered as an IP core that is embedded within a customer’s SoC or ASIC. When Speedcore IP is licensed, chip designers can chose how much logic, memory, and other common FPGA blocks are needed for their particular design. Following the up-front license fee, Achronix anticipates collecting royalty revenue on shipments of associated SoC/ASIC products. Achronix does not see large FPGA competitors (Intel/Xilinx) competing with its eFPGA solutions. While ASPs are lower, Speedcore gross margins are quite high (>90%) as Achronix does not bear the manufacturing costs. Management sees Speedcore IP as ~30% of its revenue mix long- term.

Achronix’s VectorPath accelerator cards, developed with BittWare, include one Speedster7t FPGA, 16 GB of GDDR6 memory (delivering 4 Tbps of GDDR6 bandwidth), 4 GB of DDR4 memory, 200G and 400G Ethernet optical ports, plus PCIe Gen5 support. The VectorPath card can achieve >80 TOPS of INT8 performance and is designed to handle a broad set of AI applications, including voice recognition, search, image recognition, recommendation engines and compute intensive financial analysis, among other use cases.

56 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

AMD—AMD has gradually gained ground on Intel in the server CPU market; however, Intel remains the dominant player. AMD x86 server share reached 7% in 2020 (per IDC; up from <1% in 2017), and we believe could reach 15% exiting 2021. AMD’s 3rd generation EPYC CPU “Milan” began initial production/shipments in 4Q20 and officially launched in mid-March 2021. Milan is built on TSMC’s 7nm process node and is powered by AMD’s Zen 3 core architecture. The Milan CPU generates a ~19% increase in instructions-per-clock vs. its prior gen Rome CPU. Management also claims performance up to 106% faster than competition in HPC and cloud applications, and up to 117% faster in enterprise applications. AMD’s 4th generation “Genoa” CPU is expected to be built on the 5nm process node with initial launch expectations for 2022. Early product specs leaked to the media (per tech blog Chips and Cheese; not confirmed by AMD) indicate Genoa performance potentially exceeding Milan by up to 29% like-for-like.

AMD’s flagship datacenter GPU is the Instinct MI100 accelerator (announced Nov. 2020). The MI100 is built on AMD’s 2nd generation 7nm CDNA architecture and is optimized for machine learning and high-performance compute applications. AMD suggests the MI100 delivers nearly a 7x increase in FP16 performance compared to AMD’s prior generation accelerators. The Radeon Instinct MI50 accelerator is built on AMD’s Vega 7nm architecture, and is designed for deep learning, high performance compute, cloud computing, and rendering system applications.

Beyond AMD’s existing CPU/GPU capabilities, AMD previously announced its planned all- stock acquisition of Xilinx (for 1.7234 shares of AMD stock per Xilinx share), which is expected to close by end-of-year 2021. Xilinx adds FPGA/SoC processor tech to AMD’s portfolio, broadens its datacenter reach, and opens new end market opportunities (more on Xilinx below).

Broadcom—Broadcom is a market leading provider of networking and connectivity solutions, with a notable presence in datacenter, telecom, and smartphone end markets (among many others). Specific to AI and compute acceleration, Broadcom offers a portfolio of SmartNICs/DPUs and custom ASICs. Broadcom’s Stingray SmartNIC offloads and accelerates critical infrastructure computations from the CPU, opening up compute bandwidth for application-related tasks. The Stingray SoC is built on a 16nm process node, and includes eight ARM Cortex A72 cores in addition to hardware accelerators for data flow, cryptographic security, storage processing, and PCIe connectivity. The Stingray is well suited for data-intensive applications including AI/ML. Notably, Broadcom announced in March 2020 that its Stingray 100G SmartNIC is powering Baidu Cloud Services.

In late 2020, Broadcom started sampling its first 5nm ASIC for datacenter and cloud infrastructure. Compared to the prior gen, the new ASIC provides a 2x increase in on-die training/inference compute, 2x higher bandwidth (112-Gbps SerDes), 2–4x higher memory bandwidth and up to a 30% reduction in power consumption. Broadcom is also in development of additional ASICs targeting AI, HPC, and 5G wireless infrastructure. Broadcom is also an early leader in silicon optics for high-speed DC connectivity.

Intel— Intel holds the dominant market position in the datacenter server processor market with nearly 90% CPU share. Specifically to AI, CPUs are used more broadly for inference today, and Intel remains the leading player, although GPU/ASIC-based inference is slowing gaining ground. Beyond its strong CPU positioning, Intel also makes FPGAs and ASICs for AI workloads.

Intel’s 3rd Gen Xeon Scalable Processor Ice Lake officially launched in April 2021 (although shipped more than 200K units pre-launch). Ice Lake is built on Intel’s 10nm process and is designed for single or dual socket systems, with up to 40 cores per socket. 57 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Ice Lake generates ~46% average performance improvement over the prior gen, based on popular datacenter workloads. Ice Lake includes Intel’s DL Boost technology, providing built-in AI acceleration. When layered with software optimizations, Ice Lake delivers a 74% improvement in AI performance vs. the prior gen. Intel’s Deep Learning Boost (DL Boost) is an AI-specific instruction set that accelerates the compute of the CPU. DL Boost utilizes Vector Neural Network Instructions (VNNI), which use a single instruction for deep- learning computations that previously required three separate instructions. VNNI also uses INT8 data instead of FP32, which increases power efficiency by lowering compute/memory bandwidth requirements. INT8 inference produces significant performance benefits with minimal accuracy loss. Included within Intel’s 3rd Gen products is its 14nm Cooper Lake CPU, focused on specific four- and eight-socket systems and for specific AI workloads. Following Ice/Cooper Lake, Intel’s 4th Gen Sapphire Rapids, built on Intel’s 7nm process, is expected to launch in 1H22.

Intel acquired Altera in 2015 for $16.7B, adding FPGA capabilities to its product portfolio. Intel’s Stratix 10 NX FPGA is its first AI optimized FPGA chip and is currently in production. Built on Intel’s 14nm process node, the Stratix 10 NX realized up to 15x higher INT8 compute performance vs. its prior Stratix 10 MX for AI workloads. The Stratix is targeted for AI workloads requiring numerous variables to be evaluated in real-time across multiple nodes and is well suited for natural language processing and financial fraud applications. Intel’s next-gen Agilex FPGA, based on 10nm, boasts increased performance/watt compared to the Stratix 10 NX.

To further augment its AI chip solutions, Intel acquired Habana Labs in December 2019 for $2B. Habana Labs is a developer of datacenter AI processors optimized for training deep neural networks and for inference workloads. Habana’s Gaudi deep learning (training) processor features eight programmable Tensor Processing Core (TPC) 2.0 cores and is compatible with a broad array of development tools, data types, and libraries. A single Gaudi card running a TensorFlow ResNet50 model delivers 1,590 images per second of training throughput, with eight Gaudi cards achieving 12,008 images per second. Notably, in December 2020, AWS announced it had chosen Habana’s eight-card Gaudi processor solution for its EC2 training instances. Further, AWS executives suggest Gaudi provides ~40% better price performance than current GPU-based EC2 instances.

Habana’s Goya inference processor boasts superior performance with lower latency, power efficiency, and cost savings for cloud, DC, and emerging use cases. Goya also includes eight TPC programmable cores. Goya (in November 2020) provided performance results on ResNet50, outlining 15,488 images per second at 0.8ms latency.

Intel completed its acquisition of Mobileye in 2018. Mobileye augmented its capabilities in automated driving solutions. Mobileye gave Intel inroads to the burgeoning AI-auto opportunity, and we expect it to scale further as semiconductor dollar content continues to rise in new vehicles. Mobileye is utilized by over 25 global auto OEMs, is present in over 60M vehicles worldwide, and is designed into over 300 car models.

Marvell—Marvell provides a broad range of data infrastructure products, including networking and storage solutions, for datacenter, enterprise, wireless infrastructure, and automotive end markets. Marvell has built a top merchant ASIC franchise, with >2K custom ASICs developed over the last 25 years. Marvell’s ARM-based OCTEON processor platform powers its data processing unit (DPU) and wireless network infrastructure offerings. Marvell sees opportunity for DPU acceleration across its end markets with applications including SmartNIC offloading, security, video transcoding and storage virtualization, among others. Its 5nm product portfolio delivers ~20% faster speeds with ~40% power reduction vs. its prior gen 7nm portfolio. Marvell is currently engaged with several datacenter and automotive OEMs to develop ASIC solutions. Notably, Groq partnered with Marvell to develop its Tensor Streaming Processor. With 58 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

recent acquisition of IPHI, we see MRVL well positioned in the emerging silicon optics/high speed DC connectivity market.

NVIDIA—The AI accelerator 800lb gorilla, NVIDIA has built an early AI scale advantage, utilizing the parallel processing abilities of its GPUs to accelerate AI computations. NVIDIA continues to advance its solution set and estimates a datacenter AI accelerator market opportunity of $100B by 2024. This includes $45B from hyperscalers, $30B from enterprise, $15B from edge, and $10B from HPC. NVIDIA has built significant market share in AI training (~99%) and is making meaningful progress on AI inference (we estimate ~20% share today). Notably, NVIDIA’s A100 GPU, built on its 7nm Ampere architecture, is designed to better facilitate both training and inference workloads, a concept we see gaining traction with other providers. GPU cloud compute capacity already exceeds CPU for inference, and we see A100 accelerating inference share gains vs. CPU as new workloads proliferate.

Exhibit 72: Total Cloud AI Inference Compute Capacity (Peta Ops)

Source: NVIDIA Fall 2020 GTC Presentation

The A100 is NVIDIA’s 3rd generation Tensor Core GPU and offers 312 TFLOPS of performance, a 20x improvement over NVIDIA’s prior generation V100 (Volta) GPU. NVIDIA also provides a more robust integrated AI system called the DGX A100, utilizing eight A100 GPUs. The DGX A100 delivers five PetaFLOPS of performance in a single system, supports a broad array of software stacks (including Spark 3.0, RAPIDS, TensorFlow, PyTorch, and Triton), is optimized for elastic scale-up or scale-out computing, and is highly expandable via Mellanox networking. The new system provides a significant upgrade over existing GPU/CPU-based systems for training/inference. For perspective, NVIDIA suggests that an existing system consisting of 50 DGX-1 nodes (for training) and 600 CPU nodes (for inference) can be replaced by just five DGX A100 nodes (for training and inference) at 1/10th of the cost and only 1/20th the power consumption. The A100 is also a “multi-instance GPU,” meaning cloud providers can split the processing capabilities of an individual A100 amongst multiple customers. This is useful as not all customers purchasing cloud-based AI instances require the full A100 capability. This allows for more efficient use of the chip’s overall compute capacity.

59 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Beyond its robust GPU portfolio, NVIDIA offers DPUs and more recently announced its first CPU offering. In mid-April, NVIDIA announced its next gen Bluefield 3 DPU. Bluefield 3 is NVIDIA’s DC infrastructure processor/accelerator, designed to offload critical infrastructure management workloads, freeing up CPU bandwidth for application specific processes. Bluefield 3 provides a 10x performance boost vs. Bluefield 2. NVIDIA also provided an early look at its Bluefield 4 DPU, suggesting it can achieve 1,000 TOPS of compute performance. Also in mid-April, NVIDIA announced its ARM-based Grace CPU (due 2023), initially targeted at heavy compute HPC/hyperscale AI (e.g., Training). Grace allows NVIDIA to tap into a new $25B opportunity. When coupled with GPU, management expects 10x performance vs. today’s x86-based DGX systems.

Software is a critical element to the functionality and efficiency of accelerated compute. NVIDIA’s CUDA platform has become foundational for many of the world’s fastest computers. CUDA is a parallel computing platform and programming model, enabling developers to speed up compute-intensive applications utilizing NVIDIA’s GPUs. Management estimates 6M CUDA downloads in 2020 (with 20M downloads to-date) and estimates 2.3M developers on the platform. NVIDIA also provides a software SDK for its Bluefield DPU products, called DOCA 1.0. We view DOCA’s relationship to the DPU similarly to CUDA’s with the GPU.

In the datacenter, NVIDIA has already built a strong AI customer base. NVIDIA counts all the largest cloud hyperscalers as customers, including Alibaba Cloud, AWS, Baidu AI Cloud, Google Cloud, Microsoft Azure, Oracle Cloud, Tencent Cloud, and IBM Cloud. NVIDIA sells to eight of the ten fastest supercomputers in the world, with GPUs, networking or both, including the No. 1 system in the US and Europe. NVIDIA also supplies a multitude of enterprise customers including HP, , Lenovo, and Fujitsu.

NVIDIA continues to develop its Edge compute applications, including its Drive platform (for automotive), Jarvis Platform (for conversational AI), Clara platform (for healthcare), Isaac platform (for robotics), and its Merlin recommendation system. In addition to hardware offerings, NVIDIA provides pre-trained models (available in the cloud from its NGC software catalog) for its various Edge verticals. While each vertical is gaining traction, NVIDIA sees recommendation systems as its most important Edge pipeline today. The pending strategic acquisition of ARM, set to close next year, bolsters NVIDIA’s mobile reach at the edge.

In our view, NVIDIA continues to be a thought leader. Within the semiconductor space, NVIDIA was one of the first to recognize the AI market opportunity and notably pivoted its offerings to address the market more quickly than peers. NVIDIA has continually demonstrated performance leadership on industry-standard benchmarks like MLPerf (both inference and training), and publishes large volumes of performance data on its developer web site. While NVIDIA has an inherent scale advantage, it is facing increasing competition from ASICs, FPGAs, and co-processors. That said, we expect NVIDIA to maintain dominant share of GPU accelerators for the foreseeable future, as it further scales in DCs and as the edge market proliferates.

NXP—NXP provides a suite of products to enable AI machine learning at the edge. Its EdgeVerse platform offers a diverse portfolio of and AI optimized application processors. Coupled with wireless connectivity and secure authentication solutions, NXP provides a path for accelerating ML applications in the automotive, industrial, and IoT markets. Led by the new i.MX 9 series, NXP’s application processor integrates dedicated neural processing units (NPU) for ML applications (graphics, image, display, audio, and voice) for a range of edge devices. i.MX 9 features its Energy Flex architecture to optimize energy efficiency and EdgeLock for enhanced security. The first chips will be manufactured using 16/12nm FinFET process technology.

60 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

NXP’s EdgeVerse platform also includes i.MX RT Crossover MCUs, which combine an ARM Cortex-M with high performance DSP cores. Additionally, its S32 platform of MCUs/MPUs targets automotive connectivity, security, and safety applications, among others. NXP’s eIQ ML software development platform eases deployment of inference algorithms across its MCUs, and i.MX family of SoCs. NXP was the No. 1 MCU market player in 2020.

Qualcomm—Qualcomm was foundational in the development of mobile chip platforms, utilizing ARM-based chip designs to manage power efficiency. With a long history of mobile device success, Qualcomm is broadening its energy efficient portfolio to include additional AI use cases. Qualcomm’s technology is well suited for edge AI applications, and should continue to see rapid growth as IoT use cases ramp. Accordingly, QCOM sees AI attach eventually reaching 100% for all edge products. Automotive appears another natural extension, with its Snapdragon Ride Platform, and should incrementally contribute as the overall auto semi market continues to scale. Qualcomm also provides AI accelerators for datacenters and 5G base stations, among other customers, who are increasingly demanding high levels of compute with a lower power footprint.

Qualcomm’s flagship mobile platform, the Snapdragon 888 (we suspect powering most 2021 Android flagship handsets), brings incremental gen/gen performance utilizing its 6th generation AI Engine with its Hexagon 780 processor. This new architecture removes the physical distance between its scalar, vector, and tensor accelerators; and fuses them together, creating a single large accelerator. The new architecture, in combination with 16x larger shared memory, more powerful underlying accelerators, and better GPU performance, is driving incrementally better performance. Accordingly, the Snapdragon 888 boasts max performance of 26 TOPS (vs. ~15 TOPS in Snapdragon 865), with performance/watt up 3x vs. the prior generation. QCOM’s mobile platforms drive multiple AI-based experiences including enhanced photography, gaming, and real-time AI voice translation, all with low power.

In Auto, Qualcomm launched its Snapdragon Ride Platform in January 2020 and recently expanded its product roadmap. Snapdragon Ride supports a broad range of ADAS and autonomous driving applications. These applications range from Level 1 ADAS solutions with ten TOPS of performance, to Level 4 fully autonomous driving solutions with 700-plus TOPS of performance. Additionally, Snapdragon Ride offers a broad software ecosystem, with several available stacks to support vision perception, parking, and driver monitoring.

Qualcomm’s Cloud AI 100 accelerator is built on a 7nm process node and targets inference workloads. The Cloud AI 100 can handle a wide range of use-cases from datacenter to 5G infrastructure and even edge applications. The Cloud AI 100 achieves performance up to 400 TOPS with up to 75W of power consumption. QCOM views the Cloud AI 100 as an ASIC/co-processor and is agnostic to CPU integrations. That said, QCOM is partnering with AMD/Gigabyte on an AI inferencing platform. The Gigabyte server system includes two AMD EPYC 7003 (Milan) processors and up to 16 QCOM Cloud AI 100 accelerators. Combined, one server can generate up to 6.4 PetaOPS of performance with a 19-plus card server rack generating >100 PetaOPS of performance. Additionally, one Cloud AI 100 can process ~19K images/second on ResNet50, scaling up to ~6M images/second with a full rack system.

Xilinx—Xilinx provides FPGA and programmable SoC products for applications including artificial intelligence and compute acceleration. Xilinx’s Adaptive Compute Acceleration Platform (ACAP) is a multi-core compute platform that is dynamically customizable for both hardware and software. Xilinx’s Versal AI Core Series ACAP provides multiple compute engines including scalar, adaptable, and intelligent engines. Utilizing its integrated AI engines, the Versal AI Core series is geared toward AI inference and 61 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

wireless acceleration, with Xilinx suggesting more than 100x compute performance improvement versus existing server class CPUs.

Xilinx’s Alveo line of datacenter accelerator cards are designed for parallel processing applications utilizing large data sets. Accordingly, these accelerators are well suited for activities such as video transcoding, financial analysis, genomics, and machine learning, among others. Xilinx also offers an AI development environment named Vitis, supporting mainstream frameworks including Caffe, PyTorch, and TensorFlow. Vitis is optimized for AI inference on Xilinx hardware, including both edge and datacenter applications.

62 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Leading Startup Companies Developing

AI Silicon

Blaize—Blaize, founded in 2010, is an AI startup focused on edge compute workloads. Blaize’s Graph Streaming Processor (GSP) is a fully programmable graph-native architecture driving high performance and scalability with low power consumption. Accordingly, its 2nd gen GSP El Cano generates 16 TOPS of INT8 compute with typical power consumption of ~7W. Blaize’s edge compute platforms Pathfinder and Xplorer, in combination with its software suite (including Blaize Picasso SDK and AI Studio), allow developers to build edge AI applications across a wide array of end markets. Some of its highlighted use-cases include smart applications (i.e., retail, city, manufacturing, vision), automotive, robotics, security, and edge server.

Cerebras Systems—Cerebras Systems has architected the largest AI processor to date, which it calls the Wafer Scale Engine (WSE). Now on its 2nd generation, Cerebras’ WSE-2 is 56x larger than the largest competing GPU processor, containing 2.6T transistors, 850,000 cores, and 40GB of on-chip memory. The chip is built on TSMC’s 7nm process and features 20PB/s memory bandwidth. These are all significant improvements over the 1st generation WSE-1. The WSE-2 is designed for sparsity, essentially eliminating computations on zeros in the neural network. In addition to sparsity, Cerebras sees its high core count and memory-processor proximity as competitive advantages, offering exceptional performance at lower latency and energy consumption levels. Cerebras’ datacenter solution, the CS-2, contains a single WSE-2 and is seeing a more than 2x improvement in training time based on a BERT-style network. For perspective, scientists at the US National Energy Technology Center (NETL) created a simulation testing the prior gen CS-1 vs. the Joule supercomputer. The CS-1 completed the simulation 200x faster than Joule, which utilizes 84,000 CPU cores and consumes 450kW of power.

Cerebras’ software platform also integrates with widely used machine learning frameworks/libraries, including TensorFlow and PyTorch. The CS-2 solution can be scaled out for greater combined performance. Considering the size and commensurate compute power created by the CS-2, it takes fewer CS-2 systems to reach the same level of compute as GPU-based systems. Additionally, datacenter integration is more efficient and Cerebras claims higher utilization with distributed training across CS-2 nodes. Cerebras most recently raised $276M in a November 2019 Series E round, with ~$475M raised to-date.

EdgeCortix—EdgeCortix, founded in mid-2019, is a fabless semiconductor design company headquartered in the United States, with teams located in both Japan and the US. The company is focused on bringing intelligence to edge devices and servers, with its hardware & software co-design platform, proprietary compiler, and AI hardware intellectual property (IP). The company’s edge-AI acceleration engine aims to fine-tune AI model deployment, bringing cloud-level performance to edge applications. EdgeCortix’s Dynamic Neural Accelerator (DNA) architecture is runtime reconfigurable, scalable, and power- efficient; and is designed for low-latency AI Inference at the edge. DNA, coupled with its proprietary MERA software compiler, allows engineers to optimize machine learning specific processor development, reducing time to market and driving cost savings. EdgeCortix DNA IP series is designed to scale across different technology nodes, and together with the MERA compiler, can be implemented on ASIC system on chips or off-the- shelf FPGAs, and are also optimized for small-batch or batch size-1 inference. Recently, EdgeCortix announced a partnership with PALTEK, to integrate its solution with the Xilinx Alveo FPGA Accelerator Cards. This card leverages EdgeCortix’s latest Dynamic Neural

63 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Accelerator DNA-F400 hardware IP, and MERA compiler, bringing efficient AI inference capabilities. Management suggests the card delivers ~seven INT8 TOPS at 300MHz on Alveo U50 and ~17 INT8 TOPS on Alveo U250. The company is currently in the process of putting the final touches for production of a demonstration ASIC system on chip using its DNA IP, with the fabrication provider TSMC (Taiwan Semiconductor Manufacturing Company) at a 12nm technology node. Based on a recent leading semiconductor analyst report, this chip is expected to deliver 54 INT8 TOPS of compute at under 10 watts, with class leading efficiency and latency for AI inference.

Flex Logix—Flex Logix, founded in 2014, is a provider of edge inference processors and embedded-FPGA IP cores. Flex Logix’s InferX X1 inference co-processor is optimized for large models, processing megapixel images with a batch size of one. Management suggests the InferX X1 generates 7.5 TOPS of INT8 compute, but is able to outperform accelerators with many more TOPS. This is because the InferX architecture focuses on maximizing the utilization of the compute resources (efficiency), resulting in an inference throughput/$ advantage of 10-100x depending on the workload. InferX has a reconfigurable architecture, enabling it to reconfigure the data paths between its numerous tensor processor units, and to optimize itself for each layer of a neural network. The InferX architecture leverages intellectual property first developed for embedded FPGA. Flex Logix has a six year history of developing embedded FPGA cores (eFPGAs), with commercial EFLX eFPGA products ranging from 180nm-12nm processes. Flex Logix is in development of its next gen eFPGA on TSMC’s 7/6nm process.

Graphcore—Founded in 2012, UK-based Graphcore is an AI processor startup developing the Intelligence Processing Unit (IPU). The IPU is a parallel processing platform optimized for AI and machine learning workloads. The IPU can be utilized for both training and inference applications, or what Graphcore describes as “machine intelligence.” Graphcore’s second generation Colossus MK2 GC200 IPU (released in July 2020) is built on TSMC’s 7nm process, contains 59.4B transistors and 1,472 processor cores. The MK2 offers 250 TFLOPS of FP16 performance, a ~3–4x increase from its prior generation MK1 processor. Expanding on its individual MK2 processor, Graphcore’s scalable IPU-M2000 datacenter solution includes four MK2 IPUs and delivers one PFLOP of FP16 compute capability. The M2000 machine utilizes a 1U server blade, which creates easier scalability.

In late December 2020, Graphcore raised $220M in a Series E funding round that gave Graphcore total cash approximating $440M and a valuation nearing $3B. The cash was earmarked for incremental technology investment in addition to general balance sheet improvement ahead of a potential public listing. Graphcore is one of the best capitalized AI startups having raised in excess of $700M to date exiting 2020.

Groq—Groq, founded in 2015 and based in Mountain View, CA, is an AI accelerator startup that designed the Tensor Streaming Processor (TSP), branded as GroqChip. Inference is Groq’s initial target application; however, management sees its GroqChip accelerator eventually used for both training/inference workloads. The first gen chip is built on a 14nm process node with ~27B transistors and 220MB of on-die SRAM. Groq approached its design with a software-first mindset, creating a simpler processor architecture and allowing software to control/manipulate the most complex allocations of compute resources. The GroqChip includes a single large processor core with hundreds of functional supporting units. This is in lieu of a multi-core architecture. The compiler for the chip knows exactly how long it takes to complete each computation, allowing for more efficient data flow and better overall processor performance. It also limits the amount of additional tuning needed for an algorithm to meet hardware requirements.

64 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

The GroqChip delivers deterministic performance, reaching exactly the same result (output and time to process) each time a specific model is processed. The chip can also process at batch size one, which is well suited for workloads requiring real-time responsiveness (i.e. autonomous driving). Groq sees this as a notable advantage compared to GPU-based systems, as GPUs tend to see higher latency when processing at lower batch levels.

From a performance perspective, Groq claims its chip can deliver up to 250 TFLOPS of FP16 performance and one PetaOPS of INT8 performance. Groq also offers a scalable server solution called GroqNode, with each node containing eight GroqCards (each card including one GroqChip). The GroqNode can deliver six PetaOPS of INT8 and 1.5 PFLOPS of FP16 performance with 3.3kW’s of power consumption. Groq completed a $300M Series C funding round in April 2021, bringing its total capital raised to $367M. Groq is currently in development on its next gen AI processor.

Mythic—Mythic takes a unique approach to AI inference as it developed an analog compute-in-memory architecture. The Mythic M1108 Analog Matrix Processor (AMP) integrates 108 AMP tiles (each with an analog compute engine) and can store up to 113 weight parameters, without requiring external memory. The M1108 delivers peak performance of 35 TOPS with low latency and typical power consumption of ~4W at peak throughput. With the lower cost of its 40nm process and lack of external memory (limits latency/power consumption of data movement), Mythic claims up to a 10x cost advantage over comparable architectures.

From a software perspective, Mythic is compatible with existing AI ecosystems including ONNX and TensorFlow. Mythic’s SDK includes an optimization suite that converts trained neural network data from 32-bit floating point data to 8-bit integer data (better for edge applications) and makes the data compatible with analog compute-in-memory. The SDK also includes a graph compiler, which automatically converts the neural network graph to machine code to run through Mythic’s processor. The M1108 is well suited for a wide range of edge applications, including smart home, drones, video, smart city, and factory automation among others.

SambaNova Systems—SambaNova was founded in 2017 and is developing a full-stack, software-defined hardware approach to AI computing. Samba Nova’s processor, the Cardinal SN10 Reconfigurable Dataflow Unit (RDU), is built on TSMC’s 7nm process with 40B transistors and does not have a designated instruction set. Alternatively, the RDU prioritizes optimal data flow through the processor, minimizes bottlenecks, and prevents excess caching and data movement relative to existing core-based architectures (i.e., CPU/GPU). This frees up processor bandwidth for incremental compute functionality. The RDU is programmable to meet model specifications, creating better processor optimization, higher throughput, and lower latency.

SambaNova places significant attention on its software stack SambaFlow. SambaFlow is designed to operate with a user’s new or existing models, and automatically optimizes the RDU to process a model in the most efficient way (eliminating the need for excess model optimization/tuning). SambaFlow also integrates with existing machine learning frameworks including PyTorch and TensorFlow. As researchers scale out workloads across multiple RDUs, SambaFlow automates that interaction to efficiently utilize available processing capabilities.

SambaNova’s DataScale system is a rack-based, datacenter accelerator system integrating its RDU and software technologies. One DataScale node includes eight Cardinal SN10 RDUs and fits into a quarter of a datacenter rack. DataScale can include one or more nodes, and incorporates integrated networking and management infrastructure. SambaNova views its full-stack solution as a competitive advantage, noting 65 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

that new processor tech alone isn’t enough to fully tackle the compute challenges inherent in AI today. As the lines continue to blur between distinct training and inference workloads, SambaNova’s solution can effectively bridge that gap. DataScale is well suited for a broad range of workloads, including AI training, AI inference, data & analytics, and high- performance compute. DataScale also enables easy integration into existing datacenter infrastructure and is capable of processing customer workloads within about 45 minutes out-of-box.

As the need for AI-based compute grows, companies often lack the technical expertise or employee bandwidth to effectively manage their own AI infrastructure. To address this issue, SambaNova created Dataflow-as-a-Service, a monthly subscription-based service allowing customers access to language, recommendation, and vision solutions. Dataflow- as-a-Service delivers the performance and accuracy in line with SambaNova’s DataScale platform and provides similarly fast customer onboarding. SambaNova is very well funded, with $1.1B raised to-date, following its $676M Series D capital raise in April 2021.

SiFive—SiFive is a leading provider of RISC-V based technology and was founded by the inventors of the RISC-V instruction set architecture. SiFive’s core IP portfolio ranges broadly from high performance multi-core processors all the way to smaller, low power embedded microcontrollers. SiFive also has a broad suite of software development tools.

SiFive’s Intelligence platform combines both hardware and software, creating a solution for energy efficient inference acceleration. Intelligence includes SiFive’s RISC-V Core IP, RISC-V Vector (RVV) support and various software tools/hardware extensions. Intelligence is programmable, scalable, and configurable to meet a broad range of AI requirements, is compatible with many of the most popular ML models and supports the full capability of TensorFlow Lite. The SiFive Intelligence X280 RISC-V processor with vector/Intelligence extensions is specifically designed for AI/ML edge applications.

Notably, Tenstorrent recently (April 2021) announced it will license SiFive’s Intelligence X280 processor IP and include it within its most recent AI training/inference processor. SiFive is partnering with Renesas to bring high-end RISC-V solutions to automotive applications (including AI capabilities). SiFive also extended its partnership with Samsung Foundry to accelerate the development of AI/ML inference/training SoCs based on RISC- V.

Tenstorrent—Tenstorrent, founded in 2016, is an AI startup headquartered in Toronto, Canada. As AI models become larger and more complex, Tenstorrent is acting on a growing need for programmable and efficient AI processors with the ability to better meet existing/new model computational requirements. Management suggests its processor technology creates higher compute efficiency through conditional computing and dynamic sparsity handling (eliminating unnecessary computations or irrelevant results), breaking the linearity of model size and pure compute power.

Tenstorrent’s Grayskull processor, built on GlobalFoundries’ 12nm process, was designed predominantly for inference applications and is dynamically adaptable to the exact inputs of a model. Grayskull includes 120 Tensix cores, each including five RISC cores, one compute engine, and 1MB of local SRAM. In total, Grayskull has 120MB of local SRAM and supports up to 16GB of external DRAM. Grayskull can deliver 368 TOPS of INT8 performance with 65 watts of power consumption. Tenstorrent’s next-gen Wormhole processor (currently in the lab) is expected to begin sampling mid-2021. Wormhole targets AI training and delivers ~2x better performance vs. Grayskull.

66 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Stock Prices of Other Companies Mentioned in This Report (as of 6/3/21):

CEVA, Inc. (CEVA, $43.45, Not Covered) International Business Machines Corporation (IBM, $145.55, Not Covered) MediaTek Inc (2454-TW, NT$981, Not Covered) Incorporated (MCHP, $151.71, Not Covered) Rambus Inc. (RMBS, $19.24, Not Covered) Corporation (6723-JP, ¥1268, Not Covered) Samsung Electronics Co., Ltd. (005930-KR, KRW82800, Not Covered) STMicroelectronics NV (STM-FR, €30.13, Not Covered) Taiwan Semiconductor Manufacturing Co., Ltd. Sponsored ADR (TSM, $116.82, Not Covered) Xilinx, Inc. (XLNX, $125.83, Not Covered)

67 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

Disclosure Appendix

Oppenheimer & Co. Inc. does and seeks to do business with companies covered in its research reports. As a result, investors should be aware that the firm may have a conflict of interest that could affect the objectivity of this report. Investors should consider this report as only a single factor in making their investment decision.

Analyst Certification - The author certifies that this research report accurately states his/her personal views about the subject securities, which are reflected in the ratings as well as in the substance of this report. The author certifies that no part of his/her compensation was, is, or will be directly or indirectly related to the specific recommendations or views contained in this research report. Potential Conflicts of Interest: Equity research analysts employed by Oppenheimer & Co. Inc. are compensated from revenues generated by the firm including the Oppenheimer & Co. Inc. Investment Banking Department. Research analysts do not receive compensation based upon revenues from specific investment banking transactions. Oppenheimer & Co. Inc. generally prohibits any research analyst and any member of his or her household from executing trades in the securities of a company that such research analyst covers. Additionally, Oppenheimer & Co. Inc. generally prohibits any research analyst from serving as an officer, director or advisory board member of a company that such analyst covers. In addition to 1% ownership positions in covered companies that are required to be specifically disclosed in this report, Oppenheimer & Co. Inc. may have a long position of less than 1% or a short position or deal as principal in the securities discussed herein, related securities or in options, futures or other derivative instruments based thereon. Recipients of this report are advised that any or all of the foregoing arrangements, as well as more specific disclosures set forth below, may at times give rise to potential conflicts of interest. Important Disclosure Footnotes for Companies Mentioned in this Report that Are Covered by Oppenheimer & Co. Inc: Stock Prices as of June 3, 2021 Aeva Inc. (AEVA - NYSE, $10.19, OUTPERFORM) Analog Devices (ADI - NASDAQ, $162.07, OUTPERFORM) Alphabet Inc. (GOOG - NASDAQ, $2,404.61, OUTPERFORM) Amazon.Com, Inc. (AMZN - NASDAQ, $3,187.01, OUTPERFORM) Netflix, Inc. (NFLX - NASDAQ, $489.43, OUTPERFORM) Facebook, Inc. (FB - NASDAQ, $326.04, OUTPERFORM) Intel Corp. (INTC - NASDAQ, $56.24, PERFORM) NVIDIA Corp. (NVDA - NASDAQ, $678.79, OUTPERFORM) (AMD - NYSE, $80.28, PERFORM) Broadcom Ltd. (AVGO - NYSE, $464.80, OUTPERFORM) Marvell Technology Group (MRVL - NASDAQ, $47.18, OUTPERFORM) Apple Inc. (AAPL - NASDAQ, $123.54, OUTPERFORM) QUALCOMM Incorporated (QCOM - NASDAQ, $131.78, PERFORM) Alibaba Group Holding Ltd. (BABA - NYSE, $217.04, OUTPERFORM) Baidu.com, Inc. (BIDU - NASDAQ, $189.97, OUTPERFORM) Microsoft Corporation (MSFT - NASDAQ, $245.71, OUTPERFORM) Tencent Holdings Ltd. (TCEHY - OTC PK, $78.35, OUTPERFORM) Maxim Integrated Products (MXIM - NASDAQ, $100.19, PERFORM) Aptiv plc (APTV - NYSE, $155.99, OUTPERFORM) Tesla, Inc. (TSLA - NASDAQ, $572.84, OUTPERFORM) Ambarella Inc. (AMBA - NASDAQ, $99.42, PERFORM) Synaptics, Incorporated (SYNA - NASDAQ, $129.26, OUTPERFORM) CEVA Inc. (CEVA - NASDAQ, $43.45, OUTPERFORM)

68 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

All price targets displayed in the chart above are for a 12- to- 18-month period. Prior to March 30, 2004, Oppenheimer & Co. Inc. used 6-, 12-, 12- to 18-, and 12- to 24-month price targets and ranges. For more information about target price histories, please write to Oppenheimer & Co. Inc., 85 Broad Street, New York, NY 10004, Attention: Equity Research Department, Business Manager.

Oppenheimer & Co. Inc. Rating System as of January 14th, 2008: Outperform(O) - Stock expected to outperform the S&P 500 within the next 12-18 months. Perform (P) - Stock expected to perform in line with the S&P 500 within the next 12-18 months. Underperform (U) - Stock expected to underperform the S&P 500 within the next 12-18 months. Not Rated (NR) - Oppenheimer & Co. Inc. does not maintain coverage of the stock or is restricted from doing so due to a potential conflict of interest. Oppenheimer & Co. Inc. Rating System prior to January 14th, 2008: Buy - anticipates appreciation of 10% or more within the next 12 months, and/or a total return of 10% including dividend payments, and/or the ability of the shares to perform better than the leading stock market averages or stocks within its particular industry sector. Neutral - anticipates that the shares will trade at or near their current price and generally in line with the leading market averages due to a perceived absence of strong dynamics that would cause volatility either to the upside or downside, and/ or will perform less well than higher rated companies within its peer group. Our readers should be aware that when a rating change occurs to Neutral from Buy, aggressive trading accounts might decide to liquidate their positions to employ the funds elsewhere. Sell - anticipates that the shares will depreciate 10% or more in price within the next 12 months, due to fundamental weakness perceived in the company or for valuation reasons, or are expected to perform significantly worse than equities within the peer group.

Distribution of Ratings/IB Services Firmwide

IB Serv/Past 12 Mos.

Rating Count Percent Count Percent

OUTPERFORM [O] 455 70.11 239 52.53 PERFORM [P] 194 29.89 69 35.57 UNDERPERFORM [U] 0 0.00 0 0.00

Although the investment recommendations within the three-tiered, relative stock rating system utilized by Oppenheimer & Co. Inc. do not correlate to buy, hold and sell recommendations, for the purposes of complying with FINRA rules, Oppenheimer & Co. Inc. has assigned buy ratings to securities rated Outperform, hold ratings to securities rated Perform, and sell ratings to securities rated Underperform. Note: Stocks trading under $5 can be considered speculative and appropriate for risk tolerant investors.

Company Specific Disclosures

CANTA-STO, LVT-AU, MDXH-BB, NYAX-TASE, SANION-SE, STLC-CN, TCEHY: This research report is intended for use only by institutions to which the subject security or securities may be sold pursuant to an exemption from state securities registration in the state in which the institution is located.

In the past 12 months Oppenheimer & Co. Inc. has provided investment banking services for AEVA.

Oppenheimer & Co. Inc. expects to receive or intends to seek compensation for investment banking services in the next 3 months from AEVA and BABA.

69 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

In the past 12 months Oppenheimer & Co. Inc. has provided non-investment banking, securities-related services for SYNA.

In the past 12 months Oppenheimer & Co. Inc. or an affiliate has received compensation for non-investment banking services from SYNA.

Additional Information Available

Company-Specific Disclosures: Important disclosures, including price charts, are available for compendium reports and all Oppenheimer & Co. Inc.-covered companies by logging on to https://www.oppenheimer.com/client- login.aspx or writing to Oppenheimer & Co. Inc., 85 Broad Street, New York, NY 10004, Attention: Equity Research Department, Business Manager.

Other Disclosures This report is issued and approved for distribution by Oppenheimer & Co. Inc. Oppenheimer & Co. Inc. transacts business on all principal exchanges and is a member of SIPC. This report is provided, for informational purposes only, to institutional and retail investor clients of Oppenheimer & Co. Inc. and does not constitute an offer or solicitation to buy or sell any securities discussed herein in any jurisdiction where such offer or solicitation would be prohibited. The securities mentioned in this report may not be suitable for all types of investors. This report does not take into account the investment objectives, financial situation or specific needs of any particular client of Oppenheimer & Co. Inc. Recipients should consider this report as only a single factor in making an investment decision and should not rely solely on investment recommendations contained herein, if any, as a substitution for the exercise of independent judgment of the merits and risks of investments. The analyst writing the report is not a person or company with actual, implied or apparent authority to act on behalf of any issuer mentioned in the report. Before making an investment decision with respect to any security recommended in this report, the recipient should consider whether such recommendation is appropriate given the recipient's particular investment needs, objectives and financial circumstances. We recommend that investors independently evaluate particular investments and strategies, and encourage investors to seek the advice of a financial advisor. Oppenheimer & Co. Inc. will not treat non-client recipients as its clients solely by virtue of their receiving this report. Past performance is not a guarantee of future results, and no representation or warranty, express or implied, is made regarding future performance of any security mentioned in this report. The price of the securities mentioned in this report and the income they produce may fluctuate and/or be adversely affected by exchange rates, and investors may realize losses on investments in such securities, including the loss of investment principal. Oppenheimer & Co. Inc. accepts no liability for any loss arising from the use of information contained in this report, except to the extent that liability may arise under specific statutes or regulations applicable to Oppenheimer & Co. Inc. All information, opinions and statistical data contained in this report were obtained or derived from public sources believed to be reliable, but Oppenheimer & Co. Inc. does not represent that any such information, opinion or statistical data is accurate or complete (with the exception of information contained in the Important Disclosures section of this report provided by Oppenheimer & Co. Inc. or individual research analysts), and they should not be relied upon as such. All estimates, opinions and recommendations expressed herein constitute judgments as of the date of this report and are subject to change without notice. Nothing in this report constitutes legal, accounting or tax advice. Since the levels and bases of taxation can change, any reference in this report to the impact of taxation should not be construed as offering tax advice on the tax consequences of investments. As with any investment having potential tax implications, clients should consult with their own independent tax adviser. This report may provide addresses of, or contain hyperlinks to, Internet web sites. Oppenheimer & Co. Inc. has not reviewed the linked Internet web site of any third party and takes no responsibility for the contents thereof. Each such address or hyperlink is provided solely for the recipient's convenience and information, and the content of linked third party web sites is not in any way incorporated into this document. Recipients who choose to access such third-party web sites or follow such hyperlinks do so at their own risk. This research is distributed in the UK and elsewhere throughout Europe, as third party research by Oppenheimer Europe Ltd, which is authorized and regulated by the Financial Conduct Authority (FCA). This research is for information purposes only and is not to be construed as a solicitation or an offer to purchase or sell investments or related financial instruments. This research is for distribution only to persons who are eligible counterparties or professional clients. It is not intended to be distributed or passed on, directly or indirectly, to any other class of persons. In particular, this material is not for distribution to, and should not be relied upon by, retail clients, as defined under the rules of the FCA. Neither the FCA's protection rules nor compensation scheme may be applied. https://opco2.bluematrix.com/ sellside/MAR.action Distribution in Hong Kong: This report is prepared for professional investors and is being distributed in Hong Kong by Oppenheimer Investments Asia Limited (OIAL) to persons whose business involves the acquisition, disposal or holding of securities, whether as principal or agent. OIAL, an affiliate of Oppenheimer & Co. Inc., is regulated by the Securities and Futures Commission for the conduct of dealing in securities and advising on securities. For professional investors in Hong Kong, please contact [email protected] for all matters and queries relating to this report. This report or any portion hereof may not be reprinted, sold, or redistributed without the written consent of Oppenheimer & Co. Inc.

70 TECHNOLOGY / SEMICONDUCTORS & COMPONENTS

This report or any portion hereof may not be reprinted, sold, or redistributed without the written consent of Oppenheimer & Co. Inc. Copyright © Oppenheimer & Co. Inc. 2021.

71