Quick viewing(Text Mode)

TSMC, Synopsys, AWS, Icmanage and Xilinx for Supporting This Work and Enabling This to Be Possible

TSMC, Synopsys, AWS, Icmanage and Xilinx for Supporting This Work and Enabling This to Be Possible

M F G 3 0 4 Electronic design automation: Scaling EDA workflows

Mark Duffield Simon Burke WW Tech Lead, Semiconductor Distinguished Engineer Web Services

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Abstract

Semiconductor product development is constantly pushing the boundaries of physics to meet power, performance, and area (PPA) requirements for silicon devices. Electronic design automation (EDA) workflows, from RTL to GDSII, require scale-out architectures to meet the constantly changing semiconductor design process. This session will discuss deployment tools, methods, and use cases for running the entire EDA workflow on AWS. Using customer examples, we will show how AWS can improve performance, meet tape-out windows, and effortlessly scale-out to meet unforeseen demand. Agenda

EDA on AWS

Customer use cases

The Xilinx AWS journey with Simon Burke

Deployment tools and methods Related breakouts [MFG206-L] [Leadership session: AWS for the Semiconductor industry] Monday, Dec 2, 4:00 PM - 5:00 PM – Aria, Level 1 West, Bristlecone 9 Red

[MFG404] [Using Amazon SageMaker to improve semiconductor yields] Wednesday, Dec 4, 8:30 AM - 9:30 AM – Aria, Level 3 West, Starvine 1

[MFG403] [Telemetry as the workflow analytics foundation in a hybrid environment] Wednesday, Dec 4, 10:00 AM - 11:00 AM – Aria, Plaza Level East, Orovada 3

[MFG405] [Launch a turnkey scale-out compute environment in minutes on AWS] Thursday, Dec 5, 12:15 PM - 2:30 PM – Aria, Level 1 East, Joshua 7

[MFG304] [Electronic design automation: Scaling EDA workflows] Thursday, Dec 5, 3:15 PM - 4:15 PM – Aria, Level 1 West, Bristlecone 7 Green © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Semiconductor design to product distribution

Design and Wafer Chip PCB and Product Product verification production packaging assembly integration distribution

Many opportunities for cloud-accelerated innovation Digital IC design workflow

Front-end design Back-end design Production and test Phase Design Physical Physical Power/Signal Tape out/ Silicon Design specification Synthesis verification layout verification analysis manufacturing validation

• Floorplanning • LVS/DRC/ERC • Power • OPC • Chip tests • Design capture Simulation DFT insertion • Wafer tests • Design modeling • Functional • Placement • Extraction • Thermal • Yield analysis • Formal • Routing • Timing • Signal integrity

• Gate-level Workloads

• High job concurrency • More multi-threading • Often performed by third parties • Single-threaded • Memory intensive • Big data analytics • Mixed random/sequential file I/O, • Long run times • AI/ML metadata-intensive • Large files

• Millions of jobs and small files • More sequential data access patterns Characteristics

Advanced node design and signoff Cloud is becoming the new signoff platform Electronic Design Automation infrastructure

Traditional EDA IT stack Corporate data center

Remote desktop

• License managers Remote desktop client • Workload schedulers • Directory services

Compute nodes

Shared file storage Electronic Design Automation infrastructure on AWS

On AWS, secure and well- Virtual Private Cloud on AWS optimized EDA clusters can be automatically created, Remote desktop operated, and torn down in just minutes

• License managers • Workload schedulers Encryption everywhere, with your • Directory services own keys Corporate datacenter

On-premises Cloud-based, auto-scaling HPC clusters HPC resources Machine learning and analytics

AWS Snowball Amazon Simple Storage Service (Amazon S3) and Amazon Simple Storage Shared file storage Storage cache Service Glacier

Third-party IP providers and collaborators AWS Direct Connect Faster design throughput with rapid, massive scaling

Scale up when needed, then scale down Think big In a traditional EDA datacenter, the only What if you could launch certainty is that you always have the wrong one million concurrent number of servers—too few or too many verification jobs?

CPU CORES OVER TIME Every additional EDA server launched in the cloud can improve speed of innovation— if there are no other constraints to scaling

Product development cycle Overnight or over-weekend workloads reduced to an hour or less Our own journey: Our own digital transformation

AWS silicon US expands US expands optimizations deployment in AWS deployment in AWS Formed 2014, Austin AWS One Team Full SoC development Multi-site Multi-site in the cloud Born in the cloud acquisition of development development Annapurna Latest semiconductor Israel expands fab 7nm process productivity via AWS Multiple end-to-end Annapurna startup silicon projects in AWS Multi-site Formed 2011, Israel Hybrid model Started with on-prem datacenter On-prem data center On-prem data center On-prem data center only for emulators

2011 2014 2015 2016 2017 Today AWS global infrastructure

22 geographic regions A region is a physical location in the world where we have multiple Availability Zones

69 Availability Zones Distinct locations that are Network engineered to be insulated AWS offers highly reliable, low latency, and high throughput from failures in other network connectivity. This is achieved with a fully redundant Availability Zones 100 Gbps network that circles the globe. Amazon custom hardware

• The AWS global infrastructure is Silicon built on Amazon’s own Compute hardware Routers servers

• By using its own custom ... hardware, AWS provides ... customers with the highest ... levels of reliability, the fastest pace of innovation, all at the lowest possible cost The internet • AWS optimizes this hardware for only one set of requirements: Storage servers Workloads run by AWS Load balancers customers AWS Inferentia: Custom silicon for deep learning

aws.amazon.com/machine-learning/inferentia/ Amazon silicon

AWS Graviton AWS Inferentia AWS Nitro System

Powerful and efficient server Machine learning hardware Cloud hypervisor, network, chip for modern applications and software at scale storage, and security

100% developed in the cloud: RTL → GDSII High clock speed compute instances: z1d

EDA stack on AWS Up to 4 GHz sustained, all-turbo Desktop visualization performance • Z1d instances are optimized for memory-intensive,

compute-intensive applications License managers Workload schedulers • Up to physical 24 cores Directory services • Custom Xeon scalable processor

• Up to 4 GHz sustained, all-turbo performance Cloud-based, auto-scaling HPC clusters • Up to 384GiB DDR4 memory • Enhanced networking, up to 25 Gbps throughput

Shared file storage Storage cache

Featuring High memory instances: R5

EDA stack on AWS Up to 3.1 GHz sustained, all-turbo Desktop visualization performance • R5 instances are optimized for memory-intensive,

compute-intensive applications License managers Workload schedulers • Up to physical 48 cores Directory services • Custom Intel Xeon scalable processor

• Up to 3.1 GHz sustained, all-turbo performance Cloud-based, auto-scaling HPC clusters • Up to 768 GiB DDR4 memory • Enhanced networking, up to 25 Gbps throughput

Shared file storage Storage cache

Featuring High memory instances: X1e

EDA stack on AWS 2.3 GHz performance Desktop visualization • X1e instances are optimized for memory-intensive workloads • Up to physical 64 cores • High-frequency Intel Xeon E7-8880 v3 (Haswell) processors License managers Workload schedulers with Turbo Boost Directory services • Up to 4 TiB DDR4 memory • Enhanced networking, up to 25 Gbps throughput Cloud-based, auto-scaling HPC clusters

Shared file storage Storage cache

Featuring FPGA accelerator development: F1 Up to 8x Xilinx UltraScale+ VU9P, each FPGA has: EDA stack on AWS • Dedicated PCIe x16 interface to the CPU • Approx. 2.5 million logic elements • Approx. 6,800 DSP engines Desktop visualization • 64 GiB ECC-protected memory, 288-bit wide bus • Virtual JTAG interface for debugging License managers • Fabricated using a 16 nm process Workload schedulers Directory services Instance capability • 2.7 GHz Turbo all cores and 3.0 GHz Turbo one core Cloud-based, auto-scaling HPC clusters • Up to 976 GiB of memory • Up to 4 TB of NVMe SSD storage

Shared file storage Storage cache

Featuring Amazon Elastic Compute Cloud (Amazon EC2) bare metal instances

EC2 BARE METAL

• Provide applications with direct access to hardware

• Built on the Nitro system and ideal for workloads that are not virtualized, require specific types of hypervisors, or have licensing models that restrict virtualization Comprehensive storage portfolio

Block storage File storage Object storage

SSD Amazon S3 lifecycle management

io1 gp2 HDD

Amazon Elastic Amazon Elastic Amazon FSx Amazon S3 Amazon S3 Glacier Block Store st1 sc1 File System for Lustre (Amazon EBS) (Amazon EFS) Mapping storage to EDA data types

Data type Storage solutions

Tools

ONLY - IP libraries DIY/marketplace Amazon EFS Amazon FSx Amazon S3 archive READ NFS server for Lustre

PERSISTENT Project

WRITE - Home DIY/Marketplace Amazon FSx READ Amazon S3 archive NFS server for Lustre

Workspaces

WRITE - Scratch DIY/Marketplace

READ Amazon FSx for Lustre NFS server TEMPORARY Commercial schedulers AWS supported by popular workload and resource managers

• IBM Spectrum LSF resource connector • Univa UGE and NavOps launch • Altair Accelerator (RTDA NC) Remote desktops with NICE DCV

• Native clients on Linux, Mac, Windows • HTML5 for web clients • Dynamic hardware compression • Encrypted communication

EC2 instance • Multi-monitor support • Support for various peripherals

Single or multiple persistent sessions No added cost on an Amazon EC2 instance Optional GPU acceleration © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: Astera Labs

“ At Astera Labs, we are intensely focused on delivering high-quality PCIe connectivity Industry: Semiconductor and solutions to our customers and reduce time- Electronics to-results. Our High-Performance Compute Headquarters: San Jose CA (HPC) infrastructure is hosted entirely on AWS Website: www.asteralabs.com

and we heavily leverage the cloud-scalability About Astera Labs

enabled by AWS and Synopsys tools to We are intensely focused on customers' needs. We execute to meet our promises accelerate our development schedule. on-time, on-spec, and on-cost. We ” innovate exponentially rather than incrementally in everything we do. We Jitendra Mohan operate with integrity and the highest ethical standards—aiming to earn our CEO Astera Labs partners' trust. Example: Arm limited

For details, see session MFG-206 L

IN The hybrid platform OUT Jobs submitted to common user interface, Results available to user stating preferences for Cloud cost/speed Telemetry/visualisation/ Intelligent scheduler modeling deliver runs job in most information to user & suitable location (AI/ML On prem scheduler for workflow to improve performance improvements over time)

• Migrating EDA to AWS for a hybrid cloud platform • Goal: improve engineering productivity and shift-left silicon verification • Using intelligent job scheduling with advanced telemetry and automation • Range of EDA applications Example: MediaTek

For details, see session MFG-206 L

Proven results for EDA running on AWS

• Static Timing Analysis (STA) for 7nm process SoC • 1000 AWS instances (32,000 physical cores) • 12 million core-hours of computing for STA • 8PB of data, between Taiwan and US West AWS Region • Successfully eliminated IT compute resource bottleneck • World’s first 5G So announced at Computex 2019 (May 29) © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Xilinx develops highly flexible and adaptive processing platforms that enable rapid innovation across a variety of technologies – from the endpoint to the edge to the cloud. Xilinx is the inventor of the FPGA, hardware programmable SoCs, and the ACAP, designed to deliver the most dynamic processor technology in the industry and enable the The explosion of AI and pervasive intelligence, adaptable, intelligent, and connected world of combined with the demand for exponentially the future. For more information: increasing computing power after Moore's Law, has given rise to domain-specific architectures (DSAs). Xilinx technology is ideally suited for DSAs as it can be programmed and tuned to address today's most complex and demanding architectures with impressive results across a wide variety of workloads and applications. The same piece of silicon can be updated and reconfigured to tackle multiple tasks.

Visit www.xilinx.com. © Copyright 2018 Xilinx Broad collaboration key in maximizing potential of latest technologies

New packaging and integration technologies offering numerous scaling opportunities FPGA designs have very specific requirements that demand more than standard flows and methodologies Amazon and AWS are our chosen cloud vendor for deployment of EDA flows @Xilinx Tight collaboration between TSMC, Xilinx, and Synopsys on cloud technologies enables the fastest path to productization

© Copyright 2018 Xilinx Cloud enablement key considerations: Use model

In the industry we see three major types of cloud use model ˃ All-in model Storage, compute and licenses are cloud based (no infrastructure on premise) A flow or tool run can be run either on cloud or on premise determined at project setup time Popular with startups and smaller companies with no existing infrastructure Third party companies can provide “turnkey” enablement and solutions EDA vendors provide “EDA locked” solutions on custom cloud ˃ Hybrid model Similar to all-in model, a single project or flow is cloud enabled, that project or flow has cloud storage, licenses and compute ‒ Other projects may exist on premise only ‒ Other flows may exist on premise only ‒ A flow may be duplicated on cloud and on premise but be partitioned by design or design type A flow or tool run can be run either on cloud or on premise determined at project setup time ˃ Burst model A project or flow exists both on cloud and on premise Data and licenses are shared between cloud and on premise A flow or tool run can be run either on cloud or on premise determined at flow run time

© Copyright 2018 Xilinx Cloud enablement key considerations: Use model

In the industry we see three major types of cloud use model ˃ All-in model Storage, compute and licenses are cloud based (no infrastructure on premise) A flow or tool run can be run either on cloud or on premise determined at project setup time Popular with startups and smaller companies with no existing infrastructure Third party companies can provide “turnkey” enablement and solutions A vendors provide “ A locked” solutions on custom cloud Xilinx has chosen to pursue a burst ˃ Hybrid model Similar to all-in model, a single projectmodel or flow is cloudfor enabled,cloud that deployment project or flow has to cloud storage, licenses and compute ‒ Other projects may exist on premiseaugment only our on-premise farm for ‒ Other flows may exist on premise only existing projects ‒ A flow may be duplicated on cloud and on premise but be partitioned by design or design type A flow or tool run can be run either on cloud or on premise determined at project setup time ˃ Burst model A project or flow exists both on cloud and on premise Data and licenses are shared between cloud and on premise A flow or tool run can be run either on cloud or on premise determined at flow run time

© Copyright 2018 Xilinx Cloud enablement key considerations: Storage

Cloud vendors are very good at compute and networking Storage falls into two broad categories However POSIX-based storage management is a ˃ Large semi static read data challenge especially for hybrid and burst use models Tool binaries, IP views, etc. ˃ Fundamentally cloud & on-premise infrastructures are different Access can be sparse, and typically read only (but constantly changing) ˃ Cloud typically uses block storage, which is incompatible with most EDA tools ˃ Smaller dynamic workspaces Sometimes prepopulated with data, sometimes empty ˃ Complex EDA tool workflows rely on network shared POSIX at start filesystems based on an NFS filer to ensure that the same Flow appends or creates data in this file system coherent data is accessible across thousands of nodes Access is typically heavy with read and write ˃ However, NFS filers are not available as a native instance in the cloud, and cloud NFS equivalents can have performance issues Today, companies typically try to attain hybrid workflows by setting up the cloud environment, copying the data, and then running jobs using pseudo NFS filesystems But uploading data and keeping data “in sync” between on premise and cloud is time consuming to setup and manage

© Copyright 2018 Xilinx Cloud enablement key considerations: Storage

Cloud vendors are very good at compute and networking Storage falls into two broad categories However POSIX-based storage management is a ˃ Large semi static read data challenge especially for hybrid and burst use models Tool binaries, IP views, etc. ˃ Fundamentally cloud & on-premise infrastructures are different Access can be sparse, and typically read only (but constantly changing) ˃ Cloud typically uses block storage, which is incompatible with most EDA tools ˃ Smaller dynamic workspaces Sometimes prepopulated with data, sometimes empty ˃ Complex EDA tool workflows rely on network shared POSIX at start filesystems based on an NFS filer to ensure that the same Flow appends or creates data in this file system coherent data is accessible across thousands of nodes Access is typically heavy with read and write ˃ However, NFS filers are not available as a native instance in the cloud, and cloud NFS equivalents can have performance issues Xilinx has chosen to use a virtual Today, companies typically try to attain hybrid workflows by setting filesystem model for both the up the cloud environment, copying the data, and then running jobs using pseudo NFS filesystems semi-static and workspace But uploading data and keeping data “in sync” between on storage based on the IC Manage premise and cloud is time consuming to setup and manage PeerCache product

© Copyright 2018 Xilinx Cloud enablement key considerations: Storage Cloud vendors are very good at compute and networking, however Posix-based storage management is a challenge especially for hybrid and burst use models Fundamentally, cloud & on-premise infrastructures are different. The cloud typically uses block storage, where the data is accessed by only one host at a time, while complex tool workflows rely on POSIC filesystems based on an NFS filer to ensure that the same coherent data is accessible across thousands of nodes. However, NFS filers are not available as a native instance in the cloud. Consequently, companies typically try to attain hybrid workflows today by setting up the cloud environment, copying the data, and then running jobs using pseudo NFS filesystems. Even with such solutions, uploading data and keeping data in sync between on premise and cloud is time consuming to setup and manage. Storage falls into two broad categories ˃ Large semi-static read data Tool binaries, IP views, etc. Access can be sparse, and typically read only (but constantly changing) ˃ Smaller dynamic workspaces Sometimes prepopulated with data, sometimes empty at start Flow appends or creates data in this file system Access is typically head with read and write © Copyright 2018 Xilinx Cloud enablement key considerations: Storage Cloud vendors are very good at compute and networking, however Posix-based storage management is a challenge especially for hybrid and burst use models Fundamentally, cloud & on-premise infrastructures are different. The cloud typically uses block storage, where the data is accessed by only one host at a time, while complex tool workflows rely on POSIC filesystems based on an NFS filer to ensure that the same coherent data is accessible across thousands of nodes. However, NFS filers are not available as a native instance in the cloud. Xilinx has chosen to use a virtual Consequently, companies typically try to attainfilesystem hybrid workflows model today by setting for bothup the cloudthe environment, semi-static copying the data, and then running jobs using pseudo NFSand filesystems. workspace storage based on the IC Even with such solutions, uploading data and keeping data in sync between on premise and cloud is time consuming to setup and manage. Manage PeerCache product Storage falls into two broad categories ˃ Large semi-static read data Tool binaries, IP views, etc. Access can be sparse, and typically read only (but constantly changing) ˃ Smaller dynamic workspaces Sometimes prepopulated with data, sometimes empty at start Flow appends or creates data in this file system Access is typically head with read and write © Copyright 2018 Xilinx Cloud enablement key considerations: Cost management Cloud vendors are very good at providing infinite compute and networking; however it comes at a per-compute instance, per-hour cost that can accumulate quickly There are numerous cost management tools available that run after the fact, but few that run ahead of the job to manage cost to a budget Consequently, Xilinx has created a cost management process built into the job submission architecture • All jobs submitted create a unique Job submission signature used to track predicted and actual run time and server usage

Is job Y eligible for • Signatures are used to predict next run cloud profile and cost N N Is the on- premise queue full • Budget database is dynamically Y updated initially with predicted costs, N Does later by actual costs user/group have budget

Y • Dynamically size AWS instance for job Does job Y N needs (cost management) predicted cost exceed budget Run on-premise queue Run on-AWS queue © Copyright 2018 Xilinx Cloud enablement key considerations: Other considerations

Other considerations include ˃ Security, something to be aware of but not a show-stopper issue today ˃ EDA vendor license agreements usually prohibit off-premise execution, addendums required ˃ IP vendors usually prohibit off-premise storage and use, addendums required ˃ Become best friends with your IT organization and cloud vendors ˃ Although cost is a factor, we’re focusing on agility, scalability, and fast time to tapeout

© Copyright 2018 Xilinx Cloud enablement key considerations: Overview

Use Model: Burst Storage: ICM peer cache virtual storage for semi-static and workspace data Compute: AWS C5D, Z1D, R5, and X1e depending on job type Queue: LSF, including LSF connector for instance creation and clean up Custom daemons for additional cleanup to manage runaway instances Network: Cloud vendor within cloud Secure AWS Direct Connect between Xilinx and cloud Licenses: Host on premise, served to cloud

© Copyright 2018 Xilinx Burst model cloud network, storage, and execute architecture

Xilinx Amazon 8TB EBS 2TB EBS XLNX Netapp NFS Compute instance NVME workspace

Peer cache VTRQ ICM proxy server LDAP proxy MongoDB Peer cache Compute instance SQL Holodeck NVME workspace

Secure network Compute instance NVME workspace

© Copyright 2018 Xilinx Amazon EC2 server selection and instance types Instance Risk Cost Features purchasing option On-demand Low High • Pay, by the second, for the instances that you launch Reserved Low Medium • Dedicated compute, paid for up-front Spot High Low • Spare compute at steep discounts • Spot Instances can be interrupted by Amazon EC2 with two minutes of notification when Amazon EC2 needs the capacity back

AWS instance Core count MaxMemory On-demand cost Reserve cost per Spot cost per Spot versus Xilinx usages per hour hour hour OnD cost ratio c5d.9xlarge 18 72GB $1.73 $1.02 $0.36 21% “50G” jobs c5d.18xlarge 72 144GB $3.46 $2.34 $1.16 33% Not Used(Cost) r5d.24xlarge 48 768GB $6.91 $4.07 $6.89 99% Not used (Cost)

R4x16xlarge 32 488GB $4.26 $2.50 $0.64 15% “512GB” jobs x1.16xlarge 32 976GB $13.40 $7.67 $2.00 15% “1TB” jobs x1.32xlarge 64 1952GB $26.68 $15.35 $4.00 15% “2TB” jobs

Costs provided for example only from public data, costs change constantly, refer to cloud vendors for specific details

© Copyright 2018 Xilinx SI workload general cloud guidelines

˃ Decision of instance types/on-prem based on requirements: On-demand Low duty cycle + Low job restart cost: On-premise Spot instance instances High duty cycle + Low job restart cost: Spot instance Low duty cycle + High Job restart cost: On-demand instances High duty cycle + High job restart cost: On-premise infrastructure

Job restart cost restart Job Spot instances Spot instances

Duty cycle

Duty cycle: Average amount of time HPC (high performance compute) servers will be in use, computing engineering jobs in a day or a year 50% duty cycle is 12 hours of 24 hours (or) 6 months in a year 35% duty cycle is 8.4 hours of 24 hours (or) 4.2 months in a year 25% duty cycle is 6 hours of 24 hours (or) 3 months in a year Inflection point/break even point: Point (measured in quarters) at which expense in AWS will surpass the expense if we were to acquire, install and operate the same number of servers on-premise

© Copyright 2018 Xilinx Cloud enablement problem statement

Xilinx already uses Amazon AWS environment for internal software regressions and now VCS verification execution, so we have an existing infrastructure on which to execute a new POC As part of our internal product development we run a flow called timing capture to create a database to support our proprietary FPGA place & route tools ˃ This involves capturing net delays for net segments and logical blocks on our devices and providing them to our place & route tools. Path delays of customer designs are then calculated form this data. The capture flow uses EDA standard tools in non-standard use models to collect this data set ˃ Use Synopsys Primetime for primary path selection and secondary delay calculation ˃ Use Synopsys HSPICE for primary path delay calculation (validated against Primetime delay) ˃ Accumulate delay data into single XML file for Q/A and delivery Xilinx made a decision to investigate a deployment of this flow as part of a proof of concept for execution on AWS Cloud in burst mode ˃ Flow setup completed on premise using Altair flow-tracer environment ˃ Major compute executed on cloud, submit via LSF (from Xilinx to AWS) ˃ Final Q/A and delivery completed on premise using Altair flow tracer environment © Copyright 2018 Xilinx Cloud enablement problem statement Timing capture is not a vanilla STA run

© Copyright 2018 Xilinx AWS proof of concept results: Flow execution ˃ Diagrams show our Altair flow tracer for running on premise flows. Two flows are shown: A small test case and a larger production test case. ˃ Each box corresponds to a task or tool execution, color corresponds to run state

˃ For POC purposes, high compute flow steps redirected to AWS environment

© Copyright 2018 Xilinx AWS proof of concept results: Runtime metrics

˃ Fig 1: Total runtime on prem versus on AWS for small test case Total On prem On AWS delta (AWS) runtime ˃ Fig 2: Primetime sample path runtime on prem versus on AWS PT 61 hrs 33.8 hrs 1.8x for small test case Spice 61 hrs 115 hrs 0.5x Total 122 hrs 149 hrs 0.82x ˃ Fig 3: Hspice sample path runtime on prem versus on AWS for small test case Fig .1 Path group pt on pt AWS pt delta Design metrics prem(sec) (sec) (AWS) ˃ Small test case 90 1110 1140 1x AWS c5d.18xlarge instance type 72vCPU, 144G ram, used 16cpu’s, 60G ram 91 1130 830 1.3x > Input design (pre-filtered to not load unused FSR’s) • 3 FSR 92 1370 840 1.6x • Components : 1mil (IP blocks) 93 2580 1030 2.5x • Nets : 1.5B (SOC nets) ˃ Pruned output design, 3fsr • Components : 250k ( 4:1 reduction) Fig .2 • Nets : 300M (5:1 reduction) Path group Hspice on Hspice AWS Hspice delta ˃ Large test case prem (sec) (sec) (AWS) AWS z1d.12xlarge instance type 48vCPU, 384G ram, used 16cpu’s, 360G ram ˃ Input design (pre-filtered to not load unused FSR’s) 90 1040 2680 0.4x • 70 FSR’s 91 2070 3860 0.5x • Components : 2.3B (IP blocks) • Nets : 16.5B (SOC nets) 92 1590 3236 0.5x ˃ Pruned output design, Group0 93 2590 2732 1x • Components : 32mil ( 72:1 reduction) • Nets : 1.1B (16:1 reduction)

© Copyright 2018 Xilinx Fig .3 AWS proof of concept results: Delay correlation

˃ Comparing final delays calculated on AWS environment to Xilinx on-premise results using same flow ˃ Results correlate 100% (within acceptable data noise margin)

© Copyright 2018 Xilinx Conclusion

Using our existing infrastructure (deployed to support VCS verification flow execution in burst mode on AWS), we were able to quickly deploy a new timing capture flow, not previously designed to run on the cloud, and execute the compute intensive parts on the cloud while the rest of the flow ran on premise

This was a proof of concept exercise, so this is not a production ready flow as is but productizing it is within the scope of an incremental development if we choose

The POC demonstrated we can on demand execute part of an internal flow on the cloud versus on premise with minimal impact to runtime, turn around time or quality of results, taking advantage of server scale out provided by the cloud vendors that may not be available on premise

Thanks to TSMC, Synopsys, AWS, ICManage and Xilinx for supporting this work and enabling this to be possible

© Copyright 2018 Xilinx © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scale-out computing on AWS aws.amazon.com/solutions/scale-out-computing-on-aws

• EDA/HPC environment on AWS • Easy installation in your AWS account web UI

• Amazon EC2 Integration Users (access web Elastic Load • Simple job submission UI, DCV, ssh Balancing (manages to access) scheduler) • OS agnostic and AMI support DCV graphical sessions • Desktop cloud visualization Amazon FSx for Lustre • Automatic errors handling • Web UI • 100% customizable Amazon EC2 python scripts Amazon EC2 Amazon Elastic • (scheduler (used to run jobs) Auto Scaling File System Persistent and unlimited storage instance) (launch instances to run • Centralized user-management jobs) • Support for network licenses Amazon • EFA support S3 (storage options • Simple cost/budget management for either Amazon AWS Secrets Manager persistent or • Detailed cluster analytics Elasticsearch (stores cluster ephemeral data) Service information) • Used in production (stores job & host information) IBM LSF workshop https://github.com/aws-samples/aws-eda-workshops/blob/master/workshops/eda-workshop-lsf

Corporate data AWS Cloud 1 User logs into the login server from center from within the corporate network

IBM Spectrum LSF binaries, 1 2 configuration, and logs are read from and written to Amazon EFS FPGA developer AMI 3 User submits simulation jobs from User Login server 7 /opt/Xilinx the login server

3 IBM Spectrum LSF provisions Execution hosts 4 Amazon EC2 instances to satisfy workload in the queue 4 5 Provisioned Amazon EC2 instances 5 join the cluster as dynamic execution hosts 6 10 LSF master 6 Jobs are dispatched to new execution hosts

/ec2-nfs/proj Jobs load pre-licensed Xilinx /tools//lsf 7 2 8 9 /ec2-nfs/scratch Vivado Design Suite from FPGA Developer AMI

Vivado loads example IP and design 8 from /ec2-nfs/proj

Vivado writes job runtime data and 9 results to /ec2-nfs/scratch Amazon Elastic File System (Amazon Amazon EC2 NFS server EFS) Amazon EC2 instances are 10 terminated by LSF after jobs finish NICE DCV remote desktop with https://github.com/aws-samples/aws-remote-desktop-for-eda

AWS Cloud

2 Subscribe to the FPGA Developer 1 AMI, located in AWS Marketplace. CloudFormation stack The Xilinx Vivado Design Suite is included with this AMI. 1 Availability Zone Specify required parameters (VPC, 2 Subnet, AZ, etc.) and launch the AWS CloudFormation stack VPC 3 Optional: Create an Elastic IP address 9 (persistent IP) Security group AWS 8 4 Choose a remote desktop instance Marketplace 4 type that works for your tools Remote desktop Connect to NICE DCV using the NICE 5 DCV client or over a web browser, Amazon S3 using port 8443

Remote site In the FPGA Developer AMI, launch 6 3 the Xilinx Vivado Design Suite, and 6 by typing “vivado” in a terminal 7 window EIP 5 The remote desktop is displayed on 7 TM the engineer’s local system Port 8443 NICE DCV

Optional: Configure Amazon S3 8 bucket access to load design data

Optional: Specify additional existing 9 security groups Serverless Scheduler with Resource Automation https://github.com/aws-samples/aws-decoupled-serverless-scheduler

AWS Cloud

6 Part 1 - CloudFormation Stack 1 Users upload input files and executables for job(s) AWS Lambda triggers from S3 event, 2 creates and submits the new job(s) Users Amazon 1 DynamoDB AWS Lambda monitors job queue Amazon SQS 3 and updates Auto Scaling Group with AWS Cloud 5 desired instance count (customizable)

Part 2 - CloudFormation Stack EC2 Auto Scaling Group scales the 4 number of workers from 0 to defined 2 maximum

5 Users download results AWS Step Functions workflow 6 User monitors job status through AWS Console or AWS CLI

3 Amazon EC2

The user uploads the job input 4 file(s) and executable to the S3 bucket instead of SQS. This upload triggers job start and EC2 Instance management is now Auto Scaling handled by the Auto Scaling group Group. There is no longer a need to create a json job definition. Semiconductor white papers https://aws.amazon.com/whitepapers Related content: AWS re:Invent 2018

Leadership Session: AWS semiconductor AWS re:Invent 2018 MFG201-L • Slides: http://bit.ly/2TQ5A8N • Recording: http://bit.ly/2S5ZK1E

Amazon on Amazon: How Amazon designs chips on AWS AWS re:Invent 2018 MFG305 • Slides: http://bit.ly/2TR4vhd • Recording: http://bit.ly/2tpiQG0

How to build performant, highly available license services in the cloud AWS re:Invent 2018 MFG306 • Slides: http://bit.ly/2BO9bNZ

Rightsizing your silicon design environment: Elastic clusters for EDA workloads AWS re:Invent 2018 MFG401 • Slides: http://bit.ly/2DL7S26 Thank you!

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.