This article was downloaded by: 10.3.98.104 On: 02 Oct 2021 Access details: subscription number Publisher: CRC Press Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place, London SW1P 1WG, UK

FPGAs Fundamentals, Advanced Features, and Applications in Industrial Electronics Juan José Rodríguez Andina, Eduardo de la Torre Arnanz, María Dolores Valdés Peña

Embedded Processors in FPGA Architectures

Publication details https://www.routledgehandbooks.com/doi/10.1201/9781315162133-4 Juan José Rodríguez Andina, Eduardo de la Torre Arnanz, María Dolores Valdés Peña

How to cite :- Juan José Rodríguez Andina, Eduardo de la Torre Arnanz, María Dolores Valdés Peña. 21 Feb 2017, Embedded Processors in FPGA Architectures from: FPGAs, Fundamentals, Advanced Features, and Applications in Industrial Electronics CRC Press Accessed on: 02 Oct 2021 https://www.routledgehandbooks.com/doi/10.1201/9781315162133-4

PLEASE SCROLL DOWN FOR DOCUMENT

Full terms and conditions of use: https://www.routledgehandbooks.com/legal-notices/terms

This Document PDF may be used for research, teaching and private study purposes. Any substantial or systematic reproductions, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The publisher shall not be liable for an loss, actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 From this concept, a single-core, single-thread (general-purpose, processor concept, single-thread asingle-core, From this of tasks. aset peripherals to perform of specialized a set with interacting and controllers, DSPs, FPGAs, or application-specific to processors) connected of one or processors, more micro elements programmable (general-purpose architectures. to processor terminology, closely linked is which broader related the the introduce concept and of SoC discuss weof all, will hardware/software different design, with of FPSoC alternatives. But, first multiprocessingasymmetric (AMP) (Moyer 2013). 2015), (Kurisu multiprocessingticore architectures symmetric with (SMP) or FPGAs with from evolution concurrently. an cores operating processor been has is, That there most to ones, recent the integrate up which to 10 complex years,recent rapidly one general-purpose devices including from moving area)THE FPGA vendors have on over concentrated efforts most of their Actually,tion. development the (if solutions not one areas of of is the FPSoC Architectures FPGA Embedded in Processors 3 device, 3.1. shown Figure example by as the in asingle in architectures communication and more processing powerful to integrate allows more SoCs fabrication which and technologies, tronic continuous evolution the comes from complexmany of elec functionalities availablereadily today’s in market. weas now, know affordable one many watches only of the are gadgets smart givenotifications, movie.or orders futuristic But,by of a the subject voice as rate, ority heart acalendar, access get weather receive information, forecast amobile phone, with to communicate physical us our enabling check activ 10Only ago years we would have thought about idea watch the of asmart 3.1 From Chapter 1, generically speaking, a SoC can be considered be to consist can From a SoC Chapter 1, speaking, generically terms offer in FPGAs currently possibilities the introduces chapter This FPGAs have obviously also taken advantage of this technological evoluFPGAs have advantage technological obviously taken of also this devices providing production electronics of mass consumer such The Introduction single-core processors to processors homogeneous or or heterogeneous mul heterogeneous 59 - - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 60 from the discussion in Chapter 1, in most for that demanding the it clear discussion is the from reduce power power reducing (by consumption supply voltage). However, advantage(taking technologies) to or and 3D of stacking nanometer-scale frequency solutions, efficient operating the option only increase wouldbe to for faster,demands more computationally more powerful, and energy- bus. communication same to the peripherals connected as processor by the seen are blocks hardware all solution, to this hardware. migrated be it In performance, can enough timing not does provide good- ever agiven functionality software-implemented nonconfigurable solutions,than because when provides flexibility higher context this Using FPGAs in intervals. some time memory during with fers over memory capabilities access may peripherals with cases, take data trans data flows, controlling system as master acts somealthough, inprocessor the architectures, these In functionalities. at providing specific, non-time-critical choice best forperipherals would the aimed be embedded systems usually microcontroller, specialized and or to DSP) memory resources connected SoC. watch asmart in features communication and Processing FIGURE In order for single-core architectures to cope with continuous market continuous market to cope with order architectures In for single-core 3.1

FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 tiple cores would have a serious limitation related to increased silicon area area tiple silicon cores would related have to increased limitation aserious to integrate mul ability the think chip. might on a single One OSs different capable of supporting architectures, processing very powerful in resulted (Tendlerduced POWER4 the 2002). al. et architecture evolution The then since the specialized literature about (Stallings 2016). (Stallings about literature computer architecture specialized the in ofinformation FPGAs to support find additional Readers easily SoCs. can ability the assess and to understand ideas allowing main just ontrates the concen section very diverse. are them solutions This to tackle commercial the and multicore with architectures, concepts associated many are There 3.1.1.1 trade-off. consumption performance–power provide abetter chip. architectures same Therefore, the these within of them delays communication would reducing among processors, while all require, processor asingle rently those by at cores operating lower than frequencies be executedto concur allowtasks they thatbecause efficient solution than memories. of the Multicore amuch more are systems characteristics the defined by limits, power above consumption) sense certain not does make (and, frequency operating hence,improved by increasing performance to achieve trying memory time, access is performance factor limiting main or not) the systems most processing chip.tithreading on a in single Since multicore integrate several processors, cores (either which processor mul aforementioned of approaches overcome be the can limitations byThe using Processors Multicore 3.1.1 capabilities. multithreading or of saturation delays the processors between but applicability, have they limited of interconnection because for instance, ple (at range of applications, valid for least conceptually) acertain are options relatively two course, Of these factexecution, multithreading. it sim just in is impression gives of the parallel anew Although this task. executing thread memory accesses) aperipheral or from during anew to launch response the for (for waiting execution of while programs sequential instance, the ing advantage take processors, which dur of of dead multithreading use times possible bus. solution Another communication acommon the is through connected elements are all and processors among the shared usually are peripherals and memory resources concurrently. architectures, these In operate can they that so among them of tasks sumption) distribution the and multiple (withcore of processors corresponding power sources the con concurrently.to execute several tasks of asystem ability is, that the of parallelism, use on the work based those are not applications, is aviable solution, may that ones current only the this and Architectures in FPGA Processors Embedded The first date first multicore The someprocessors back 15 years ago, intro IBM when of multiple use the is approach single- straightforward to parallelism The Main Hardware Issues 61 ------Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 processor cores. processor different combines architecture: (b) heterogeneous identical; are cores processor architecture: (a) Homogeneous architectures. processor multicore (a) (b) heterogeneous and Homogeneous FIGURE 3.2 data it uses more often without affecting the other elements in the system), the elements other in the data more without it often affecting uses (so processor each with the access it memory associated can cache of local 3.5. Section in analyzed are FPSoCs most widely in used (Vadjanections 2011). topic, buses of on-chip this Given the importance the intercon rely SoCs on crossbar but most purpose, current the for this used may be buses Shared themselves. processors among the peripherals but also its memory/ and processor each not between only communications ­ more low-latency, for multicore require processors, which ­ tasks. among communication intensive and not needing processors specialized requiring tasks specific into clearly be partitioned can suitable for applications where functionality DSPs, particularly are GPUs. and Therefore, heterogeneous architectures , as, for instance, architectures, different combining forms of plat use the require tasks specialized execution for ofneed highly the applications, complexity target ofgrowing and the nature whose increasing due very to the mainly toward is rent trend heterogeneous solutions. This 62 of parallelization capabilities and are easily scalable. easily are capabilities and of parallelization cores. Therefore, different homogeneousthe ­ power of processing availability in to according the on functionality, effect (even among processors interchangeable threads) are time) no with at run ­ multicore are processors general-purpose sets). Most instruction and architectures different of(consisting cores with set) or ­ instruction and whose cores have architecture same the affordable prices. Today,market.the in one may chips find easily 16-core ­ ing cost. However, turn, and, in and, more recently, nanometer-scale 3D stack To reducehave data traffic, usually two multicore systems one or levels of any , akey aspect but actually even is Communications In spite of the good characteristics of homogeneous a cur is systems,spite there In characteristics good of the 3.2, Figure shown in multicoreAs may homogeneous processors be (all of technologies have fabricationtechnologies enabled the of at multicore reasonably chips (a) FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: μC μC μC μC (b) GPU DSP μC1 homogeneous. In them, tasks (orhomogeneous. tasks them, In solutions make an efficient use efficientuse an solutions make μC2 μC1 high- heterogeneous ­bandwidth - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Usual cache memory architectures. memory cache Usual FIGURE 3.3 dated must and updated used. be be before can they invali are caches other all cache, variable itsvariable local in copies in of this way,simplistic the when value that acore modifies of a shared means this In a on modification–invalidation–update based are of mechanisms. them includesors memory system. acache-coherent why multicore reason the proces all is data at This correct access any time. processors all to ensure requirement Coherence obviously is afundamental variable caches. that have values local for their it written cores sharing in it data order to according the retrieves the aposition within a core reading value. words, other it written last In the agiven memory space coherent is if agiven memory space always sharing cessors “see” at any position within gramming one (beyond the scope of this book). of one scope (beyond this the gramming apro following) consistency the and in (discussed issue architectural an is rightorder. the cores in Therefore, sharing coherence the in programmed are variables to shared access instructions means system, whereas consistency were the there no variable memories cache if in as any shared cores see all related tolems data coherence consistency. and brief, In coherence means memories, cache of local use implicitlycores, the prob creates with together 3.3. Figure shown in are memory architectures cache Examples of usual tures. in data space a common or forcontrol need reduces modifications struc the cores share or hardware versa, vice all fact tothat concurrent the gramming pro made is decision to somemigrate sequential the case in that memory is plus level one memory. higher cache of shared Aside benefit shared using of Architectures in FPGA Processors Embedded Although there are many different approaches to ensure coherence, approaches to different ensure all many are Although there A multicore system using cache memories is coherent if it ensures all pro all coherentA multicore memories it cache is if system ensures using some fact data that (sharedThe variables) different modified by be can b a Single corewithprivate Usually level1(L1)cache. Usually level2(L2)cache. memory Private cache Main cache Core a Multicore withprivatecache Private cache Core memory a Main Private cache Core a Multicore withprivate Private cache andsharedcache Core Shared cache memory a Main Private cache Core b a 63 ------Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 sharing memory space. They are based on using only one(if only OS for required) memory space. on using based are They sharing approaches, 3.5. Figure depicted in or SMP either AMP by using addressed be can these architecture, ware From processors. point of soft of view the the between and tasks between communications well as as sequencing, and partitioning task resources, 2014). Currently, for most case embedded systems. the is this (Walls necessary is OS or high-end processors, areal-time high-end in tions scalable.easily Finally, order in to efficientlyimplement complex applica system to be forresulting recommended the case, forlatter it the highly is may at notto least use deemed necessary asimple be Although this kernel. (and it cases, usual is both advisable) In processors. applications high-end in implementation of complex applications low-end in or simple processors control layer orany OS). software (kernel Two the are cases intermediate solutions, approach do bare-metal which tosors, not usual use is require the 3.4. Figure shown in For low-end to simpleexecuted be in proces programs itself,* processor of that the and to executedware processor be by the as 64 * scenarios. Software FIGURE 3.4 communications. or sharing, resource control, for partitioning, multithreading mechanisms sary levels (application, middleware, to, but OS) neces not including, the limited atsidered embedded different systems, particular, multicore and in in ones concepts to con be ofsoftware many hardware, are case the there in As 3.1.1.2

data buses. 32-bit with wider or those processors high-end as up 16-bit to and wide are buses data whose those processors complexity, idea about low-end as we label haveJust to a straightforward SMP architectures apply to or homogeneous two more with cores systems SMP architectures of shared organization the to considered be are issues important Other Different scenarios are possible depending on the complexity on the possible soft of are depending the scenarios Different Main Software Issues

Low CPU complexity High Applications and Features, Advanced Fundamentals, FPGAs: Low Simple kernel Bare metal solution solution Software complexity Simple kernel OS orRTOS solution solution High ------Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 AMP and SMP multiprocessing. SMP and AMP FIGURE 3.5 to applications with a high level of intrinsic parallelism, where critical tasks tasks where critical parallelism, to level applications ahigh of with intrinsic at application performed the level.very carefully solutions oriented AMP are control such resources, be must shared of charge controlling in specifically cores may even implement system). abare-metal is none OSs Since of the (eitherits OS own ones; separate different some copies or totally same of the core case, each runs orgeneous heterogeneous multicore this In processors. available embedded Windows, (, Android, and to cite just some). problemgivenanymorewidecurrently thea significant range of options not is multicorepast drawback, supporting processing, OS for need an the embedded many applications. in feature Another a fundamental is which response, system of to provide the ability the apredictable timing affects of factor, workload. factor distribution related dynamic This limiting to the but performance, have they processing amain to maximize of parallelism simpler are when working just one tasks OS. with ging protocol, avoiding thus overhead the would introduce. Finally, this debug communication for no any need implementing specific is there tion, because communica interprocess easy is one significant Another SMP architectures. advantages control one most of important is the of cores. sharing Resource among control of completion ordering the oftask sharing the of and resource to of cores),tion tasks/threads, of tasks assignment well dynamic as as and workload among cores (which application extract implies parti parallelism, the of distribution atware any point, efficiently it a perform can dynamic about whole the information the system hard all has OS cores. the Since all Architectures in FPGA Processors Embedded In contrast to implemented SMP, be eitherIn homo can in architectures AMP clearly are most oriented to possible get the advantage SMP architectures Application 1

Core 1 Thread 1 OS Multicore processor AMP architecture

Thread N Application 2 Middleware Core 2 RTOS Thread 1

Thread M Core 1 Multicore processor SMP architecture Middleware Application read 1 OS

Core 2 read N 65 - - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 66 are fabricated in 14 nm process technologies, but new families have 14 fabricated butare already in technologies, new process families nm 2015;book, to according FPGA vendors ( Kenny 2016), FPGAs high-end this of writing At fabrication time the technologies. latest nanometer-scale level application. target by demanded performance the on the depending of low-cost, from mid-range, devices for to choose designers high-end and ones).geneous now market a wide offers a result, FPSoC portfolio As the and, more recently, homogeneous architectures (initially heteroprocessors of FPGA devices have developed families been eral integrate multicore that PowerPC. processors, as such hard of single-core few years, last the sev In FPGA the fabric as chip same the fabric. in integration next step was the The FPGA of the rable logic resources implemented the using microcontrollers processors, is, that configu soft Initially, were on single-core based FPSoCs FPSoCs. to rise FPGA combined embedded giving fabricthat microcontrollers, with developing started and of chips SoCs potential tremendous the realized of applications ago, market.by Some years demanded the FPGA vendors advantageous to an be alternative design for number seems acountless DSPs, microcontrollers, FPGAs with Therefore, or GPUs combining clearly or versatility, of parallelism ing no platform hardware compares to FPGAs. context? Obviously, this offer in what they design, and can SoC when speak role the of is FPGAs What in arise: questions pertinent point two At this 3.1.3 now, until book, because of this adopted it not any FPGA. been has in out is scope of its the analysis future, the in architecture processing dominant 2009; al. et (Shalf Reinders 2015; Jeffers and Pavlo the be will 2015) this claim vendorsparatively and Although researchers many low consumption. energy a com with at providing massive aim concurrency architectures These tems. multicore in sys simplerare have used and computingthose less power than available solutions 2010; Dally [Nickolls and 2010; NVIDIA 2014]), Kalray but commercially ofsist of number avery cores (up large some existing in to 400 tions), cloud for computing instance, datacenters. Many-core (mostly intensive scalability computing applica a high requiring systems in niche main their option, many-core find processors, which a third but is there SoCs, multicore most and found ones in commonly solutions the Single- are Processors Many-Core 3.1.2 solutions. addition, single-core In from migration it eases individual thecontrol the coreeach of simplifies designer.by processor. This achieved. Usually, locked are systems, processes AMP in in order for a behaviorpredictableresources be toto specific assigned are FPGAs are among the few types of devices that can take advantage take of can devices of that the few types among the FPGAs are 3.6. Figure shown in as evolutionThe summarized be can of FPSoCs FPSoCs FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: (assigned) to agiven processors con ­processors ------Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 lyzed in the following sections. the lyzed in tage avery flexible ofarchitecture. having resources), blocks, hardware advan interconnect and the with specialized FPGA of the fabric (distributed logic resources implemented the logic, using cores (usually ones) IP processor general-purpose are They architectures. 3.1.3, Section statedAs in of involved FPSoC are processors origin soft the in Processors Soft 3.2 applications different (suchmany FPGAs) as figures. get to those can for in high-volume chips used be dollars. Only applications can that or those of order on investmentof the millions mustof hundreds in be return the grow,matically viable, to economically be extent migration for to that the the verificationdra and design software and hardware with associated costs to amore design advanced achip say nodeing 28 us to (let 14 nm), from the viability. just economic is migrat When for this reason The ASICs 65 nm. is on 10 average whereas based the technologies, nm announced forbeen evolution. FPSoC FIGURE 3.6 Embedded Processors in FPGA Architectures in FPGA Processors Embedded The different FPSoC options currently available in the market are ana are available market the currently in options FPSoC different The

1 No. of processor cores 2 4 PicoBlaze (Xilinx) () AT94K 1999

(Triscend) Excalibur () (Altera) 2000 Nios E5 (QuickLogic) QuickMIPS Microblaze (Triscend) (Xilinx) 2001 A7 Virtex IIPro µPSD3200 (Xilinx) 2002 …… (ST) Homogeneous architectures Virtex-4 (Xilinx) (Altera) Nios-II 2004 (Lattice) 2005 LM8 Virtex-5 (Lattice) (Xilinx) 2006 ……2009…… LM32 SmartFusion (Microsemi) (Altera) Arria V 2011 SmartFusion 2 (Microsemi) Zynq-7000 Cyclone V (Altera) (Xilinx) 2012 Arria 10 (Altera) Heterogeneous architectures 2013 …… hard processors FPGA familiesincluding Soft processors Zynq UltraScale+ Stratix10 (Altera) (Xilinx) 2015 Year 67 - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 features or update existing ones along the whole lifetime of the system and the the system of and the whole along ones the or update lifetime features existing to support of new adding resources ability the to refers both sors. Scalability advantages significant other soft ofreducedproces of are obsolescence risk complexity, performance, to meet tions and Scalability or cost requirements. advantage development the main processors, of enabling soft of solu custom the becomes then may there not enough be Flexibility of them. cases other in applications, many in whereas to necessary be resources their notcauses all providearchitecture some advantages fixed ones, to soft regard their with 3.3) Section in detail (analyzed of devices. processors in lies Although hard Cortex-A9 (ARM latest 2012) fami their in included by Altera Xilinx and ARM’s the as implementing processors, such hardwarespecific fixed blocks “standard” peripheral. way same it the any CPU as toavailable the in connect and FPGA resources alwaysis possible to designer for implement the peripheral acustom using systems), it performance-critical in speed processing to increase need of the (for because instance, to optimized be needs given available functionality soft theprocessor, of or a possibilities configuration standard of the part as not application. is available that target addition, In aperipheral required is if map). way, to extent, the tailored this be In to acertain can, processor soft the the defined be designer by (e.g.,and peripherals of or memory number type [ISA], blocks), architecture set or some can functional others bits, instruction be modifiedand cannot and (e.g.,defined data instruction numberthe of 68 instruction set and programming model. programming and set instruction aconsistent uses family processor soft each families, microcontroller Like memory, peripherals, on-chip of on-chip memory. to off-chip interfaces and architecture. processor Soft FIGURE 3.7 The main alternative to soft processors are hard processors, which are are processors, which hard are alternative processors to soft main The Although some of the characteristics of a given soft processor are pre of are processor agiven soft Although some characteristics of the 3.7, Figure shown in As of core, a processor consists processor aset asoft LBs

Hard processor IOBs FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: Soft processor FPGA fabric Bus interface memory On-chip Processor core Bus interface Bus interface controller Memory Bus interface Peripherals - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 (Lattice 2014)(Lattice processors, LatticeMico32 (LM32) the released 2012) (Lattice (LM8) LatticeMico8 and cores over processor years. More the recently,etary Semiconductor Lattice , respectively,through propri have most popular the consistently been Nios-II* (Altera 2015c), 3.8a Figure shown in are whose block diagrams more to used. be FPGA resources worse predictable, implementation less is and their formance requires and usually, per architecture, their for any particular optimized nothand, being to modify. other and the On to adapteasy FPGA architectures to different quite limited. code are the of reusing possibility the of and bility ior major accurate porta drawback models. hardware from the that Their is it possible behav because is accurately be to simulate their can determined, power and consumption utilization, abouttion speed, resource processing informa the that provide sense usually the amore reliable in performance, they so FPGA architecture, for aparticular optimized cores are Proprietary Cores Proprietary 3.2.1 availablecomprehensive options for different of view designers. the give afairly most widely cores, the will and which used features main the available market, the withoutfeatures of in loss generality, on focus we will respectively. diverse many with processors soft many are Although there 3.2.1 Sections 3.2.2, and in analyzed are processors of soft types two These processor. soft the HDL of owner the code describing actual not the of devices, mayfamily which not available be is designer others, the or that in given to a specific logic may resources processor use soft the that are regard this factors in of devices. Limiting to new migrated be families usually can of of reduced FPGA obsolescence, risk chip. terms processors same soft the In in one processor asystem, of replicating more implementing than possibility Architectures in FPGA Processors Embedded and e,and respectively. † *

Lattice FPGAs, they are better analyzed together with proprietary cores. proprietary with together analyzed better are they FPGAs, Lattice for optimized are they since cores, IP free open-source, actually LM32 are LM8 and Although Nios-II. of predecessor processor, soft Nios the commercialized and developed previously Altera Xilinx’s PicoBlaze (Xilinx 2011a) PicoBlazeXilinx’s 2016a) (Xilinx (Xilinx MicroBlaze and Altera’s and portable more and cores are affordable. relatively are Open-source They divided be groups: two into cores can processor Soft 2. 1. therefore, be implemented in devices from different vendors.therefore, different implemented be devices from in independent can, and technology cores, are which Open-source vendor. by that only devicesported from FPGA vendor, an with cores, associated Proprietary is, that sup † whose block diagrams are shown in Figure 3.8d Figure shown in are whose block diagrams - 69 - - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 70 * including: architectures, have similar cycles. Both also clock two executed in are PicoBlaze, of whoseparticularly instructions all FPGAs, Lattice and respectively. have Both Xilinx* apredictive behavior, (c) Altera’s (d) Nios-II, Lattice’s (e) LM32, and Lattice’s LM8. MicroBlaze, (a) (b) cores: PicoBlaze, Xilinx’s Xilinx’s processor of proprietary diagrams Block FIGURE 3.8

and Virtex-7 Series. and Virtex-6, KCPSM6 and FPGAs, Spartan-6, for Spartan-3 for version PicoBlaze KCPSM3 the is PicoBlaze and LM8 are 8-bit RISC microcontroller cores optimized for cores optimized microcontroller PicoBlaze 8-bit LM8 RISC and are (a) PicoBlaze processorcore Decode and (d controller Program control lo JT control La Stack ) Ex controller controller Interr Prog AG tticeMico32 pr ce lo gi debug ption ram upt c gi c General-purpose Ge M controller Interrupt 16×8 bit 16×8 bit registers oc n Bank A Bank B 32×32 bit ultiply and re re eral- Control essor divide AL gi gi sters sters U pur FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: (c) core Nios-II processorcore JTAG debug pose Exception controller controller controller Program Interrupt module Scratch pad I/O ports Program memory memory Instr Memor 256 B ALU Wi 4 kB Wi Da Instr Da memor shb shb ta memor uction cach ta ca Custom logic Avalon port Register sets uction one bu one bu y andpe 32×32 bit General- purpose Shadow Control ch y e y s s e ripheral (b purpose reg. JTAG debug c MicroBlaze processorcore ontrol logic controller Exception ) Interrupt controller Program Special logic MMU s Instruction ALU FPU cache Memory and Peripherals cache Data MPU (e LatticeMico8 processorcore ) Barrel shift Multiplier 32×32 bit General- registers Decode and purpose Divider controller Program Program ALU FPU counter control Stack Instruction Data bus Instruction bus TCM TCM Data a Instruction businterface LMB, localmemorybus. controller Interrupt Memory andperipherals Data businterface Instruction cache ALU (AXI orLMB (AXI orLMB Data cache Wishbone bus MMU Scratchpad a a ) memory ) memory PROM

Memory and peripherals Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 istics and features: and istics vendors, respective FPGAs ofthe their have they character common many 3.5.4. Section in OpenCores, aWishbone described interface from whereas LM8 uses up to 256 input up through and to them 256 output with municates ports, must separately be erals FPGA implemented the fabric. in PicoBlaze com periph interface. required None peripherals, of all so it includes internal PicoBlaze communication LM8 the and is between difference main The Architectures in FPGA Processors Embedded Similarly, although MicroBlaze, with Nios-II, associated LM32 and also are • • • • • • • • • • • • • • • • • • • • and Nios-II). and eCos, FreeRTOS,ThreadX, uC/OS-II, (only or embOS MicroBlaze in support: OS μ Linux, real-time and Standard logic. debug Hardware peripherals. adding custom and Possibility for creating capabilities. Exception handling Multiple sources. interrupt peripherals. memories and toInterfaces off-chip Nios-II). and MicroBlaze point floating computationSingle-precision capabilities in (only memory interfaces. other I/O, interfaces,nication general-purpose SDRAM controllers, and commu serial Wide timers, as peripherals such range of standard or performance. area Possibility of variable pipeline, to optimize memory (only management Nios-II). and MicroBlaze in virtual to support requiring (MMU) OSs Memory unit management data memories. cache and Instruction 32-bitThirty-two registers. general-purpose . 32-bit set, data path, address space. and instruction 32-bit processors. RISC general-purpose 8 in LM8). Interrupt management(oneinterrupt source inPicoBlaze,upto Unit Logic (ALU).Arithmetic pages LM8). 256- in in memory scratchpad (64 PicoBlaze, RAM in Internal up to 4 GB Up to 4 Kof 188-bit-wide memory. instruction (16 registers General-purpose PicoBlaze, in 16 or 32 LM8). in CLinux, MicroC/OS-II, MicroC/OS-II, CLinux, - 71 - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 72 the Nios-II architecture consists of the following functional blocks: following functional of the consists Nios-IIthe architecture Figure 3.8c, shown in As processors. applicable soft also to similar any other example, Nios-II an core as processor are soft but analyses avast of majority the target the application. of needs specific notbutdoes havebe adaptedcan it some to the afixed structure, extent to tionality, or complexity. performance, words, other In core processor asoft whether to or include not, for them func to according system requirements applications. it for up and designer optional, is to all the are Some of them required core up are the building blocks not hand, other all usage. the On to reduce FPGA resource software emulated be in but also somecan of them implemented hardware, ISA usually the in are supporting blocks functional elements. The core to external the peripherals, to connect resources, and data memories, and addition blocks, in to instruction for of functional aset need the implies ISA. to This support designed is acertain processor A soft Most of the remainder of this section is focused on the architecture of the of the architecture on the focused is section of this Most remainder of the • • • ones used by other custom instructions and by and native instructions). instructions by custom other used ones (the registers Nios-II oflogic block with general-purpose instead of their registers operate internal the can with that those are tions instruc custom file register . asingle mented Internal in (up to imple be to 256) or multicycle instructions combinational several allow cycle to completed.one be clock Extended instructions more than cycle, require whereas multicycle (sequential) instructions clock a single within its function a logic block performs that through implemented is face) instruction supported. be A combinational can ­ tional, interesting feature of the Nios-II architecture not provided by others. Nios-II of the architecture feature interesting ALU, the integrated in block is that an Figure 3.9. is shown in as This created generates alogic hardware. instruction new Each custom in by one executed asingle to of substitute asequence native instructions to accelerate idea execution. to able designer for be The is the algorithm for example, components custom only of instructions, but custom also Custom instruction logic ( of performance. expense the at to save software, emulated for purposes in other FPGA resources (e.g.,have some instructions division) implemented or hardware in the core, to may choose designers configuring When instructions. rotate and logic, shift supports and relational, arithmetic, and ters ALU execution time. turn, latency and, in mayto shadow 63 reduce defined to be sets context register switch 32-bit up and to thirty-two Optionally,registers control registers. up Register sets Up to 256 custom instructions of five different types types of(combina five different Up to instructions 256 custom : It regis contents operates the general-purpose of with the multicycle, extended, internal register file, and external inter multicycle, file,and external register extended, internal : They are organized in thirty-two 32-bit thirty-two in general-purpose organized are : They FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: optional ): Nios-II addition supports the of not ------Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 FIGURE 3.9 Connection of custom instruction logic to the ALU. the to logic instruction of custom Connection Embedded Processors in FPGA Architectures in FPGA Processors Embedded • • • interface specification. The Avalon bus is analyzed in Section 3.5.2.TheSection in interface specification. analyzed is Avalon bus Avalon-MM ports, to according Altera’s master Avalon proprietary implemented 32-bit both using are data buses and separate instruction Instruction and data buses notcontroller is implemented. interrupt to internal it the and connected also are sources interrupt EIC EIC, an using interface. it corenect When internal through to the EIC create con an may and also Designers by software. determined sources, whose is priority interrupt hardware up to 32ports internal Internal and external interrupt controller routine. exception response sponding corre the calls exception of cause and the the assesses that dler exception han an through interrupts, hardware internal including Exception controller applications: floating-point computation-intensive IEEE Std. 754-2008 or IEEE Std. 754-1985 specifications) to support (according to floating-pointincludeinstructions single-precision These logic. instruction custom from built instructions predefined it advantage wish) anywayuse they to take if of instructions. custom assembly code to (they use for need programmers the may eliminating any Cor C++ in instantiated application directly be ated code, can that tion interfaces to elements access interfaces outsidetion processor’s of the data path. generate communica Finally, instructions interface custom external instr Arithmetic Relational Shif Cu Logica In addition to user-defined instructions, set Nios-IIof a offers instructions, addition In to user-defined Whenever a new custom instruction is created, a macro is gener created, is is amacro Whenever instruction anew custom ro

stom uction t and ta te Nio l s- II AL U : It possible exceptions, to provides all response Result : Nios-II is based on a Harvard architecture. The The architecture. on aHarvard : Nios-II based is A(31..0) B(31..0) clk_en n(7..0) b(4..0) a(4..0) c(4..0) wr_c re rd_b star rd_a cl set k t Cu Ex ( stom inst Combinatoria EIC ter register file Multicycle Extended Internal nal interfac ) ( optional ru ction logi l e ): Nios-II sup ): Nios-II c re done sult(31..0) ------73 Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 74 • • • • • ( memories Instruction and data cache FPGA the fabric)mented in used. is memory, slower memory, off-chip peripherals (imple on-chip and However, most usually, of acombination (fast) embedded on-chip FPSoC. the when defining configured are either. features These nor way used, to be them the can peripherals that to connect and ries of memo or type number the not does specify Nios-II architecture to executed processor. be by the (reads)fetches instructions the just bus data peripherals, memory and instruction whereas the resources. removed and phase tion FPGA operation, for releasing normal thus verifica and design the during used be module debugging can the context, that advantage is the processors this to hard regard In with data. execution trace real-time memory contents,and collecting and registers editing and watchpoints, and analyzing breakpoints ting to memory, stopping execution,programs and set program starting downloading are tasks supported Some of debugging the purposes. to remotely be processor for soft allows debugging accessed the This core signals. JTAG to internal and on-chip to the connects circuitry protection unitMemory ( execution. ware without proper authorization, avoidingsections thus soft errant to memoryprevent protection and to to memory write any process Avalon of the address lines the in (the bus), sets hardware the ones (software) physical into memory addresses of addresses tion virtual to transla memory. processes, memory allocation are tasks Its main virtual requiring OS an with conjunction in sense only makes its use MMU aTCM with port. Several TCMs one may associated each used, be core. to the but data on chip external TCMs, and tion are which instruc both connect applications. ports These time-critical in access low-latency at ensuring memory aimed TCMincludes ports optional Tightly coupled memories software. in managed coherence and are Cachemanagement both. available are one or of to methods them bypass Software optional. core, of is the part use but their intrinsic an are data caches and tion instruc Both data ports. master and instruction the supported in JTAG debug module generated.is memory access, exception an unauthorized an attempts to perform a process case In memory map to defined the be software. by in regions agement different to the not. is It allows permissions access memory man but virtual required are features memory protection The data allows bus memory-mappedThe read/write to access both ( optional FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: ): This block handles virtual memory, virtual and, therefore, block handles ): This ( optional ( MPU TCM ): As shown in Figure 3.10,): Figure shown in As block this ) ( ) ( optional optional optional ): This block is used when block used is ): This ): The Nios-II architecture ): Nios-II The architecture ): Cache memories are ------Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 sors. In this regard, there are significant differences among the three soft soft three theamong differences significant are regard, there this sors. In proces of soft architecture the in FPGA akey the fabric aspect is in cuits but others are also available from third parties. available third also but from are others Not FPGA only vendors provide processors, soft peripherals to support their or biometrics, networking, fields. among other processing, oriented to signal (GPIO,resources counters, timers, or UARTs) blocks to complex, specialized standard range from They architecture. processor soft the integrated in be or of may number peripherals are Alarge interconnections. required the addition, includes peripherals, processor soft and memory resources, the In theis. ISA executed core be no what of matter the configuration can the in any instruction that one, most it toit ensure important since the has is Connection of the JTAG of the module. debug Connection FIGURE 3.10 μ as such it block. OSs not does It integrated with MMU be include an can processor,RISC former but two ones. For the much instance, simpler than implementation Lattice’s the in details. a32-bit LM32some differences also is may even there be if architectures, two the present in most are of which and includeblocks, optional and fixed architecture Harvard with processors 3.8. 32-bit Figure clearly are be in noticed Both can MicroBlaze RISC Xilinx of performance. usage. Finally, Nios-II/e (economy) usage at expense resource the optimizes resource and performance (standard) between trade-off a balanced offers of usage. FPGA Nios-II/s resource at expense the performance maximize Nios-II/for decisions. complexity their weighs more in (fast) to designed is core, own on whether performance their depending build can designers applications,of different basic Altera which models provides from three to requirements the fit the architecture Nios-II To of task configuring the ease Architectures in FPGA Processors Embedded CLinux, uC/OS-II,CLinux, TOPPERS/JSP and 2008). (Lattice kernel Communication of the core processor with peripherals and external cir external peripherals and corewith of processor Communication the of, element consists only not processor core is butthe a soft processor The The similarities between the hardware architecture of Altera’s architecture hardware the between Nios-II and similarities The PC prog orexternal rammer

JTAG controller JTAG debug Av module alon interf On-chip memory ac e FP Av Av GA fabric processor core alon interf alon interf controller Memory Nios-II ac ac e e Av Pe alon interf ripher Nios-II al s ac e 75 - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 complex flexible and (configurable) cores. includesopen-source an It ALU Space European nated Agency’s from LEON. project It one most of is the It supports different OSs, such as Linux, RTEMS, Linux, It OSs, as such supports different FreeRTOS, eCos. and WishBone Section 3.5.4). (describedinterface is in communication The units. exception/interrupt and functions, management processing tation of signal caches, MMU, floating-point unit (FPU), MAC the efficient unit implemenfor data and instructions it registers, features, other includes general-purpose 3.11b,Figure Among a32-bit is architecture. Harvard with processor RISC 1200,32- shown in is whose block OpenRISC processors. 64-bit diagram and implementation the developed of 1000 by targeting architecture, OpenCores FPGA on the family.depending 70 range from to devices, 200 MHz, operate it afrequency and can in Xilinx 3.5.1.1). Section (describedbus in core supports and Altera, The Microsemi, ARM’s interface is communication The AMBA proprietary 32-bit AHB-Lite data TCMs, and controller, interrupt or configurable instruction debug logic. includes configurable and, features, among other 32-bit architecture RISC 3.11a,Figure FPGAs. a has It targeting was developed specifically by ARM OSs. of and tools set byported afull dent, sup low are they and cost, on well-known, proven based architectures, indepen technology advantages are they solutions that are ofmain these 8051, the well-known processors, as such different 80186 (88), The 68000. and community, Aeroflex Gaisler’s LEON4,implementationsas as manywell of ColdFire V1, Technologies’ MIPS 1200 OpenCores from MP32, OpenRISC ARM’s Some examples are parties. Cortex-M1 Cortex-M3, and Freescale’s cores available processor soft other from vendors, open-source also are there FPGA architectures/ certain addition with cores,In to associated proprietary Cores Open-Source 3.2.2 6.3. Section in of design SoPCs described the are supporting tools process.The design the simplify dramatically cores that IP and tools design of ecosystem an well as as environments robust design have disposal at their Fortunately, processors. soft sions one must with face when dealing designers alternatives, or deci ofsity design software concepts, and hardware terms, 76 FPSoCs is made in Section 3.5. Section made is in FPSoCs most widely in buses used on-chip of the analysis interfaces. A detailed ARM’sdevices use WishBone interface. LM32 AXI Lattice uses processor [XCL]), CacheLink [LMB] memory bus local Xilinx and but most current the IBM’s used tially (such ones proprietary bus, as with together CoreConnect sions to date, Altera’s ini Avalon proprietary Xilinx hand, other the bus. On Nios-II always has analyzed. its very from first being used, ver processors LEON4 a32-bit is SPARC on the based processor origi V8 architecture 1200 (OpenCores processor 2011) OpenRISC The OpenRISC on the based is Cortex-M1The (ARM processor 2008), shown in is whose block diagram At this point, readers may be afraid to realize the huge the diver and amount point, readersAt to realize may afraid this be FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: ------Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Embedded Processors in FPGA Architectures in FPGA Processors Embedded analyzed in this chapter. It this should in data noted be that have extracted analyzed been ThreadX. point of view, it eCos, supports Linux, RTEMS, Nucleus, VxWorks, and 3.5.1.1) Section (described in interface. communication From as asoftware AMBA the 2.0 bus AHB uses and data caches and levels of instruction data buffer. trace and It debug supportsand two module instruction with multiply, hardware with divide, MAC and IEEE-754 units, FPU, MMU, Some open-source soft processors: (a) (c) processors: Cortex-M1, and soft (b) LEON4. OpenRISC1200, open-source Some FIGURE 3.11 Table 3.1 summarizes the performance of the different soft processors processors soft Table different of the performance the 3.1 summarizes

(b Op ) (c) Instr Instr LEON4 enRISC1200 Instr Instr (a) controller MMU ca P CP TC ca uction RA uction AR ch ower Instr uction uction ch controller Interr U e M interface M M Cor CPU e uction upt tex -M1 Da MA CP ca RA Da Da MMU ta ca Da ch U TC DS AHB in C |FP ta ta M ta e AHB-lit interface M interfac P MAC |MUL|DIV ch Da e U ta te ALU rfa e De MMU FP controller ce P e bug lo Wi U ower shb De wa Breakp access gi one interfac De bug lo c tchp Da Copr De register file Four-port controller bug Interr ta bug lo po oint oint controller Interr

oc gi rt Ti c esso upt mer gi upt e c r 77 Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 78 architecture or ISA. architecture developers, ofview for no of software difference, example, is there terms in delay. minimum fabric with However, very interestingly, point of the from FPGA to the connected be can they that so embedded in are devices they FPGA of the adapted are ones architectures to the hard that is processors stand-alone versions the same of with the difference main blocks.ware The hard of specialized somehow type be considered another can as they so chip, same FPGA the the fabricetary, in integrated with are that processors propri usually viable only solution. commercial, the be are processors Hard may processors hard required, is but possible performance highest when the avery suitable are processors alternativeSoft development for the of FPSoCs, Processors Hard 3.3 architectures. AMP possibleis homogeneous to design SMP or or heterogeneous FPSoCs, with Therefore, OSs. support processors, it and different hard with combination processors, or soft their or different same on the may systems ticore based be developed to be multicore. single mul from can on them, based These tions available), enough resources are extent there that solu diverse many FPSoC obtained. been has information this provided by information vendorsfrom it some how cases, not and, is clear in Since several soft processors can be instantiated in an FPGA (to design an in the instantiated be can several processors Since soft PicoBlaze a Soft Processor Performance of Processors Soft 3.1 TABLE LatticeMico8 MicroBlaze Nios-II LatticeMico32 Cortex-M1 OpenRISC1200 LEON4 Up to200 MHzor100MIPSinaVirtex-II Pro FPGA (Xilinx2011a). Nios-II/e Nios-II/s Nios-II/f FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: MIPS or DMIPS/MHz or MIPS 100 MIPS 0.15 DMIPS/MHz No data 0.74 DMIPS/MHz 1.34 DMIPS/MHz 1.16 DMIPS/MHz 1.14 DMIPS/MHz 0.8 DMIPS/MHz 1 DMIPS/MHz 1.7 DMIPS/MHz a Maximum Frequency Frequency Maximum 240

200 94.6 (LatticeECP2) 165 343 185 115 200 300 150 Reported (MHz) Reported - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 † * logic technology). gates fabric of proprietary (QuickLogic’s Via-Link antifuse processor from MIPS Technologies MIPS from processor aMIPS32 4Kc devices combined FPSoCs, whereas QuickLogic their in sors performance. available peripherals, frequencies/ operating and architectures, processor 2010b). Virtex-5 (Xilinx and FXT of terms was abig in ahead jump This 2001), 2011b), Virtex-II Xilinx’s and Pro (Xilinx 2008), Virtex-4 (Xilinx FX ESP (QuickLogic 2001),QuickLogic’s QuickMIPS Triscend’s A7 (Triscend of Altera’s release the with (Altera family Excalibur 2002), followed by produces AT40Kalthough Atmel still FPGAs. rupt sources). available market, the currently in None devices is of these inter external and (capable controllers interrupt internal both of handling UART), (capable compare and units capture of signals), PWM generating and of timers/counters, set of just asmall (SPI, interfaces communication serial consisted equivalent logic gates, microcontrollers peripherals of the the and for roughly 40,000 accounting resources of consisted reconfigurable part (8051/52microcontroller compatible, 10 at MIPS 40 MHz). the cases, both In (Triscend 2000), series E5 8032 the on its an side, including commercialized 25 MHz) on reconfigurable its logic based with AT40KFPGA family. Triscend, AVR 8-bit RISC aproprietary combined (1 processor MIPS/MHz, up to SystemProgrammable (Atmel Level series 2002), which by Triscend.* Atmel and Atmel For developed instance, AT94K the Field of (and support for) relatively devices recent may discontinued. be aconsequence, production as continuously and released, are being features where segment new a market ever-enhanced devices is with This processors. Finally, hard limited. for as stand-alone processors, same affects obsolescence FPGAfamily, specific foreach be may portability fine-tuned design are they Second, sincebe modified. cannot hardware structure fixed their because applications. own their point to develop starting as use available can designs erence designers that ref provided many ecosystems usually port vendors. by the also are There have they and extensive detailed, whole and usually development sup and performance/ to provide designed reliable, have highly agood and are carefully been have they of memory resources, management peripherals and awide variety ASICing implementations); implementations (and these from well known correspond to the similar is performance Their processors. state-of-the-art Architectures in FPGA Processors Embedded

Imagination Technologies acquired MIPS Technologies in 2013. in Technologies MIPS acquired Technologies Imagination 2004. in Triscend acquired 2016, in Atmel Xilinx and acquired Technology Microchip Altera and Triscend already opted at this point to include ARM proces point toAltera include Triscend ARM and opted already at this afew only months, 32-bitAfter market FPSoC entered the processors were hard processors proposed including FPSoCs commercial first The have some drawbacks. processors Hard not First, also scalable, are they obviously are advantagesThere many of optimized, derived use the from ­functionality/power is trade-off. Documentation consumption † with approximately with 550,000 equivalent 79 - - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 recent FPGA families, such as Altera’s as such FPGArecent families, 10 3.13), Arria (Figure V, Arria and system debug setup. and addition,In block intended to it ease embedded macrocell trace includes an blocks, multi-mode UARTs, or Triple-Speed control). access media Ethernet 80 port different communication standards (USB controller; standards communication SPI, different port I FPGA the fabric, logic peripherals to in many sup well as user as custom 3.5.1) Section orin blocks hardware specialized with for communication (all interfaces on ARM’s based different AMBA proprietary bus, described 512 kB of nonvolatile 64-bit cache. It memory, provides instruction 8 kB and up 1.25 to 166 MHz, DMIPS/MHz), memory blocks, SRAM 32 kB two with Cortex-M3 ARM include an lies 32-bit architecture, (Harvard processor RISC 2013, 2(Microsemi Microsemi’s SmartFusion and SmartFusion 2016) fami platforms. FPSoC for their processors main the as architectures for ARM evolution,tremendous one FPGA factor: common with vendors All opted 3.12) Figure SmartFusion, a been has there but to released, be then since for new them designs. to use not recommends market, the although Xilinx in still are families latter three Virtex-5405 cores and up FX to IBM two PowerPC these cores. 440 Only Virtex-II Virtex-4 devices FX included Pro and up to IBM two PowerPCXilinx SmartFusion architecture. FIGURE 3.12 Altera and Xilinx include ARM Cortex-A9 include ARM most some of their coresAltera in Xilinx and It took more than 5 years for a new FPSoC family (Microsemi’s family for years anew FPSoC It 5 took more than JTAG SmartFusion2 (PCIe, XAUI/XGXS)

controller Serial controller0 I/O + System blo native SERDES Micro SR Se ck rial 0I/ SPI I/O s (1024×18) FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: AM OS

Multistandard userI/OD ARM Cortex-M3 subsystem (MSS) Microcontroller OSCs blocks (1024×18) Large SRAM FPGA fabri (P c Se +n CI rial e, XA ative SERD erial 1I/ controller UI/X DDR controller DDR controller DR userI/ O MACC (18×18) Fabric GXS) MSS Math blocks ES 1 O PLL 2 C, CAN and s - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 follows: anytime.the fromat processor configured tially power or par to reduced. be addition, consumption fully be In logic can the power off it used, possible supplybe is to turn fabric, for the hence allowing suppliedfabric to are separate is from power processor the only If sources. FPGA the and devices, processors the Altera Xilinx both and In functions. (real-time) of time-critical charge in core is other whereasprograms, the application main and OS the operation, one cores may of because the run suitable for particularly real-time to 1.5 are GHz). architectures Dual-core Cortex-A9 ARM The a32-bit is (2.5 dual-core processor DMIPS/MHz, up Cyclone V(Altera 2016a,b) 2014). Zynq-7000 Xilinx’s and (Xilinx SoC AP Architectures in FPGA Processors Embedded * system. 10 processor hard Arria FIGURE 3.13

FPGA configuration is analyzed in Chapter 6. Chapter in analyzed is configuration FPGA The main features of the ARM Cortex-A9 ARM of the features as are dual-core processor main The • • • caches, 512 kB 32 kB share each, and of level 2(L2) cache. its level own coreEach has 1(L1) data and separate instruction ROM. of 64 kB on-chip and Up RAM to 256 kB of on-chip modes. dual-processor to operate single-processor,Ability in SMP dual-processor, or AMP Hard pr core br LW debug/trac AXI 32 NEON

AR JT HPSto 32 KL1cach AG oc M Cor idge essor syst e tex-A9 FP HPS e 32/64/128 32 KL2cach U em NAND bridge AX to core I fl as NEON AR h e 32 KL1cach M Cor Core toHP 32/64/128 bridge AX tex-A9 256 kB RA I FP e M U S Hard memor M controller fr * USB OT ont end ultipo Ti EM UA SP I mers 2 C RT AC I rt G y Confi conf QS De SD/SDIO/ subsyste contro FP MMC DM PI fl dicat ig I/O gu uratio GA ration A as ed l m h

n

81 - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 82 * mance. Cores can work in split (independent) or lock-step (parallel) modes. mance. Cores canworkinsplit(independent)orlock-step (parallel) modes. MHz and providing 1.67 DMIPS/MHz perfor of operating at up to 600 tecture. The ARM Cortex-R5 is a 32-bit dual-core real-time processor,* capable resulting inaheterogeneous multiprocessor SoC(MPSoC)hardware archi- core processor andan ARM Mali-400MP2GPU,asshowninFigure 3.14b, operatecan up to 1.5 GHz, DMIPS/MHz providing 2.3 performance. 10 power Zynq families cores in and The UltraScale+ consumption. Cortex-A9, ARM the presentalready in less has and but smaller former is the Cortex-A53 ARM of the Most features of 3.14) the (Figure processor are 2016b),(Xilinx Cortex-A53 ARM an including both quad-core processor. Altera’splatforms are 10 Stratix (Altera 2015d) Zynq Xilinx’s and UltraScale+ FPSoC most book, two recently released the of this of writing At time the

Cortex-A series includes “Application” processors and Cortex-R series “real-time” ones. “real-time” “Application” Cortex-R series and includes Cortex-A series processors In addition,ZynqUltraScale+devicesincludean ARM Cortex-R5dual- • • • • • • • • • USB 2.0, CAN, SPI, UART, I and 10/100/1000 or Access Control, Media tri-mode Ethernet well as as Available controller, peripherals include interrupt GPIO, timers, telephony, processing, image synthesis. sound and decode, 2D/3D processing, audio speech graphics, and gaming, video encode/ as such algorithms processing signal and ­multimedia ­vector ­ multiple-data (SIMD) single-instruction, supporting and sets providing aquad-MAC 128-bit additional and 64-bit and ­ FPU ­ enhances which engine, NEON processing media double-precision and Single- FPU. MMU. memory, scatter–gather. and memory to memory,types: memory to peripheral, peripheral to data transfer different DMA controller supporting Eight-channel (8–11 pipeline length stages).Dynamic poses (Sharma 2014). flow for executedpur be to debugging accessed being instruction CoreSight TraceARM Program allows the which Macrocell, Section 3.5.1.3). (described blocks) hardware interfaces in AXI through ­specialized FPGA the fabric (distributed with logic and Connections flash (QSPI, NOR,and NAND),and SD/SDIO/MMC memories. for memory interfaces Hard DDR4, DDR3, DDR3L, DDR2, LPDDR2, floating-point instructions. NEON technology can acceleratecan NEON technology floating-pointinstructions. FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: 2 C interfaces. features by features register register - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Processing systems of (a) Altera’s Stratix 10 FPGAs and (b) Xilinx’s Zynq UltraScale+ MPSoCs. of (a) Zynq UltraScale+ systems Altera’s (b) 10 Xilinx’s and FPGAs Stratix Processing FIGURE 3.14 Embedded Processors in FPGA Architectures in FPGA Processors Embedded (b ) MPS AR Po 32 kBL1 NEON GIC I-cache MMU AR wer oC pr (a) Sy M Cor AR M Cort

stem managemen Hard pr AR | 1 MBL2cach FP M Cor SC LW oc AR 32 kBL1 32 kBL1 GA bridge AR M Cor NEON I-cache I-cache NEON te U essing system HPSto 32 kBL1 M Cor D-cache x M Cor Secu | ex TM FP ET oc te CCI/MMU TM -A53 x U essor syst te MA rity TM te te x -A53 TM x x 32 kBL1 32 kBL1 D-cache D-cache -A53 TM e TM e 1 HPS toFPGA FPU FPU -A53 -A53 e -A53 1 MBL2cache 1 t bridge e 1 em GigE AR ARM Cortex 32 kBL1 32 kBL1 I-cache I-cache NEON NEON Ge pr FP M Cort 32 kBL1 AR I-cach MMU FPGA toHPS 128 kB RM oc FP ometr TC GA fabric esso M Cort 64 kBL2cach GA fabric bridge M MaliTM- CA AR e ex y r 32 kBL1 32 kBL1 MMU D-cache D-cache TM N M Cor TM FPU FPU 32 kBL1 D-cach -A53 -A53 ex-R ET FP T pr U 400 MP2 wo pi M tex-R5 5 oc Hard memor e scheduler SDRAM e essors a SP controller B M U ch xe L1 IU e NAND Timers 256 kB UART JTAG debug ls RAM flash SPI

PS-GTR USB 3.0 y eMMC Timers NAND Debug SDM toHPS HPS toSDM DMA SD/ AR SD/SDIO/ USB OTG HPS I/O EMAC MMC DMA T I SD 2 C M controller Qu SATA 3.1 Display 256 kB DDR NO RAM PCIe port ad SP R I 83 Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 multimedia applications. multimedia avery suitable GPU in platform results for Mali-400 the Its with combination encoding/decoding simultaneous supporting of audio video and streams. block hardware (i.e., of form specialized the FPGA ofin the resources), part as include avideo codec unit also Zynq family the UltraScale+ Some devices in 4. 3. 2. 1. Cortes-A53 ARM the application processor. It of five consists blocks: main autonomous fully memory bandwidth is nal and with to operate parallel in Texture Ericsson and Compression Anti-Aliasing Scene to reduceFull exter 84 † * OpenVG 1.1,* ES2.0. whereas on OpenGL 3D based graphics are capable of at Its operating 2D up on vector to based 667 MHz. graphics are GPU alow-power is Mali-400 ARM The graphics acceleration processor, redundant systems. Lock-step operationisintendedforsafety-criticalapplicationsrequiring

embedded systems. on 3D graphics 2D and full-function for API cross-platform a royalty-free, is ES OpenGL graphics. raster and vector two-dimensional accelerated hardware for API OpenVG cross-platform 1.0 aroyalty-free, is The main features of each core are as follows: as of core each are features main The • • • • • • 5.

blocks. Power other power the supporting for unit, management all gating cache. L2 cache:Geometryandpixelsprocessors share a64kBL2read-only translation. and forMMUs checking access MMU images. display final as screens that results cessing stages ofthepipeline graphics (two),Pixel processors which the pro handle rasterization and fragment for pixel processors. data structures pipeline processor,Geometry in charge ofthe processing vertex stage ofthe graphics AXI interfaces (described in Section 3.5.1.3). Section (described interfaces in AXI system. ging CoreSight to ARM debug for connection Embedded macrocell trace double-precision and Single- FPU. MPU. controller.Interrupt access). memories have All ECC and/or protection. parity or low-latency applicationsdeterministic (real-time single-cycle 128 and data caches and kB TCM for32 highly kB L1 instruction : Both the geometry processor and the pixel processors use use pixel processors the and processor geometry the : Both : It accelerates of and of generates primitives building lists FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: : They produce the framebuffer produce: They framebuffer the † It supports - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Embedded Processors in FPGA Architectures in FPGA Processors Embedded where special attention is paid attention is to SoPC 6.3). (Section tools where design special Tools Chapter 6, in methodologies for and FPGA-based analyzed are design blocks, hardware devices). or even implemented external in specialized as system, FPGA implemented processing the fabric, the in in available there (memorya whole of resources peripherals integrated hardware and bunch may have kernels) to interact (maybe share some proprietary and with also OSs real-time heterogeneous multicore and where FPSoC, general-purpose about what it to debug may to is, just think one take has a this how true ize market. To the platforms in ofsuccess these potential real the in importance alsoparamount of is and tasks tools for verification design efficientsoftware context. this role play in also afundamental available FPGA the fabric Chapter (analyzed 2) resources in in The FPSoCs. to ones of considertant*) only not design the when the are addressing (although performance and impor extremely features their that emphasize 3.5.1.3). AMBA Section AXI4through (described interfaces in FPGA the fabric with and among themselves interconnected of are consists * user’s due is context) to devices. This of success contributed to these the has context awareness (identification real-time of multiple enabling sensors of mobile devices (tablets, in integration The wearables, smartphones, IoT) and Hubs Sensor 3.4.1 getapplication specific domains. theytar readers acomprehensiveor architectures) of view SoC configurable book, haveFPGA might but and included excluded to give are been this from (some not even are on of them based aforementioned basicthe architecture they either because do not follow characteristics solutions specific other with analyzes section applications. on specific This not focused is, are that they awide range of target application they fact that the processors, and domains, FPGA one of or and more an embedded consisting basic architecture, the haveable all They at have characteristics: least common analyzed. two been avail solutions commercially FPSoC most typical the previous sections, In 3.4

ment of more and more sophisticated SMP and AMP platforms. AMP and SMP sophisticated more and of more ment develop continuous the to related is years recent in platforms evolution of FPSoC fast the for reason main the that 3.3 is 3.2 and Sections in analyses the from deriving conclusion A clear Given the increasing complexity of platforms, availability Given of FPSoC the increasing the to it have important processors analyzed, is soft been and hard Once system of Zynq devices processing the blocks UltraScale+ building All Other “Configurable” Solutions SoC Other 85 - - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 mainly through SPI I and through mainly host processor the and sensors the with DMA,and connect whereas they rate heart sensors),and of well as data as storage. eter, pressure, ambient and light, proximity, temperature, gesture, humidity, (accelerometer, front-end sensors sampling and ing, magnetom gyroscope, targeting always-on context awareness applications.targeting tion). includesdetector change and a It reconfiguration supports in-system detec or gesture dead indoor reckoning, navigation,monitoring, pedestrian rate heart motion-compensated (such recognition, and voice as triggering algorithms data processing of sensor charge in is processor one. FFE The to use it necessary is it case OS, and the tasks, hosts in processing purpose (single-cycle of general- MAC) charge in processor. core is VLIW ARM The DSP-like proprietary (FFE),QuickLogic is a flexiblefusionengine which Cortex-M4F, memory,of SRAM up FPU and to 512 kB an including a and FPGA fabric. an and blocks hardware 3.15.Figure of specialized It aset of amulticore consists including processor shown in is Its basic or sensors. light architecture inertial, or environmental, microphones, mobile devices, in high-performance as such range of sensors (QuickLogic 2010). SoC (QuickLogic 2015) and Customer-Specific Standard Product (CSSP) ing twodesignplatformsinthisarea, namely, EOSS3SensorProcessing 86 it is notified and takes overthetakes and process. it notified is host attention, of context requires change when the Only time. real context in user’s in changes to hardware detect necessary includeprocessing. They the microwatts) theof range moreof tens efficient,(in and power-consuming less faster, in resulting tasks, management sensor from ahost processor relieving very rapidlyis at developing. aimed systems hubs coprocessing are Sensor gave hubs, to anew paradigm, sensor which rise This used. platforms are processing power traditional if ahigh implies ment consumption of sensors solutionsmanage tackle specific it.to Real-time requires which concern, tal of mobile devices, afundamen powerbecomes batteries from consumption or DSP-based microcontroller- by traditional systems. However, case the in very complex. usually are algorithms processing necessary the because data acquisition, computational storage ahigh well power, as as analysis, and involving process always-on context aware decision-making and monitoring apps to work corresponding the properly, to have place in it an necessary is to to respond order voice conditions, In mental ability for or the commands. state (e.g., user as location, environ or sleeping, running), walking, sitting, knowledge on the based offered of be data such can that services many the Data transfer among processors is carried out multiple-packet using carried FIFOS is among processors Data transfer A third processor, the Sensor Manager, is in charge of initializing, calibrat of initializing, Manager, processor, charge in Sensor is the A third ARM processors, two an executed in are tasks Control processing and platform SoC intended to support processing awide asensor is EOS S3 QuickLogic specifically focuses on sensor hubs for mobile devices, offer performed easily be thatcan tasks are these At sight, first may onethink FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: 2 C serial interfaces. Analog inputs connected to inputs connected interfaces. Analog C serial ------Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 for direct connection of integrated interchip sound (I sound of integrated interchip connection for direct voice always-listening supporting applications. include interfaces These low-speed peripherals. ing analog 12-bit available sigma-delta ADCs are or for monitoring battery connect Architectures in FPGA Processors Embedded EOS S3 block diagram. block S3 EOS FIGURE 3.15 blocks such as RAM, FIFO, RAM, as such blocks most (in complex and the devices) SPI and device families. ArcticLink and QuickLogic’s platforms hardware are supporting PolarPro The domains. fast implementation target the applicationing of specific new the products in of portfolio alarge (mostlyplatforms and parameterizable) blocks, IP allow devices, but ofhardware approach, configurable adesign use on the based of mobile afamily devices. in CSSP not is actually visualization and tivity hubs,of sensor support applications but other also it related to can connec be to added. accelerated,functionalities be user-defined and to processor FFE or the ARM either the executed in extended, algorithms the savings. out,energy carried providing significant are tasks nition case, voice voice, the is microphone actually recog is the when only this and from coming sound the block capable if is inputs.sound This of identifying low-level voice from ofdetector technology, commands detecting charge in CODECs), accelerator ahardware and on Sensory’s based low power sound voice for without need using the on-chip recognition PCM for high-accuracy (PCM) converter (which converts output the of low-cost PDM microphones to ­modulation (PDM) microphones, modulation PDM a hardware to pulse-code Given the importance of audioGiven mobile devices, importance in the includes EOS S3 resources PolarPro is a family of simple devices with a few specialized hardware hardware PolarPro of simple afew afamily specialized devices with is CSSPThe platform of predecessor implementation for EOS S3 was the the Finally, to be processor FFE of FPGAthe the fabric features allows the

S Mobile de A

ound sensors pr pplication oc

esso vi r ce UA Fi PDM AD lo RT Os SP I xe 2 gi RT S c. C I C d c Cor SR FIFOs PDM FP DM PC AR to AM EOS S3 GA fabric tex-M4 M

A M manager Low S ensor dete sound Flexible fu eng po ct sion wer or ine

I2C/SPI I2C 2 Mobile de Gyroscop He Pr H S) and pulse-density S) pulse-density and umidity ox ar Mag Acce Ambient ligh imity Te t rate mp netomete leromete e vi eratur ce s Pressure Ge Ga UV ensor stur e t r s r e s 87 - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 ent Windows Android, OSs, and as such Linux, Mobile. CSSP and EOS S3 have drivers available differ devices to with integrate the blocks, or security-related Finally, among others. processing, image both Proven System include data storage, Blocks. These network connection, available blocks CSSPIP soft the platform, fordevices supporting the called of portfolio alarge blocks, hardware is there addition specialized In to their hub: sensor and connectivity, visualization, application of CSSP:shows main domains possible solutions three for the target application.able device specific each depend on the 3.16 in Figure avail blocks of number functional and displays. the types The in sumption or reduce to blocks improve devices, conS3 processing and visualization available EOS to those in processors, similar manager 2.4.5) sensor and FFE Section mentioned interfaces in communication (in addition serial to the 88 I 2 C interfaces. ArcticLink is a family of specific-purpose FPGAs that includesthatFPGAs of specific-purpose afamily is C interfaces. ArcticLink • • • • tion to the host applications to the tion processor. available devices) EOS S3 to those in aSPI and interface for connec (similar processors Manager devices, Sensor include and FFE which 3S2 hubCSSP applications ArcticLink supports sensor through consumption).overall systems, of displays the responsible for these are 30%–60% in that savings energy (it shouldbe significant noted achieving brightness, data providedDPO statistical by VEE HD+ HD+ to uses adjust conditions. lighting different under perception image improving range, contrast, to optimized, color be images and in saturation reduce and power battery VEE HD+ allows dynamic consumption. oriented to improve devices are visualization image ArcticLink in HD+)Display (DPO Definition HD+) High and PowerOptimizer (VEE Engine Enhancement Visual Definition High blocks hard The output. MIPI and input LVDS and output, MIPI input RGB and output, or RGB input MIPI includes input LVDS devices with and family output, RGB (namely, MIPI, RGB, LVDS). and VX5 III ArcticLink the For instance, most widely display the bridges interfaces as bus used between ing serv blocks hardware include specialized family ArcticLink the interfaces. To some devices from interface interconnection, ease CPU bus display main of and lack between compatibility the is problems mobile devices in visualization most of typical the One offer asuitableor applications. ArcticLink) support to these (e.g., interfaces communication serial hard FPGAs with PolarPro 3E keyboards, headphone as devices such nal jacks, or even computers. exter and resources internal both with host of processor the nection con intended to facilitate the applications those are Connectivity FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: ------Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Embedded Processors in FPGA Architectures in FPGA Processors Embedded (a) Connectivity solution. (b) Visualization solution. (c) hub solution. solution. Sensor (b) solution. Visualization (a) Connectivity FIGURE 3.16 (b

Mobile Application ) (a) processor Mobile de PC SDI (c) devic I, displ pr Sy Communication O/SPI oc stem Mobile FPGA f Fix esso Application e manager processor vi ay ed logic ce , r ArcticLink 3S2 devic abric Incoming RG Fix I PLL 2 C ed logic e Pr interfaces interface Storage and high-sp pe V I/O blo oc SD/SDIO/ C fusion engine ide ripheral MMC/ SPI essor E-A Flexible manager o Sensor B TA eed FP Logic blocks ck FPGA f ArcticLink II Fi ArcticLink GA fabric xe s s d lo VEE abric gi 2 bus controller Mobile de frame buffer

I C c R I/O bloc OT interfaces interfaces Network USB 2. egisters Storage RAM de USB G PH vi ce Gyr Mobile Heart Pr Humidity 0 DPO vi ks Y o

ce oscope ximit Magnet Acceler Ambient ligh Tem rate device sensor y per Mobile de omete fl omete ID PC Pr atur Gestur as USB hub, SDI SD E, essure Ga h, AT UV I, UA Mobile Application NAND e O/SPI t Blueto processor r Ca s r Senso Displa e API, rd RT s vi

device ce , ot

rs y h 89 Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Customizable processors. Customizable FIGURE 3.17 90 hardware, versa. vice and some new require will anew most instruction set, likely because instruction to extend the ability to the strongly linked option latter is components. This ers, or MMUs), or user-defined of adding new registers possibility or the pliers, dividers, FPUs, DMA, GPIO, MAC controller, interrupt units, tim of predefined adding or components removing possibility as (such multi interface, etc.), communications external structure, the register buses, nal data memory and controllers, of number bits core of (instruction the inter for example, to acceleratenew instructions, functions. critical by set creating application, target to the instruction tics extend well the as as theto adaptprocessor of characteris sometheresources of its configure can set. Users giveninstruction and a abasic created from corebe configuration 2015). 2014; 3.17) (Figure processors solution such One customizable tions. are (Cadence the fordevelopment specific applica flexibility targeting SoCs of a certain non-FPGA-based also are solutions configurable designers There offering Processors Customizable 3.4.2 Resource configuration includes the parameterization of some features of someof features includesthe parameterization configuration Resource or multicore to single- Customizable processors custom allow processors

Pr Processor Exception controller Interrupt edefined handling controls R registers Timer egiste rs s FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: fe atur es Base register Instruction Functional execution Base AL Base ISA decoder pipeline units file User-defined features U Instruction controller controller memory memory Data Configur Cache Cache RO RA RO RA M M M M External interface able fe atur

es External memory and peripherals - - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Embedded Processors in FPGA Architectures in FPGA Processors Embedded buses widely used in FPSoCs are Avalon are FPSoCs of their because widely in CoreConnect buses and used (either processors ARM Cortex-A Other chips. or Cortex-M) their within for clearly QuickLogic) Altera, embedding are opting Microsemi, (Xilinx, vendors leading for the IP-based because design industry in standard tivity design integration and verification. and integration design technology- of companies, IP toward use the vendors foronly chip third-party but also of not case the in predictable seen atrend, behavior. is there to this, Due and optimized with systems modular in quite resulting straightforward, is Avalon (Nios-II). processor(s) its associated with bus Integration of astandard AMBA for (ARM instance, processors),tectures, (PowerPC), CoreConnect or archi processor certain with association in originated buses Most of these here: listed are ones most popular The standards. architecture bus have on-chip market SoC different the proposed in companies some leading whole of the To SoC. turn, core and, issues, in over address these years, the problems creates IP of related the to degraded This performance designs. to add glue problems. it logic cases, necessary such topatibility the is In ones),proprietary of com reuse, because and integration complicating their protocols (even interfaces communication and different core use designers to to reduced. be market Unfortunately, time and costs design both some IP of available form reusable is cores, the IP allowing in functionality such that platforms) FPSoC with when dealing (particularly ittime, very important is nals), speed, throughput, as such latency. and issues addressing At same the (either of information exchange data or control sig secure fast and ensuring buses communication is, that the infrastructure, communication on-chip the of design the is technology of SoC Therefore, one major of challenges the peripherals. their and processors between efficient communication an develop key factorOne to successfully to achieve embedded is systems Buses On-Chip 3.5 In the FPGA market, AMBA has become the de facto dominating connec FPGA the market, de AMBAIn facto become the dominating has • • • • • • • WishBone from OpenCores (open-source OpenCores WishBone from standard) (licensed) STMicroelectronics from STBus (licensed) Sonics from Backplane Silicon CoreFrame (licensed) PalmChip from tool core developers) and IP and royalty designers cost for chip but IBM available from (licensed, orCoreConnect at no licensing Avalon Altera from (open-source standard) (open standard) (AMBA)Advanced Bus Architecture Microcontroller ARM from independent standard buses in library components, ease library which in buses ­independent standard 91 - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 92 effect of slow modules in the communications of to limited. fast ones of be slow communications effect the modules in allows the structure (usuallydate This different) their frequencies. operating to accommo buses both address, between data, all fers control and signals APB into buf format and commands transfer AHB bridge translates that a made is through buses two these between APB connection bus. The an bus, AHB whereas low-bandwidthan through interconnected peripherals are through interconnected peripherals are high-bandwidth and processor The control. programmed under accessed are and registers low. aconsequence, also as memory-mapped peripherals use Usually these low-powerconnect or low-bandwidth complexity peripherals, being, their to optimized are buses memories, DMA devices. and on-chip Secondary processors, modules as such frequency clock high oftion high-performance, proper the interconnec bandwidth and system ensure of bus the formance (peripheral, bridges.secondary to it per APB) through The connected buses at with least one system (main, AHB) and bus architecture bus archical Peripheral Bus (APB) protocols (ARM 1999). by AMBA default 2uses ahier AMBAin 2specification, published 1999,and introduced AHB Advanced 3.5.1.1 atable included to is provide section, of the amore of general view AMBA. here,(AHB). but detail at in end the analyzed ones the are Therefore, these (AXI3, AXI4, AXI4-Lite, Bus Advanced and AXI4-Stream) High-performance Advanced are most eXtensible FPSoCs widely protocols in used Interface The of of protocols aset five specifications. included in different It consists cores. processor for bus ARM communication the as AMBA originated 3.5.1 3.5.1 Sections 3.5.4. in through detail in analyzed are buses four These processors. OpenCores and some Lattice in used also Wishbone is processors, Nios-II respectively. soft MicroBlaze the and with association SoC based on AHB and APB protocols. APB and AHB on based SoC FIGURE 3.18 The structure of a SoC based on this specification is shown in Figure in 3.18.Figure is shown specification on this based of aSoC structure The controller Memory AMBA AHB processor ARM

High-bandwidth system DMA FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: On-chip memory modules USB AHB AHB toAPB bridge APB Low-bandwidth peripherals UA GPIO RT Timer PWM - - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 AHB bus structure according to AMBA to 2specification. according structure bus AHB FIGURE 3.19 transfer through another MUX. In case there is an APB bus, an it is as there acts case In MUX. another through transfer slave involved the decoder selects the actually the while MUXs, ing in correspond slaves the to sent to be all are through signals master which decides arbiter The then control address and signals. corresponding the decoder. one only arbiter one is bus, and AHB there an In right slave signals. selection hierarchy,access arbitration of afixed by means protocol. (i.e., bus the adata transfer) at Therefore, atime. starts the it bus defines (e.g., to retry completed has master the that so of split case the transactions). in it, completed, in error itfer or was was successfully an if could there not if be of them can take over at take bus the atime. can of them (multimaster one master architecture), but may more onethere be only than bus, AHB an of operations), In write case control signals. required the and (in the data address to to accessed, transferred the by be be the generating basic four and blocks: master, AHB slave, AHB arbiter, AHB decoder. AHB and arbiter, with on multiplexed based a master–slave structure interconnections widthconfigurable data bus up to 128 in bits.3.19,Figure As shown has it a with split and operation,supports transactions, pipelined transfers, burst Architectures in FPGA Processors Embedded Operation is as follows: All masters willing to start a transfer generate atransfer to start follows: as willing Operation is masters All the generating Finally, for address decoding, decoder used AHB is the over takes master one only AHB arbiter responsible AHB to ensure is The AHB slaves react to read or write requests and notify the master if the trans the if master the slavesAHB notify and to react read requests or write aread operation, or write launch can that blocks only the are masters AHB high-bandwidth of modules, theAHB requirements orderIn to fulfill Ma Ma 2 1 ster ster

RD_DATA WR_D ADDR WR_D ADDR RD_D AT AT AT A A A De Arbite co der r SEL_3 SEL_2 SEL_1 WR_DATA WR_DATA WR_DATA RD_DATA RD_DATA RD_DATA ADD ADD ADD SEL_3 SEL_2 SEL_1 R R R Slave Slave Sl av 2 1 3 e 93 - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 94 Multilayer interconnect topology.Multilayer interconnect FIGURE 3.20 3.20. Figure more one in complex the For instance, than structures to build way, applications. of this different adapted In requirements to the it possible is out later. carried be can that so transfers pending to the corresponding control and signals addresses (oneity. matrix layer) per input The stages interconnection of the store the prior slave higher the the with arbiter has master decides associated which slave same to access request the the masters time, two transfer. at If same the slave the layer each with involved decoder determines associated The the in layer its AHB own has master each (i.e., layer). per one only master is there slave same want to the access time. at same the when several masters cases ity, to the limited are arbitration because tasks slave same the to several reduced masters, and of complex associating bility flexibility, provideshigher bandwidth,the possi increased This established. slaves and masters to be between allows multiplethat connections parallel matrix interconnection on an scheme, based ­multilayer interconnection AHB AMBA 3specification 2004a),(ARM in the2003, introduces published 3.5.1.2 to it. connected slave, AHB APB bus, an an slow is there if peripherals would be nected as slaves. AHB APB bridges usually con are be Although any peripheral can memory controllers, memories, and external too. a master On-chip usually APB for slaves. the decoding a slave bridge, corresponding of provides the which level asecond of The number of input and output ports of the interconnection matrix can be be can matrix of number input output interconnection The and of the ports 3.20, Figure shown in is where simplest The multilayer structure AHB DMA a master; controller AHB, is is the processor using the FPSoCs In Master Master 2 1 utlyr AHB Multilayer Layer 2 Layer 1 Parallel accesspaths masters andslaves between multiple Interconne matr FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: ix ct

Slave Slave Slave 2 3 1 interconnect matrix stage2 stage1 Input Input Multilayer Decoder Decoder Arbiter Arbiter Arbiter - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 ARM Cortex-M3 core and peripherals in SmartFusion2 devices. SmartFusion2 in peripherals and core Cortex-M3 ARM FIGURE 3.21 namely, AXI3, AXI4, AXI4 and are AXI3 AXI4-Lite, Both AXI4-Stream. and implementation eitherSoC ASICs in or FPGAs. suitable for result, provides architecture interface AXI and acommunication vendorsdesigners—FPGA among them) cooperate its development. in a As (including OEM, companies some 35 that leading is success EDA, chip and tiple clock domains). (i.e., systems peripherals, for well as as multifrequency mul with systems provides high-frequency avery efficientwith solution for communicating AXI4 (ARM AMBA in 2011). in 4, resulting additional functionalities AXI or, more precisely, (ARM AXI3 was provided 2004b). with architecture The namely, AMBAarchitecture, a new in 3specification introduced AXI ARM AXI 3.5.1.3 (APB_0 APB_1). and multilayer. is matrix bus AHB The bridge to AHB AHB APB two bridges and an slaves, through connected slaves (MM), masters (MS), 7direct of number secondary alarge and 10 Cortex-M3 in ARM includes an of peripherals a organized set core and 2013). (Microsemi family SoC 3.21, Figure SmartFusion2 shown in As it low-bandwidth slaves.combine to for instance, useful, is one. a single as This them treats matrix nection (connected to just one layer), of or group slaves aset intercon the that so possible layer, to same have the in several masters it is slaves define local Architectures in FPGA Processors Embedded AMBA AMBAthe 3and different versions 4 defineprotocol, of four busing. Aproof for ade of on-chip facto its standard currently is AXI AHB/APB uses example that An of FPSoC Microsemi the is buses ARM Cortex-M3 subsystem (MSS) Microcontroller FIC_ FIC_ FIC_

2 1 0 MM5 AHB bus MS4 MM4 matrix MM0 toMM2 controller Cache MM6 EMA C controller System MM8 MM9 USB OT G SRAM MS5 MS0 toMS APB_ NV 3 AHB M 0 to Pe AHBBridg MSS DDR ripher Bridge MS6 al s e HPDMA APB_ MM3 1 PDMA MM7 95 - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 tions, the master sends data through the write data channel, and the slave the and data write channel, the data through sends master the tions, opera write read up process.In the one, speeding thus current the pleting way, com while data new for transaction the preparing slave the start can this In read slave address before the transaction. completed has current the overlappingincludes an may master anew send feature, the read so burst thecompleted.read that been operation has master theThe protocol notifies read response The read data the channel. through areaddata response and read operations, In slave the both master the sends address channels. write AXI. in 3.23 Figure shows transactions write read and channel. response write and data write channel, address channel, write data channel, read channel, read address channels: slave. five interface has different This to a amaster a slave component connect or to interconnect directly to the cost DMA systems. implementation the it that of is eases low- feature implication of this direct A signals. aslave out, and amaster handshake between to carried be using data transfers allows bidirectional simultaneous, This pendent channels. read where inde data, control datature, address and signals, write use and 96 * to slaves.masters components to connect interconnect are there and by masters launched are master–slavetransfers configurations, where data use both that in of AHB applications, addressing. where data streaming not does access speed require intended to support low-performance is and high- peripherals. AXI4-Stream very reduced version of AXI4, intended to support to access control registers very robust, memory-mapped high-performance, solutions.* AXI4-Lite a is protocol. AXI of the Architecture FIGURE 3.22

within a memory space (map). space amemory within address acertain accesses transfer data each where those to refer protocols Memory-mapped Address and control information is sent through either the read or either the sent through is controlAddress and information master or a interface either to connect connection a defines single AXI The main difference is that AXI uses a point-to-point channel architec apoint-to-point uses channel AXI that is difference main The to that similar conceptually is 3.22, Figure architecture shown in AXI As

FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: Ma Ma 2 1 ster ster Interface Interconne comp onen ct t

Interface Sl Sl Sl av av av 3 2 1 e e e - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Read (a) and write (b) transactions in AXI protocol. AXI in (a)Read (b) transactions write and FIGURE 3.23 same time as address channels are simplified. are address channels as time same way, at parallel the for interface.bus this in each In transferred be data can independent data an having while interfaces among different address bus for bandwidth addresses, ithigher for possible the data is to than share much requiring multiple systems in For data buses. address and instance, multiple and address buses multilayer, data and buses, shared buses, with data address and shared several are ones modes. most in usual The ured component config be ferent application can interconnect the requirements, are seen as a single master or slave. master asingle as seen are addition,In it or of slaves allows masters aset together, to grouped be they so one or to more one or masters more slaves. of connecting interface, charge in AHB. It one AMBA in acomponent is more with than matrix connection the transfer. the to provided be needs burst address of to the start starting AXI4. the in Only up and bursts,to up 256 AXI3 on variable-length to 16 based in are transfers from 8 to configurable AXI in 1024 (exceptAXI4-Lite)transfers databits. All the one. are completionfieswidths write and current data the Read bus of slave before the anew transfer noti start can master the so buffered, data are Write channel. response write the acompletion through with replies signal Architectures in FPGA Processors Embedded In order to adapt the balance between performance and complexity and to dif performance orderIn to between adapt balance the The interconnect component in Figure 3.22 is more versatile than the inter the 3.22 component more Figure is versatile than in interconnect The (b (a) interface interface )

Ma Ma ster ster and contro and contro A A ddress ddress Wr Read da da l l ta ta it e Resp A A Wr Read da da ddress channe ddress channe Da Da ite ta ta onse channe ta c ta c hanne hanne Wr Read da da ta ta it l l e l l l Wr Read da da ta ta it e re Wr sp onse ite interface interface Sl Sl av av e e 97 - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 98 IP cores includes a large number of peripherals widely used in SoC of cores number includes peripherals widely in IP alarge used 2010; Dao and 2013; 2015a). Singh Xilinx of portfolio AXI-compliant The (Sundaramoorthy al. Zynq-7000 et and SoC Programmable All 7 series, Virtex-6, UltraScale, Spartan-6, its FPGA cores in families IP for the tasks. programming forneed knowledge deep of FPGA technology, on concentrating but mainly solution developers it that is enables software to implement without SoCs the parameters. tosome define configuration just needs usually designer ated the and automatically logic is gener interconnect adesign, the IPs in AXI-based most when cases, systems. including In modular to create highly nected con on easily it, be of based blocks IP can include which variety alarge exclusive support. error accesses, and operations to enable accesses, semaphore-type cache, secure and privileged system types, burst different downsizing, and data upsizing data transfers, of configurability.unaligned degree as includes such features, manyIt other of AXI, but acomplex it really is high protocol and of its because versatility features some most of significant the We here to highlight just intending are As an example, Xilinx adopted AXI as a communication interface adopted acommunication as AXI example, an As Xilinx of this use the from extracted be conclusion can most that important The Today, of interface, ­ and type this use vast of majority FPSoCs the follows: as are of AXI features interesting Other • • • order of completion. agiven not requiring to tags pleted those ID on order different and to com be need that transactions tag to ID all same the ter assigns by mas tags. ID AXI The supported in are Out-of-order transactions latencies).peripherals may generate different data with different data out totheir send of orderfeature (sometage of this complex reduced. advanperipherals is Complex take also peripherals can way, this In negative the caused by slow influence of times dead application). of agiven the order in a requirement is transactions to completed be completing (unless latter the the before attending afast peripheral,one with it not to does wait need former for the aslow with peripheral later and another atransaction starts master It supports out-of-order completion. a if For transaction instance, implementation systems. the of multifrequency simplifying operate frequency, can at master–slaveEach pair adifferent thus one direction. only in information send independent and of are other each channels all feasible is because of number stages. on the This achieved depending throughput/latency be different so can trade-offs channels, all in It ARM’sstages supports (registerpipeline in slices terminology) FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: vendors - - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 data checking. and whole transforming, routing, provide as They such FPSoC. features the available also to assembling help are IP in Infrastructure as known ­controllers, video controllers, PCIe. and addition, In of resources aset UARTs, processors, timers, as design, such memory controllers, Ethernet Architectures in FPGA Processors Embedded Block diagram of the Xilinx’s AXI Interconnect IP core. IP Interconnect AXI Xilinx’s of the diagram Block FIGURE 3.24 Examples of such blocks are as follows:Examples as of are blocks such • • • • • • • • Ma Ma 2 1 ster ster AXI transactions. AXI Performance Monitors ProtocolAXI debug and to and Checkers, test delay. path intended to reduce is critical stages. most this pipeline cases, In to aslave amaster Slice, of Register aset AXI to connect through channels). write (it read and affects Data FIFO,AXI to aslave amaster FIFO buffers to connect through Lite or AXI4 to AXI3). protocol to aslave­master adifferent (e.g., uses that AXI4 to AXI4- ProtocolAXI Converter, AXI3, an AXI4, to connect or AXI4-Lite domains. clock ­different Converter, Clock AXI slaves and masters in operating to connect slave data widths. different use Data WidthAXI Converter, the to when and resize master memory-mapped Crossbar, AXI AXI to peripherals. connect the implementationofFPSoCs. Figure 3.25,whichhighlightstheversatilityandpowerof AXI for The AXI Interconnect IP core supportstheusemodelsshownin interconnect component, but it can be configured in multiple ways. 3.24): of cores (Figure IP a set combining component by interconnect the with associated tasks the It performs IP, Interconnect AXI memory-mapped slaves. to and connect masters As commentedearlier, AXI doesnotdefinethestructure ofthe

Register slices Sla ve interfac Clock converter

Data width converter e

Data FIFO Crossbar matr ix

Register slices Ma

Protocol converter ster interfac

Clock converter

Data width converter e

Data FIFO Sl Sl Sl av 1 av av 3 2 e e e 99 Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 100 simultaneous data transfers among multiple data transfers When multiple master–slavesimultaneous pairs. Alterain a slightly be modified found versioncan (2003). dates specification back to Nios-IIand2002, processor. the soft original The Avalon solution the is provided by on Altera based design to support FPSoC 3.5.2 Table in marized 3.2. sum are features of AMBA, variations main different their the regarding order In forinformation readers to have most to significant access the easy models. use core IP Interconnect AXI Xilinx’s FIGURE 3.25 Master Master Avalon basically defines a master–slave structure with arbiter,withAvalon supports master–slave a defines basically which structure Ma Ma Ma WR_A WR_A RD_A RD_A Multiplemasterdevicesarbitrate AXI3sla AXI4-Litesla Datawidthcon foraccesstoasingleslavedevice. 1 2 1 ster ster ster Avalon 2 1

ve adaptation 1- N arbiter arbiter Write R -t N-to-M interconnect: to-1 con ead o-1 interconne Con ve adaptation Arbiter version Datatransferscan Sparedatacrossbarconne Single- Share version FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: version d-A R R out out ct thre er er ddres aded s, M Sl Sl writeandreadaddressarbitration(lef RD_A WR_A Sla RD_A WR_A Sla av 1 av 1 ultiple-Data (SA oc ve ve e e 2 1 cur indep ctivity (rig Master Master WR_D WR_D RD_D RD_D Ma Ma endently andconcurrently 2 1 1 1 MD) top ster ster multipleslavedevice. Singlemasterdeviceaccesses channelFIFO Pipe Cl ht ) oc lining (regis k rateco N-to-M interconnect: olo 1- 1- to-1 conversio to gy Write data Read data - Pipe De crossbar crossbar N Arbiter Arbiter Arbiter Arbiter : router interconne co nv ) lining der/ ersion ter slic t) ct n es orda Sl Sl Sl ta RD_D WR_D Slav RD_D WR_D Slav av av 1 2 av 1

e 2 e 1 e e e - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Embedded Processors in FPGA Architectures in FPGA Processors Embedded TABLE 3.2 Specifications and Protocols of the AMBA Communication Bus Year Spec. Protocol Aim and Features

1999 AMBA 2 AHB Supports high-bandwidth system modules Main system bus in microcontroller usage Some features are • 32-bit address width and 8- to 128-bit data width • Single shared address bus and separate read and write data buses • Default hierarchical bus topology support • Supports multiple bus masters • Burst transfers • Split transactions • Pipelined operation (fixed pipeline between address/control and data phases) • Single-cycle bus master handover • Single-clock edge operation • Non-tri-state implementation • Single frequency system APB Simple, low-power interface to support low-bandwidth peripherals Some features are • Local secondary bus encapsulated as a single AHB slave device • 32-bit address width and 32-bit data width • Simple interface • Latched address and control • Minimal gate count for peripherals • Burst transfers not supported • Unpipelined • All signal transitions are only related to the rising edge of the clock ASB Obsolete (Continued) 101 Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 102

TABLE 3.2 (Continued ) Specifications and Protocols of the AMBA Communication Bus FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: Year Spec. Protocol Aim and Features

2003 AMBA 3 AXI Intended for high-performance memory-mapped requirements (AXI3) Key features: • 32-bit address width and 8- to 1024-bit data width • Five separate channels: read address, write address, read data, write data, and write response • Default bus matrix topology support • Simultaneous read and write transactions • Support for unaligned data transfers using byte strobes • Burst-based transactions with only start address issued • Fixed-burst mode for memory-mapped I/O peripherals • Ability to issue multiple outstanding addresses • Out-of-order transaction completion • Pipelined interconnect for high-speed operation • Register slices can be applied across any channel AHB- The main differences with regard to AHB are that it does not support multiple bus masters and extends data width Lite up to 1024 bits APB Includes two new features with regard to AMBA 2 specification, namely, wait states and error reporting ATB Advanced Trace Bus: adds a data diagnostic interface to the AMBA specification for debugging purposes (Continued) Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Embedded Processors in FPGA Architectures in FPGA Processors Embedded

TABLE 3.2 (Continued ) Specifications and Protocols of the AMBA Communication Bus

Year Spec. Protocol Aim and Features

2011 AMBA 4 ACE AXI Coherency Extensions: extends the AXI4 protocol and provides support for hardware-coherent caches. Enables correctness to be maintained when sharing data across caches ACE- Small subset of ACE signals Lite AXI4 The main difference with regard to AXI3 is that it allows up to 256 beats of data per burst instead of just 16 It supports Quality of Service signaling AXI4- A subset of AXI4 intended for simple, low-throughput memory-mapped communications Lite Key features: • Burst length of one for all transactions • 32- or 64-bit data bus • Exclusive accesses not supported AXI4- Intended for high-speed data streaming Stream Designed for unidirectional data transfers from master to slave, greatly reducing routing Key features: • Supports single- and multiple data streams using the same set of shared wires • Supports multiple data widths within the same interconnect APB Includes two new functionalities with regard to AMBA 3 specification, namely, transaction protection and sparse data transfer 2013 AMBA 5 CHI Coherent Hub Interface: it defines the interconnection interface for fully coherent processors and dynamic memory controllers. Used in networks and serves 103 Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 104 after that are capable are that after cycle of (such anew the generating clock one each in as latency to generatebut datum, first the initial an require peripherals that synchronous when accessing one allows bandwidth usage to optimized be second The for DMA transfers. interesting particularly is which transfers, slave successive to perform streaming and data master streaming between transactions to one supporteases first peripherals. high-bandwidth The aware peripherals modes (included Avalon the in Bus Module), oriented andlatency- peripherals Avalonnal includes streaming also specification origi transfer, bus per the datum asingle exchanges master–slaveeach pair data or from to disable outputs. addresses cycle bus to distinguish each ing is forno there need decod simplified, peripheralsbecause is of design the the 8-, supports specification 16-,and 32-bit data. Avalon original The to connected. data be widths different peripherals with generatedof interrupts by slave to allow sizing bus peripherals, or dynamic cycle, clock asingle identificationand prioritization within vide responses tion, wait-state generation to accommodate pro slow cannot peripherals that for peripheral includes selec address decoding Its functionality FPSoC. the up build peripherals and the arbitrationwell to connect as logic, required Module. Bus Avalon an through of peripherals connected aset including a sample FPSoC completed. eventually are 3.26 Figure shows of blocktransactions the diagram requested all to ensure required generates and controlpriority signals the slave, same want to the access masters arbitrationthe the access logic defines Altera’s on based Avalon FPSoC bus. Sample FIGURE 3.26 Although it is mainly oriented to memory-mapped whereAlthough it mainly connections, is Avalon way, for separate address, this ports In uses data, control and signals. AvalonThe Bus Module address, data, includes all as control and signals,

FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: Instruction Instruction memory master Sla Nios processor ve Read datasignal Avalon Write data b us memory master Data Sla Data and control ve s signals P P eripher eripher Slav Slave e al al 1 2 - - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4

Avalon current The specification (Altera 2015a) interfaces:different seven defines application evolved Avalon domains, Nios-II it. to cope the with architecture and peripheral, read move operationthe the then later. resume and tasks, to other execute read a ofmode,request can to digital filters). master thecase this In Architectures in FPGA Processors Embedded Typical read and write transfers of the Avalon-MM of interface. the transfers write Typical and read FIGURE 3.27 As the demand for higher bandwidth and throughput was growing in many many in bandwidth for throughput demand was higher growing and the As re 2. 1. ad_data_vali

wait_re point-to-point transfers. The simplest The point-to-point version transfers. stream supports single high-bandwidth, low-latency,peripherals performing unidirectional Avalon Interface (Avalon-ST, Streaming 3.28), Figure oriented to for transfers. burst control signals data bus, the valid “read_data_valid” data in are 3.27), Figure in or there that master the peripherals to notify pipelined (typical in nals (“wait_request” request to the 3.27), Figure response in data valid sig slave provide the master immediate the cannot an to notify ­signals systems, wait multimaster in modes, arbitration as such signals in advanced required signals specific Figure 3.27. also in are There “read,” are transfer. Examples of signals such “write,” or “response” (read or write), completion, end, start, successful or of data error each direction to indicate the signals vides generic control handshake and maximum data width increases from 32 from to 1024 data width increases maximum bits. specification, capabilities. or burst original Withing to the regard much and more complextransfers ones, for example, pipelin with cyclesread write perform to or of afixed bus number requiring als simple operation both peripher modesvides different supporting of memory-mappedconnection master–slave peripherals. It pro Avalon Memory Mapped Interface (Avalon-MM), oriented to the writ re re Like AMBA memory-mapped other many Like and Avalon buses, pro ad_dat addres sp e_dat

quest writ ons re cl ad d k a a e e s A ddres sA Resp Da ons ta e ddress Da ta - - - - - 105 Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 106 Avalon-ST interface signals. FIGURE 3.28 4. 3. 5.

them to the corresponding interrupt receivers interrupt (masters). corresponding to the them generatedrupts senders (slave by interrupt peripherals) notify and inter Avalon of managing charge Interface, in Interrupt is which give control right peripheral bus to at the any moment. control arbitration multiplexed and and logic toall identify signals Avalon-TC signals. on tri-state based is terminals includes shared the to access case, the this In required. of number terminals the mizing multiplexing widely to multiple access devices is mini used external FPGA of the chip. Signal terminals the in data or buses, control signals address or as such resources devices sharing of for controllers external Avalon Tri-State Conduit Interface (Avalon-TC), design oriented to the domain. clock same the within and width, same of the of signals, type same the use they if connected (off-chip) external with interfaces devices. Several conduits be can to design of Avalon used types mainly other are interface. These output, to created when be do they any or not bidirectional) fitin Avalon Conduit (input, signals Interface, allows data which transfer packets. of variable-length case packet, the the in in symbols empty “empty” signal packet. of cycles valid bus The the last and identifies “startofpacket” “endofpacket” and transfers, based first the identify full. is FIFOwhen at sink the the to preventactive). data technique loss, ausual for is example, This ready is to (the accept “ready” them when this signal sink to the is data send only interface can source case, the implemented. this In (e.g., bit bit 0and 1may flag CRCand overflowerrors,respectively). data transfer the considered conditions error in the a bit stating mask “error” and “channel” channel, of is number the the signal indicates “valid”data if only active is (i.e., “data”). valid data in are there The and, optionally, “channel” “error.” and interface samples sink The “data” signals the of data, requires only which “valid” and to used be Finally, Avalon-ST packet- In packet transfers. and supports burst Avalon-ST to be backpressure supporting allows interfaces also FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: star endof S ource to channel fpa pa empty erro vali da cke cke ta d r t t Av alon- ST re ady Sink - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 nect a large number of Actually, cores available number IP alarge its libraries. nect design in logic, address decoder width bus matching logic, arbitration logic) to con fabricgenerates (address/data suitable the interconnect connections, bus system tool integration vides (Altera the Qsys 2015b), automatically which faces, as shown in Figure 3.29.faces, Figure shown in as of inter type and any may number use FPSoC the componentsingle within interface. same Actually, of the or interfaces multipledifferent a instances Nios-II Avalon on and the processor based FPSoC An may include multiple Architectures in FPGA Processors Embedded Sample FPSoC using different Altera’s Avalon different interfaces. using FPSoC Sample FIGURE 3.29 To verification and design the of ease AlteraFPSoCs,Avalon-based pro 6. 7. chronization reference. chronization syn as acting source a clock Interrupt, with or Reset) associated are aperipheral interfaces (MM, may ST, use synchronous Conduit, TC, put (clock source), of PLLs). case (for the or both in instance, other All peripheral. Aperipheral may have input (clock clock sink), out clock Avalon signal(s)clock the Interface, Clock defines which by a used orstate. peripheral,safe it forcing to auser-defined interface logicAvalon of an internal the Interface, Reset resets which

M, ma S, sl Us av Nio er lo e ster M S s- gi II c Av Csr Csnk Cn, Av Fl Cn alon- as c, h DM cl , cl M TC alon ConduitInterfac oc Bridge TC oc Fl TC A ct Cn k source(A as S k sink(A rl M S h . FP SR Cn PC GA AM valon Cl M valon Cl Ie SR TC ct S AM rl S . oc Cloc e oc k Interface) Us k Interface) k er lo M

CsnkPL Av gi SP S L c alon-MM I Csrc - - 107 - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 108 Sample FPSoC using CoreConnect bus architecture. bus CoreConnect using FPSoC Sample FIGURE 3.30 bridges. through levels,performance interconnected PLBbuses, two AMBA as Same OPB, uses 2, and CoreConnect different with AMBA 3.18) with 2(Figure similarities 3.30, where may structural noticed. be Figure shown in is architecture bus CoreConnect of the blockThe diagram 2000): Lee and 1999; (IBM Bergamaschi memory-mapped levels performance or DMA peripherals of different (soft)MicroBlaze PowerPC and (hard) embedded processors. theuses it for Xilinx because here brieflyanalyzed is CoreConnect tions, supremacy FPGA to of the devices points AMBA-basedmost current solu the cores in ARM to use trend strong 1990s. the current in Although the by proposed IBM architecture interconnection on-chip an is CoreConnect 3.5.3 (Alterabuses 2013). automatically generates different components bridges using to connect Avalon both of using design systems the and AXI and eases also Qsys CoreConnect consists of three different buses, intended buses, to accommodate different of three consists CoreConnect 2. 3. 1. CoreConnect cessor and mainly used to initialize them. to initialize used mainly and cessor the frompro peripherals different the of the controlregisters figure ControlDevice (DCR), Register to con oriented to provide achannel in bandwidthPLB. reduce peripherals and traffic PeripheralOn-Chip Bus (OPB), low- to connect bus a secondary DMA controllers). memories peripherals (such or high-bandwidth on-chip as connect Bus (PLB), Local Processor and processor asystem the to serve bus Pr Arbiter oc essors andhigh-bandwidt Co- pr pr oc Main pe oc esso ripheral esso r FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: r s Pe Memor ripheral PL h y B PLB toOPB bridge DC R OPB Pe Pe Low-bandwidt ripheral ripheral 1 3 Pe Pe h pe ripheral ripheral 4 ripheral 2 Arbite s r - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 capabilities: topologies, multimaster some with of interconnection them different four nals) INTERCON and interconnections). the It (the supports one containing basic blocks, namely, SYSCON sig reset and of clock generating (in charge bridges. for withoutinterfaces need using the (i.e., low-speed, high-latency) separate two peripherals may Wishbone use (i.e., high-performance both low-latency) high-speed, low-performance and to connections bus. requiring high-speed Systems a type, onedefines just bus it them, but,far, unlike on a master–slave architecture, based Wishbone is so described interfaces other 2002 the by since Like OpenCores. maintained Wishbone) interface developed acommunication is 1999 by in Silicore and for PortableWishbone Interconnection Cores IP (usually to just as referred 3.5.4 LMB, bus proprietary optionally, implement can they OPB. Xilinx and 2013.1 AMBA interfaces 4(AXI4 interconnection ACE) and on) main as use fication purposes. for well as peripherals as system veri the DCR for and initializing transfers interface. communication Specifically, as used for used datais are PLB buses 2010b). Virtex-5 included devices in (Xilinx are sors CoreConnect cases, all In 2010a),PowerPC (Xilinx processors 405 hard whereas PowerPC proces 440 to/fromtransfers OPB masters. DMA and themselves), transfers, (same buses burst ing the as transfers, line OPB as PLB and fore masters acting slaves. siz Bridges bus support dynamic error reporting, among others. reporting, error serviced),being atomic master-driven operation,or slave split transactions, (allowing a new read torequest or overlapped be write one current the with transfers, address pipelining line transfers, fixed- burst or variable-length atone master atime. arbiter, with over where control taken bus structure by is amultimaster port sup also They transfers. bidirectional enables simultaneous data.write This Architectures in FPGA Processors Embedded The general Wishbone architecture is shown in Figure 3.31. Figure shown in is It includes general Wishbone two architecture The (from most processor versionsAlthough recent the soft MicroBlaze of the include embedded Virtex-II Virtex-4 families Pro and Former Xilinx PLB-to-OPB bridges PLB allow to OPB access masters peripherals, there safety, and to improve speed as such transfer PLB includes functionalities for addresses, read PLB data,Both OPB and and independent channels use • • • WishBone slaves, place to take at atime. but allows transaction only one single one or or two more with more masters bus, connects which Shared aslave interface and amaster stage pipeline has each interface. Data flow, topology,implementthis to In used systems. pipelined slave. to asingle master Point asingle to point, connects which 109 ------Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 110 parity or error correction bits, vectors, or correction error interrupt parity or control cache operations. data as transfers, such information to identify helpful especially are They address bus, to a data an bus, or for cycle. a bus information appending used the form of in ones “tags”user-defined (TAGN in 3.31).Figure may These be slowest of the at speed the run cycles Wishbone interface). bus rate to data adjusted be transfer for allowing every cyclemation bus (all and [ERR], [RTY], of infor cycle [CYC]) and retry transmission correct ensuring (selection [SEL], signals handshake [STB], strobe acknowledge [ACK], error (ADR, 64-bit) data (DAT, and 8-/16-/32- or 64-bit) of aset well as as buses, theyhave so Wishbone specification, the user defined. be to in slaves. the accesses when master and each However, not arbiters defined are arbitration to define topologies switch how require crossbar and bus Shared topologies of Wishbone connection and interfaces. architecture General FIGURE 3.31 In addition to the signals defined in specification,Wishbone its defined supports additionIn signals to the 3.31,According to Figure have Wishbone interfaces independent address • Ma Ma tion channels. tion to or two moreously slaves; connected is, that it several connec has allows or two to more simultane which switch, be masters Crossbar 1 2 ster ster Master

TA TA ADDR AC DA DA CY CLK RS ST SEL WE GN GN Shar C K T T T B bu INTERCO s FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: ed SYSCON defined User N Sl Sla Sl av av ve 2 e 3 e1 RST TAGN TAGN CYC ACK STB SEL WE DAT DAT ADDR CLK Slave Ma Ma 2 1 ster ster Master 1 Crossbar swit P oint topoint ch Slave 1 Sl Sl Sl av av av e 2 e 3 e 1 - - - Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 respectively. 3.2.1 Sections 3.2.2, and in processors, described soft OpenRISC1200 Lattice’s in used Wishbone is LM8 LM32, and OpenCores’ in well as as Cadence. 2014. datasheet. processor 11 Tensilica Xtensa customizable In cores. using systems-on-chip W.R. Lee, R.A. and Designing 2000. Bergamaschi, data circuit integrated levelsystem AT94KAtmel. 2002. field programmable series (rev. manual reference 2012.ARM. r4p1). Cortex-A9 technical MPCore ID091612. 2011.ARM. 0022D. IHI ACEdatasheet. and AMBA AXI protocol specification DDI 0413D. manual. reference 2008. Cortex-M1ARM. technical 2004b.(v1.0) protocol specification AMBAARM. AXI 0022B. IHI datasheet. DVI 0045B. datasheet. overview Multilayer AHB 2004a. ARM. 1999.ARM. 0011A. 2.0)IHI AMBA (rev specification datasheet. Altera. 2016b. 10 A10-DATASHEET. sheet. data device Arria Available manual. reference Altera. 2016a. technical system 10 processor hard Arria Altera. 2015d. 10 S10-OVERVIEW. sheet. data overview Stratix device NII5V1 2015.04.02. guide. Altera. reference 2015c. processor Nios classic II QPS5V1 2015.05.04. handbook. edition standard prime Altera. 2015b. Quartus Altera. 2015a.MNLAVABUSREF Avalon specifications. interface 2015.03.04. Altera. 2013. Available Avalon Altera Qsys. and AMBA AXI using at: Interoperation MNL-AVABUSREF-1.2. manual. Altera. 2003.reference Avalon specification bus DS-EXCARM-2.0. sheet. data overview device Altera. Excalibur 2002. References Architectures in FPGA Processors Embedded Wishbone supports three basic data transfer modes: basic data transfer Wishbone supports three 2. 3. 1. Angeles, CA. Angeles, 37th the of Proceedings (DAC Conference Automation Design 2000) 1138F-FPSLI-06/02.sheet. 20, 2016. November Accessed https://www.altera.com/en_US/pdfs/literature/hb/arria-10/a10_5v4.pdf. at: 20, 2016.https://www.youtube.com/watch?v=LdD2B1x-5vo. November Accessed given moment. phores available is to indicate whether agiven resource or not at a sema using resources share processes software where different multiprocessor or multitask in systems mode used is transfer This halves cycle. of both 3.31) the (Figure during signal asserted remains CYC_O The half. second the during performed is datawrite transfer performed, whereas a cycle, is of the read data asingle half transfer first the cycle. bus same During agiven the memory locationin in Read–modify–write, written allows read data and which to both be Block read/write, transfers. burst in used read/write,Single transfers. single-data in used . June 5–9, Los - 111 Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Tendler, J.M., J.S., Dodson, Fields Jr., J.S., Le, H., Sinharoy, B. and POWER4 2002. sys 2015.Synopsys. datasheet. DesignWare processor ARC HS34 Sundaramoorthy, N., Rao, T. N., Hill, and 2010. paves way the to AXI4 interconnect Stallings, W. 2016.ComputerOrganization andArchitecture. Designingfor Perfor­ AXI4 based Xilinx V. using Singh, performance Dao, system and K. 2013. Maximize Sharma M. 2014. CoreSight SoC enabling efficient design of custom debug and trace Kurisu, W. 2015. Addressing design challenges in heterogeneous multicore embed multicore heterogeneous W.Kurisu, in 2015. challenges design Addressing Kenny, Watt, and R. J. tri-gate advantage 2016. with for FPGAs breakthrough The Kalray. 2014. MPPA Available ManyCore. at: http://www.kalrayinc.com/IMG/pdf/ Jeffers, J. J. Reinders, and 2015. Parallelism High Performance Pearls. Multicore Many- and IBM. 1999. architecture. bus CoreConnect™ The 112 Pavlo, A. 2015. Emerging hardware trends in large-scale transaction processing. processing. transaction Pavlo, large-scale in A. 2015. trends hardware Emerging 2011.(v0.11).OpenCores. specification 1200 core IP OpenRISC Available 2010. at:NVIDIA. http://www. Tegra architecture. NVIDIA multi-processor J.Nickolls, Dally, and W.J. 2010. Micro era. GPU IEEE computing The Moyer, B. 2013. World Real Systems Multicore Embedded Microsemi. 2016.SmartFusion2 system-on-chipFPGAsproduct brief. Available at: guide. user subsystem 2013. microcontroller Microsemi. SmartFusion2 manual. 2014. reference Lattice. processor LatticeMico8 manual. reference 2012. processor Lattice. LatticeMico32 guide. reference system to LatticeMico32 port 2008. Linux Lattice. Shalf, J.,Shalf, Bashor, J., D., Patterson, K., Asanovic, Yelick, K., Keutzer, K., and brief. platform SoC processing 2015. sensor S3 EOS QuickLogic. QuickLogic 2010. enables product platform- approach QuickLogic. standard Customer specific 2001. sheet. data QL901MQuickLogic. QuickMIPS plug-and-play IP. paper white WP379. Xilinx 10th edn.PearsonEducation,UK. IBM Journal of Research and Development and Research of Journal IBM . tem paper white WP417. Xilinx interconnects. building_debug_and_trace_multicore_soc.pdf. Accessed November20,2016. an ARM SoC. ARM WhitePaper. Available at:https://www.arm.com/files/pdf/ subsystems forcomplexSoCs.Keystepstocreate adebugandtracesolutionfor revolution-will-hpc-lead-or-follow/. tri-gate-technology.pdf. Accessed November 23, Accessed tri-gate-technology.pdf. 2016. content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01201-fpga- Paper WP-01201-1.4.technology. White Available at: https://www.altera.com/ FLYER_MPPA_MANYCORE.pdf. November 20, Accessed 2016. ProgrammingCore Approaches v1.1.pdf. November 20, Accessed 2016. nvidia.com/docs/io/90715/tegra_multiprocessor_architecture_white_paper_final_ Elsevier–Newnes. documentation. Accessed November20,2016. http://www.microsemi.com/products/fpga-soc/soc-fpga/smartfusion2#­ paper systems. white ded TECH12350-w. FOLLOW? Available at: ​ MANYCORE revolution: LEAD HPC or Will T.Mattson, 2009. The Datasheet. paper (rev. White design. based F). Internet Computing FPGAs: Fundamentals, Advanced Features, and Applications and Features, Advanced Fundamentals, FPGAs: , 19:68–71. , http://cs.lbl.gov/news-media/news/2009/the- . Elsevier. : A Practical Approach. : APractical , 30:56–69. , , 46(1):5–25. ​manycore- mance, IEEE IEEE - - ​ Downloaded By: 10.3.98.104 At: 00:06 02 Oct 2021; For: 9781315162133, chapter3, 10.1201/9781315162133-4 Xilinx. 2016b. Zynq UltraScale+ MPSoC overview data sheet DS891 2016b. sheet data (v1.1). overview MPSoC Xilinx. Zynq UltraScale+ UG984. guide 2016a. reference processor MicroBlaze Xilinx. generation consecutive TSMC for 7nm fourth on with collaborates 2015b. Xilinx Xilinx. 2015a. UG1037. guide reference VivadoXilinx. suite—AXI design UG585 manual reference 2014. technical SoC programmable Xilinx. Zynq-7000 all 2011b.Xilinx. FPGAs: X Virtex-IIplatform Pro Complete Virtex-IIand Pro sheet data UG129. guide 2011a. user microcontroller Xilinx. embedded 8-bit PicoBlaze UG200 guide 2010b. reference FPGAs Virtex-5 blockin processor Xilinx. Embedded 2010a. UG018 PowerPC guide Xilinx. blockreference (v2.4). processor 405 UG070 guide (v2.6). FPGA user 2008. Virtex-4 Xilinx. Mentor applications. for embedded system Walls, operating C. 2014. an Selecting Vadja, A. 2011. Chips Programming Many-Core datasheet. platform Triscend. 2001.system-on-chip A7 configurable Triscend datasheet. family system-on-chip configurable E5 Triscend Triscend. 2000. Embedded Processors in FPGA Architectures in FPGA Processors Embedded Leadership-and-Multi-node-Scaling-Advantage. Accessed November 23, Accessed 2016.Leadership-and-Multi-node-Scaling-Advantage. on-7nm-for-Fourth-Consecutive-Generation-of-All-Programmable-Technology- Available at http://press.xilinx.com/2015-05-28-Xilinx-Collaborates-with-TSMC- advantage. multi-node scaling and leadership technology programmable of all (v1.7). DS083 (v5.0). (v1.8). Graphics paper white TECH112110-w. . Springer Science + Business Media. Science + Business . Springer 113