P018_NELE_APR24_Layout 1 19/04/2012 13:39 Page 18

Technology Watch is sponsored by RS Components

isibility is everything in debug, but it is hard to achieve in the world of Complexity is Vembedded systems. Compared to desktop work, where a software monitor can easily show the internal state of a system, the enemy debugging an embedded target can seem more like keyhole surgery. ebugging is, on the face of it, a simple process: looking for For years, the in circuit emulator (ICE) was flaws in software by running the program on the target the staple of mcu users. A specialised connector Dhardware and fixing the errors. made it possible to watch activity and Not so long ago, debugging was a matter of plugging in provide shadow regions of memory that would something like an in circuit emulator and watching what happened. help control what the cpu was running. This was In some ways, it was like looking into a room through a fairly large vital for rom based mcus as the emulator keyhole. Today, a combination of functional integration, smaller memory could be changed on the fly, in contrast packages and the advent of multiple processor cores has made the to the fixed memory on the target mcu. process of ‘plugging in’ impossible – the ‘keyhole’ is no longer there State machines in the emulator would and more elegant solutions have had to be developed. extract bus activity data and use it to determine Embedded systems are, essentially, custom designs. Engineers where a cpu was in its execution and what data choose the best pieces from a vast collection of components and it saw. When the state machine saw that a work to increasingly stringent constraints. So it’s no surprise to particular combination of events happened, it find that debugging standards have been fairly thin on the ground. could provide a breakpoint: stepping in and As embedded systems – and microcontrollers in particular – stopping execution and then downloading the became more complex, the electronics world adopted its registers and other state information from the standard fall back position of the proprietary solution; if you had a cpu core. Or it could record the data as processor, for example, you used background debug watchpoints. mode. Standing alongside these was the other fall back position – As mcus became more complex and began the printf command. to incorporate additional peripherals – such as The automotive industry drove an attempt to bring on chip memory and caches to increase speed standardisation to debug in the 1990s; a move that resulted in and reduce system cost – visibility suffered. ICE Nexus 5001. But while it has support from most leading mcu makers had to buy custom ‘bondout’ versions developers, Nexus 5001 is not in general use. of the target mcu that made available to the The problem is put into focus when you look at ARM and its emulator internal bus address and data lines embedded trace methodologies. Heterogeneous systems make it that would normally be hidden from view. But, hard to apply what is, effectively, a proprietary solution, even as these connections were not through proper though Cortex-M cores are used widely. I/O pads, they did not have the same level of What is certain is that, as embedded systems become more electrostatic discharge protection. Bondout integrated and more complex, effective versions were far more electrically fragile – debug methods will become even more and were expensive to replace when a important. It all comes down to time to misplaced connection resulted in the device market; if you can’t debug your system latching up forever. effectively, you can’t release it. And, today, Although programmers had the option of time to market is the synonym for time to falling back on the standby of printf debugging – money. instrumenting the code to spit out hints about its behaviour on a serial port – this approach Glenn Jarrett, head of electronics was generally prone to timing problems. Code marketing, RS Components that worked fine in debug mode would suddenly break when compiled for production. With operations in 32 countries and 17 warehouses, The delays caused by the printf statements or RS Components stocks 550,000 products from 2500 leading other signals might have masked fatal deadlocks suppliers. It serves 1.6 million customers worldwide, shipping and race conditions. more than 46,000 parcels on the day the orders are received. Embedded programmers got a lucky break rswww.com tel: 08457 201201 in the late 1980s after the IEEE Joint Test Access Group (Jtag) put together a plan to reduce the

18 24 April 2012 www.newelectronics.co.uk P018_NELE_APR24_Layout 1 19/04/2012 13:39 Page 19

Technology Watch Research & Development

The quest for visibility

As technology becomes more complex, so too does working out what’s going on inside an mcu. By Chris Edwards.

cost of test. Chip designers realised that allow signals to pass through. This made it quickly adopted the Jtag port for increased integration was making it tougher for possible to test the continuity between the hardware assisted debug, as well as for board OEMs to test assembled boards at the end of devices on the board without probing the pcb level test – implementing it on the 80486 – production using conventional ‘bed of nails’ traces directly. Designers realised the ability to which pushed other manufacturers to use it. testers – so many functions were hidden away send commands to logic blocks inside a chip Motorola – now Freescale – developed its inside individual pieces of silicon and the chips could not only be extended to test functions Background Mode Debug (BDM) interface, later themselves so tightly spaced that board level inside the device itself, but also to send embracing Jtag as the access port for the BDM probes were unable to reach them individually commands to the logic blocks and to extract logic on its products. However, Jtag to see if they worked. information from them as they ran in a implementations were far from equal. Although The answer lay in providing access to test functioning system. The first target for this many companies put the debug functions on a logic inside each chip through a serial bus that extension of Jtag was the mcu’s central different internal scan chain to the test functions, wove its away around the board and inside processing unit (cpu), which was beginning to some popular devices – such as the IBM each device. Using this bus, a tester could suffer from a similar visibility problem to that PowerPC 600 series – did not, making the job of disable the core logic cells inside each chip and experienced by board level test engineers. accessing the debug support often quite tedious. Since its early days, the had included an Fig 1: The Nexus 5001 standard divides debug support into several classes instruction intended for use by software debuggers. The INT 3 instruction is a single byte instruction – with an opcode of 0xCC – that Static debug forces the processor to run an interrupt. Read/write register and memories Start/stop processor Class 1: basic run control Generally, this would be used by a debugger to Hardware/software breakpoints patch a location in memory to force a breakpoint when the processor hit that location. The use of a single byte made it possible to patch any instruction in the x86 set, some of Watchpoint message which are only 1byte long. The 80386 greatly extended hardware Ownership trace message Class 2: instruction trace watchpoints support for debugging with the inclusion of six debug registers that could be used to set breakpoints without patching memory directly. Program trace messages The registers could be set to any location in memory – allowing breakpoints on data as well as instruction accesses. Another register setting Read/write access made it possible to single step through the Class 3: read/write access data trace code. The processor would halt after every executed instruction. The inclusion of the Jtag Data trace messages port on the 80486 made it possible for external hardware to program these registers without needing to interfere with the running program. Memory substitution Motorola’s BDM brought greater visibility to Class 4: memory and port substitution lower cost mcus and processors, such as its popular 68332, without adding dedicated Port replacement breakpoint registers. Instead, BDM provided an onchip controller for the cpu that could change

www.newelectronics.co.uk 24 April 2012 19 P018_NELE_APR24_Layout 1 19/04/2012 13:39 Page 20

Technology Watch is sponsored by RS Components

or fetch register and memory contents through Fig 2: An example of debug connectivity in a multicore ARM SoC its serial connection without halting the target, using a mechanism similar to direct memory access (DMA). As a separate piece of hardware, the BDM controller could interrogate the AHB AXI system after a software crash or force the core to AXI to APB bridge bridge into single step mode. Later parts added breakpoint registers to avoid the need to patch memory with trap instructions. AMBA AXI bus Unfortunately, a comparatively slow serial connection is only good for relaying start-stop Cross trigger matrix commands and extracting small chunks of data at a time. One big advantage of the ICE was its ability to provide a real time trace of program execution. Without trace, programmers had to, Cortex ARM dsp again, lace their code with printf statements to core core work out which branches the processor took during execution. Some real time operating interface interface Cross trigger systems, such as VxWorks, added their own Cross trigger level of instrumentation to improve debug visibility – storing a record of recent system calls Debug access in a chunk of memory set aside for the purpose Jtag port – but this again could alter timing related Jtag behaviour. And it was only suitable for targets that could justify the inclusion of an rtos. Debug bus

Evolutionary process The step in the evolution of debug was on chip trace in an attempt to provide more of the To allow the recording of more detailed trace device, trace support is becoming more features of an emulator, letting the debugger data that carries information about memory important. And it is no longer isolated to follow the progress of the code running inside locations that are accessed by instructions and processors. Hardware accelerators are beginning the cpu. The problem was one of pins. Despite not just instruction execution, vendors have to sport trace functions to make it easier to being a core part of any embedded developed two main mechanisms. One is to visualise the interactions between them and the development project, debug is far from being a multiplex the trace bus pins with those of regular on chip processors. A trace buffer that can priority when it comes to allocating pins. On peripherals so that, when the mcu is in debug record events in the correct order is vital to small package mcus, pins are relatively mode, the pins provide high bandwidth trace understanding whether the application is expensive. But full trace is bandwidth hungry as, data. Clearly, this strategy only works if the pins suffering from race conditions. A further potentially, every address visited by the program are not needed for regular I/O work. enhancement turning up these multicore systems needs to be output on a dedicated trace bus. Compression makes it possible to reduce is a new take on printf debugging. More recent However, even in tightly looping code, only a the bandwidth demand. Because the flow of a ARM cores, for example, contain instrumentation fraction of the instructions are responsible for a program is relatively predictable, it is possible to buffers that record events under software change in flow. Otherwise, instruction flow is compress the number of address bits passed control. By setting aside an onchip buffer, the highly predictable. In a typical C program, a through the trace port, restricting full addresses program does not use potentially precious branch is encountered only every 10 to those of monitored data locations. hardware I/O resources such as serial ports. The instructions. Companies such as Freescale If access to pins is highly restricted, an debugger can read out the logged events implemented branch trace modes that output alternative is to dedicate an area of on chip through the Jtag or the trace port. small packets of data only on a change in flow. memory to a circular buffer that records the In many cases, it is not practical to halt the This greatly reduced the amount of data that most recent cpu activity – its contents can be processor completely on every breakpoint. In a needed to be sent from the chip, allowing the read while the target is halted either through multitasking system where interrupts need to use of a narrower trace bus. In rare cases where the Jtag port or by switching the relevant be handled in real time – because, for example, the flow of control does not change after more multiplexed pins into trace mode. they may be used to control a motor – the than 200 instructions or so, the debug controller Because it is often not practical or desirable breakpoint is used to halt a particular thread, outputs a synchronisation message. to stop all the cores on a multitasking multicore but not the actual processor. ARM calls this

20 24 April 2012 www.newelectronics.co.uk P018_NELE_APR24_Layout 1 19/04/2012 13:39 Page 21

Technology Watch Research & Development

Fig 3: The proposed OCP-IP 3.0/Nexus debug interface a given client is found. However, this is not an efficient use of resources and a lack of communication between cores makes certain functions that can be useful during multicore Cross trigger interface debugging – such as synchronised halt and restart or cross triggering between threads OCP bus fabric running on different cores – hard, if not impossible, to achieve. Bus During the second half of the last decade, a Test Socket number of bodies decided to create their own multicore debugging standards. The OCP-IP Debug group, which defined the on-chip interconnect Core Trace interface interface architecture used by companies such as Texas Instruments, has teamed up with Nexus to define a set of multicore debug standards. Other efforts included the EU funded Sprint Memory mapped Consortium, the Taiwan SoC Consortium and a Debug Jtag mapped interface Nexus mapped proposal for mobile phone processors put Debug IP registers together by the MIPI Alliance. While OCP-IP is currently the most active, there is still a lack of commonality in multicore debug protocols.

Jtag control Nexus data trace Making debug tools work better For version 3.0 of its interconnect specification, Debug software OCP-IP is planning to make debug tools work better with systems where the power state of individual cores can change rapidly over time. For example, without specific support, the chip’s ‘monitor mode debugging’. Instead of stopping they bought. Although Freescale promoted power manager may attempt to power down a the processor, control for a thread of execution Nexus enthusiastically and other silicon core while it is stopped, preventing access to its passes to a debug monitor that runs as a manufacturers selling into the automotive registers from the debugger. Another focus is on privileged task alongside normal threads. This industry embraced it, support for Nexus cache coherency – making the debugger aware monitor takes care of accesses to debug elsewhere is far from widespread. of state changes in the cache that might affect registers, trace buffers and stored register In practice, de facto standards such as ARM’s the timing of software loops. For example, a contents – which will be flushed to memory CoreSight, Motorola/Freescale’s BDM and cache update may block the main memory bus, while other tasks run. When the thread needs OnCE port or the Intel Extended Debug Port affecting a number of processors cores in a to be started from where it finished, the debug (XDP) – the successor to the 80486’s debug cluster. monitor returns its state to ready to run and port – are the winners so far. Outside of the standards efforts, a wider restores the correct stack state so the rtos can Things began to change in the mid 2000s, range of data is likely to be captured by trace schedule the thread for execution. when it became clear that multicore architectures ports. Xilinx made it possible to send data Despite the success of Jtag as a low level were the future for the embedded world. In the about temperature and other environmental standard for debug, standardisation at higher desktop world, de facto standards could be data over the Jtag port. TI has indicated that level proved elusive. Core Jtag pins are expected to survive as the focus has, so far, been future versions of its low power generally always available: the questions focus on homogenous multicore processors. However, microcontrollers will record more data about on additional pins that might be present for in the embedded space, heterogeneous energy consumption in real time for use in relaying data to the emulator or host architectures are becoming common, making it debuggers, making it easier to see the effect of computer and the protocols used to control difficult to use single vendor debug architectures, code changes on overall power consumption. the target. even one as pervasive as ARM’s. Support for all these different aspects of The longest lived attempt to unify debug is In principle, debug systems for different debug is still patchy across the industry, but the the Nexus 5001 consortium, formed in 1998 in processors can coexist on the same Jtag trend is gradually towards greater levels of response to demands from car makers for connection. The protocol is designed to support visibility which, in turn, should cut down the silicon suppliers to converge on one debug multiple test targets within a device, cycling length of time it takes to verify that an interface they could use for all the processors through the scan chain until the correct one for works as planned.

www.newelectronics.co.uk 24 April 2012 21