Keywords: Design-For-Debug, Validation, Testing, Debug, Silicon, Prototypes

1005EF5-Novas.doc

Keywords: Design-for-Debug, validation, testing, debug, silicon, prototypes

Editorial Feature Tabs: Methods-Tools / Design for Debug (DFD)

@head:Is there a Practical Approach to Reducing Silicon Debug Time? @deck:A design-for-debug approach expands data observability so engineers can efficiently validate silicon prototypes before volume production.

@text:Preparing a design for volume production doesn’t end after tape-out. Indeed, the time spent from the arrival of the first silicon prototypes to achieving high-volume production is steadily growing. One of the main tasks during this stage is “chip bring-up,” which is the process for validating the prototypes while they’re plugged into the parent systems. This in-situ simulation is similar to pre-silicon verification in that engineers must debug any abnormal behavior that’s uncovered. As the complexities of silicon devices and their corresponding systems increase, however, so does the difficulty and effort required to debug silicon errors. Design-for-debug (DFD) methodologies may be used to better understand internal silicon-device behavior during in-situ silicon validation.

The development of integrated circuits (ICs) begins with the creation of the specification. That step is followed by a series of implementation and verification procedures. IC development culminates with the manufacture of the first silicon prototypes. Throughout this process, engineering teams utilize a multitude of verification tools to verify the design for functionality as well as electrical and physical-layout requirements. Because tools are bounded by capacity and performance, however, it isn’t possible to check all scenarios prior to tape-out. As a result, silicon-validation teams must run the silicon prototypes in the target systems to check for errors that escaped the pre-silicon verification process. Common sources of errors (see Figure 1) are listed below along with drivers of each type of error:

 Specification: The design isn’t adequately specified or incorrect. Specification increases with the rise in design complexity.  Coding: The register-transfer-level (RTL) code is incorrectly written or doesn’t correctly specify functionality. Coding increases when verification methods and verification tool performance do not keep up with the growth in the amount of RTL per design.  Synthesis: The output of synthesis doesn’t match the intended functionality described by the RTL. This error source is related to the quality of synthesis products.  Layout: The layout doesn’t meet all of the physical specification requirements. This problem increases with the lack of sufficient modeling and design-rule-check (DRC) tool quality.  Timing: Process variations combined with small design margins prevent devices from running at the intended frequency. Timing increases with smaller geometries.  Electrical: Circuit values altered due to dynamic power consumption, crosstalk effects, and RF effects. This error source increases with smaller geometries and model inaccuracies.  Manufacturing: Physical defects introduced during the manufacture process. This problem increases with smaller geometries and is compounded by resulting difficulty in defect-model accuracy.

Unfortunately, no standard methodology for system validation exists. Most validation teams plug the silicon devices into the systems, turn on the power, and run various modes and scenarios as described in a test plan. But the biggest impact of a lack of a standard methodology occurs when the validation team notices erroneous behavior. The next critical step is to understand the cause of the behavior.

Most of the engineers tasked with debugging erroneous silicon behavior describe it as extremely difficult due to the lack of internal signal observability. For example, what are the next steps if the system freezes? While they may know the last applied scenario on the system, engineers need to know what is happening inside the silicon device. Signal values may be present at the external pins and monitored with equipment like logic analyzers. Yet engineers find the status of the data, status registers, and internal state machines much more useful. In effect, they want to observe the internal silicon signal values to better understand the device’s behavior. With this information, the validation engineer can better communicate with the original RTL designer to aid in the debug process. Observability of internal signals must be provided in order to make silicon debug easier and more efficient.

Emerging Silicon-Debug Methodology To minimize system-validation time, a silicon-debug methodology must bridge the communication gap between the system-validation engineer and designer. At the same time, it must provide excellent silicon-signal data visibility. An emerging methodology called design-for-debug is gaining attention as a means of improving silicon debug. It involves placing logic on the silicon to help with its debug. This methodology requires up-front planning.

The concept of placing dedicated hardware on chip for the purpose of silicon debug isn’t new. Indeed, some of the first silicon-debug papers and articles were in regard to processors. As with many other design methodologies, engineering teams for leading processors were among the first to deploy DFD. Such processors included those offered by Intel, Motorola, and Sun Microsystems. The companies’ engineering teams took advantage of external buses by multiplexing them to gain internal signal accesses. In other cases, the teams efficiently reused scan chains that were already in place for manufacturing test. These latter cases mark the start of the recent trend to implement DFD by exploiting design-for-test (DFT) structures. The most straightforward DFD methodology reuses existing DFT, such as scan chains, in conjunction with the IEEE 1149.1 test modules that are also known as JTAG (see Figure 2). Over half of all designs are now outfitted with an IEEE 1149.1 controller. More than 80% contain scan chains. As a result, a DFD methodology that reuses these test structures offers the most viable option for most design teams. The primary objective of this methodology is to provide register values by shifting out scan-chain values via the IEEE 1149.1 controller while the device is in the system.

As part of this method, design teams must make two up-front decisions: 1. What part(s) of the device needs to be observed? 2. Does the system need to continue operation after observing the scan-chain values?

Some of the best candidates for the first question are areas containing complex state machines or other sequential elements with values that aren’t easily deduced. The response to the second question determines the type of scan chain that the team will need to implement: destructive versus non-destructive scan. Destructive scan chains destroy the values of the registers when shift-out operations occur. If the values are associated with registers that are critical to system operation, the shifting can occur only once before the system must be rebooted.

The advantage of this type of scan chain is the minimal impact on area and timing. On the other hand, non-destructive scan chains don’t disrupt register values. The system- validation team can shift out values to observe and then continue system operation. Because non-destructive scan chains use significantly larger area, many design teams limit their use to registers that are needed for debug.

To access scan chains of a device in situ, a set of debug instructions needs to be added to the IEEE 1149.1 test-access-port (TAP) controller. The 1149.1 standard defines a set of mandatory instructions. But it also allows additional user-defined instructions. The minimum set of instructions should include:  Debug mode -- maintain or “freeze” the device in its current state  Shift -- enable serial shift operation of all the scan chains via the TAP pins

In debug mode, the design is readied for additional debug instructions. The user can enter the Shift to serial-shift-out values of all of the scan chains. If the design contains a mix of both destructive and non-destructive scan chains, it should contain an extra instruction to only shift out the non-destructive scan chains. This arrangement will allow the system to continue operation after exiting debug mode.

Beyond shifting capabilities, other techniques to improve silicon debug include the use of breakpoints and register loading. Breakpoints are hardware monitors that halt the system when triggered by an event. These monitors usually detect a notable condition (for example, a specific value in a counter). For greater flexibility, design teams can make the comparison values for the breakpoints user-programmable. In contrast, register loading is useful for testing a hypothesis of the error and resolution. It forces flip-flop and latch values to specific states. The corresponding high-level instructions for these options include:  Stop-on-breakpoint -- specifies that the device should operate until a breakpoint is reached  Initialize -- initialize key registers to default values  Single step -- allow single-step clock operation  Load register -- write new value to a specific register Note that if more than one register is writeable, the TAP controller needs additional commands to specify the assigned debug-mode addresses of each register.

One of the key design challenges for implementing debug mode is the handling of clock and reset controls. Ideally, the clocks should transition from system to debug mode without hazards. To accomplish this goal, the clock-control logic must not introduce glitches. In designs with multiple clocks, all clocks should stop simultaneously to accurately take a “snapshot” of data in the registers.

The data must be transferred from the device to a software debug environment. An electrical connection must therefore be made from the device to a personal computer (PC). Accommodating this type of connection requires that the system board contain an electrical connection from the TAP pins of the device to a physical socket. Prior to turning on the system, the socket is connected via an IEEE 1149.1 pod to a port on the PC, such as the Universal Serial Bus (USB) port. The pod takes care of matching the different electrical characteristics and protocol between the socket and the USB port. Because there is no standardized type of socket or pod, both should be chosen in advance of the printed-circuit-board design.

A crucial piece of application software, which was created by the system-validation team to control the DFD logic, runs on the PC. In many cases, the system-validation team can create the application based on the software libraries that are available with the selected pods. With the system and application running, the user can load various debug instructions during normal system operation. Loading in such instructions doesn’t affect the application running on the system.

To gain insight into device behavior, design teams can shift values from the scan chains, through the TAP controller, across the pod, and into the PC. Once the raw signal data has been captured, it must be translated to correspond with the HDL design. It can then be imported into existing HDL-centric debug systems (see Figure 3). This translation involves three steps. The notion of time or cycle must first be re-introduced into the captured data. Next, the data must be associated with the gate-level signal names in the HDL design. Lastly, it must be formatted into a standard like value change dump (VCD).

Synthesized gate-level designs can be difficult to understand. The level of abstraction is too low. In addition, synthesis tools perform transformations that don’t necessarily provide a one-to-one mapping to RTL code. To understand the behavioral intent of gate- level logic, it’s important for the engineer to know the corresponding RTL-signal values. To accomplish this goal, map the signal values to RTL rather than gate-level instances prior to formatting the VCD file.

Once data capture, translation, and formatting are completed, designers can use standard debug systems. They also can apply pre-silicon techniques to isolate the root cause of any problem--all while operating in the designer’s environment. The standard capabilities that are available in a debug tool, such as the Verdi Automated Debug System, include the ability to perform source-code tracing, waveform viewing, schematic viewing, and value annotation on any design view. If it’s available, expected data based on golden simulation results can help narrow down an error’s source. It highlights differences with actual silicon data.

The debug system assists the user in finding the cause and location of an error that should be subsequently confirmed. This latter step may require replicating the cause using simulation or emulation and performing additional signal-integrity checks. The engineer will then need to verify that the resulting predicted response matches the observed silicon response.

In response to the growing problem of insufficient observability, this silicon-debug methodology vows to enable validation teams to efficiently validate silicon prototypes prior to volume production. The impact of the methodology on design-logic implementation is minimal, as it reuses existing DFT resources that would otherwise be idle during system-level testing. The methodology provides a means to extract and process data from the silicon for further exploration and analysis by debug systems and associated verification environments. This approach eases the silicon-debug bottleneck for both validation engineers and designers. At the same time, it increases the probability that errors will be found in a timely manner.

Reference: 1. EETimes, October 25, 2004, “Scan design called portal for hackers,” Gartner Dataquest study cited.

Yu-Chin Hsu, Vice President of Research & Development at Novas, is the former head of the synthesis product line at Avant! He has over 16 years of research and development experience in EDA. Hsu has held faculty positions at the University of California and Tsing Hua University in Taiwan. He holds a Bachelor of Science degree from Taiwan University and a Master of Science degree and PhD from the University of Illinois.

Robert Ruiz is the Senior Product Marketing Manager at Novas. Prior to joining Novas, Ruiz held various marketing and technical positions for the verification and test- automation products at Synopsys and Viewlogic Systems. His background includes over 13 years in functional verification and design-for-test methodologies as well as several years as an ASIC designer. Ruiz has a BSEE from Stanford University.

+++++++++++++ Captions: Figure 1: The errors and defects that are introduced throughout the design process make planning for silicon test and debug essential. Figure 2: This diagram shows an example DFD architecture that reuses DFT and adds debug instruction registers, control, and optional monitors. Figure 3: Scan data from the silicon device is viewed textually or imported into an HDL- based debug system. Additional scan data may be needed from the silicon to track down the root cause of a problem.