<<

IT2354 EMBEDDED SYSTEMS

A Course Material on

Embedded Systems

By

Mr. .PRABHAKARAN

ASSISTANT PROFESSOR

DEPARTMENT OF INFORMATION & APPLICATIONS

SASURIE COLLEGE OF

VIJAYAMANGALAM – 638 056

SCE DEPT OF IT 1 IT2354 EMBEDDED SYSTEMS

QUALITY CERTIFICATE

This is to certify that the e-course material

Subject Code : IT2354

Scubject : EMBEDDED SYSTEMS

Class : III B.Tech ()

being prepared by me and it meets the knowledge requirement of the university curriculum.

Signature of the Author

Name:

Designation:

This is to certify that the course material being prepared by Mr.D.Prabhakaran is of adequate quality. He has referred more than five books amont them minimum one is from aborad author.

Signature of HD

Name:

SEAL

SCE DEPT OF IT 2 IT2354 EMBEDDED SYSTEMS

IT2354 EMBEDDED SYSTEMS L P 3 0 0 3 UNIT I EMBEDDED 9 Challenges of Embedded Systems – design . Embedded processors – 8051 , ARM – Architecture, Instruction sets and programming.

UNIT II MEMORY AND / OUTPUT MANAGEMENT 9 Programming Input and Output – Memory system mechanisms – Memory and I/O devices and interfacing – handling.

UNIT III PROCESSES AND OPERATING SYSTEMS 9 Multiple tasks and processes – Context switching – policies – Interprocess mechanisms – Performance issues.

UNIT IV EMBEDDED 9 Programming embedded systems in assembly and C – Meeting real time constraints – Multi-state systems and function sequences. development tools – and .

UNIT V EMBEDDED SYSTEM DEVELOPMENT 9 Design issues and techniques – Case studies – Complete design of example embedded systems. TOTAL : 45PERIODS TEXT BOOKS 1. Wayne Wolf, “ as Components: Principles of Embedded Computer System Design”, Elsevier, 2006. 2. Michael J. Pont, “”, Pearson Education , 2007.

REFERENCES: 1. Steve Heath, “Embedded System Design”, Elsevier, 2005. 2. Muhammed Ali Mazidi, Janice Gillispie Mazidi and Rolin D. McKinlay, “The 8051 Microcontroller and Embedded Systems”, Pearson Education, Second edition, 2007.

SCE DEPT OF IT 3 IT2354 EMBEDDED SYSTEMS

UNIT-I EMBEDDED COMPUTING

Challenges of Embedded Systems – Embedded system design process. Embedded processors –

8051 Microcontroller, ARM processor – Architecture, Instruction sets and programming.

1.1 INTRODUCTION

Embedded system: An embedded system is a special-purpose computer system designed to perform a dedicated function Unlike a general-purpose computer, such as a , an embedded system performs one or a few pre-defined tasks, usually with very specific requirements, and often includes task-specific hardware and mechanical parts not usually found in a general-purpose computer. Embedded computer System: Any device that includes a programmable computer but is not itself intended to be a general-purpose computer is called embedded computer system. Embedding Computers Computers have been embedded into applications since the earliest days of computing.Eg) Whirlwind. a computer designed MIT in the late 1940s and early 1950s. Whirlwind was also the first computer designed to support real-time operation and was originally conceived as a mechanism for controlling an aircraft simulator. : A microprocessor is a single-chip CPU. The first microprocessor, the 4004, was designed for an embedded application, namely, a . come in many different levels of sophistication; they are usually classified by their word size. Microprocessors execute programs very efficiently.

 An 8- microcontroller is designed for low-cost applications and includes on-board memory and I/O devices;    A 16-bit microcontroller is often used for more sophisticated applications that may require either   longer word lengths or off-chip I/O and memory;    A 32-bit RISC microprocessor offers very high performance for computation-intensive

SCE DEPT OF IT 4 IT2354 EMBEDDED SYSTEMS

applications. 

 Why use microprocessors?   Microprocessors are a very efficient way to implement digital systems.    Microprocessors make it easier to design families of products that can be built to provide various feature sets at different price points and can be extended to provide  new features to keep up with rapidly changing markets.    Characteristics of Embedded Computing Applications Embedded computing is in many ways much more demanding than the sort of programs that you may have written for PCs or . On the one hand, embedded computing systems have to provide sophisticated functionality: ■ Complex : The operations performed by the microprocessor may be very sophisticated. For example, the microprocessor that controls an automobile engine must perform complicated filtering functions to optimize the performance of the while minimizing and fuel utilization. ■ User : Microprocessors are frequently used to control complex user interfaces that may include multiple menus and many options. The moving maps in Global Positioning System (GPS) navigation are good examples of sophisticated user interfaces.

Deadlines involves: ■ Real time: Many embedded computing systems have to perform in real time if the data is not ready by a certain deadline, the system breaks. In some cases, failure to meet a deadline is unsafe and can even endanger lives. In other cases, missing a deadline does not create safety problems but does create unhappy customers—missed deadlines in printers, can result in scrambled pages. ■ Multirate: Many embedded computing systems have several real-time activities going on at the same time. They may simultaneously control some operations that run at slow rates and others that run at high rates. applications are prime examples of multirate behavior. The audio and video portions of a multimedia stream run at very different rates, but they must remain closely synchronized. Costs of various sorts are also very important: ■ Manufacturing cost: The total cost of building the system is very important in many cases. Manufacturing cost is determined by many factors, including the type of microprocessor used, the amount of memory required, and the types of I/O devices.

SCE DEPT OF IT 5 IT2354 EMBEDDED SYSTEMS

SCE DEPT OF IT 6 IT2354 EMBEDDED SYSTEMS

■ Power and energy: Power consumption directly affects the cost of the hardware, since a larger power supply may be necessary. Energy consumption affects battery life, which is important in many applications,as well as heat consumption, which can be important even in desktop applications.

1.2 CHALLENGES IN EMBEDDED COMPUTING SYSTEM DESIGN The following external constraints are one important source of difficulty in embedded system Design.  How much hardware do we need? We have a great deal of control over the amount of computing power we apply to our problem. We cannot only select the type of microprocessor used, but also select the amount of memory, the devices, and more. If too little hardware and the system fails to meet its deadlines, too much hardware and it becomes too expensive.  How do we meet deadlines? It is entirely possible that increasing the CPU may not make enough difference to execution time, since the program’s speed may be limited by the memory system.  How do we minimize power consumption? In battery-powered applications, power consumption is extremely important. Even in non- battery applications, excessive power consumption can increase heat dissipation. One way to make a digital system consume less power is to make it run more slowly, but naively slowing down the system can obviously to missed deadlines.  How do we design for upgradability? The hardware platform may be used over several product generations or for several different versions of a product in the same generation, with few or no changes.  Does it really work? Reliability is always important when selling products—customers rightly expect that products they buy will work. The sources that make the design so difficult are: ■ Complex testing: Exercising an embedded system is generally more difficult than typing in some data. The timing of data is often important, meaning that we cannot separate the testing of an embedded computer from the in which it is embedded.

■ Limited observability and controllability: Embedded computing systems usually do not come SCE DEPT OF IT 7 IT2354 EMBEDDED SYSTEMS with keyboards and screens. This makes it more difficult to see what is going on and to affect the system’s operation. We may be forced to the values of electrical signals on the microprocessor , for example, to know what is going on inside the system. Moreover, in real- time applications we may not be able to easily stop the system to see what is going on inside. ■ Restricted development environments: The development environments for embedded systems (the tools used to develop software and hardware) are often much more limited than those available for PCs and workstations.

1.3 THE EMBEDDED SYSTEM DESIGN PROCESS

Overview of the embedded system design process aimed at two objectives. First,it will give us an introduction to the various steps in embedded system design before we delve into them in more detail. Second, it will allow us to consider the design methodology itself.Figure 1.1 summarizes the major steps in the embedded system design process.In this top–down view,we start with the in the step comes Specification.

Fig 1.3.1 Embedded system design process

Top–down Design—we will begin with the most abstract description of the system and conclude with concrete details. The alternative is a bottom–up view in which we start with components to build a system. Bottom–up design steps are shown in the figure as dashed-line

SCE DEPT OF IT 8 IT2354 EMBEDDED SYSTEMS

arrows. We need bottom–up design because we do not have perfect insight into how later stages of the design process will turn out. During the design process we have to consider the major goals of the design such as ■ manufacturing cost; ■ performance (both overall speed and deadlines); and ■ power consumption. Design steps are detailed below:

1.3.1 A.Requirements:

Requirements may be functional or nonfunctional. We must capture the functions of the embedded system, but functional description is often not sufficient. Typical nonfunctional requirements include: 1. Performance: The speed of the system is often a major consideration both for the usability of the system and for its ultimate cost. Performance may be a combination of soft performance metrics such as approximate time to perform a user-level function and hard deadlines by which a particular operation must be completed. 2. Cost: The target cost or purchase price for the system is almost always a consideration. Cost typically has two major components: o Manufacturing cost includes the cost of components and assembly;

o NonRecurring engineering (NRE) costs include the personnel and other costs of designing the system. 3. Physical size and weight: The physical aspects of the final system can vary greatly depending upon the application. e.g) An industrial control system for an assembly line may be designed to fit into a standard-size rack with no strict limitations on weight. But a handheld device typically has tight requirements on both size and weight that can ripple through the entire system design. 4. Power consumption: Power, of course, is important in battery-powered systems and is often important in other applications as well. Power can be specified in the requirements stage in terms of battery life.  mock-up. The mock-up may use canned data to simulate functionality in a restricted demonstration, and it may be executed on a PC or a . But it should give the

SCE DEPT OF IT 9 IT2354 EMBEDDED SYSTEMS

customer a good idea of how the system will be used and how the user can react to it. Physical,nonfunctional models of devices can also give customers a better idea of characteristics such as size and weight. e.g) of a GPS moving map The moving map is a handheld device that displays for the user a map of the terrain around the user’s current position; the map display changes as the user and the map device change position. The moving map obtains its position from the GPS, a satellite-based navigation system.

The moving map display might look something like the following figure

Fig.1.3.1.1 G.P.S Moving map

■ Functionality: This system is designed for highway driving and similar uses, not nautical or aviation uses that require more specialized and functions. The system should show major roads and other landmarks available in standard topographic databases. ■ : The screen should have at least 400_600 pixel resolution. The device should be controlled by no more than three buttons. ■ Performance: The map should scroll smoothly. Upon power-up, a display should take no more than one second to appear, and the system should be able to verify its position and display the current map within 15 s. ■ Cost: The selling cost (street price) of the unit should be no more than $100. ■ Physical size and weight: The device should fit comfortably in the palm of the hand. ■ Power consumption: The device should run for at least eight hours on four AA batteries.

SCE DEPT OF IT 10 IT2354 EMBEDDED SYSTEMS

Fig.1.3.1.2 Requirements form for GPS Moving map

1.3.2 B.Specification

The specification is more precise—it serves as the contract between the customer and the architects. The specification should be understandable enough so that someone can verify that it meets system requirements and overall expectations of the customer. It should also be unambiguous. the specification must be carefully written so that it accurately reflects the customer’s requirements and does so in a way that can be clearly followed during design. A specification of the GPS system would include several components: .   Data received from the GPS satellite constellation. .   Map Data .  User Interface Operations that must be performed to satisfy customer requests. Background actions required to keep the system running, such as operating the GPS receiver. UML, a language for describing specifications

1.3.3 C.Architecture Design

The specification does not say how the system does things, only what the system does. Describing how the system implements those functions is the purpose of the architecture. Figure 1.3 shows a sample system architecture in the form of a block diagram that shows major operations and data flows among them.

SCE DEPT OF IT 11 IT2354 EMBEDDED SYSTEMS

Fig 1.3.3.1 Block diagram of the moving map

Many implementation details should we refine that system block diagram into two block diagrams: one for hardware and another for software. These two more refined block diagrams are shown in Figure 1.4.The hardware block diagram clearly shows that we have one central CPU surrounded by memory and I/O devices. In particular, we have chosen to use two memories: a frame buffer for the pixels to be displayed and a separate program/data memory for general use by the CPU .

Fig 1.3.3.2 Hardware design Fig 1.3.3.3

1.3.4 D. Designing Hardware and Software Components

The component design effort builds those components in conformance to the architecture and specification. The components will in general include both hardware—FPGAs, boards, and so on—and software modules. Some of the components will be ready-made. In the moving map, the GPS receiver is a good example of a specialized component that will nonetheless be a predesigned, standard component. We can also make use of standard software modules. One good example is the topographic .

SCE DEPT OF IT 12 IT2354 EMBEDDED SYSTEMS

1.3.5 E. System Integration

The components built are put together and see how the system works. If we debug only a few modules at a time, we are more likely to uncover the simple bugs and able to easily recognize them. Only by fixing the simple bugs early will we be able to uncover the more complex or obscure bugs.

1.4 EMBEDDED PROCESSORS

Embedded processors can be broken into two broad categories: ordinary microprocessors (μP) and (μC), which have many more on chip, reducing cost and size. Contrasting to the personal computer and markets, a fairly large number of basic CPU architectures are used; there are Von Neumann as well as various degrees of Harvard architectures, RISC as well as non-RISC and VLIW; word lengths vary from 4-bit to 64- and beyond (mainly in DSP processors) although the most typical remain 8/16-bit. Most architecture comes in a large number of different variants and shapes, many of which are also manufactured by several different companies. e.g ARM

1.4.1 8051 MICROCONTROLLER

8051 is an excellent device for building many embedded systems. One important factor is that the 8051 requires a minimum number of external components in order to operate. It is a well-tested design; introduced in its original form by Intel in 1980 the development costs of this device now start at less than US $1.00. At this price, you get a performance of around 1 million , and 256 (not megabytes!) of on-chip RAM, 32 port pins and a serial interface. Features of 8051: The main features of 8051 microcontroller are:  RAM – 128 Bytes (Data memory)    ROM – 4Kbytes .    – Using UART makes it simpler to interface for serial communication.   SCE DEPT OF IT 13 IT2354 EMBEDDED SYSTEMS

 Two 16 bit / Counter, Input/output Pins – 4 Ports of 8 bits each on a single chip.    6 Sources    8 – bit ALU () 

 Harvard Memory Architecture – It has 16 bit Address bus (each of RAM and ROM) and 8 bit Data Bus.    8051 can execute 1 million one-cycle instructions per second with a clock frequency of  12MHz. This microcontroller is also called as “” because it has all the features on a single chip. The Block Diagram of 8051 Microcontroller is as shown in Figure

8051 Microcontroller Block diagram Pin configuration of 8051 The following is the Pin diagram of 8051 microcontroller.

SCE DEPT OF IT 14 IT2354 EMBEDDED SYSTEMS

A) BASIC PINS

PIN 9: PIN 9 is the reset pin which is used to reset the microcontroller’s internal registers and ports upon starting up. (Pin should be held high for 2 machine cycles.)

PINS 18 & 19: The 8051 has a built-in oscillator amplifier hence we need to only connect a crystal at these pins to provide clock pulses to the circuit.

PIN 40 and 20: Pins 40 and 20 are VCC and ground respectively. The 8051 chip needs +5V 500mA to function properly, although there are lower powered versions like the 2051 which is a scaled down version of the 8051 which runs on +3V.

PINS 29, 30 & 31: As described in the features of the 8051, this chip contains a built-in . In order to program this we need to supply a voltage of +12V at pin 31. If external memory is connected then PIN 31, also called EA/VPP, should be connected to ground to indicate the presence of external memory. PIN 30 is called ALE (address latch enable), which is used when multiple memory chips are connected to the and only one of them needs to be selected. PIN 29 is called PSEN. This is "program store enable". In order to use the external memory it is required to provide the low voltage (0) on both PSEN and EA pins.

B) PORTS

There are 4 8-bit ports: P0, P1, P2 and P3.

PORT P1 (Pins 1 to 8): The port P1 is a general purpose input/output port which can be used for a variety of interfacing tasks. The other ports P0, P2 and P3 have dual roles or additional functions associated with them based upon the context of their usage.

PORT P3 (Pins 10 to 17): PORT P3 acts as a normal IO port, but Port P3 has additional functions such as, serial transmit and receive pins, 2 external interrupt pins, 2 external counter inputs, read and write pins for memory access.

PORT P2 (pins 21 to 28): PORT P2 can also be used as a general purpose 8 bit port when no external memory is present, but if external memory access is required then PORT P2 will act as an address bus in conjunction with PORT P0 to access external memory. PORT P2 acts as A8-A15, as can be seen from fig 1.1

PORT P0 (pins 32 to 39) PORT P0 can be used as a general purpose 8 bit port when no external SCE DEPT OF IT 15 IT2354 EMBEDDED SYSTEMS memory is present, but if external memory access is required then PORT P0 acts as multiplexed address and data bus that can be used to access external memory in conjunction with PORT P2. P0 acts as AD0-AD7.

C) OSCILLATOR CIRCUITS

The 8051 requires the existence of an external oscillator circuit. The oscillator circuit usually runs around 12MHz, although the 8051 (depending on which specific model) is capable of running at a maximum of 40MHz. Each machine cycle in the 8051 is 12 clock cycles, giving an effective cycle rate at 1MHz (for a 12MHz clock) to 3.33MHz (for the maximum 40MHz clock). The oscillator circuit generates the clock pulses so that all internal operations are synchronized.

The external interface of the Standard 8051

 Small 8051  Low-cost members of the 8051 family with reduced number of port pins, and no support for off- chip memory. Typical application: Low-cost consumer goods.

  Standard 8051: The Small 8051s and the Extended 8051s are derived   Extended 8051:Members of the 8051 family with extended range of no-chip facilities (e.g. CAN controllers, ADC, DAC,etc), large numbers of port pins, and - in recent devices - support for large amounts of off-chip memory. Typical applications: Industrial and automotive systems.  Reset requirements

‘Reset routine’ must be run to place hardware into an appropriate state before it can begin executing the user program. Running this reset routine takes time, and requires that the microcontroller’s oscillator is operating. Where system is supplied by a robust power supply, which rapidly reaches its specified output voltage when switched on, rapidly decreases to 0V when switched off, and – while switched on – cannot ‘brown out’ (drop in voltage), then we can safely use low-cost reset hardware based on a capacitor and a resistor to ensure that system will be reset correctly: this form of reset circuit is shown in Figure

SCE DEPT OF IT 16 IT2354 EMBEDDED SYSTEMS

Figure:Reset requirements Clock frequency and performance

All digital computer systems are driven by some form of oscillator circuit: the 8051 is certainly no exception. If the oscillator fails, the system will not function at all; if the oscillator runs irregularly, any timing calculations performed by the system will be inaccurate.

Fig: Clock frequency and performance Memory Issues A. Types of memory

.   Dynamic RAM (DRAM) .   Static RAM (SRAM) .   Mask Read-Only Memory (ROM) .   Programmable Read-Only Memory (PROM) .   UV Erasable Programmable Read-Only Memory (UV EPROM) . EEPROM and Flash ROM 

SCE DEPT OF IT 17 IT2354 EMBEDDED SYSTEMS

 Dynamic RAM Vs Static RAM Dynamic RAM Static RAM Dynamic RAM is a read-write memory Static RAM is a read-write memory technology that uses a small capacitor technology that uses a form of to store information. electronic flip-flop to store the information It must be frequently refreshed to No refreshing is required maintain the required information Less complex and least cost More complex and costs can be several times that of the corresponding size of DRAM Access time is large compared to Access times may be one-third those of SRAM DRAM  Mask Read-Only Memory (ROM): Mask ROM is – from the software developer’s perspective – read only; A ‘mask’ is provided by the company for which the chips are being produced. Such devices are therefore sometimes referred to as ‘factory programmed”.  Programmable Read-Only Memory (PROM) PROM is a form of Write-Once, Read-Many (WORM) or ‘One-Time Programmable’ (OTP) memory. Basically, we use a PROM to blow tiny ‘fuses’ in the device.  UV Erasable Programmable Read-Only Memory (UV EPROM) : UV EPROMs are programmed electrically. Unlike PROMs, they also have a quartz window which allows the memory to be erased by exposing the internals of the device to UV light.

 EEPROM and Flash ROM Electrically-Erasable Programmable Read-Only Memory (EEPROMs) and ‘Flash’ ROMs are a more user-friendly form of ROM that can be both programmed and erased electrically. EEPROM and Flash ROM are very similar. EEPROMs can usually be reprogrammed on a -by-byte basis, and are often used to store passwords or other ‘persistent’ user data. Flash ROMs generally require a block-sized ‘erase’ operation before they can be programmed.

SCE DEPT OF IT 18 IT2354 EMBEDDED SYSTEMS

b) Memory organization and ‘hex’

All data items are represented in as binary codes, each containing a certain number of bits. To simplify the storage and retrieval of data items, these memory bits are organized into memory locations, each with a unique . The other Memory representations are shown

Fig: Memory Organization and Hex representation

C) The 8051 memory architecture There are two distinct memory regions in an 8051 device: the DATA area and the CODE area.  DATA memory : DATA memory is used to store variables and the program stack while the  program is running. The DATA area will be implemented using some form of RAM. Most of  the DATA area has a byte-oriented memory organization. However, within the DATA area is a  16-byte BDATA area which can also be accessed using bit addresses.

The avoids confusion if 0x24 in bit address and byte address.  CODE memory :CODE area is used to store the program code, usually in some form of ROM.  The CODE area may also contain read-only variables (‘constants’), such as filter co-efficients or data for speech playback.

D) 8-bit family, 16-bit address space

An ‘8-bit microcontroller’ refers to the size of the registers and data bus. This means that the family will handle 8-bit data very quickly and process 16-bit or 32-bit data rather less efficiently. The 16-bit address space means that the device can directly address 216 bytes of memory: that is, 64 kbytes. SCE DEPT OF IT 19 IT2354 EMBEDDED SYSTEMS

E) I/O pins Most 8051s have four 8-bit ports, giving a total of 32 pins you can individually read from or control. All of the ports are bidirectional: that is, they may be used for both input and output F) All members of the 8051 family have at least two timer/counters, known as Timer 0 and Timer 1: most also have an additional timer (Timer 2). These are 16-bit timers, they can hold values from 0 to 65535 (decimal).There are many things we can do with such a timer:

 It used to measure intervals of time. It can measure the duration of a function by noting the value of a timer at the beginning and end of the function call, and comparing the two results.    Use it to generate precise hardware delays.    Use it to generate ‘time out’ facilities: this is a key requirement in systems with real-time constraints.    Most important of all, we can use it to generate regular ‘ticks’, and drive an operating  system G. Interrupts An interrupt is a hardware mechanism used to notify a processor that an ‘’ has taken place: such events may be ‘internal’ events (such as the overflow of a timer) or ‘external’ events (such as the arrival of a character through a serial interface).The original ‘8051’ (‘8052’) architecture supported seven interrupt sources:  Two or three timer/counter interrupts    Two UART-related    Two external interrupts    The ‘power-on reset’ (POR) interrupt. In Figure the system executes two (background) functions, Function 1 and Function 2 . During the execution of Function 1, an interrupt is raised, and an 

‘interrupt service routine’ (ISR1) deals with this event. After the execution of ISR1 is complete, Function 1

resumes its operation. During the execution of Function 2,

SCE DEPT OF IT 20 IT2354 EMBEDDED SYSTEMS

another interrupt is raised, this time dealt with by ISR 2.

H. Serial interface Such an interface is common in embedded processors, and is widely used. Here are some examples:  The serial port may be used to debug embedded applications, using a desktop PC.    The serial port may be used to load code into flash memory for ‘in circuit programming’    The serial port may be used to transfer data from embedded data acquisition systems to a PC, or to other embedded processors  I .Power consumption All modern implementations of 8051 processors have at least three operating modes:  Normal mode.    Idle Mode.    Power-Down Mode.  The ‘Idle’ and ‘Power Down’ modes are intended to be used to save power at times when no processing is required .We generally aim for an average power consumption of less than 10 mA.  Idle mode: In the idle mode the oscillator of the C501 continues to run, but the CPU is gated off from the clock signal. Rhe interrupt system, the serial port and all timers are connected to  the clock. The idle mode is entered by setting the flag bit IDLE (PCON.0). The easiest way is:

PCON |= 0x01; // Enter idle mode There are two ways as activating any enabled interrupt and Perform a hardware reset.  Power-down mode: The power-down mode is entered by setting the flag bit PDE (PCON.1). This is done in ‘C’ as follows: PCON |= 0x02; // Enter power-down mode The only exit from power-down mode is a hardware reset.

SCE DEPT OF IT 21 IT2354 EMBEDDED SYSTEMS

1.4.2 ARM PROCESSOR

The ARM processor is widely used in phones and many other systems.

INTRODUCTION:  complex instruction set computers (CISC).  These provided a variety of instructions that may perform very complex tasks, such as string searching; they also generally used a number of different instruction formats of varying lengths.  Reduced instruction set computers (RISC)  These computers tended to provide somewhat fewer and simpler instructions it.  Streaming data.  Data sets that arrive continuously and periodically are called Streaming data.  :  Assembly language has the following features:    One instruction appears per line.    Labels, which give names to memory locations, start in the first column.    Instructions must start in the second column or after to distinguish them from    labels.  Comments run from some designated comment character (; in the case of ARM) to  the end of the line Assemblers must also provide some pseudo-ops to help create complete assembly language programs.An example of a pseudo-op is one that allows data values to be loaded into memory locations.These allow constants,for example, to be set into memory. TheARM % pseudo-op allocates a block of memory of the size specified by the operand and initializes locations to zero. e.g) label1 ADR r4,c LDR r0,[r4] ; a comment ADR r4,d LDR r1,[r4] SUB r0,r0,r1 ; another comment

SCE DEPT OF IT 22 IT2354 EMBEDDED SYSTEMS

ARM PROCESSOR:

ARM is actually a family of RISC architectures that have been developed over many years. ARM does not manufacture its own VLSI devices; rather, it licenses its architecture to companies who either manufacture the CPU itself or integrate the ARM processor into a larger system. The textual description of instructions, as opposed to their binary representation, is called an assembly language. ARM instructions are written one per line, starting after the first column. Comments begin with a semicolon and continue to the end of the line. A label, which gives a name to a memory location, comes at the beginning of the line, starting in the first column e.g) LDR r0,[r8]; a comment label ADD r4,r0,r1 1. Processor and Memory Organization TheARM architecture supports two basic types of data:    The standardARM word is 32 bits long.  The word may be divided into four 8-bit bytes  ARM7 allows addresses up to 32 bits long.An address refers to a byte,not a word.Therefore, the word 0 in the ARM address space is at location 0, the word 1 is at 4, the word 2 is at 8,and so on.

The ARM processor can be configured at power-up to address the bytes in a word in either    little-endian mode (with the lowest-order byte residing in the low-order bits of the word)  big-endian mode (the lowest-order byte stored in the highest bits of the word), 

Fig: Byte organizations within an ARM word.

SCE DEPT OF IT 23 IT2354 EMBEDDED SYSTEMS

Data Operations:

Arithmetic and logical operations in C are performed in variables. Variables are implemented as memory locations. Sample fragment of C code with data declarations and several assignment statements. The variables a, b, c, x, y, and z all become data locations in memory. In most cases data are kept relatively separate from instructions in the program’s memory image. In the ARM processor, arithmetic and logical operations cannot be performed directly on memory locations. While some processors allow such operations to directly reference main memory, ARM is a load-store architecture—data operands must first be loaded into the CPU and then stored back to main memory to save the results. e.g)C Fragment code int a, b, c, x, y, z; x_(a_b)_c; y_a*(b_c); z_(a << 2) | (b & 15);

Following Fig shows the registers in the basic ARM programming model. ARM has 16 general- purpose registers, r0 through r15. Except for r15, they are identical—any operation that can be done on one of them can be done on the other one also. The r15 register has the same capabilities as the other registers, but it is also used as the

The other important basic register in the programming model is the current program status register (CPSR). This register is set automatically during every arithmetic, logical, or shifting operation. The top four bits of the CPSR hold the following useful information about the results of that arithmetic/logical operation:

SCE DEPT OF IT 24 IT2354 EMBEDDED SYSTEMS

■ The negative (N) bit is set when the result is negative in two’s-complement arithmetic.

■ The zero (Z) bit is set when every bit of the result is zero. ■ The carry (C) bit is set when there is a carry out of the operation. ■ The overflow(V) bit is set when an arithmetic operation results in an overflow.

Instruction Sets in ARM

Following figure summarizes the ARM move instructions. The instruction MOV r0, r1 sets the value of r0 to the current value of r1. The MVN instruction complements the operand bits (one’s complement) during the move.

Fig: ARM data Instructions

. LDRB and STRB load and store bytes rather than whole words,while LDRH and SDRH operate on half-words and LDRSH extends the sign bit on loading.  .   An ARM address may be 32 bits long. . .The ARM load and store instructions do not directly refer to main memory addresses, since a 32-bit address would not fit into an instruction that included an  opcode and operands. Instead, the ARM uses register-indirect addressing.

SCE DEPT OF IT 25 IT2354 EMBEDDED SYSTEMS

C assignments in ARM instructions

We will use the assignments of Figure 2.7. The semicolon (;) begins a comment after an instruction, which continues to the end of that line. The statement x = (a +b) _ c; can be implemented by using r0 for a, r1 for b, r2 for c, and r3 for x . We also need registers for indirect addressing. In this case, we will reuse the same indirect addressing register, r4, for each variable load. The code must load the values of a, b, and c into these registers before performing the arithmetic, and it must store the value of x back to memory when it is done. This code performs the following necessary steps: ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b, reusing r4 LDR r1,[r4] ; load value of b ADD r3,r0,r1 ; set intermediate result for x to a + b ADR r4,c ; get address for c LDR r2,[r4] ; get value of c SUB r3,r3,r2 ; complete computation of x ADR r4,x ; get address for x STR r3,[r4] ; store x at proper location

Flow of Control The B (branch) instruction is the basic mechanism inARM for changing the of control. The address that is the destination of the branch is often called the branch target. Branches are PC-relative—the branch specifies the offset from the current PC value to the branch target. The offset is in words, but because the ARM is byteaddressable, the offset is multiplied by four (shifted left two bits, actually) to form a byte address. Thus, the instruction B #100 will add 400 to the current PC value.

SCE DEPT OF IT 26 IT2354 EMBEDDED SYSTEMS

 Implementing an if statement in ARM We will use the following if statement as an example: if (a < b) { x = 5; y = c + d; } else x = c – d; The implementation uses two blocks of code, one for the true case and another for the false case. A branch may either fall through to the true case or branch to the false case:  compute and test the condition ADR r4,a ; get address for a LDR r0,[r4] ; get value of a ADR r4,b ; get address for b LDR r1,[r4] ; get value of b CMP r0, r1 ; compare a < b BGE fblock ; if a >= b, take branch  the true block follows MOV r0,#5 ; generate value for x SCE DEPT OF IT 27 IT2354 EMBEDDED SYSTEMS

ADR r4,x ; get address for x STR r0,[r4] ; store value of x ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value of d ADD r0,r0,r1 ; compute c + d ADR r4,y ; get address for y STR r0,[r4] ; store value of y B after ; branch around the false block ; the false block follows fblock ADR r4,c ; get address for c LDR r0,[r4] ; get value of c ADR r4,d ; get address for d LDR r1,[r4] ; get value of d SUB r0,r0,r1 ; compute c – d ADR r4,x ; get address for x STR r0,[r4] ; store value of x after ... ; code after the if statement

SCE DEPT OF IT 28 IT2354 EMBEDDED SYSTEMS

UNIT II MEMORY AND INPUT / OUTPUT MANAGEMENT

Programming Input and Output – Memory system mechanisms – Memory and I/O devices and interfacing – Interrupts handling.

2.1 PROGRAMMING INPUT AND OUTPUT

Then basic techniques for I/O programming can be understood relatively independent of the instruction set. In this section, we cover the of I/O programming and place them in the contexts of both the ARM and begin by discussing the basic characteristics of I/O devices so that we can understand the requirements they place on programs that communicate with them.

FIGURE 2.1 Structure of a typical I/O device

INPUT AND OUTPUT DEVICES Input and output devices usually have some analog or non-electronic component— for instance, a disk drive has a rotating disk and analog read/write . But the digital logic in the device that is most closely connected to the CPU very strongly resembles the logic you would expect in any computer system. Figure 2.1 shows the structure of a typical I/O device and its relationship to the CPU. The interface between the CPU and the device’s internals (e.g., the rotating disk and read/write electronics in a disk drive) is a set of registers. The CPU talks to the device by reading and writing the registers.

SCE DEPT OF IT 29 IT2354 EMBEDDED SYSTEMS

Devices typically have several registers,

■ Data registers hold values that are treated as data by the device, such as the data read or written by a disk.

■ Status registers provide information about the device’s operation, such as whether the current transaction has completed.

Some registers may be read-only, such as a status register that indicates when the device is done, while others may be readable or writable. Application Example 2.1 describes a classic I/O device.

Application Example

The 8251 UART

The 8251 UART (Universal Asynchronous Receiver/Transmitter) [Int82] is the original device used for serial , such as the serial port connections on PCs. The 8251 was introduced as a stand-alone for early microprocessors. Today, its functions are typically subsumed by a larger chip, but these more advanced devices still use the basic programming interface defined by the 8251.

The UART is programmable for a variety of transmission and reception parameters. However, the basic format of transmission is simple. Data are transmitted as streams of characters, each of which has the following form:

Every character starts with a start bit (a 0) and a stop bit (a 1). The start bit allows the receiver to recognize the start of a new character; the stop bit ensures that there will be a transition at the start of the stop bit. The data bits are sent as high and low voltages at a uniform rate. That rate is known as the baud rate; the period of one bit is the inverse of the baud rate.

SCE DEPT OF IT 30 IT2354 EMBEDDED SYSTEMS

Before transmitting or receiving data, the CPU must set the UART’s mode registers to correspond to the data line’s characteristics. The parameters for the serial port are familiar from the parameters for a serial communications program (such as ) ■ the baud rate;

■ the number of bits per character (5 through 8);

■ whether parity is to be included and whether it is even or odd; and ■ the length of a stop bit (1, 1.5, or 2 bits).

SCE DEPT OF IT 31 IT2354 EMBEDDED SYSTEMS

The UART includes one 8-bit register that buffers characters between the UART and the CPU bus. The Transmitter Ready output indicates that the transmitter is ready to accept a data character; the Transmitter Empty signal goes high when the UART has no characters to send. On the receiver side, the Receiver Ready pin goes high when the UART has a character ready to be read by the CPU.

INPUT AND OUTPUT PRIMITIVES Microprocessors can provide programming support for input and output in two ways: I/O instructions and memory-mapped I/O. Some architectures, such as the Intel , provide special instructions (in and out in the case of the Intel x86) for input and output. These instructions provide a separate address space for I/O devices. But the most common way to implement I/O is by memory mapping even CPUs that provide I/O instructions can also implement memory-mapped I/O. As the name implies, memory-mapped I/O provides addresses for the registers in each I/O device. Programs use the CPU’s normal read and write instructions to communicate with the devices. Example 3.1 illustrates memory-mapped I/O on the ARM. Example Memory-mapped I/O on ARM We can use the EQU pseudo-op to define a symbolic name for the memory location of our I/O device: DEV1 EQU 0x1000 Given that name, we can use the following standard code to read and write the device register: Read DEV1 LDR r0, #8 ; set up value to write STR r0, [r1] ; write 8 to device How can we directly write I/O devices in a high-level language like C? When we define and use a variable in C, the compiler hides the variable’s address from us. But we can use pointers to manipulate addresses of I/O devices. The traditional names for functions that read and write arbitrary memory locations are peek and poke.

The peek function can be written in C as: int peek(char *location) { return *location; /* de-reference location pointer */ }

The argument to peek is a pointer that is de-referenced by the C * operator to read the location. Thus, to read a device register we can write: #define DEV1 0x1000 dev_status = peek(DEV1); /* read device register */ SCE DEPT OF IT 32 IT2354 EMBEDDED SYSTEMS

The poke function can be implemented as:

void poke(char *location, char newval) { (*location) = newval; /* write to location */ }

To write to the status register, we can use the following code: poke(DEV1,8); /*write 8 to device register */

These functions can, of course, be used to read and write arbitrary memory locations, not just devices

BUSY-WAIT I/O The most basic way to use devices in a program is busy-wait I/O. Devices are typically slower than the CPU and may require many cycles to complete an operation. If the CPU is performing multiple operations on a single device, such as writing several characters to an , then it must wait for one operation to complete before starting the next one. (If we try to start writing the second character before the device has finished with the first one, for example, the device will probably never print the first character.) Asking an I/O device whether it is finished by reading its status register is often called polling. Example illustrates busy-wait I/O.

Busy-wait I/O programming In this example we want to write a sequence of characters to an output device. The device has two registers: one for the character to be written and a status register. The status register’s value is 1 when the device is busy writing and 0 when the write transaction has completed. We will use the peek and poke functions to write the busy-wait routine in C. First, we define symbolic names for the register addresses: #define OUT_CHAR 0x1000 /* output device character register */ #define OUT_STATUS 0x1001 /* output device status register */

The sequence of characters is stored in a standard C string, which is terminated by a null (0) character. We can use peek and poke to send the characters and wait for each transaction to complete: char *mystring = "Hello, world." /* string to write */ char *current_char; /* pointer to current position in string*/ current_char = mystring; /* point to head of string */ while (*current_char != `\ 0') { /* until null character */

SCE DEPT OF IT 33 IT2354 EMBEDDED SYSTEMS

poke(OUT_CHAR,*current_char); /* send character to device */ while (peek(OUT_STATUS) != 0); /* keep checking status */

current_char++; /* update character pointer */ } The outer while loop sends the characters one at a time. The inner while loop checks the device status—it implements the busy-wait function by repeatedly checking the device status until the status changes to 0. Example Copying characters from input to output using busy-wait I/O We want to repeatedly read a character from the and write it to the output device. First, we need to define the addresses for the device registers: #define IN_DATA 0x1000 #define IN_STATUS 0x1001 #define OUT_DATA 0x1100 #define OUT_STATUS 0x1101

The input device sets its status register to 1 when a new character has been read; we must set the status register back to 0 after the character has been read so that the device is ready to read another character. When writing, we must set the output status register to 1 to start writing and wait for it to return to 0. We can use peek and poke to repeatedly perform the read/write operation: while (TRUE) { /* perform operation forever */ /* read a character into achar */ while (peek(IN_STATUS) == 0); /*wait until ready*/ (char)peek(IN_DATA); /* read the character achar*//* write poke(OUT_DATA,achar);poke(OUT_STATUS,1); /* turn on device */ while(peek(OUT_STATUS) != 0); /*wait until done*/}

SCE DEPT OF IT 34 IT2354 EMBEDDED SYSTEMS

2.2 MEMORY SYSTEM MECHANISMS

Modern microprocessors do more than just read and write a monolithic memory. Architectural features improve both the speed and capacity of memory systems. Microprocessor clock rates are increasing at a faster rate than memory speeds, such that memories are falling further and further behind microprocessors every day. As a result, computer architects resort to caches to increase the average performance of the memory system. Although memory capacity is increasing steadily, program sizes are increasing as well, and designers may not be willing to pay for all the memory demanded by an application. Modern microprocessor units (MMUs) perform address translations that provide a larger space in a small physical memory. In this section, we review both caches and MMUs.

Caches Caches are widely used to speed up memory system performance. Many microprocessors architectures include caches as part of their definition. The speeds up average memory access time when properly used. It increases the variability of memory access times—accesses in the cache will be fast, while access to locations not cached will be slow. This variability in performance makes it especially important to understand how caches work so that we can better understand how to predict cache performance and factor variability’s into system design. A cache is a small, fast memory that holds copies of some of the contents of main memory. Because the cache is fast, it provides higher-speed access for the CPU; but since it is small, not all requests can be satisfied by the cache, forcing the system to wait for the slower main memory. Caching makes sense when the CPU is using only a relatively small set of memory locations at any one time; the set of active locations is often called the working set. Figure 3.6 shows how the cache support reads in the memory system. A cache controller mediates between the CPU and the memory system comprised of the main memory. The cache controller sends a memory request to the cache and main memory. If the requested location is in the cache, the cache controller forwards the location’s contents to the CPU and aborts the main memory request; this condition is known as a cache hit. If the location is not in the cache, the controller waits for the value from main memory and forwards it to the CPU; this situation is known as a cache miss. We can classify cache misses into several types depending on the situation that generated them: A compulsory miss (also known as a cold miss) occurs the first time a location is used, A capacity miss is caused by a too-large working set, and A conflict miss happens when two locations map to the same location in the cache.

SCE DEPT OF IT 35 IT2354 EMBEDDED SYSTEMS

Even before we consider ways to implement caches, we can write some basic formulas for memory system performance. Let h be the hit rate, the that a given memory location is in the cache. It follows that 1_h is the miss rate, or the probability that the location is not in the cache. Then we can compute the average memory access time as

Where tcache is the access time of the cache and tmain is the main memory access time. The memory access times are basic parameters available from the memory manufacturer. The hit rate depends on the program being executed and the cache organization, and is typically measured using simulators. The best-case memory access time (ignoring cache controller overhead) is tcache, while the worst-case access time is tmain. Given that tmain is typically 50– 60 ns for DRAM, while tcache is at most a few nanoseconds, the spread between worst-case and best-case memory delays is substantial. Modern CPUs may use multiple levels of cache as shown in Figure 3.7. The first-level cache (commonly known as L1 cache) is closest to the CPU, the second-level cache (L2 cache) feeds the first-level cache, and so on. The second-level cache is much larger but is also slower. If h1 is the first-level hit rate and h2 is the rate at which access hit the second-level cache but not the first-level cache, then the average access time for a two-level cache system is

As the program’s working set changes, we expect locations to be removed from the cache to make way for new locations. When set-associative caches are used; we have to think about what happens when we throw out a value from the cache to make room for a new value. We do not have this problem in direct-mapped caches because every location maps onto a unique block, but in a set-associative cache we must decide which set will have its block thrown out to make way for the new block. One possible replacement policy is least recently used (LRU), that is, throw out the block that has been used farthest in the past. We can add relatively small amounts of hardware to the cache to keep track of the time since the last access for each block. Another policy is random replacement, which requires even less hardware to implement. SCE DEPT OF IT 36 IT2354 EMBEDDED SYSTEMS

The simplest way to implement a cache is a direct-mapped cache, as shown in Figure. The cache consists of cache blocks, each of which includes a tag to show which memory location is represented by this block, a data field holding the contents of that memory, and a valid tag to show whether the contents of this cache block are valid. An address is divided into three sections. The index is used to select which cache block to check. The tag is compared against the tag value in the block selected by the index. If the address tag matches the tag value in the block, that block includes the desired memory location. If the length of the data field is longer than the minimum addressable unit, then the lowest bits of the address are used as an offset to select the required value from the data field. Given the structure of the cache, there is only one block that must be checked to see whether a location is in the cache—the index uniquely determines that block. If the access is a hit, the data value is read from the cache. Writes are slightly more complicated than reads because we have to update main memory as well as the cache. There are several methods by which we can do this. The simplest scheme is known as write-through—every write changes both the cache and the corresponding main memory location (usually through a write buffer). This scheme ensures that the cache and main memory are consistent, but may generate some additional main memory traffic. We can reduce the number of times we write to main memory by using a write-back policy: If we write only when we remove a location from the cache, we eliminate the writes when a location is written several times before it is removed from the cache.

SCE DEPT OF IT 37 IT2354 EMBEDDED SYSTEMS

The direct-mapped cache is both fast and relatively low cost, but it does have limits in its caching power due to its simple scheme for mapping the cache onto main memory. Consider a direct-mapped cache with four blocks, in which locations 0, 1, 2,and 3 all map to different blocks. But locations 4, 8,12,…all map to the same block as location 0;locations 1, 5, 9,13,…all map to a single block; and so on. If two popular locations in a program happen to map onto the same block, we will not gain the full benefits of the cache. As seen in Section 5.6, this can create program performance problems.

The limitations of the direct-mapped cache can be reduced by going to the set-associative cache structure shown in Figure 3.9.A set-associative cache is characterized by the number of banks or ways it uses, giving an n-way set-associative cache. A set is formed by all the blocks (one for each bank) that the same index. Each set is implemented with a direct-mapped cache. A cache request is broadcast to all banks simultaneously. If any of the sets has the location, the cache reports a hit. Although memory locations map onto blocks using the same function, there are n separate blocks for each set of locations. Therefore, we can simultaneously Cache several locations that happen to map onto the same cache block. The set associative cache structure incurs a little extra overhead and is slightly slower than a direct-mapped cache, but the higher hit rates that it can provide often compensate. The set-associative cache generally provides higher hit rates than the direct mapped cache because conflicts between a small numbers of locations can be resolved within the cache. The set-associative cache is somewhat slower, so the CPU designer has to be careful that it doesn’t slow down the CPU’s cycle time too much. A more important problem with set-associative caches for embedded program design is predictability.

SCE DEPT OF IT 38 IT2354 EMBEDDED SYSTEMS

Because the time penalty for a cache miss is so severe, we often want to make sure that critical segments of our programs have good behavior in the cache. It is relatively easy to determine when two memory locations will conflict in a direct-mapped cache. Conflicts in a set- associative cache are more subtle, and so the behavior of a set-associative cache is more difficult to analyze for both humans and programs. Example 3.8 compares the behavior of direct-mapped and set-associative caches.

Example Direct-mapped vs. set-associative caches For simplicity, let’s consider a very simple caching scheme. We use 2 bits of the address as the tag. We compare a direct-mapped cache with four blocks and a two-way set-associative cache with four sets, and we use LRU replacement to make it easy to compare the two caches. A 3-bit address is used for simplicity. The contents of the memory follow:

We will give each cache the same pattern of addresses (in binary to simplify picking out the index): 001, 010, 011, 100, 101, and 111. To understand how the direct-mapped cache works, let’s see how its state evolves.

SCE DEPT OF IT 39 IT2354 EMBEDDED SYSTEMS

We can use a similar procedure to determine what ends up in the two-way set-associative cache. The only difference is that we have some freedom when we have to replace a block with new data. To make the results easy to understand, we use a least-recently-used replacement policy. For starters, let’s make each way the size of the original direct-mapped cache. The final state of the two-way set-associative cache follows:

Of course, this is not a fair comparison for performance because the two-way set associative cache has twice as many entries as the direct-mapped cache. Let’s use a two-way, set-associative cache with two sets, giving us four blocks, the same number as in the direct- mapped cache. In this case, the index size is reduced to 1 bit and the tag grows to 2 bits.

In this case, the cache contents are significantly different than for either the direct- mapped cache or the four-block, two-way set-associative cache. The CPU knows when it is fetching an instruction (the PC is used to calculate the address, either directly or indirectly) or data. We can therefore choose whether to cache instructions, data, or both. If cache space is limited, instructions are the highest priority for caching because they will usually provide the highest hit rates.

A cache that holds both instructions and data is called a unified cache.

Various ARM implementations use different cache sizes and organizations [Fur96]. The ARM600 includes a 4-KB, 64-way (wow!) unified instruction/data cache.The StrongARM uses a 16-KB,32-way instruction cache with a 32-byte block and a 16-KB,32-way data cache with a 32- byte block; the data cache uses a write-back strategy.

SCE DEPT OF IT 40 IT2354 EMBEDDED SYSTEMS

The C5510, one of the models of C55x, uses a 16- byte instruction cache organized as a two-way set-associative cache with four 32-bit words per line. The instruction cache can be disabled by software if desired. It also includes two RAM sets that are designed to hold large contiguous blocks of code. Each RAM set can hold up to 4-K bytes of code organized as 256 lines of four 32-bitwords per line. Each RAM has a tag that specifies what range of addresses is in the RAM; it also includes a tag valid field to show whether the RAM is in use and line valid bits for each line.

Memory Management Units and Address Translation AMMU translates addresses between the CPU and physical memory. This translation process is often known as memory mapping since addresses are mapped from a logical space into a physical space. MMUs in embedded systems appear primarily in the host processor. It is helpful to understand the basics of MMUs for embedded systems complex enough to require them. Many DSPs, including the C55x, do not use MMUs. Since DSPs are used for compute- intensive tasks, they often do not require the hardware assist for logical address spaces. Early computers used MMUs to compensate for limited address space in their instruction sets. When memory became cheap enough that physical memory could be larger than the address space defined by the instructions, MMUs allowed software to manage multiple programs in a single physical memory, each with its own address space. Because modern CPUs typically do not have this limitation, MMUs are used to provide virtual addressing. As shown in Figure 3.10, the MMU accepts logical addresses from the CPU. Logical addresses refer to the program’s abstract address space but do not correspond to actual RAM locations. The MMU translates them from tables to physical addresses that do correspond to RAM. By changing the MMU’s tables, you can change the physical location at which the program resides without modifying the program’s code or data. (We must, of course, move the program in main memory to correspond to the memory mapping change.) Furthermore, if we add a secondary storage unit such as flash or a disk, we can eliminate parts of the program from main memory. In a virtual memory system, the MMU keeps track of which logical addresses are actually resident in main memory; those that do not reside in main memory are kept on the secondary storage device.

SCE DEPT OF IT 41 IT2354 EMBEDDED SYSTEMS

When the CPU requests an address that is not in main memory, the MMU generates an exception called a page fault. The handler for this exception executes code that reads the requested location from the secondary storage device into main memory. The program that generated the page fault is restarted by the handler only after the required memory has been read back into main memory, and The MMU’s tables have been updated to reflect the changes. Of course, loading a location into main memory will usually require throwing something out of main memory. The displaced memory is copied into secondary storage before the requested location is read in. As with caches, LRU is a good replacement policy. There are two styles of address translation: segmented and paged. Each has advantages and the two can be combined to form a segmented, paged addressing scheme. As illustrated in Figure 3.11, segmenting is designed to support a large, arbitrarily sized region of memory, while pages describe small, equally sized regions. A segment is usually described by its start address and size, allowing different segments to be of different sizes. Pages are of uniform size, which simplifies the hardware required for address translation. A segmented, paged scheme is created by dividing each segment into pages and using two steps for address translation. Paging introduces the possibility of fragmentation as program pages are scattered around physical memory. In a simple segmenting scheme, shown in Figure 3.12,the MMU would maintain a segment register that describes the currently active segment. This register would point to the base of the current segment. The address extracted from an instruction (or from any other source for addresses, such as a register) would be used as the offset for the address. The physical address is formed by adding the segment base to the offset. Most segmentation schemes also check the physical address against the upper limit of the segment by extending the segment register to include the segment size and comparing the offset to the allowed size. The translation of paged addresses requires more MMU state but a simpler calculation. As shown in Figure 3.13, the logical address is divided into two sections, including a page number and an offset. The page number is used as an index into a page table, which stores the physical address for the start of each page. However since all pages have the same size and it is easy to ensure that page boundaries fall on the proper boundaries, the MMU simply needs to

SCE DEPT OF IT 42 IT2354 EMBEDDED SYSTEMS

Concatenate the top bits of the page starting address with the bottom bits from the page offset to form the physical address. Pages are small, typically between 512 bytes and 4 KB. As a result, the page table is large for architecture with a large address space. The page table is normally kept in main memory, which means that an address translation requires memory access.

SCE DEPT OF IT 43 IT2354 EMBEDDED SYSTEMS

The page table may be organized in several ways, as shown in Figure 3.14. The simplest scheme is a flat table. The table is indexed by the page number and each entry holds the page descriptor. A more sophisticated method is a tree. The root entry of the tree holds pointers to pointer tables at the next level of the tree; each pointer table is indexed by a part of the page number. We eventually (after three levels, in this case) arrive at a descriptor table that includes the page descriptor we are interested in. A tree-structured page table incurs some overhead for the pointers, but it allows us to build a partially populated tree. If some part of the address space is not used, we do not need to build the part of the tree that covers it.

SCE DEPT OF IT 44 IT2354 EMBEDDED SYSTEMS

The efficiency of paged address translation may be increased by caching page translation information. A cache for address translation is known as a translation look aside buffer (TLB).The MMU reads the TLB to check whether a page number is currently in the TLB cache and, if so, uses that value rather than reading from memory. Virtual memory is typically implemented in a paging or segmented, paged scheme so that only page-sized regions of memory need to be transferred on a page fault. Some extensions to both segmenting and paging are useful for virtual memory: ■ at minimum, a present bit is necessary to show

2.3 MEMORY DEVICES

In this section, we introduce the basic types of memory components that are com-monly used in embedded systems. Now that we understand the operation of the bus, we are able to understand the pinouts of these memories and how values are read and written. We also need to understand the varieties of memory cells that are used to build memories. There are several varieties of both read-only and read/write memories, each with its own advantages. After discussing some basic characteristics of memories, we describe RAMs and then ROMs. Memory Device Organization The most basic way to characterize a memory is by its capacity, such as 256 MB. However, manufacturers usually make several versions of a memory of a given size, each with a different data width. For example, a 256-MB memory may be available in two versions: ■ As a 64 M 4-bit array, a single memory access obtains an 8-bit data item, with a maximum of 226 different addresses. ■ As a 32 M 8-bit array,a single memory access obtains a 1-bit data item, with a maximum of 223 different addresses. The height/width ratio of a memory is known as its aspect ratio. The best aspect ratio depends on the amount of memory required. Internally, the data are stored in a two-dimensional array of memory cells as shown in Figure 4.15. Because the array is stored in two dimensions,the n-bit address received by the chip is split into a row and a column address (with n r c).

SCE DEPT OF IT 45 IT2354 EMBEDDED SYSTEMS

As shown in Figure 4.16, transitions on the control signals are related to a clock [Mic00]. RAS and CAS can therefore become valid at the same time. The address lines are not shown in full detail here; some address lines may not be active depend- ing on the mode in use. SDRAMs use a separate refresh signal to control refreshing. DRAM has to be refreshed roughly once per millisecond. Rather than refresh the entire memory at once, DRAMs refresh part of the memory at a time. When a section of memory is being refreshed, it cannot be accessed until the refresh is complete. The occurs over fairly few seconds so that each section is refreshed every few microseconds. SDRAMs include registers that control the mode in which the SDRAM operates. SDRAMs support burst modes that allow several sequential addresses to be accessed by sending only one address. SDRAMs generally also support an interleaved mode that exchanges pairs of bytes.

SCE DEPT OF IT 46 IT2354 EMBEDDED SYSTEMS

Read-Only Memories Read-only memories (ROMs) are preprogrammed with fixed data. They are very useful in embedded systems since a great deal of the code, and perhaps some data, does not change over time. Read-only memories are also less sensitive to radiation- induced errors. There are several varieties of ROM available. The first-level distinction to be made is between factory-programmed ROM (sometimes called mask-programmed ROM ) and field- programmable ROM . Factory-programmed ROMs are ordered from the factory with particular programming. ROMs can typically be ordered in lots of a few thousand, but clearly factory programming is useful only when the ROMs are to be installed in some quantity. Field-programmable ROMs, on the other hand, can be programmed in the lab. Flash memory is the dominant form of field-programmable ROM and is electrically erasable. Flash memory uses standard system voltage for erasing and programming.

2.4 I/O DEVICES In this section we survey some input and output devices commonly used in embed-ded computing systems. Some of these devices are often found as on-chip devices in micro- controllers; others are generally implemented separately but are still com- monly used. Looking at a few important devices now will help us understand both the requirements of device interfacing in this chapter and the uses of devices in programming in this and later chapters. Timers and Counters Timers and counters are distinguished from one another largely by their use, not their logic. Both are built from logic with registers to hold the current

value, with an increment input that adds one to the current register value. However, a timer has its count connected to a periodic clock signal to measure time intervals, while a counter has its count input connected to an aperiodic signal in order to count the number of occurrences of some external event. Because the same logic can be used for either purpose, the device is often called a counter/timer . SCE DEPT OF IT 47 IT2354 EMBEDDED SYSTEMS

Figure shows enough of the internals of a counter/timer to illustrate its operation. An n-bit counter/timer uses an n-bit register to store the current state of the count and an array of half to decrement the count when the count signal is asserted. Combinational logic checks when the count equals zero; the done output signals the zero count. It is often useful to be able to control the time-out, rather than require exactly 2n events to occur. For this purpose, a reset register provides the value with which the count register is to be loaded. The counter/timer provides logic to load the reset register. Most counters provide both cyclic and acyclic modes of operation. In the cyclic mode, once the counter reaches the done state, it is automatically reloaded and the counting process continues. In acyclic mode, the counter/timer waits for an explicit signal from the microprocessor to resume counting.

A is an I/O device that is used for internal operation of a system. As shown in Figure 4.18, the watchdog timer is connected into the CPU bus and also to the CPU’s reset line. The CPU’s software is designed to periodically reset the watchdog timer, before the timer ever reaches its time-out limit. If the watchdog timer ever does reach that limit, its time-out action is to reset the processor. In that case, the presumption is that either a software flaw or hardware problem has caused the CPU to misbehave. Rather than diagnose the problem, the system is reset to get it operational as quickly as possible.

Keyboards A keyboard is basically an array of switches, but it may include some internal logic to help simplify the interface to the microprocessor. In this chapter, we build our understanding from a single switch to a microprocessor-controlled keyboard.

SCE DEPT OF IT 48 IT2354 EMBEDDED SYSTEMS

A switch uses a mechanical contact to make or break an electrical circuit. The major problem with mechanical switches is that they bounce as shown in Figure 4.19. When the switch is depressed by pressing on the button attached to the switch’s arm, the force of the depression causes the contacts to bounce several times until they settle down. If this is not corrected, it will appear that the switch has been pressed several times, giving false inputs. A hardware debouncing circuit can be built using a one-shot timer. Software can also be used to debounce switch inputs. A raw keyboard can be assembled from several switches. Each switch in a raw keyboard has its own pair of terminals, making raw keyboards impractical when a large number of keys is required. More expensive keyboards, such as those used in PCs, actually contain a microprocessor to preprocess button inputs. PC keyboards typically use a 4-bit microprocessor to provide the interface between the keys and the computer. The microprocessor can provide debouncing, but it also provides other functions as well. An encoded keyboard uses some code to represent which switch is cur- rently being depressed. At the heart of the encoded keyboard is the scanned array of switches shown in Figure 4.20. Unlike a raw keyboard, the scanned keyboard array reads only one row of switches at a time. The demultiplexer at the left side of the array selects the row to be read. When the scan input is 1, that value is trans- mitted to one terminal of each key in the row. If the switch is depressed, the 1 is sensed at that switch’s column. Since only one switch in the column is activated, that value uniquely identifies a key. The row address and column output can be used for encoding, or circuitry can be used to give a different encoding.

A consequence of encoding the keyboard is that combinations of keys may not be represented. For example, on a PC keyboard, the encoding must be chosen so

that combinations such as control-Q can be recognized and sent to the PC. Another SCE DEPT OF IT 49 IT2354 EMBEDDED SYSTEMS consequence is that rollover may not be allowed. For example, if you press “a,” and then press “b” before releasing “a,” in most applications you want the keyboard to send an “a” followed by a “b.” Rollover is very common in typing at even modest rates. A naive implementation of the encoder circuitry will simply throw away any character depressed after the first one until all the keys are released. The keyboard microcontroller can be programmed to provide n- key rollover , so that rollover keys are sensed, put on a stack, and transmitted in sequence as keys are released.

Touchscreens A is an input device overlaid on an output device. The touchscreen registers the position of a touch to its surface. By overlaying this on a display, the user can react to information shown on the display. The two most common types of are resistive and capacitive. A resistive touchscreen uses a two-dimensional voltmeter to sense position. As shown in Figure 4.23, the touchscreen consists of two conductive sheets separated by spacer balls. The top conductive sheet is flexible so that it can be pressed to touch the bottom sheet. A voltage is applied across the sheet; its resistance causes a voltage gradient to appear across the sheet. The top sheet samples the conductive sheet’s applied voltage at the contact point. An analog/digital converter is used to measure the voltage and resulting position. The touchscreenalternates between x and y position sensing by alternately applying horizontal and vertical voltage gradients.

2.5 COMPONENT INTERFACING

Building the logic to interface a device to a bus is not too difficult but does take some attention to detail. We first consider interfacing memory components to the bus, since that is relatively simple, and then use those concepts to interface to other types of devices. SCE DEPT OF IT 50 IT2354 EMBEDDED SYSTEMS

Memory Interfacing If we can buy a memory of the exact size we need, then the memory structure is simple. If we need more memory than we can buy in a single chip, then we must construct the memory out of several chips. We may also want to build a memory that is wider than we can buy on a single chip; for example, we cannot generally buy a 32- bit-wide memory chip. We can easily construct a memory of a given width (32 bits, 64 bits, etc.) by placing RAMs in parallel. We also need logic to turn the bus signals into the appropriate memory signals. For example, most busses won’t send address signals in row and column form. We also need to generate the appropriate refresh signals.

Device Interfacing Some I/O devices are designed to interface directly to a particular bus, forming glueless interfaces. But glue logic is required when a device is connected to a bus for which it is not designed. An I/O device typically requires a much smaller range of addresses than a memory, so addresses must be decoded much more finely. Some additional logic is required to cause the bus to read and write the device’s registers. Example 4.1 shows one style of interface logic.

Example A glue logic interface Below is an interfacing scheme for a simple I/O device

SCE DEPT OF IT 51 IT2354 EMBEDDED SYSTEMS

The device has four registers that can be read and written by presenting the register number on the regid pins, asserting R/W as required, and reading or writing the value on the regval pins. 2.6 INTERRUPTS

Basics Busy-wait I/O is extremely inefficient—the CPU does nothing but test the device status while the I/O transaction is in progress. In many cases, the CPU could do useful work in parallel with the I/O transaction, such as:

■ computation, as in determining the next output to send to the device or processing the last input received, and

■ control of other I/O devices.

To allow parallelism, we need to introduce new mechanisms into the CPU.

The interrupt mechanism allows devices to signal the CPU and to force execu- tion of a particular piece of code. When an interrupt occurs, the program counter’value is changed to point to an interrupt handler routine (also commonly known as a ) that takes care of the device: writing the next data, reading data that have just become ready, and so on. The interrupt mechanism of course saves the value of the PC at the interruption so that the CPU can return to the program that was interrupted. Interrupts therefore allow the flow of control in the CPU to change easily between different contexts, such as a foreground computation and multiple I/O devices. As shown in Figure, the interface between the CPU and I/O device includes the following signals for interrupting:

■ the I/O device asserts the interrupt request signal when it wants service from the CPU; and

■ the CPU asserts the interrupt acknowledge signal when it is ready to handle the I/O device’s request.

The I/O device’s logic decides when to interrupt; for example, it may generate an interrupt when its status register goes into the ready state. The CPU may not be able to immediately service an interrupt request because it may be doing something else that must be finished first—for example, a program that talks to both a high-speed disk drive and a low-speed keyboard should be designed to finish a disk transaction before handling a keyboard interrupt. Only when the CPU decides to acknowledge the interrupt does the CPU change the program SCE DEPT OF IT 52 IT2354 EMBEDDED SYSTEMS counter to point to the device’s handler. The interrupt handler operates much like a , except that it is not called by the executing program. The program that runs when no interrupt is being handled is often called the foreground program; when the interrupt handler finishes, it returns to the foreground program, wherever processing was interrupted.

SCE DEPT OF IT 53 IT2354 EMBEDDED SYSTEMS

Before considering the details of how interrupts are implemented, let’s look at the interrupt style of processing and compare it to busy-wait I/O. Example 3.4 uses interrupts as a basic replacement for busy-wait I/O; Example 3.5 takes a more sophisticated approach that allows more processing to happen concurrently.

Example Copying characters from input to output with basic interrupts As with Example 3.3, we repeatedly read a character from an input device and write it to an output device. We assume that we can write C functions that act as interrupt handlers. Those handlers will work with the devices in much the same way as in busy-wait I/O by reading and writing status and data registers. The main difference is in handling the output— the interrupt signals that the character is done, so the handler does not have to do anything. We will use a global variable achar for the input handler to pass the character to the foreground program. Because the foreground program doesn’t know when an interrupt occurs, we also use a global Boolean variable, gotchar, to signal when a new character has been received. The code for the input and output handlers follows:

void input_handler() { /* get a character and put in global */ achar = peek(IN_DATA); /* get character */ gotchar = TRUE; /* signal to main program */ poke(IN_STATUS,0); /* reset status to initiate next transfer */ } void output_handler() { /* react to character being sent */

/* don't have to do anything */

SCE DEPT OF IT 54 IT2354 EMBEDDED SYSTEMS

}

The main program is reminiscent of the busy-wait program. It looks at gotchar to check when a new character has been read and then immediately sends it out to the output device.

main() { while (TRUE) { /* read then write forever */

if (gotchar) { /* write a character */ poke(OUT_DATA,achar); /* put character

in device */

poke(OUT_STATUS,1); /* set status to

initiate write */

gotchar = FALSE; /* reset flag */ } }

}

The use of interrupts has made the main program somewhat simpler. But this program design still does not let the foreground program do useful work. Example uses a more sophisticated program design to let the foreground program work completely independently of input and output.

Example

Copying characters from input to output with interrupts and buffers Because we do not need to wait for each character, we can make this I/O program more sophisticated than the one in Example 3.4. Rather than reading a single character and then writing it, the program performs reads and writes independently. The read and write routines communicate through the following global variables:

■ A character string io_buf will hold a queue of characters that have been read but not yet written.

■ A pair of integers buf_start and buf_end will point to the first and last characters read.

■ An integer error will be set to 0 whenever io_buf overflows. SCE DEPT OF IT 55 IT2354 EMBEDDED SYSTEMS

The global variables allow the input and output devices to run at different rates. The queue io_buf acts as a wraparound buffer—we add characters to the tail when an input is received and take characters from the tail when we are ready for output. The head and tail wrap around the end of the buffer array to make most efficient use of the array. Here is the situation at the start of the program’s execution, where the tail points to the first available character and the head points to the ready character. As seen below, because the head and tail are equal, we know that the queue is empty.

When the first character is read, the tail is incremented after the character is added to the queue, leaving the buffer and pointers looking like the following:

When the buffer is full, we leave one character in the buffer unused. As the next figure shows, if we added another character and updated the tail buffer (wrapping it around to the head of the buffer), we would be unable to distinguish a full buffer from an empty one.

Here is what happens when the output goes past the end of io_buf: The following code provides the declarations for the above global variables and some service routines for adding and removing characters from the queue. Because interrupt handlers are regular code, we can use to structure code just as with any program.

#define BUF_SIZE 8

SCE DEPT OF IT 56 IT2354 EMBEDDED SYSTEMS char io_buf[BUF_SIZE]; /* character buffer */

int buf_head = 0, buf_tail = 0; /* current position in buffer */

int error = 0; /* set to 1 if buffer ever overflows */

void empty_buffer() { /* returns TRUE if buffer is empty */ buf_head == buf_tail;

} void full_buffer() { /* returns TRUE if buffer is full */ (buf_tail+1) % BUF_SIZE == buf_head ; }

int nchars() { /* returns the number of characters in the buffer */ if (buf_head >= buf_tail) return buf_tail – buf_head;

else return BUF_SIZE + buf_tail – buf_head; }

void add_char(char achar) { /* add a Character to the buffer head */

io_buf[buf_tail++] = achar; /* check pointer */ if (buf_tail == BUF_SIZE)

buf_tail = 0;

} char remove_char() { /* take a character from the buffer head*/ char achar;

achar = io_buf[buf_head++];

/* check pointer */

if (buf_head == BUF_SIZE)

buf_head = 0;

}

SCE DEPT OF IT 57 IT2354 EMBEDDED SYSTEMS

Assume that we have two interrupt handling routines defined in C, input_handler for the input device and output_handler for the output device. These routines work with the device in much the same way as did the busy-wait routines. The only complication is in starting the output device: If io_buf has characters waiting, the output driver can start a new output transaction by itself. But if there are no characters waiting, an outside agent must start a new output action whenever the new character arrives. Rather than force the foreground program to look at the character buffer, we will have the input handler check to see whether there is only one character in the buffer and start a new transaction. Here is the code for the input handler:

#define IN_DATA 0x1000 #define IN_STATUS 0x1001 void input_handler() {

char achar;

if (full_buffer()) /* error */

error = 1;

else { /* read the character and update pointer */

achar = peek(IN_DATA); /* read character */

add_char(achar); /* add to queue */ } poke(IN_STATUS,0); /* set status register back to 0 */

/* if buffer was empty, start a new output

transaction */

if (nchars() == 1) { /* buffer had been empty until

this interrupt */

poke(OUT_DATA,remove_char()); /* send

character */

poke(OUT_STATUS,1); /* turn device on */ } }

#define OUT_DATA 0x1100 SCE DEPT OF IT 58 IT2354 EMBEDDED SYSTEMS #define OUT_STATUS 0x1101 void output_handler() {

if (!empty_buffer()) { /* start a new character */

poke(OUT_DATA,remove_char()); /* send character */

poke(OUT_STATUS,1); /* turn device on */ } }

The foreground program does not need to do anything—everything is taken care of by the interrupt handlers. The foreground program is free to do useful work as it is occasionally interrupted by input and output operations. The following sample execution of the program in the form of a UML sequence diagram shows how input and output are interleaved with the foreground program. (We have kept the last input character in the queue until output is complete to make it clearer when input occurs.) The simulation shows that the foreground program is not executing continuously, but it continues to run in its regular state independent of the number of characters waiting in the queue.

Interrupts allow a lot of concurrency, which can make very efficient use of the CPU. But when the interrupt handlers are buggy, the errors can be very hard to find. The fact that an interrupt can occur at any time means that the same bug can manifest itself in different ways when the interrupt handler interrupts different segments of the foreground program. Example 3.6 illustrates the problems inherent in interrupt handlers.

Example

Debugging interrupt code Assume that the foreground code is performing a matrix multiplication operation y Ax b:

for (i = 0; i < M; i++) { y[i] = b[i];

for (j = 0; j < N; j++)

y[i] = y[i] + A[i,j]*x[j]; }

SCE DEPT OF IT 59 IT2354 EMBEDDED SYSTEMS We use the interrupt handlers of Example 3.5 to perform I/O while the matrix compu- tation is performed, but with one small change: read_handler has a bug that causes it to change the value of j . While this may seem far-fetched, remember that when the interrupt handler is written in assembly language such bugs are easy to introduce. Any CPU register that is written by the interrupt handler must be saved before it is modified And restored before the handler exits. Any type of bug—such as forgetting to save the register or to properly restore it—can cause that register to mysteriously change value in the foreground program.

What happens to the foreground program when j changes value during an interrupt depends on when the interrupt handler executes. Because the value of j is reset at each iteration of the outer loop, the bug will affect only one entry of the result y . But clearly the entry that changes will depend on when the interrupt occurs. Furthermore, the change observed in y depends on not only what new value is assigned to j (which may depend on the data handled by the interrupt code), but also when in the inner loop the interrupt occurs. An inter- rupt at the beginning of the inner loop will give a different result than one that occurs near the end. The number of possible new values for the result vector is much too large to consider manually—the bug cannot be found by enumerating the possible wrong values and correlat- ing them with a given root cause. Even recognizing the error can be difficult—for example, an interrupt that occurs at the very end of the inner loop will not cause any change in the foreground program’s result. Finding such bugs generally requires a great deal of tedious . . Priorities and vectors Providing a practical interrupt system requires having more than a simple interrupt request line. Most systems have more than one I/O device, so there must be some mechanism for allowing multiple devices to interrupt. We also want to have flexibil- ity in the locations of the interrupt handling routines, the addresses for devices, and so on. There are two ways in which interrupts can be generalized to handle mul- tiple devices and to provide more flexible definitions for the associated hardware and software:

■ interrupt priorities allow the CPU to recognize some interrupts as more important than others, and ■ interrupt vectors allow the interrupting device to specify its handler. Prioritized interrupts not only allow multiple devices to be connected to the interrupt line but also allow the CPU to ignore less important interrupt requests

while it handles more important requests. As shown in Figure 3.3, the CPU pro-vides several different interrupt request signals, shown here as L1, L2, up to Ln. Typically, the lower- numbered interrupt lines are given higher priority, so in this case, if devices 1, 2, and n all requested interrupts simultaneously, 1’s request would be acknowledged because it is connected SCE DEPT OF IT 60 IT2354 EMBEDDED SYSTEMS to the highest-priority interrupt line. Rather than provide a separate interrupt acknowledge line for each device, most CPUs use a set of signals that provide the priority number of the winning interrupt in binary form (so that interrupt level 7 requires 3 bits rather than 7). A device knows that its interrupt request was accepted by seeing its own priority number on the interrupt acknowledge lines.

UNIT III

PROCESSES AND OPERATING SYSTEMS

■The process abstraction. ■Switching contexts between programs. ■Real-time operating systems(RTOSs). ■ Inter process communication. ■Task-level performance analysis and power consumption. ■A telephone answering machine design.

We are particularly interested in real-time operating systems (RTOSs), which are Oss that provide facilities for satisfying real-time requirements. ARTOS allocates resources using algorithms that take real time in to account. General-purpose OSs, incontrast, generally allocate resources using other criteria like fairness. Trying to allocate the CPU equally to all processes with out regard to time can easily cause processes to miss their deadlines. In the next section, we will introduce the concepts of task and process. Introduces some basic concepts in inter process communication. Section6.5con- siders the performance of RTOSs while Section6.6 looks at power consumption. Section 6.7 walks through the design of a telephone answering machine.

3.1 MULTIPLE TASKS AND MULTIPLE PROCESSES

Most embedded systems require functionality and timing that is too complex to body in a single program. We break the system in to multiple tasks in order to manage when things happen. In this section we will develop the basic abstractions that will be manipulated by the RTOS to build multirate systems. To understand why these parathion of an application in to tasks may be reflected in the program structure, consider how we would build a stand-alone compression unit based on the compression we implemented in Section3.7. As shown in Figure6.1, this device is connected to serial ports on both ends. The input to the box is an uncompressed stream of bytes. The box emits a compressed string of bits on the output serial line, based on a predefined compression table. Such a box may be used, for example, to compress data being sent to a . The program’s need to receive and send data at different rates—for example, the program may emit 2 bits for the first byte and then 7 bits for the second byte— will obviously find itself reflected in the structure of the code. It is easy to create irregular, ungainly code to solve this problem; a more elegant solution is to create a queue of out put bits, with those bits being removed from the queue and sent to the serial port in 8-bit sets. But beyond the need to

SCE DEPT OF IT 61 IT2354 EMBEDDED SYSTEMS create a clean data structure that simplifies the control structure of the code, we must also ensure that we process the inputs and outputs at the proper rates. For example, if we spend too much time in packaging and emitting output characters, we may drop an input character. Solving timing problems is a more challenging problem.

The text compression box provides a simple example of rate control problems. A on a machine provides an example of a different type of rate con- troll problem, the

asynchronous input. The control panel of the compression box may, for example, include a compression mode button that disables or enables com- pression, so that the input text is passed through unchanged when compression is disabled. We certainly do not know when the user will push the compression modebutton— the button may be depressed a synchronously relative to the arrival of characters for compression. We do know, however, that the button will be depressed at a much lower rate than characters will be received, since it is not physically possible for a person to repeatedly depress a button at even slow serial line rates. Keeping up with the input and output data while checking on the button can introduce some very complex control code in to the program. Sampling the button’s state too slowly can cause the machine to miss a button depression entirely, but sampling it too frequently and duplicating a data value can cause the machine to in correctly compress data.

One solution is to introduce a counter in to the main compression loop, so that a subroutine to check the input button is called once every times the compression loop is executed. But this solution does not work when either the compression loop or the button-handling routine has highly variable execution times—if the execution time of either varies significantly, it will cause the other to execute later than expected, possibly causing data to be lost. We need to be able to keep track of these two different tasks separately, applying different timing requirements to each.

This is the sort of control that processes allow. The above two examples illustrate how requirements on timing and execution rate can create major problems in programming. When code is written to satisfy several different timing requirements at once, the control structures necessary to get any sort of solution become very complex very quickly. Worse, such complex control is usually quite difficult to verify for either functional or timing properties.

3.1.2Multi rate Systems

Implementing code that satisfies timing requirements is even more complex when multiple rates of computation must be handled. Multi rate embedded computing systems are very common, including auto mobile engines, printers, and cell phones.In all these systems, certain operations must be executed periodically, and each oper-ation is executed at its own rate. Application Example6.1 describes why auto mobile engines require multi rate control.

SCE DEPT OF IT 62 IT2354 EMBEDDED SYSTEMS

3.1.3 Timing Requirements on Processes

Processes can have several different types of timing requirements imposed on them by the application. The timing requirements on a set of process strongly influence the type of scheduling that is appropriate. A scheduling policy must define the timing requirements that it uses to determine whether a schedule is valid. Before studying scheduling proper, we outline the types of process timing requirements that are useful in embedded system design. Figureillustrates different ways in which we can define two important requirements on processes: eg. What happens when a process misses a deadline? The practical effects of a timing violation depend on the application—the results can be catastrophic in an auto mo- tive control system, where as a missed deadline in a multi media system may cause an audio or video glitch. The system can be designed to take a variety of actions when a deadline is missed. Safety-critical systems may try to take compensatory measures such as approximating data or switching in to a special safety mode. Systems for which safety is not as important may take simple measures to avoid propagating bad data, such as inserting silence in a phone line, or may completely ignore the failure.Even if the modules are functionally correct, their timing improper behavior can introduce major execution errors. Application Example6.2 describes a timing problem in space shuttle software that caused the delay of the first launch of the shuttle. We need a basic measure of the efficiency with which we use the CPU. The simplest and most direct measure is utilization:

Utilization is the ratio of the CPU time that is being used for useful computations to the total available CPUtime. This ratio ranges between 0 and 1, with 1 meaning that all of the available CPU time is being used for system purposes. The utilization is often expressed as a percentage. If we measure the total execution time of all processes over an interval of time t, then the CPU utilization is U/T

3.1.5 Process State and Scheduling The first job of the OS is to determine that process runs next. The work of choosing the order of running processes is known as scheduling. The OS considers a process to be in one of three basic scheduling states

Executing

Chosen Needs data

to run Gets data , CPU ready

Preempted Waiting Ready Received Data SCE DEPT OF IT 63 IT2354 EMBEDDED SYSTEMS Schedulability means whether there exists a schedule of execution for the processes in a system that satisfies all their timing requirements. In general, we must construct a schedule to show schedulability, but in some cases we can eliminate some sets of processes as unschedulable using some very simple tests. Utilization is one of the key metrics in evaluating a scheduling policy. Our most basic require- ment is that CPU utilization be no more than 100% since we can’t use the CPU more than100% of the time.

When we evaluate the utilization of the CPU, we generally do so over a finite period that covers all possible combinations of process executions. For periodic processes, the length of time that must be considered is the hyper period, which is the least-common multiple of the periods of all the processes. If we evaluate the hyper period, we are sure to have considered all possible combinations of the periodic processes. The next example evaluates the utilization of a simple set of processes.

We will see that some types of timing requirements for a set of processes imply that we can not utilize 100% of the CPU’s execution time on useful work, even ignoring context switching overhead. However, some scheduling policies can deliver higher CPU utilizations than others, even for the same timing requirements. The best policy depends on the required timing characteristics of the processes being scheduled.

On every simple scheduling policy is known as cyclostatic scheduling or some- times as Time Division Multiple Access scheduling. Processes always run in the same time slot. Two factors affect utilization: the number of times lots used and the fraction of each time slot that is used for useful work. Depending on the deadlines for some of the processes, we may need to leave some times lots empty. And since the time slots are of equal size, some short processes may have time left over in their time slot. We can use utilization as a schedulability measure: the total CPU time of all the processes does not have any useful work to do, the round-robin scheduler moves on to the next process in order to fill the time slot with useful work. In this example, all three processes execute during the first hyper period, but during the second one, P1 has no useful work and is skipped. The processes are always evaluated in the same order. The last time slot in the hyper period is left empty; if we have occasional, non-periodic tasks with out deadlines, we can execute them in these empty time slots. Round-robin scheduling is often used in hardware such as buses because it is

Very simple to implement but it provides some amount of flexibility. In addition to utilization, we must also consider scheduling overhead—the execution time required to choose the next execution process, which is incurred in addition to any context switch in gover head. In general, the more sophisticated the scheduling policy, the more CPU time it takes during system operation to implement it. Moreover, we generally achieve higher theoretical CPU utilization by applying more complex scheduling policies with higher over heads. The final decision on a

SCE DEPT OF IT 64 IT2354 EMBEDDED SYSTEMS scheduling policy must take in to account both theoretical utilization and practical scheduling overhead.

3.1.7 Running Periodic Processes

We need to find a programming technique that allows us to run periodic processes, ideally at different rates. For the moment, let’s think of a process as a subroutine; we will call the mp1(), p2(), etc. for simplicity. Our goal is to run these subroutines at rates determined by the system

designer. Here is a very simple program that runs our process subroutines repeatedly: A timer is a much more reliable way to control execution of the loop. We would probably use the timer to generate periodic interrupts. Let’s assume for the moment that the pall() function is called by the timer’s interrupt handler. Then this code will execute each process once after a timer interrupt: voidpall() { p1(); p2();

}

But what happens when a process runs too long? The timer’s interrupt will cause the CPU’s interrupt system to mask its interrupts, so the interrupt will not occur until after the pall() routine returns. As a result, the next iteration will start late. This is a serious problem, but we will have to wait for further refinements before we can fix it.

Our next problem is to execute different processes at different rates. If we have several timers, we can set each timer to a different rate. We could then use a function to collect all the processes that run at that rate: voidpA() {

/*processesthatrunatrateA*/ p1(); p3();

} voidpB() {

/*processesthatrunatrateB*/ SCE DEPT OF IT 65 IT2354 EMBEDDED SYSTEMS

This solution allows us to execute processes at rates that are simple multiples of each other. However, when the rates are n’t related by a simple ratio, the counting process becomes more complex and more likely to contain bugs. We have developed some what more reliable code, but this programming style is still limited in capability and prone to bugs. To improve both the capabilities and reliability of our systems, we need to invent the RTOS.

3.2 CONTEXT SWITCHING

PREEMPTIVE REAL-TIME OPERATING SYSTEMS

ARTOS executes processes based upon timing constraints provided by the system designer. The most reliable way to meet timing constraints accurately is to build a preemptive OS and to use priorities to control what process runs at any given time. We will use these two concepts to build up a basic RTOS. We will use as our example OS Free RTOS. org[Bar07]. This runs on many different plat forms.

3.2.1 Preemption

Preemption is an alternative to the C function call as away to control execution. To be able to take full advantage of the timer, we must change our notion of a process as something more than a function call. We must, infact, break the assumptions of our high-level . We will create new routines that allow us to jump from one subroutine to another at any point in the program. That, together with the timer, will allow us to move between functions whenever necessary based upon the system’s timing constraints.

We want to share the CPU a cross two processes. The kernel is the part of the OS that determines what process is running. The kernel is activated periodically by the timer. The length of the timer period is known as the time quantum because it is the smallest increment in which we can control CPU activity. The kernel determines what process will run next and causes that process to run. On the next timer interrupt, the kernel may pick the same processor another process to run.

Note that this use of the timer is very different from our use of the timer in the last section. Before, we used the timer to control loop iterations, with one loop Iteration including the execution of several complete processes. Here, the time quantum is in general smaller than the execution time of any of the processes. How do we switch between processes before the process is done? We cannot rely on C-level mechanisms to do so. We can, however, use assembly language to switch between processes. The timer interrupt causes control to change from the currently executing process to the kernel; assembly language can be used to save and restore registers. We can similarly use assembly language to restore registers not from the process that SCE DEPT OF IT 66 IT2354 EMBEDDED SYSTEMS was interrupted by the timer but to use registers from any process we want. These to F registers that define a process are known as its con- text and switching from one process’s register set to another is known as context switching. The data structure that holds the state of the process is known as the process control block.

3.2.2 Priorities

How does the kernel determine what process will run next? We want a mechanism that executes quickly so that we don’t spend all our time in the kernel and starve out the processes that do the useful work. If we assign each task a numerical priority, then the kernel cans imply look at the processes and their priorities, see which ones actually want to execute (some may be waiting for data or for some event), and select the highest priority process that is ready to run. This mechanism is both flexible and fast. The priority is a non-negative integer value. The exact value of the priority is not as important as the relative priority of different processes. In this book, we will generally use priority 1 as the highest priority, but it is equally reasonable to use

■A process continues execution until it completes or it is pre empted by a higher-priority process. Let’s define a simple system with three processes as seen below. Process Priority Execution time

P1 1 10

P2 2 30

P3 3 20

In addition to describing the properties of the processes in general, we need to know the Environmental setup. We assume that P2 is ready to run when the system is started, P1 is released at time 15, and P3 is released at time 18. Once we know the process properties and the environment, we can use the pri-orities to determine which process is running throughout the complete execution of the system. To understand the basics of a context switch, let’s assume that these to f tasks is in steady state: Everything has been initialized, the OS is running, and we are ready for a timer interrupt. This diagram shows ,the hardware timer, and all

The functions in the kernel that are involved in the context switch:

/*Acknowledge the interrupt at AIC level... */

AT91C_BASE_AIC->AIC_EOICR=portCLEAR_AIC_INTERRUPT;

/*Restore the context of the new task. */ portRESTORE_CONTEXT();

} SCE DEPT OF IT 67 IT2354 EMBEDDED SYSTEMS V Pre emptive Tick () has been declared as an aked function; this means that it does not use the normal procedure entry and exit code that is generated by the compiler. Because the function is

naked, the registers for the process that was interrupted are still available; v Pre emptive Tick () doesn’t have to go to the procedure call stack to get their values. This is particularly handy since the procedure mechanism would save only part of the process state, making the state- saving code a little more complex.

The first thing that this routine must do is save the context of the task that was interrupted. To do this, it uses the routine port SAVE_CONTEXT(), which saves all the context of the stack. It the n performs some house keeping, such as incre- menting the tick count. The tick count is the internal timer that is used to determine deadlines. After the tick is incremented, some tasks may have become ready as they passed their deadlines.

Next, the OS determines which task to run next using the routine vTaskSwitchContext(). After some more house keeping, it uses port RESTORE_CONTEXT() to restore the context of the task that was selected by vTaskSwitchContext(). The action of portRESTORE_CONTEXT() can uses control to transfer to that task with out using the standard C return mechanism.

The code for portSAVE_CONTEXT(), in the file port macro.h, is defined as a macro and not as a C function. It is structured in this way so that it does n’t disturb the register values that need to be saved. Because it is a macro, it has to be written in a hard-to-read way—all code must be on the same line or end-of-line continuations (back slashes) must be used. Here is the code in more readable form, with the end-of-line continuations removed and the assembly language that is the heart of this routine temporarily removed.:

#defineportSAVE_CONTEXT()

{

Extern volatilevoid *volatile pxCurrentTCB;

Extern volatile unsigned portLONGulCriticalNesting;

/*PushR0aswearegoingtousetheregister. */

Asm volatile( /*assembly languagecodehere */ );

(void ) ulCriticalNesting;

(void ) pxCurrentTCB;

} SCE DEPT OF IT 68 IT2354 EMBEDDED SYSTEMS The asm statement allows assembly language code to be introduced in-line in to the C program. The keyword volatile tells the compiler that the assembly language may change register values,

which means that many compiler optimizations can not be performed across the assembly language code. The code uses ulCriticalNesting and pxCurrentTCB simply to avoid compiler warnings about un used variables— the variables are actually used in the assembly code, but the compiler can not see that. The asm statement requires that the assembly language be entered as strings, one string per line, which makes the code hard to read. The fact that the code is included in a #define makes it even harder to read. Here is a clean ed-up version of the assembly language code from the asm volatile()statement:

STMDB SP!, {R0}

/*SetR0topointtothetaskstackpointer. */

STMDB SP, {SP}^

NOP

SUB SP, SP, #4

LDMIA SP!,{R0}

/*Pushthereturnaddressontothestack. */

STMDB R0!, {LR}

/*NowwehavesavedLRwecanuse it insteadofR0. */

MOV LR,R0

/*PopR0sowecansave itontothesystemmodestack. */

3.3 SCHEDULING POLICIES

PRIORITY-BASEDSCHEDULING

Now that we have a priority as context switching mechanism, we have to determine an algorithm by which to assign n priorities to processes. After assigning priorities, the OS takes care of the rest by choosing the highest-priority ready process. There are two major ways to assign priorities: static priorities that do not change during execution and dynamic priorities that do change. We will look at examples of each in this section.

3.3.1 Rate-MonotonicScheduling SCE DEPT OF IT 69 IT2354 EMBEDDED SYSTEMS Rate-monotonic scheduling(RMS), introduced by Liu and Lay land[Liu73], was one of the first scheduling policies developed for real-time systems and is still very widely used. RMS is a static

scheduling policy. It turns out that these fixed priorities are sufficient to efficiently schedule the processes in many situations.

The theory underlying RMS is known as rate-monotonic analysis(RMA). This theory, as summarized below, uses a relatively simple model of the system.

■All processes run periodically on a single CPU.

■Context switching time is ignored.

■There are no data dependencies between processes.

■The execution time for a process is constant.

■All deadlines are at the ends of their periods.

■The highest-priority ready process is always selected for execution.

The major result of RMAs that a relatively simple scheduling policy is optimal under certain conditions. Priorities are assigned by rank order of period, with the process with the shortest period being assigned the highest priority. This fixed-priority scheduling policy is the optimum assignment of static priorities to processes, in that it provides the highest CPU utilization while ensuring that all processes meet their deadlines. Example6.3 illustrates RMS.

If, on the other hand, we give higher priority to P2, the n critical-instant analysis tells us that we must execute all of P2 and all of P1 in one of P1’s periods in the worst case:

T1T21.

There are cases where the first relationship can be satisfied and the second cannot, but there are no cases where the second relationship can be satisfied and the first cannot. We can inductively show that the process with the shorter period should always be given higher priority for process sets of arbitrary size. It is also possible to prove that RMS always provides a feasible schedule if such a schedule exists.

The bad news is that, although RMS is the optimal static-priority schedule, it does not always allow the system to use 100% of the available CPU cycles. In the RMS frame work, the total CPU utilization for a set of n tasks is the fraction Ti/I is the fraction of time that the CPU spends executing task i.

SCE DEPT OF IT 70 IT2354 EMBEDDED SYSTEMS It is possible to show that for a set of two tasks under RMS scheduling, the CPU utilization U will be no greater than 2(21/21) 0.83. In other words, the CPU will be idle at least 17% of

∼ the time. This idle time is due to the fact that priorities are assigned statically; we see in the next section that more aggressive scheduling.

3.3.2 Earliest-Deadline-First Scheduling

Earliest deadline first(EDF) is another well-known scheduling policy that was also studied by Liu and Lay land[Liu73]. It is a dynamic priority scheme—it changes process priorities during execution based on initiation times. As a result, it can achieve higher CPU utilizations than RMS. The EDF policy is also very simple: It assigns priorities in order of deadline. The highest-priority process is the one whose deadline is nearest in time, and the lowest- priority process is the one whose deadline is farthest away. Clearly, priorities must be recalculated at every completion of a process. However, the final step of the OS during the scheduling procedure is the same as for RMS— the highest-priority ready process is chosen for execution.

Example6.4 illustrates EDF scheduling in practice. In some applications, it may be acceptable for some processes to occasionally miss deadlines. For example, a set-top box for video decoding is not a safety-critical application, and the occasional display artifacts caused by missing deadlines may be acceptable in some markets. What if your set of processes is un schedulable and you need to guarantee that they complete their deadlines? There are several possible ways to solve this problem:

■Get a faster CPU. That will reduce execution times with out changing the periods, giving you lower utilization. This will require you to redesign the hardware, but this is often feasible because you are rarely using the fastest CPU available.

■ Redesign the processes to take less execution time. This requires knowledge of the code and may or may not be possible.

■ Rewrite the specification to change the deadlines. This is un likely to be feasible, but may be in a few cases where some of the deadlines were initially made tighter than necessary.

SCE DEPT OF IT 71 IT2354 EMBEDDED SYSTEMS

3.4 INTERPROCESS COMMUNICATION MECHANISMS:

3.4.1 Communication:

CPU I/O Devices Shared Location Read

write Memory

In general, a process can send a communication in one of two ways: blocking or non blocking. After sending a blocking communication, the process goes in to the waiting state until it receives are sponse. Non blocking communication allows the process to continue execution after sending the communication. Both types of communication are useful.

There are two major styles of inter process communication: shared memory and message passing. The two are logically equivalent—given one, you can build an interface that implements the other. However, some programs may be easier to write using one rather than the other.

The input data arrive at a constant rate and are easy to manage. But because the output data are consumed at a variable rate, these data require an elastic buffer. The CPU and output UART share a memory area—the CPU writes compressed characters in to the buffer and the UART removes them as necessary to fill the serial line. Because the number of bits in the buffer changes constantly, the compression and transmission processes need additional size information. In this case, coordination is simple—the CPU writes at one end of the buffer and the UART reads at the other end. The only challenge is to make sure that the UART does not over run the buffer instruction returns true and the location is in fact set. The bus supports this as an atomic operation that cannot be interrupted. Programming Example 6.1 describes a test-and-set operation in more detail.

A test-and-set can be used to implement a , which is a language-level synchronization construct. For the moment, let’s assume that the system provides one semaphore that is used to guard access to a block of protected memory. Any process that wants to access the memory must use the semaphore to ensure that no other process is actively using it. As shown below, the SCE DEPT OF IT 72 IT2354 EMBEDDED SYSTEMS semaphore names by tradition are P() to gain access to the protected memory and V() to release it.

/* some non protected operations here */

P(); /*wait for semaphore */

/* do protected work here */

V (); /*releases ema phore */

The P () operation uses a test-and-set to repeatedly test a location that holds a lock on the memory block. The P () operation does not exist until the lock is available; once it is available, the test-and-set automatically sets the lock. Once past the P () operation, the process can work on the protected memory block. The V () operation resets the lock, allowing other processes access to the region by using the P() function.

3.4.2MessagePassing

Message passing communication complements the shared memory model. As shown in Figure6.15, each communicating entity has its own message send/receive unit. The message is not stored on the communications link, but rather at the senders/ receivers at the end points. In contrast, shared memory communication n can be seen as a memory block used as a communication device, in which all the data a restored in the communication link/memory. Applications in which units operate relatively autonomously are natural candidates for message passing communication. For example, a home control system has one micro controller per house hold device—lamp, , faucet, appliance, and so on. The devices must communicate relatively infrequently; furthermore, their physical separation is large enough that we would not naturally

Think of them as sharing a central pool of memory.

3.5 EVALUATING OPERATING SYSTEM PERFORMANCE

The scheduling policy does not tell us all that we would like to know about the performance of a real system running processes. Our analysis of scheduling policies makes some simplifying assumptions:

■We have assumed that context switches require zero time.

We also assumed that processes don’t interact, but the cache causes the execution of one program to influence the execution time of other programs. The techniques for bounding the cache-based performance of a single program do not work when multiple programs are in the SCE DEPT OF IT 73 IT2354 EMBEDDED SYSTEMS same cache. Many real-time systems have been designed based on the assumption that there is no cache present, even though one actually exists. This grossly conservative assumption is made

because the system architects lack tools that permit them to analyze the effect of caching. Since they do not know where caching will cause problems, they are forced to retreat to the simplifying assumption that there is no cache. The result is extremely over designed hardware, which has much more computational power than is necessary. However, just as experience tells us that a well-designed cache provides significant performance benefits for a single program, a properly sized cache can allow a microprocessor to run a set of processes much more quickly. By analyzing the effects of the cache, we can make much better use of the available hardware.

Li and Wolf[Li99] developed a model for estimating the performance of multiple processes sharing a cache. In the model, some processes can be given reservations in the cache, such that only a particular process can inhabit a reserved section of the cache; other processes are left to share the cache. We generally want to use cache partitions only for performance-critical processes in cache reservations are wasteful of limited cache space. Performance is estimated by constructing a schedule, taking in to account not just execution time of the processes but also the state of the cache. Each process in the shared section of the cache is modeled by a binary variable: 1 if present in the cache and 0 if not. Each process is also characterized by three total execution times: assuming no caching, with typical caching, and with all code always resident in the cache. The always-resident time is unrealistically optimistic, but it can be used to find a lower bound on the required schedule time. During construction of the schedule, we can look at the current cache state to see whether the no-cache or typical-caching execution time should be used at this point in the schedule. We can also update the cache state if the cache is needed for another process. Although this model is simple, it provides much more realistic performance estimates than assuming the cache either is none x is tent or is perfect. Example 6.9 shows how cache management can improve CPU utilization. Going in to alow-power mode takes time; generally, the more that is shutoff, the longer the delay incurred during restart. Because power-down and power-up are not free, modes should be changed carefully. Determining when to switch into and out of a power-up mode requires an analysis of the overall system activity.

■Avoiding a power-down mode can cost un necessary power.

■Powering down too soon can cause severe performance penalties.

Re-entering run mode typically costs a considerable amount of time.

A straight forward method is to power up the system when a request is received. This works as long as the delay in handling the request is acceptable. A more sophisticated technique is predictive shutdown. The goal is to predict when the next request will be made and to start the system just before that time, saving their quest or the start-up time. In general, predictive

SCE DEPT OF IT 74 IT2354 EMBEDDED SYSTEMS shutdown techniques are probabilistic—they make guesses about activity patterns based on a proba- bilistic model of expected behavior. Because they rely on , they may not always correctly guess the time of the next activity.

3.6 TELEPHONEANSWERINGMACHINE

In this section we design a digital telephone answering machine. The system will store messages in digital form rather than on an analog tape. To make life more interesting, we use a simple algorithm to compress the voice data so that we can make more efficient use of the limited amount of available memory.

3.6.1 Theory of Operation and Requirements

In addition to studying the compression algorithm, we also need to learn a little about the operation of telephone systems.

The compression scheme we will use is known as adaptive differential pulse code modulation (ADPCM). Despite the long name, the technique is relatively simple but can yield2compression ratios on voice data.

3.6.2 Specification

Figure 6.21 shows the class diagram for the answering machine. In addition to the classes that perform the major functions, we also use classes to describe the incoming and OGMs. As seen below, these classes are related. The buttons and lights simply provide attributes for their input and out put values. The phone line, , and speaker are given behaviors that let us sample their current values. The message classes are defined in Figure 6.23. Since incoming and OGM types share many characteristics, we derive both from a more fundamental message type. The major operational classes—Controls, Record, and Playback. The Controls class provides an operate () behavior that over sees the user-level operations. The Record and Playback classes provide behaviors that handle writing and reading sample sequences. The state diagram for the Controls activate behavior is shown in Figure 6.25. Most of the user activities are relatively straight forward. The most complex is an- swering an incoming call. As with the software mode mof Section5.11, we want to be sure that a single depression of a button causes the required action to be taken exactly once; this requires edge detection on the button signal. State diagrams for record-msg and play back-msg are shown in Figure 6.26.We have parameterized the specification for record-msg so that it can be used either from the phone line or from the microphone. This requires parameterizing the source itself and the termination condition.

SCE DEPT OF IT 75 IT2354 EMBEDDED SYSTEMS

UNIT-IV

EMBEDDED SOFTWARE

PROGRAMMING EMBEDDED SYSTEMS IN ASSEMBLY AND C

C AND ASSEMBLY

Many programmers are more comfortable writing in C, and for good reason: C is a mid- level language (in comparison to Assembly, which is a low-level language), and spares the programmers some of the details of the actual implementation.

However, there are some low-level tasks that either can be better implemented in assembly, or can only be implemented in assembly language. Also, it is frequently useful for the programmer to look at the assembly output of the C compiler, and hand-edit, or hand optimize the assembly code in ways that the compiler cannot. Assembly is also useful for time-critical or real-time processes, because unlike with high-level languages, there is no ambiguity about how the code will be compiled. The timing can be strictly controlled, which is useful for writing simple device drivers. This section will look at multiple techniques for mixing C and Assembly program development.

INLINE ASSEMBLY

One of the most common methods for using assembly code fragments in a C programming project is to use a technique called inline assembly. Inline assembly is invoked in different in different ways. Also, the assembly language syntax used in the inline assembly depends entirely on the assembly engine used by the C compiler. C++, for instance, only accepts inline assembly commands in MASM syntax, while GNU GCC only accepts inline assembly in GAS syntax (also known as AT&T syntax). This page will discuss some of the basics of mixed-language programming in some common compilers.

Microsoft C Compiler

Turbo C Compiler

GNU GCC Compiler

Borland C Compiler

LINKED ASSEMBLY

When an assembly source file is assembled by an assembler, and a C source file is compiled by a C compiler, those two object files can be linked together by a to form the final executable. The beauty of this approach is that the assembly files can written using any syntax and assembler that the programmer is comfortable with. Also, if a change needs to be made in SCE DEPT OF IT 76 IT2354 EMBEDDED SYSTEMS the assembly code, all of that code exists in a separate file, that the programmer can easily access. The only disadvanges of mixing assembly and C in this way are that a)both the assembler and the compiler need to be run, and b) those files need to be manually linked together by the programmer. These extra steps are comparatively easy, although it does mean that the programmer needs to learn the command-line syntax of the compiler, the assembler, and the linker.

INLINE ASSEMBLY VS. LINKED ASSEMBLY

ADVANTAGES OF INLINE ASSEMBLY:

Short assembly routines can be embedded directly in C function in a C code file. The mixed- language file then can be completely compiled with a single command to the C compiler (as opposed to compiling the assembly code with an assembler, compiling the C code with the C Compiler, and then linking them together). This method is fast and easy. If the in-line assembly is embedded in a function, then the programmer doesn't need to worry about #Calling_Conventions, even when changing compiler switches to a different calling convention.

ADVANTAGES OF LINKED ASSEMBLY:

If a new microprocessor is selected, all the assembly commands are isolated in a ".asm" file. The programmer can update just that one file -- there is no need to change any of the ".c" files (if they are portably written).

CALLING CONVENTIONS

When writing separate C and Assembly modules, and linking them with your linker, it is important to remember that a number of high-level C constructs are very precisely defined, and need to be handled correctly by the assembly portions of your program. Perhaps the biggest obstacle to mixed-language programming is the issue of function calling conventions. C functions are all implemented according to a particular convention that is selected by the programmer (if you have never "selected" a particular calling convention, it's because your compiler has a default setting). This page will go through some of the common calling conventions that the programmer might run into, and will describe how to implement these in assembly language.

Code compiled with one compiler won't work right when linked to code compiled with a different calling convention. If the code is in C or another high-level language (or assembly language embedded in-line to a C function), it's a minor hassle -- the programmer needs to pick which compiler / optimization switches she wants to use today, and recompile every part of the program that way. Converting assembly language code to use a different calling convention takes more manual effort and is more bug-prone.

Unfortunately, calling conventions are often different from one compiler to the next -- even on the same CPU. Occasionally the calling convention changes from one version of a compiler to the next, or even from the same compiler when given different "optimization" switches. SCE DEPT OF IT 77 IT2354 EMBEDDED SYSTEMS

Unfortunately, many times the calling convention used by a particular version of a particular compiler is inadequately documented. So assembly-language programmers are forced to use techniques to figure out the exact details they need to know in order to call functions written in C, and in order to accept calls from functions written in C.

The typical process is:

 write a ".c" file with stubs ... details??? ...... exactly the same number and type of inputs and outputs that you want the assembly-language function to have.  Compile that file with the appropriate switches to give a mixed assembly-language-with- c-in-comments file (typically a ".cod" file). (If your compiler can't produce an assembly language file, there is the tedious option of disassembling the binary ".obj" machine-code file).  Copy that ".cod" file to a ".asm" file. (Sometimes you need to strip out the compiled hex numbers and comment out other lines to turn it into something the assembler can handle).  Test the calling convention -- compile the ".asm" file to an ".obj" file, and link it (instead of the stub ".c" file) to the rest of the program. Test to see that "calls" work properly.  Fill in your ".asm" file -- the ".asm" file should now include the appropriate header and footer on each function to properly implement the calling convention. Comment out the stub code in the middle of the function and fill out the function with your assembly language implementation.  Test. Typically a programmer single-steps through each instruction in the new code, making sure it does what they wanted it to do.

PARAMETER PASSING

Normally, parameters are passed between functions (either written in C or in Assembly) via the stack. For example, if a function foo1() calls a function foo2() with 2 parameters (say characters x and y), then before the control jumps to the starting of foo2(), two bytes (normal size of a character in most of the systems) are filled with the values that need to be passed. Once control jumps to the new function foo2(), and you use the values (passed as parameters) in the function, they are retrieved from the stack and used.

There are two parameter passing techniques in use,

1. Pass by Value 2. Pass by Reference

Parameter passing techniques can also use

right-to-left (C-style) left-to-right (Pascal style)

On processors with lots of registers (such as the ARM and the Sparc), the standard calling convention puts *all* the parameters (and even the return address) in registers.

SCE DEPT OF IT 78 IT2354 EMBEDDED SYSTEMS

On processors with inadequate numbers of registers (such as the 80x86 and the M8C), all calling conventions are forced to put at least some parameters on the stack or elsewhere in RAM.

Some calling conventions allow "re-entrant code".

PASS BY VALUE

With pass-by-value, a copy of the actual value (the literal content) is passed. For example, if you have a function that accepts two characters like

void foo(char x, char y){ x = x + 1; y = y + 2; putchar(x); putchar(y); } and you invoke this function as follows

char a,b; a='A'; b='B'; foo(a,b);

then the program pushes a copy of the ASCII values of 'A' and 'B' (65 and 66 respectively) onto the stack before the function foo is called. You can see that there is no mention of variables 'a' or 'b' in the function foo(). So, any changes that you make to those two values in foo will not affect the values of a and b in the calling function.

PASS BY REFERENCE

Imagine a situation where you have to pass a large amount of data to a function and apply the modifications, done in that function, to the original variables. An example of such a situation might be a function that converts a string with lower case alphabets to upper case. It would be an unwise decision to pass the entire string (particularly if it is a big one) to the function, and when the conversion is complete, pass the entire result back to the calling function. Here we pass the address of the variable to the function. This has two advantages, one, you don't have to pass huge data, therby saving execution time and two, you can work on the data right away so that by the end of the function, the data in the calling function is already modified. But remember, any change you make to the variable passed by reference will result in the original variable getting modified. If that's not what you wanted, then you must manually copy the variable before calling the function.

SCE DEPT OF IT 79 IT2354 EMBEDDED SYSTEMS

CDECL

In the CDECL calling convention the following holds:

 Arguments are passed on the stack in Right-to-Left order, and return values are passed in eax.  The calling function cleans the stack. This allows CDECL functions to have variable- length argument lists (aka variadic functions). For this reason the number of arguments is not appended to the name of the function by the compiler, and the assembler and the linker are therefore unable to determine if an incorrect number of arguments is used.

Variadic functions usually have special entry code, generated by the va_start(), va_arg() C pseudo-functions.

Consider the following C instructions:

_cdecl int MyFunction1(int a, int b) { return a + b; } and the following function call:

x = MyFunction1(2, 3);

These would produce the following assembly listings, respectively:

:_MyFunction1 push ebp mov ebp, esp mov eax, [ebp + 8] mov edx, [ebp + 12] add eax, edx pop ebp ret and push 3 push 2 call _MyFunction1 add esp, 8

When translated to assembly code, CDECL functions are almost always prepended with an underscore (that's why all previous examples have used "_" in the assembly code).

SCE DEPT OF IT 80 IT2354 EMBEDDED SYSTEMS

STDCALL

STDCALL, also known as "WINAPI" (and a few other names, depending on where you are reading it) is used almost exclusively by Microsoft as the standard calling convention for the Win32 API. Since STDCALL is strictly defined by Microsoft, all compilers that implement it do it the same way.

 STDCALL passes arguments right-to-left, and returns the value in eax. (The Microsoft documentation erroneously claims that arguments are passed left-to-right, but this is not the case.)  The called function cleans the stack, unlike CDECL. This means that STDCALL doesn't allow variable-length argument lists.

Consider the following C function:

_stdcall int MyFunction2(int a, int b) { return a + b; } and the calling instruction: x = MyFunction2(2, 3);

These will produce the following respective assembly code fragments:

:_MyFunction@8 push ebp mov ebp, esp mov eax, [ebp + 8] mov edx, [ebp + 12] add eax, edx pop ebp ret 8 and push 3 push 2 call _MyFunction@8

There are a few important points to note here:

1. In the function body, the ret instruction has an (optional) argument that indicates how many bytes to pop off the stack when the function returns.

SCE DEPT OF IT 81 IT2354 EMBEDDED SYSTEMS

2. STDCALL functions are name-decorated with a leading underscore, followed by an @, and then the number (in bytes) of arguments passed on the stack. This number will always be a multiple of 4, on a 32-bit aligned machine.

FASTCALL

The FASTCALL calling convention is not completely standard across all compilers, so it should be used with caution. In FASTCALL, the first 2 or 3 32-bit (or smaller) arguments are passed in registers, with the most commonly used registers being edx, eax, and ecx. Additional arguments, or arguments larger than 4-bytes are passed on the stack, often in Right-to-Left order (similar to CDECL). The calling function most frequently is responsible for cleaning the stack, if needed.

Because of the ambiguities, it is recommended that FASTCALL be used only in situations with 1, 2, or 3 32-bit arguments, where speed is essential.

The following C function:

_fastcall int MyFunction3(int a, int b) { return a + b; } and the following C function call: x = MyFunction3(2, 3);

Will produce the following assembly code fragments for the called, and the calling functions, respectively:

:@MyFunction3@8 push ebp mov ebp, esp ;many compilers create a stack frame even if it isn't used add eax, edx ;a is in eax, b is in edx pop ebp ret and

;the calling function mov eax, 2 mov edx, 3 call @MyFunction3@8

The name decoration for FASTCALL prepends an @ to the function name, and follows the function name with @x, where x is the number (in bytes) of arguments passed to the function.

SCE DEPT OF IT 82 IT2354 EMBEDDED SYSTEMS

Many compilers still produce a stack frame for FASTCALL functions, especially in situations where the FASTCALL function itself calls another subroutine. However, if a FASTCALL function doesn't need a stack frame, optimizing compilers are free to omit it.

MEETING REAL TIME CONSTRAINTS

CPS must meet real-time constraints

• A real-time system must react to stimuli from the controlled object (or the operator) within the time interval dictated by the environment.

• For real-time systems, right answers arriving too late are wrong.

• “A real-time constraint is called hard, if not meeting that constraint could result in a catastrophe“ [Kopetz, 1997].

• All other time-constraints are called soft.

• A guaranteed system response has to be explained without statistical arguments

CPS, ES and Real-Time Systems synonymous?

. For some embedded systems, real-time behavior is less important (smart phones)

. For CPS, real-time behavior is essential, hence RTS  CPS

. CPS models also include a model of the physical system

. ES models typically just model IT components CPS model  (ES-) IT components model + physical model

. Typically, CPS are reactive systems: “A reactive system is one which is in continual interaction with is environment and executes at a pace determined by that environment“ [Bergé, 1995] Dedicated towards a certain application Knowledge about behavior at design time can be used to minimize resources and to maximize robustness

. Dedicateduserinterface (no mouse, keyboard and screen)

. Situation is slowly changing here: systems become less dedicated SCE DEPT OF IT 83 IT2354 EMBEDDED SYSTEMS Characteristics and Challenges of RTS

 Real-time systems are computing systems in which the meeting of timing constraints is essential to correctness.  If the system delivers the correct answer, but after a certain deadline, it could be regarded as having failed.

Types of Real-Time Systems

 Hard real-time system

A system where “something very bad” happens if the deadline is not met.

Examples: control systems for aircraft, nucluear reactors, chemical power plants, jet engines, etc.

 Soft real-time system

A system where the performance is degraded below what is generally considered acceptable if the deadline is missed.

Example: multimedia system

UTILITY FUNCTION (TASK VALUE FUNCTION)

ISSUES IN REAL-TIME COMPUTING

 Real-time computing deals with all problems in , fault- tolerant computing and operating systems are also problems in real-time computing, with the added complexity of having to meet real-time constraints  Real-time computer systems differ from general-purpose systems They are more specific in their applications The consequences of their failure are more drastic Emphasis is placed on meeting task deadlines SCE DEPT OF IT 84 IT2354 EMBEDDED SYSTEMS Example Problems  Example 1: Task Scheduling General-purpose system can use round-robin scheduling This is NOT suitable for real-time systems because high-priority tasks may miss their deadlines with round-robin scheduling A priority mechanism is necessary Example 2: Cache Usage Scheduling A general-purpose system typically allows the process that is currently executing the right to use the entire cache area This keeps the cache miss rate low Side effect: task run times are less predictable – Thus, not so desirable for real-time systems  Real-time systems can be constructed [out] of sequential programs, but typically they are built [out] of concurrent programs, called tasks.  Tasks are usually divided into: Periodic tasks: consist of an infinite sequence of identical activities, called instances, which are invoked within regular time periods. Non-periodic [or aperiodic] : are invoked by the occurrence of an event. [Sporadic : aperiodic tasks with a bounded interarrival time] SCHEDULING

 Offline scheduling: The scheduler has complete knowledge of the task set and its constraints.  Online scheduling: Make their scheduling decisions during run-time.  Deadline: Is the maximum time within which the task must complete its execution with respect to an event. Real-time systems are divided into two classes, hard and soft real-time systems SCHEDULABILITY ANALYSIS  At this point we must check that the temporal requirements of the system can be satisfied, assuming time budgets assigned in the detailed design stage.  In other words, we need to make a schedulability analysis of the system based on the temporal requirements of each component REAL TIME OPERATING SYSTEMS

 Multi-tasking OS designed to permit tasks (processes) to complete within precisely stated deadlines If deadline constraints cannot be met for a new task, it may be rejected If a new task would result in deadline violations for other tasks, it may be rejected  Example commercial operating systems Vrtx – Systems VxWorks and pSOS – RTLinux – FSMLabs, later acquired by Wind River Systems SCE DEPT OF IT 85 IT2354 EMBEDDED SYSTEMS

MULTI-STATE SYSTEMS AND FUNCTION SEQUENCES

Definition: A system that can exist in multiple states (one state at a time) and transition from one state to another Characteristics A series of system states Each state requires one or more functions Rules exist to determine when to transition from one state to another Typical Solution Every time tick, the system should check if it is time to transition to the next state When it is time to transition, appropriate control variables are updated to reflect the new state Categories Timed Multi-state systems: Transitions depend only on time Input-based multi-state system: Transitions depend only on external input Not commonly used due to danger of indefinite wait Input-based/Timed Multi-state systems: Transitions depend both on external input and time Example Timed System: Traffic Light States: Red Red- Green Amber

Time Constants

#define RED_DURATION 20 #define RED_AND_AMBER_DURATION 5 #define GREEN_DURATION 30 . #define AMBER_DURATION 5

State Update Code switch (Light_state_G) { case RED: { Red_light = ON; Amber_light = OFF; Green_light = OFF; if (++Time_in_state == RED_DURATION) { Light_state_G = RED_AND_AMBER; Time_in_state = 0; } SCE DEPT OF IT 86 IT2354 EMBEDDED SYSTEMS

break; . }

Example Timed System: Robotic Dinosaur .  States:

Input/Timed Systems

Two or more states Each state associated with one or more function calls Transition between states controlled by a combination of time and user input

Solution Characteristics System keeps track of time If a certain user input is detected, a state transition occurs If no input occurs for a pre-determined period, a state transition occurs

EMBEDDED TOOLS

EMULATORS

In computing, an is hardware or software or both that duplicates (or emulates) the functions of one computer system (the guest) in another computer system (the host), different from the first one, so that the emulated behavior closely resembles the behavior of the real system (the guest). This focus on exact reproduction of behavior is in contrast to some other

SCE DEPT OF IT 87 IT2354 EMBEDDED SYSTEMS

forms of computer simulation, in which an abstract model of a system is being simulated. For example, a computer simulation of a hurricane or a chemical reaction is not emulation.

Emulators in computing

Emulation refers to the ability of a in an electronic device to emulate (imitate) another program or device. Many printers, for example, are designed to emulate Hewlett-Packard LaserJet printers because so much software is written for HP printers. If a non- HP emulates an HP printer, any software written for a real HP printer will also run in the non-HP printer emulation and produce equivalent printing.

A hardware emulator is an emulator which takes the form of a hardware device. Examples include the DOS-compatible card installed in some old-world Macintoshes like Centris 610 or Performa 630 that allowed them to run PC programs and FPGA-based hardware emulators.

In a theoretical sense, the Church-Turing thesis implies that any operating environment can be emulated within any other. However, in practice, it can be quite difficult, particularly when the exact behavior of the system to be emulated is not documented and has to be deduced through reverse engineering. It also says nothing about timing constraints; if the emulator does not perform as quickly as the original hardware, the emulated software may run much more slowly than it would have on the original hardware, possibly triggering time interrupts that alter performance.

Emulation in preservation

Emulation is a strategy in to combat obsolescence. Emulation focuses on recreating an original computer environment, which can be time-consuming and difficult to achieve, but valuable because of its ability to maintain a closer connection to the authenticity of the digital object.[2]

Emulation addresses the original hardware and software environment of the digital object, and recreates it on a current machine.[3] The emulator allows the user to have access to any kind of application or operating system on a current platform, while the software runs as it did in its original environment.[4] Jeffery Rothenberg, an early proponent of emulation as a digital preservation strategy states, "the ideal approach would provide a single extensible, long-term solution that can be designed once and for all and applied uniformly, automatically, and in synchrony (for example, at every refresh cycle) to all types of documents and media".[5] He further states that this should not only apply to out of date systems, but also be upwardly mobile to future unknown systems.[6] Practically speaking, when a certain application is released in a new version, rather than address compatibility issues and migration for every digital object created in the previous version of that application, one could create an emulator for the application, allowing access to all of said digital objects.

SCE DEPT OF IT 88 IT2354 EMBEDDED SYSTEMS

Benefits

Basilisk II emulates a Macintosh 68k using interpretation code and dynamic recompilation.

 Emulators allow users to play games for discontinued consoles.  Emulators maintain the original look, feel, and behavior of the digital object, which is just as important as the itself.[7]  Despite the original cost of developing an emulator, it may prove to be the more cost efficient solution over time.[8]  Reduces labor hours, because rather than continuing an ongoing task of continual data migration for every digital object, once the of past and present operating systems and is established in an emulator, these same are used for every document using those platforms.  Many emulators have already been developed and released under GNU General Public License through the open source environment, allowing for wide scale collaboration.[9]  Emulators allow software exclusive to one system to be used on another. For example, a PlayStation 2 exclusive could (in theory) be played on a PC or Xbox 360 using an emulator. This is especially useful when the original system is difficult to obtain, or incompatible with modern equipment (e.g. old video game consoles may be unable to connect to modern TVs).

OBSTACLES

Intellectual property - Many technology vendors implemented non-standard features during program development in order to establish their niche in the market, while simultaneously applying ongoing to remain competitive. While this may have advanced the technology industry and increased vendor's market share, it has left users lost in a preservation nightmare with little supporting documentation due to the proprietary nature of the hardware and software.

 Copyright laws are not yet in effect to address saving the documentation and specifications of and hardware in an emulator module.

SCE DEPT OF IT 89 IT2354 EMBEDDED SYSTEMS

 Emulators are often used as a copyright infringement tool, since they allow users to play video games without having to buy the console, and rarely make any attempt to prevent the use of illegal copies. This to a number of legal uncertainties regarding emulation, and leads to software being programmed to refuse to work if it can tell the host is an emulator; some video games in particular will continue to run, but not allow the player to progress beyond some late stage in the game, often appearing to be faulty or just extremely difficult. These protections make it more difficult to design emulators, since they must be accurate enough to avoid triggering the protections, whose effects may not be obvious.

EMULATORS IN NEW MEDIA ART

Because of its primary use of digital formats, new media art relies heavily on emulation as a preservation strategy. Artists such as Cory Arcangel specialize in resurrecting obsolete technologies in their artwork and recognize the importance of a decentralized and deinstitutionalized process for the preservation of digital culture.

In many cases, the goal of emulation in new media art is to preserve a digital medium so that it can be saved indefinitely and reproduced without error, so that there is no reliance on hardware that ages and becomes obsolete. The paradox is that the emulation and the emulator have to be made to work on future computers.

EMULATION IN FUTURE

Emulation techniques are commonly used during the design and development of new systems. It eases the development process by providing the ability to detect, recreate and repair flaws in the design even before the system is actually built.[15] It is particularly useful in the design of multi-cores systems, where concurrency errors can be very difficult to detect and correct without the controlled environment provided by virtual hardware.[16] This also allows the software development to take place before the hardware is ready,[17] thus helping to validate design decisions.

TYPES OF EMULATORS

SCE DEPT OF IT 90 IT2354 EMBEDDED SYSTEMS

Windows XP running an Acorn Archimedes emulator, which is in turn running a Sinclair ZX Spectrum emulator.

Most emulators just emulate a hardware architecture—if operating system or software is required for the desired software, it must be provided as well (and may itself be emulated). Both the OS and the software will then be interpreted by the emulator, rather than being run by hardware. Apart from this for the emulated binary machine's language, some other hardware (such as input or output devices) must be provided in virtual form as well; for example, if writing to a specific memory location should influence what is displayed on the screen, then this would need to be emulated.

While emulation could, if taken to the extreme, go down to the atomic level, basing its output on a simulation of the actual circuitry from a virtual power source, this would be a highly unusual solution. Emulators typically stop at a simulation of the documented hardware specifications and digital logic. Sufficient emulation of some hardware platforms requires extreme accuracy, down to the level of individual clock cycles, undocumented features, unpredictable analog elements, and implementation bugs. This is particularly the case with classic home computers such as the Commodore 64, whose software often depends on highly sophisticated low-level programming tricks invented by game programmers and the demoscene.

In contrast, some other platforms have had very little use of direct hardware addressing. In these cases, a simple compatibility layer may suffice. This translates system calls for the emulated system into system calls for the host system e.g., the compatibility layer used on *BSD to run closed source Linux native software on FreeBSD, NetBSD and OpenBSD. For example, while the Nintendo 64 graphic processor was fully programmable, most games used one of a few pre-made programs, which were mostly self-contained and communicated with the game via FIFO; therefore, many emulators do not emulate the graphic processor at all, but simply interpret the commands received from the CPU as the original program would.

Developers of software for embedded systems or video game consoles often design their software on especially accurate emulators called simulators before trying it on the real hardware. This is so that software can be produced and tested before the final hardware exists in large quantities, so that it can be tested without taking the time to copy the program to be debugged at a low level and without introducing the side effects of a . In many cases, the simulator

SCE DEPT OF IT 91 IT2354 EMBEDDED SYSTEMS is actually produced by the company providing the hardware, which theoretically increases its accuracy.

Math emulators allow programs compiled with math instructions to run on machines that don't have the coprocessor installed, but the extra work done by the CPU may slow the system down. If a math coprocessor isn't installed or present on the CPU, when the CPU executes any coprocessor instruction it will make a determined interrupt (coprocessor not available), calling the math emulator routines. When the instruction is successfully emulated, the program continues executing.

STRUCTURE OF AN EMULATOR

Typically, an emulator is divided into modules that correspond roughly to the emulated computer's subsystems. Most often, an emulator will be composed of the following modules:

 a CPU emulator or CPU simulator (the two terms are mostly interchangeable in this case)  a memory subsystem module  various I/O devices emulators

Buses are often not emulated, either for reasons of performance or simplicity, and virtual peripherals communicate directly with the CPU or the memory subsystem.

MEMORY SUBSYSTEM

It is possible for the memory subsystem emulation to be reduced to simply an array of elements each sized like an emulated word; however, this model falls very quickly as soon as any location in the computer's logical memory does not match physical memory.

This clearly is the case whenever the emulated hardware allows for advanced (in which case, the MMU logic can be embedded in the memory emulator, made a module of its own, or sometimes integrated into the CPU simulator).

Even if the emulated computer does not feature an MMU, though, there are usually other factors that break the equivalence between logical and physical memory: many (if not most) architectures offer memory-mapped I/O; even those that do not often have a block of logical memory mapped to ROM, which means that the memory-array module must be discarded if the read-only nature of ROM is to be emulated. Features such as or segmentation may also complicate memory emulation.

As a result, most emulators implement at least two procedures for writing to and reading from logical memory, and it is these procedures' duty to map every access to the correct location of the correct object.

On a base-limit addressing system where memory from address 0 to address ROMSIZE-1 is read- only memory, while the rest is RAM, something along the line of the following procedures would be typical: SCE DEPT OF IT 92 IT2354 EMBEDDED SYSTEMS

void WriteMemory(word Address, word Value){ word RealAddress; RealAddress = Address + BaseRegister; if ((RealAddress < LimitRegister) && (RealAddress > ROMSIZE)) { Memory[RealAddress] = Value; } else { RaiseInterrupt(INT_SEGFAULT); } } word ReadMemory(word Address){ word RealAddress; RealAddress=Address+BaseRegister; if (RealAddress < LimitRegister){ return Memory[RealAddress]; } else { RaiseInterrupt(INT_SEGFAULT); return NULL; } }

CPU SIMULATOR

The CPU simulator is often the most complicated part of an emulator. Many emulators are written using "pre-packaged" CPU simulators, in order to concentrate on good and efficient emulation of a specific machine.

The simplest form of a CPU simulator is an interpreter, which is a computer program that follows the execution flow of the emulated program code and, for every instruction encountered, executes operations on the host processor that are semantically equivalent to the original instructions.

This is made possible by assigning a variable to each register and flag of the simulated CPU. The logic of the simulated CPU can then more or less be directly translated into software algorithms, creating a software re-implementation that basically mirrors the original hardware implementation.

The following example illustrates how CPU simulation can be accomplished by an interpreter. In this case, interrupts are checked-for before every instruction executed, though this behavior is rare in real emulators for performance reasons (it is generally faster to use a subroutine to do the work of an interrupt). void Execute(void) { if (Interrupt != INT_NONE){ SuperUser = TRUE; SCE DEPT OF IT 93 IT2354 EMBEDDED SYSTEMS

WriteMemory(++StackPointer, ProgramCounter); ProgramCounter = InterruptPointer; } switch (ReadMemory(ProgramCounter++)) { /* * Handling of every valid instruction * goes here... */ default: Interrupt = INT_ILLEGAL; } }

Interpreters are very popular as computer simulators, as they are much simpler to implement than more time-efficient alternative solutions, and their speed is more than adequate for emulating computers of more than roughly a decade ago on modern machines.

I/O

Most emulators do not, as mentioned earlier, emulate the main system bus; each I/O device is thus often treated as a special case, and no consistent interface for virtual peripherals is provided.

This can result in a performance advantage, since each I/O module can be tailored to the characteristics of the emulated device; designs based on a standard, unified I/O API can, however, rival such simpler models, if well thought-out, and they have the additional advantage of "automatically" providing a plug-in service through which third-party virtual devices can be used within the emulator.

A unified I/O API may not necessarily mirror the structure of the real hardware bus: bus design is limited by several electric constraints and a need for hardware concurrency management that can mostly be ignored in a software implementation.

Even in emulators that treat each device as a special case, there is usually a common basic infrastructure for:

 managing interrupts, by means of a procedure that sets flags readable by the CPU simulator whenever an interrupt is raised, allowing the virtual CPU to "poll for (virtual) interrupts"  writing to and reading from physical memory, by means of two procedures similar to the ones dealing with logical memory (although, contrary to the latter, the former can often be left out, and direct references to the memory array be employed instead)

SCE DEPT OF IT 94 IT2354 EMBEDDED SYSTEMS

VIDEO GAME CONSOLE EMULATORS

Video game console emulators are programs that allow a personal computer or video game console to emulate another video game console. They are most often used to play older video games on personal computers and more contemporary video game consoles, but they are also used to translate games into other languages, to modify existing games, and in the development process of home brew demos and new games for older systems. The internet has helped in the spread of console emulators, as most - if not all - would be unavailable for sale in retail outlets. Examples of console emulators that have been released in the last 2 decades are: Dolphin, Zsnes, Kega Fusion, Desmume, Epsxe, Project64, Visual Boy Advance, NullDC and Nestopia.

TERMINAL EMULATORS

Terminal emulators are software programs that provide modern computers and devices interactive access to applications running on operating systems or other host systems such as HP-UX or OpenVMS. Terminals such as the IBM 3270 or VT100 and many others, are no longer produced as physical devices. Instead, software running on modern operating systems simulates a "dumb" terminal and is able to render the graphical and text elements of the host application, send keystrokes and process commands using the appropriate terminal protocol. Some terminal emulation applications include Attachmate Reflection, IBM Personal Communications, Stromasys CHARON-VAX/AXP and Micro Focus Rumba.

IN-CIRCUIT EMULATOR

An in-circuit emulator (ICE) is a hardware device used to debug the software of an embedded system. It was historically in the form of bond-out processor which has many internal signals brought out for the purpose of debugging. These signals provide information about the state of the processor.

More recently the term also covers JTAG based hardware debuggers which provide equivalent access using on-chip debugging hardware with standard production chips. Using standard chips instead of custom bond-out versions makes the technology ubiquitous and low cost, and eliminates most differences between the development and runtime environments. In this common case, the in-circuit emulator term is a misnomer, sometimes confusingly so, because emulation is no longer involved.

Embedded systems present special problems for a programmer because they usually lack keyboards, monitors, disk drives and other user interfaces that are present on computers. These shortcomings make in-circuit software debugging tools essential for many common development tasks.

In-circuit emulation can also refer to the use of hardware emulation, when the emulator is plugged into a system (not always embedded) in place of a yet-to-be-built chip (not always a processor). These in-circuit emulators provide a way to run the system with "live" data while still SCE DEPT OF IT 95 IT2354 EMBEDDED SYSTEMS allowing relatively good debugging capabilities. It can be useful to compare this with an in-target probe (ITP) sometimes used on enterprise servers.

FUNCTION

An in-circuit emulator provides a window into the embedded system. The programmer uses the emulator to load programs into the embedded system, run them, step through them slowly, and view and change data used by the system's software.

An "emulator" gets its name because it emulates (imitates) the of the embedded system's computer. Traditionally it had a plug that inserts into the socket where the CPU chip would normally be placed. Most modern systems use the target system's CPU directly, with special JTAG-based debug access. Emulating the processor, or direct JTAG access to it, lets the ICE do anything that the processor can do, but under the control of a software developer.

ICEs attach a terminal or PC to the embedded system. The terminal or PC provides an interactive user interface for the programmer to investigate and control the embedded system. For example, it is routine to have a level debugger with a graphical windowing interface that communicates through a JTAG ("emulator") to an embedded target system which has no .

Notably, when their program fails, most embedded systems simply become inert lumps of nonfunctioning electronics . Embedded systems often lack basic functions to detect signs of software failure, such as an MMU to catch memory access errors. Without an ICE, the development of embedded systems can be extremely difficult, because there is usually no way to tell what went wrong. With an ICE, the programmer can usually test pieces of code, then isolate the fault to a particular section of code, and then inspect the failing code and rewrite it to solve the problem.

In usage, an ICE provides the programmer with execution , memory display and , and input/output control. Beyond this, the ICE can be programmed to look for any range of matching criteria to pause at, in an attempt to identify the origin of the failure.

Most modern microcontrollers utilize resources provided on the manufactured version of the microcontroller for device programming, emulation and debugging features, instead of needing another special emulation-version (that is, bond-out) of the target microcontroller.[1] Even though it is a cost-effective method, since the ICE unit only manages the emulation instead of actually emulating the target microcontroller, trade-offs have to be made in order to keep the prices low at manufacture time, yet provide enough emulation features for the (relatively few) emulation applications.

ADVANTAGES

Virtually all embedded systems have a hardware element and a software element, which are separate but tightly interdependent. The ICE allows the software element to be run and tested on the actual hardware on which it is to run, but still allows programmer conveniences to help SCE DEPT OF IT 96 IT2354 EMBEDDED SYSTEMS isolate faulty code, such as "source-level debugging" (which shows the program the way the programmer wrote it) and single-stepping (which lets the programmer run the program step-by- step to find errors).

Most ICEs consist of an adaptor unit that sits between the ICE host computer and the system to be tested. A header and cable assembly connects the adaptor to a socket where the actual CPU or microcontroller mounts within the embedded system. Recent ICEs enable a programmer to access the on-chip debug circuit that is integrated into the CPU via JTAG or BDM (Background Debug Mode) in order to debug the software of an embedded system. These systems often use a standard version of the CPU chip, and can simply attach to a debug port on a production system. They are sometimes called in-circuit debuggers or ICDs, to distinguish the fact that they do not replicate the functionality of the CPU, but instead control an already existing, standard CPU. Since the CPU does not have to be replaced, they can operate on production units where the CPU is soldered in and cannot be replaced. On x86 Pentiums, a special 'probe mode' is used by ICEs to aid in debugging.[2]

In the context of embedded systems, the ICE is not emulating hardware. Rather, it is providing direct debug access to the actual CPU. The system under test is under full control, allowing the developer to load, debug and test code directly.

Most host systems are ordinary commercial computers unrelated to the CPU used for development - for example, a Linux PC might be used to develop software for a system using a Freescale 68HC11 chip, which itself could not run Linux.

The programmer usually edits and compiles the embedded system's code on the host system, as well. The host system will have special compilers that produce executable code for the embedded system. These are called cross compilers/cross assemblers.

SIMULATORS

Software instruction simulators provide simulated program execution with read and write access to the internal processor registers. Beyond that, the tools vary in capability. Some are limited to simulation of instruction execution only. Most offer breakpoints, which allow fast execution until a specified instruction is executed. Many also offer trace capability, which shows instruction execution history.

While instruction simulation is useful for algorithm development, embedded systems by their very nature require access to peripherals such as I/O ports, timers, A/Ds, and PWMs. Advanced simulators help verify timing and basic peripheral operation, including I/O pins, interrupts, and status and control registers.

These tools provide various stimulus inputs ranging from pushbuttons connected to I/O pin inputs, logic vector I/O input stimulus files, regular clock inputs, and internal register value injection for simulating A/D conversion data or serial communication input. Many embedded systems can effectively be debugged using proper peripheral stimulus.

Simulators offer the lowest-cost development environment. However, many real-time systems are difficult to debug with simulation only. Simulators also typically run at speeds 100 to 1,000 SCE DEPT OF IT 97 IT2354 EMBEDDED SYSTEMS times slower than the actual embedded processor, so long timeout delays must be eliminated when simulating.

IN-CIRCUIT EMULATORS

In-circuit emulators are plugged into a system in place of the embedded processor. They offer real-time code execution, full peripheral implementation, and capability. High-end emulators also offer real-time trace buffers, and some will time-stamp instruction execution for code profiling (see figure).

An in-circuit emulator is plugged into a target system in place of the embedded processor.

Some in-circuit emulators have special ASICs or FPGAs that imitate core processor code execution and peripherals, but there may be behavioral differences between the actual device and the emulator. The problem is sidestepped with "bond-out" emulation devices that provide direct I/O and peripheral access using the same circuit technology as the processor and that provide access to the internal data registers, program memory, and peripherals.

This is accomplished by using emulation RAM instead of the processor's internal program memory. The microcontroller firmware is downloaded into the emulation RAM and the bond-out processor executes instructions using the same data registers and peripherals as the target processor. The I/Os of the bond-out are made available on a socket that is plugged into the system under development instead of the target processor being emulated.

State-of-the-art emulators provide multilevel conditional breakpoints and instruction trace, including code coverage and timestamping. Some even allow code without halting the processor.

There is a large gap in performance and price between software simulators and hardware emulators. Between the two extremes, there is a lot of room for intermediate solutions.

SCE DEPT OF IT 98 IT2354 EMBEDDED SYSTEMS

Burn-and-learn method

Simulation is most often used with the burn-and-learn method of run-time firmware development. A chip is burned with a device programmer; and after plugging it into the hardware, the system crashes.

At this point, an attempt is made to figure out what went wrong; the source code is changed, the executable is rebuilt, and another chip is burned. This cycle is repeated until the chip works properly.

Routines can be added to dump vital debugging information to a serial port for display on a terminal. I/O pins can be toggled to indicate program flow.

If more symptoms are provided by the system as it runs, then more logical changes to the source code can be made. Overall, this method of debugging is inefficient, slow, and tedious.

In-circuit simulators

When simulating code where branches are conditional on the state of an input pin or other hardware condition, a simulator can get information on that state from the hardware using simulator stimulus. This is the first step away from software-only simulation, allowing a certain amount of hardware debug.

This approach is taken by integrating the capability of a simulator with a communication module that acts as the target processor. Stimulus is provided to the simulation directly from the processor's digital input pins, which allows the simulator to set binary values on the output pins.

However, such tools run at the speed of a simulator, so they can't set and clear output pins fast enough to implement timing-critical features like a software UART. They often don't support complex peripheral features such as A/Ds and PWMs.

Run-time monitors

In-circuit simulators provide a communication channel between the host development system and the hardware, with something like a UART in the hardware to provide the communication. The host can then issue commands to the processor to get it to perform debug functions such as setting or reading memory contents.

Some code execution control is also possible. This multiplexed execution of code under development and communication of debug information with a cross-development host is usually referred to as a run-time monitor.

Software breakpoints can be built in at compile time. Other features, such as hooking into a periodic timer interrupt to copy data of interest from the target to the host, can also be included when building the executable. Since these techniques don't require writing to program memory at run time, they can be used with burn-and-learn debug using one-time-programmable devices. SCE DEPT OF IT 99 IT2354 EMBEDDED SYSTEMS

The routines required to transfer debug data to the host or to download a new executable--the monitor code itself--occupy a certain amount of program memory on the target device. They also use data memory and bandwidth in addition to the UART or other communication device. However, most of this overhead occurs when code execution is not in progress.

Use of a software run-time monitor is a great step forward from simulation in isolation from the target hardware. Adding just a few simple support features in the silicon of the target processor can turn the monitor into a system that provides all the basic features of an emulator.

In-circuit debuggers

The system becomes more than a software monitor when custom silicon features are added to the target processor. Development tools with special silicon features to support code debug and having serial communication between host and target are typically referred to as "debuggers."

In the past, some systems have used external RAM program memory to support this feature. Now reprogrammable flash memory has made in-circuit debuggers (ICDs) practical for single- chip embedded microcontrollers. ICDs allow the embedded processor to "self-emulate."

A good example of an in-circuit debugger is the MPLAB-ICD, a powerful, low-cost, run-time development tool. MPLAB-ICD uses the in-circuit debugging capability built into the PIC16F87X or PIC18Fxxx microcontrollers.

This feature, along with the In-Circuit Serial Programming (ICSP) protocol, offers cost-effective in-circuit flash program load and debugging from the graphical user interface of the MPLAB Integrated Development Environment. The designer can develop and debug source code by watching variables, single-stepping, and setting breakpoints. Running the device at full speed enables testing hardware in real time.

The in-circuit debugger consists of three basic components: the ICD module, ICD header, and ICD demo board. The MPLAB software environment is connected to the ICD module via a serial or full-speed USB port.

When instructed by MPLAB, the ICD module programs and issues debug commands to the target microcontroller using the ICSP protocol, which is communicated via a five-conductor cable using a modular RJ-12 plug and jack. A modular jack can be designed into a target circuit board to support direct connection to the ICD module, or the ICD header can be used to plug into a DIP socket.

The ICD header contains a target microcontroller and a modular jack to connect to the ICD module. Male 40- and 28-pin DIP headers are provided to plug into a target circuit board. The ICD header may be plugged into either the included ICD demo board or the user's custom hardware.

SCE DEPT OF IT 100 IT2354 EMBEDDED SYSTEMS

The demo board provides 40- and 28-pin DIP sockets that will accept either a microcontroller device or the ICD header. It also offers LEDs, DIP switches, an analog , and a prototyping area.

Immediate prototype development and evaluation are feasible even if hardware isn't yet available. The complete hardware development system, along with the host software, provides a powerful run-time development tool at a very reasonable price.

Run-time operation

The debug kernel is downloaded along with the target firmware via the ICSP interface. A nonmaskable interrupt vectors execution to the kernel under three conditions: when the program counter equals a preselected hardware breakpoint address, after a single step, or when a halt command is received from MPLAB.

As with all interrupts, this pushes the return address onto the stack. On reset, the breakpoint register is set equal to the reset vector, so the kernel is entered immediately when the device comes out of any reset.

The ICD module issues a reset to the target microcontroller immediately after a download. Thus, after a download the kernel is entered, and control is passed to the host.

The user can now issue commands to the target processor and modify or interrogate all RAM registers, including the PC and other special function registers. The user can single-step, set a breakpoint, animate, or start full-speed execution.

Once started, a halt of program execution causes the PC address prior to kernel entry to be stored, which allows the software to display where execution halted in the source code. When the host is commanded to run again, the kernel code executes a return from interrupt instruction, and execution continues at the address that pops off the hardware stack.

Silicon support requirements

The breakpoint address register and comparator make up most of what is needed in silicon, along with some logic to single-step and recognize asynchronous commands from the host. Since the ICSP programming interface is already in place to support programming of the device, this doesn't constitute an additional silicon requirement. When this channel is used for in-circuit debug, these two I/O pins may not be used for other run-time functions.

EMBEDDED SYSTEM DEVELOPMENT TOOLS

Emulators, simulators, and in-circuit debuggers have features useful to the design (see table).

SCE DEPT OF IT 101 IT2354 EMBEDDED SYSTEMS Features of emulators, simulators, and in-circuit debuggers

Parameters Burn and Software In-circuit In-circuit In-circuit learn simulation simulation emulator debugger

Real-time Yes No No Yes Yes execution

Low cost Yes/Free Yes/Low/Free Yes/Low No/High Yes/Low

Full device Yes No/Limited Yes/Limited Yes Yes peripheral implementation

Automatic No Yes Yes Yes Yes download of new program

No loss of target Yes Yes/Sometimes Yes/Sometimes Yes No device I/O pins when debugging

Target system None None Complex Complex Simple cabling

View and modify No Yes Yes/Limited Yes Yes RAM and peripherals

I/O resources None None Serial port, None Two or needed to support some I/O pins fewer I/O debugging

No loss of No Yes No Yes No program memory or data RAM

Simple None None No/Complex No/Complex Yes connectivity of debugger to target

Real-time trace None No/Limited No/Limited Yes No buffer

Single stepping None Yes Yes Yes Yes

Hardware None Yes/Unlimited One/Unlimited One/Unlimited Yes/One breakpoints

SCE DEPT OF IT 102 IT2354 EMBEDDED SYSTEMS

With the basic run-time features of single-step, full-speed execution and watching variables, an in-circuit debugger can help in a lot of designs where only an emulator would have done before. Investing in the right development tools up front will pay back handsomely in faster development and debug cycles and a shorter time to market.

DEBUGGING

Debugging is a methodical process of finding and reducing the number of bugs, or defects, in a computer program or a piece of electronic hardware, thus making it behave as expected. Debugging tends to be harder when various subsystems are tightly coupled, as changes in one may cause bugs to emerge in another. Many books have been written about debugging (see below: Further reading), as it involves numerous aspects, including interactive debugging, , integration testing, log files, monitoring (application, system), memory dumps, profiling, Statistical Process Control, and special design tactics to improve detection while simplifying changes.

ORIGIN

There is some controversy over the origin of the term "debugging".

The terms "bug" and "debugging" are both popularly attributed to Admiral Grace Hopper in the 1940s.[1] While she was working on a Mark II Computer at Harvard University, her associates discovered a moth stuck in a relay and thereby impeding operation, whereupon she remarked that they were "debugging" the system. However the term "bug" in the meaning of technical error dates back at least to 1878 and Thomas Edison (see for a full discussion), and "debugging" seems to have been used as a term in aeronautics before entering the world of computers. Indeed, in an interview Grace Hopper remarked that she was not coining the term. The moth fit the already existing terminology, so it was saved.

The Oxford English Dictionary entry for "debug" quotes the term "debugging" used in reference to airplane engine testing in a 1945 article in the Journal of the Royal Aeronautical Society. An article in "Airforce" (June 1945 p. 50) also refers to debugging, this time of aircraft cameras. Hopper's bug was found on September 9, 1947. The term was not adopted by computer programmers until the early 1950s. The seminal article by Gill in 1951 is the earliest in-depth discussion of programming errors, but it does not use the term "bug" or "debugging". In the ACM's , the term "debugging" is first used in three papers from 1952 ACM National Meetings. Two of the three use the term in quotation marks. By 1963, "debugging" was a common enough term to be mentioned in passing without explanation on page 1 of the CTSS manual

SCOPE

As software and electronic systems have become generally more complex, the various common debugging techniques have expanded with more methods to detect anomalies, assess impact, and schedule software patches or full updates to a system. The words "anomaly" and "discrepancy" SCE DEPT OF IT 103 IT2354 EMBEDDED SYSTEMS can be used, as being more neutral terms, to avoid the words "error" and "defect" or "bug" where there might be an implication that all so-called errors, defects or bugs must be fixed (at all costs). Instead, an impact assessment can be made to determine if changes to remove an anomaly (or discrepancy) would be cost-effective for the system, or perhaps a scheduled new release might render the change(s) unnecessary. Not all issues are life-critical or mission-critical in a system. Also, it is important to avoid the situation where a change might be more upsetting to users, long-term, than living with the known problem(s) (where the "cure would be worse than the disease"). Basing decisions of the acceptability of some anomalies can avoid a culture of a "zero- defects" mandate, where people might be tempted to deny the existence of problems so that the result would appear as zero defects. Considering the collateral issues, such as the cost-versus- benefit impact assessment, then broader debugging techniques will expand to determine the frequency of anomalies (how often the same "bugs" occur) to help assess their impact to the overall system.

TOOLS

Debugging on video game consoles is usually done with special hardware such as this Xbox debug unit intended only for developers.

Debugging ranges, in complexity, from fixing simple errors to performing lengthy and tiresome tasks of data collection, analysis, and scheduling updates. The debugging skill of the programmer can be a major factor in the ability to debug a problem, but the difficulty of software debugging varies greatly with the complexity of the system, and also depends, to some extent, on the programming language(s) used and the available tools, such as debuggers. Debuggers are software tools which enable the programmer to monitor the execution of a program, stop it, restart it, set breakpoints, and change values in memory. The term debugger can also refer to the person who is doing the debugging.

Generally, high-level programming languages, such as , make debugging easier, because they have features such as that make real sources of erratic behaviour easier to spot. In programming languages such as C or assembly, bugs may cause silent problems such as memory corruption, and it is often difficult to see where the initial problem happened. In those cases, memory debugger tools may be needed.

In certain situations, general purpose software tools that are language specific in nature can be very useful. These take the form of static code analysis tools. These tools look for a very specific SCE DEPT OF IT 104 IT2354 EMBEDDED SYSTEMS set of known problems, some common and some rare, within the source code. All such issues detected by these tools would rarely be picked up by a compiler or interpreter, thus they are not syntax checkers, but more semantic checkers. Some tools claim to be able to detect 300+ unique problems. Both commercial and free tools exist in various languages. These tools can be extremely useful when checking very large source trees, where it is impractical to do code walkthroughs. A typical example of a problem detected would be a variable dereference that occurs before the variable is assigned a value. Another example would be to perform strong type checking when the language does not require such. Thus, they are better at locating likely errors, versus actual errors. As a result, these tools have a reputation of false positives. The old Unix lint program is an early example.

For debugging electronic hardware (e.g., ) as well as low-level software (e.g., BIOSes, device drivers) and firmware, instruments such as oscilloscopes, logic analyzers or in- circuit emulators (ICEs) are often used, alone or in combination. An ICE may perform many of the typical software debugger's tasks on low-level software and firmware.

TYPICAL DEBUGGING PROCESS

Normally the first step in debugging is to attempt to reproduce the problem. This can be a non- trivial task, for example as with parallel processes or some unusual software bugs. Also, specific user environment and usage history can make it difficult to reproduce the problem.

After the bug is reproduced, the input of the program may need to be simplified to make it easier to debug. For example, a bug in a compiler can make it when parsing some large source file. However, after simplification of the test case, only few lines from the original source file can be sufficient to reproduce the same crash. Such simplification can be made manually, using a divide-and-conquer approach. The programmer will try to remove some parts of original test case and check if the problem still exists. When debugging the problem in a GUI, the programmer can try to skip some user interaction from the original problem description and check if remaining actions are sufficient for bugs to appear.

After the test case is sufficiently simplified, a programmer can use a debugger tool to examine program states (values of variables, plus the call stack) and track down the origin of the problem(s). Alternatively, tracing can be used. In simple cases, tracing is just a few print statements, which output the values of variables at certain points of program execution.

TECHNIQUES

 Print debugging (or tracing) is the act of watching (live or recorded) trace statements, or print statements, that indicate the flow of execution of a process. This is sometimes called printf debugging, due to the use of the printf statement in C. This kind of debugging was turned on by the command TRON in the original versions of the novice-oriented BASIC programming language. TRON stood for, "Trace On." TRON caused the line numbers of each BASIC command line to print as the program ran.

SCE DEPT OF IT 105 IT2354 EMBEDDED SYSTEMS

 Remote debugging is the process of debugging a program running on a system different than the debugger. To start remote debugging, a debugger connects to a remote system over a network. The debugger can then control the execution of the program on the remote system and retrieve information about its state.

 Post-mortem debugging is debugging of the program after it has already crashed. Related techniques often include various tracing techniques and/or analysis of memory dump (or ) of the crashed process. The dump of the process could be obtained automatically by the system (for example, when process has terminated due to an unhandled exception), or by a programmer-inserted instruction, or manually by the interactive user.

 "Wolf fence" algorithm: Edward Gauss described this simple but very useful and now famous algorithm in a 1982 article for communications of the ACM as follows: "There's one wolf in Alaska, how do you find it? First build a fence down the middle of the state, wait for the wolf to howl, determine which side of the fence it is on. Repeat process on that side only, until you get to the point where you can see the wolf."[9] This is implemented e.g. in the Git version control system as the command git bisect, which uses the above algorithm to determine which commit introduced a particular bug.

[10]:p.123  Delta Debugging - technique of automating test case simplification.

 Saff Squeeze - technique of isolating failure within the test using progressive inlining of parts of the failing test.[11]

DEBUGGING FOR EMBEDDED SYSTEMS

In contrast to the general purpose computer software design environment, a primary characteristic of embedded environments is the sheer number of different platforms available to the developers (CPU architectures, vendors, operating systems and their variants). Embedded systems are, by definition, not general-purpose designs: they are typically developed for a single task (or small range of tasks), and the platform is chosen specifically to optimize that application. Not only does this fact make life tough for embedded system developers, it also makes debugging and testing of these systems harder as well, since different debugging tools are needed in different platforms.

 to identify and fix bugs in the system (e.g. logical or synchronization problems in the code, or a design error in the hardware);  to collect information about the operating states of the system that may then be used to analyze the system: to find ways to its performance or to optimize other important characteristics (e.g. energy consumption, reliability, real-time response etc.).

SCE DEPT OF IT 106 IT2354 EMBEDDED SYSTEMS DEBUGGING TECHNIQUES

 Introduction

 Rule of Thumb: Write good, bug-free code from start if you could

 Testing/Debugging embedded software is more difficult than application software

 Post-shipment application problems are more tolerable than embedded (real-time or life- critical) software

 10.1 Testing on Host Machine

 Some reasons why you can’t test (much, if any) on target machine:

 Test early (target may not ready or completely stable)

 Exercise all code, including exceptions (real situations may be difficult to exercise)

 Develop reusable, repeatable test (difficult to do in target environment, and likelihood of hitting the same bug is low)

 Store test results (target may not even have disk drive to store results)

 Target system on the left: (hardware-indep code, hardware-dep code, hw)

 Test system (on host) on the right: (hardware-indep code – same, scaffold – rest)

 Scaffold provides (in software) all functionalities and calls to hardware as in the hardware-dep and hardware components of the target system – more like a simulator for them!

SCE DEPT OF IT 107 IT2354 EMBEDDED SYSTEMS

 Radio.c -- hardware independent code

 Radiohw.c – hardware dependent code (only interface to hw: inp() and outp() supporting vTurnOnTransmitter() and vTurnOffTransmitter() functions

 Inp() and outp() must have real hardware code to read/write byte data correctly - makes testing harder!!

 Replace radiohw.c with scaffold, eliminating the need for inp() and outp() – both are simulated in software – a program stub!!

Figure 10.2 A Poor Plan for Testing

/* File: radio.c */ void vRadioTask (void) { . . . !! Complicated code to determine if turning on the radio now !! is within FCC regulations. . . . !! More complicated code to decide if turning on the radio now !! makes sense in our protocol. If (!! Time to send something on the radio) { vTurnOnTransmitter (FREQ_NORMAL); !! Send data out vTurnOffRadio (); } } ------(continued)

SCE DEPT OF IT 108 IT2354 EMBEDDED SYSTEMS

 Embedded systems are interrupt-driven, so to test based on interrupts

 1) Divide interrupt routines into two components

 A) a component that deals with the hardware

 B) a component of the routine which deals with the rest of the system

 2) To test, structure the routine such that the hardware-dependent component (A) calls the hardware-independent part (B).

 3) Write component B in C-language, so that the test scaffold can call it

 Hw component (A) is vHandleRxHardware(), which reads characters from the hw

 Sw component (B) is vHandleByte, called by A to buffer characters, among others

SCE DEPT OF IT 109 IT2354 EMBEDDED SYSTEMS  The test scaffold, vTestMain(), then calls vHandleByte(), to test if the system works [where vTestMain() pretends to be the hardware sending the chars to vHandleByte()]

 Calling the Timer Interrupt Routine

 Design the test scaffold routine to directly call the timer interrupt routine, rather than other part of the host environment, to avoid interruptions in the scaffold’s timing of events

 This way, the scaffold has control over sequences of events in the test which must occur within intervals of timer interrupts

 Script Files and Output Files

 To let the scaffold test the system in some sequence or repeated times, write a script file (of commands and parameters) to control the test

 Parse the script file, test system based on commands/parameters, and direct output – intermixture of the input-script and output lines – into an output file

 More Advanced Techniques

 Making the scaffold automatically control sequence of events – e.g., calling the printer interrupt many times but in a controlled order to avoid swamping

 Making the scaffold automatically queue up requests-to-send output lines, by automatically controlling the button interrupt routine, which will cause successive pressing of a button to let the next output line be received from the hardware (the printer interrupt routine). In this way, the hardware-independent software is controlled by the scaffold, where the button interrupts serve as a switch

 The scaffold may contain multiple instances of the software-independent code, and the scaffold serves as a controller of the communication between the instances – where each instance is called by the scaffold when the hardware interrupt occurs (e.g., the scanner or the cash register). In this way, the scaffold simulates the hardware (scanner or register) and provides communication services to the software-independent code instances it calls. – See Fig 10.7

SCE DEPT OF IT 110 IT2354 EMBEDDED SYSTEMS

 Using software to simulate:

 The target microprocessor instruction set

 The target memory (types - RAM)

 The target microprocessor architecture (interconnections and components)

 Simulator – must understand the linker/locator Map format, parse and interpret it

 Simulator – takes the Map as input, reads the instructions from simulated ROM, reads/writes from/to simulated registers

 Provide a user interface to simulator for I/O, debugging (using, e.g., a macro language)

 Capabilities of Simulators:

 Collect statistics on # instructions executed, bus cycles for estimating actual times

 Easier to test assembly code (for startup software and interrupt routines) in simulator

SCE DEPT OF IT 111 IT2354 EMBEDDED SYSTEMS  Easier to test for portability since simulator takes same Map as the target

 Other parts, e.g., timers and built-in peripherals, can be tested in the corresponding simulated versions in the simulated microprocessor architecture

 What simulators can’t help:

 Simulating and testing ASICs, sensors, actuators, specialized radios (perhaps, in future systems!!)

 Lacking I/O interfaces in simulator to support testing techniques discussed (unless additional provision is made for I/O to support the scaffold; and scripts to format and reformat files between the simulator, simulated memory, and the scaffold)

 Like storage scopes that (first) capture many signals and displays them simultaneously

 It knows only of VCC and ground voltage levels (displays are like timing diagrams) – Real scopes display exact voltage (like analog)

 Can be used to trigger on-symptom and track back in stored signal to isolate problem

 Many signals can be triggered at their low and/or high points and for how long in that state

 Used in Timing or State Mode

 Logic Analyzers in Timing Mode

 Find out if an event occurred – did cordless scanner turn on the radio?

 Measure how long it took software to respond to an interrupt (e.g., between a button interrupt signal and activation signal of a responding device – to turn off an bell)

 Is the software putting out the right pattern of signals to control a hardware device – looking back in the captured signal for elapsed time

 Logic Analyzers in State Mode

 Captures signals only on clock-event occurring from the attached hardware

 Typical use: instructions read/fetched by microprocessor, data read from or written to ROM, RAM, or I/O devices

SCE DEPT OF IT 112 IT2354 EMBEDDED SYSTEMS  To do so, connect LA to address and data signals and RE/ signal on the ROM (or RAM)

 If triggering is on rising edge of RE/ pin, address and data signals will be captured

 Output of LA, called trace, is stored for later analysis – see Fig 10.18

 LA can be triggered on unusual event occurrences, then capture signals therefrom – especially for debugging purposes (e.g., writing data to wrong address, tracking a rarely occurring bug, filtering signals for select devices or events)

 LA can’t capture all signals, e.g., on fetch from caches, registers, un-accessed memory

SCE DEPT OF IT 113 IT2354 EMBEDDED SYSTEMS

UNIT V EMBEDDED SYSTEM DEVELOPMENT .

DESIGN METHODOLOGIES

This section considers the complete design methodology—a design process— for embedded computing systems power. Design goals :Three of these goals are summarized below. ■ Time-to-market: Customers always want new features.  The product that comes out first can win the market, even setting customer preferences for future generations of the product. The profitable market life for some products is 3–6 months—if you are 3 months late, you will never make money.  In some categories,the competition is against the calendar,not just competitors.  , for example, are disproportionately sold just before school starts in the fall.  If you miss your market window, you have to wait a year for another sales season. ■ Design cost:  Many consumer products are very cost sensitive.  Industrial buyers are also increasingly concerned about cost.  The costs of designing the system are distinct from manufacturing cost—the cost of ’ salaries, computers used in design, and so on must be spread across the units sold.  Design costs can also be important for high-volume consumer devices when time-tomarket pressures cause teams to swell in size. ■ Quality:  Customers not only want their products fast and cheap, they also want them to be right.  A design methodology that cranks out shoddy products will soon be forced out of the Marketplace Design Flows  A design flow is a sequence of steps to be followed during a design.  Some of the steps can be performed by tools,such as compilers or CAD systems;other steps can be performed by hand. In this section we look at the basic characteristics of design flows.

Fig shows the waterfall model introduced by Royce [Dav90], the first model proposed for the software development process.  The waterfall development model consists of five major phases: requirements analysis determines the basic characteristics of the system; architecture design decomposes the functionality into major components; coding implements the pieces and integrates them; testing uncovers bugs; and maintenance entails deployment in the field, bug fixes, and upgrades.

SCE DEPT OF IT 114 IT2354 EMBEDDED SYSTEMS

Above Figure illustrates an alternative model of software development called the spiral model While the waterfall model assumes that the system is built once in its entirety, the spiral model assumes that several versions of the system Successive refinement design methodology.  In this approach, the system is built several times.  A first system is used as a rough prototype, and successive models of the system are further refined. This methodology makes sense when you are relatively unfamiliar with the application domain for which you are building the system.  Refining the system by building several increasingly complex systems allows you to test out architecture and design techniques. The various iterations may also be only partially completed;for example,continuing an initial system only through the detailed design phase may teach you enough to help you avoid many mistakes in a second design iteration that is carried through to completion.  Embedded computing systems often involve the design of hardware as wellas software.  Even if you aren’t designing a board, you may be selecting boards and plugging together multiple hardware components as well as writing code

Concurrent engineering SCE DEPT OF IT 115 IT2354 EMBEDDED SYSTEMS

 Concurrent engineering attempts to take a broaderapproach and optimize the total flow.  Reduced design time is an important goal for concurrent engineering, but it can help with any aspect of the design that cuts across the design flow, such as reliability, performance, power consumption, and so on.  It tries to eliminate “over-the-wall” design steps, in which one designer

performs an isolated task and then throws the result over the wall to the next designer, with little interaction between the two.  In particular, reaping the most benefits from concurrent engineering usually requires eliminating thewall between design and manufacturing. Concurrent engineering efforts are comprised of several elements: ■ Cross-functional teams include members from various disciplines involved in the process, including manufacturing, hardware and software design, marketing, and so forth. ■ Concurrent product realization process activities are at the heart of concurrent engineering. Doing several things at once, such as designing various subsystems simultaneously, is critical to reducing design time. ■ Incremental information sharing and use helps minimize the chance that concurrent product realization will lead to surprises. As soon as new information becomes available, it is shared and integrated into the design. Cross functional teams are important to the effective sharing of information in a timely fashion. ■ Integrated project management ensures that someone is responsible for the entire project, and that responsibility is not abdicated once one aspect of the work is done. Early and continual supplier involvement helps make the best use of suppliers’ capabilities. ■ Early and continual customer focus helps ensure that the product best meets customers’ needs.

REQUIREMENTS ANALYSIS

 Requirements are informal descriptions of what the customer wants, while specifications are more detailed, precise, and consistent descriptions of the system that can be used to create the architecture.  Both requirements and specifications are, however, directed to the outward behavior of the system, not its internal structure. The overall goal of creating a requirements document is effective communication between the customers and the designer A good set of requirements should meet several tests: ■ Correctness: The requirements should not mistakenly describe what the customer wants. Part SCE DEPT OF IT 116 IT2354 EMBEDDED SYSTEMS of correctness is avoiding over-requiring—the requirements should not add conditions that are not really necessary.

Unambiguousness: The requirements document should be clear and have only one plain language interpretation. ■ Completeness: All requirements should be included. ■ Verifiability: There should be a cost-effective way to ensure that each requirement is satisfied in the final product. For example, a requirement that the system package be “attractive” would be hard to verify without some agreed upon definition of attractiveness ■ Consistency: One requirement should not contradict another requirement. ■ Modifiability: The requirements document should be structured so that it can be modified to meet changing requirements without losing consistency, verifiability, and so forth. ■ Traceability: Each requirement should be traceable in the following ways:  We should be able to trace backward from the requirements to know why each requirement exists.  We should also be able to trace forward from documents created before the requirements (e.g., marketing memos) to understand how they relate to the final requirements.  We should be able to trace forward to understand how each requirement is satisfied in the implementation.  We should also be able to trace backward fromthe implementation to know which requirements they were intended to satisfy.

SPECIFICATIONS

Control-Oriented Specification Languages  We have already seen how to use state machines to specify control in UML.  An example of a widely used state machine specification language is the SDL language  SDL specifications include states, actions, and both conditional and unconditional transitions between states. SDL is an event-oriented state

Statechart

 The Statechart is one well-known technique for state-based specification that introduced some important concepts.  The Statechart notation uses an event-driven model  Statecharts allow states to be grouped together to show common functionality. SCE DEPT OF IT 117 IT2354 EMBEDDED SYSTEMS  There are two basic groupings: OR and AND

the AND/OR table, to describe similar relationships between states. An example AND/OR table and the Boolean expression it describes are shown in Figure.

SYSTEM ANALYSIS AND ARCHITECTURE DESIGN

 The CRC card methodology is a well-known and useful way to help analyze a system’s structure.  It is particularly well suited to object-oriented design since it encourages the encapsulation of data and functions  The acronym CRC stands for the following three major items that the methodology tries to identify: ■ Classes define the logical groupings of data and functionality.

■ Responsibilities describe what the classes do. ■ Collaborators are the other classes with which a given class works The name CRC card comes from the fact that the methodology is practiced by having people write on index cards. (In the ,the standard size for index cards is 3”x5”inches), An example card is shown in Figure

SCE DEPT OF IT 118 IT2354 EMBEDDED SYSTEMS

The walkthrough process used with CRC cards is very useful in scoping out a design and determining what parts of a system are poorly understood. This informal technique is valuable to tool-based design and coding following steps are followedwhen using it to analyze a system:  Develop an initial list of classes: Write down the class name and perhaps a few words on what it does.  Write an initial list of responsibilities and collaborators: The responsibilities list helps describe in a little more detail what the class does  Create some usage scenarios: These scenarios describe what the system does.  Walk through the scenarios: This is the heart of the methodology. During the walkthrough, each person on the team represents one or more classes.  Refine the classes, responsibilities, and collaborators: Some of this will be done during the course of the walkthrough, but making a second pass after the scenarios is a good idea  Add class relationships: Once the CRC cards have been refined, subclass and superclass relationships should become clearer and can be added to the cards.

QUALITY ASSURANCE

The quality assurance (QA) process is vital for the delivery of a satisfactory system.

QUALITY ASSURANCE TECHNIQUES

The International Standards Organization (ISO) has created a set of quality standards known as ISO 9000.

 ISO 9000 was created to apply to a broad range of industries, including but not limited to embedded hardware and software.  A standard developed for a particular product, such as wooden construction beams, could specify criteria particular to that product, such as the load that a beam must be able to carry.  However, a wide-ranging standard such as ISO 9000 cannot specify the detailed standards for every industry.

CASE STUDY INTRUDER ALARM SYSTEM

In this case study, we will consider the design and implementation of a small intruder alarm SCE DEPT OF IT 119 IT2354 EMBEDDED SYSTEMS system suitable for detecting attempted thefts in a home or business environment Figure

Figure shows the same gallery with the alarm system installed. In this figure, each of the windows has a sensor to detect class breakage. A magnetic sensor is also attached to the door. In each case, the sensors appear to be simple switches as far as the alarm system is concerned. Foll fig also shows a ‘bell box’ outside the property: this will sound if an intruder is detected

Inside the door (in Figure 10.2), we have the alarm control panel: this consists mainly of a small keypad, plus an additional ‘buzzer’ to indicate that the alarm has sounded. The alarm system is designed in such a way that the user – having set the alarm by entering a four-digit password – has time to open the door and leave the room before the monitoring process starts. Similarly, if the user opens the door when the system is armed, he or she will have time to enter the password before the alarm begins to sound. When initially activated, the system is in ‘Disarmed’ state.

_ In Disarmed state, the sensors are ignored. The alarm does not sound. The system remains in this state until the user enters a valid password via the keypad (in our demonstration system, the SCE DEPT OF IT 120 IT2354 EMBEDDED SYSTEMS password is ‘1234’). When a valid password is entered, the systems enters ‘Arming’ state.

_ In Arming state, the system waits for 60 seconds, to allow the user to leave the area before the monitoring process begins. After 60 seconds, the system enters ‘Armed’ state.

_ In Armed state, the status of the various system sensors is monitored. If a window sensor is tripped,31 the system enters ‘Intruder’ state. If the door sensor is tripped, the system enters ‘Disarming’ state.

The keypad activity is also monitored: if a correct password is typed in, the system enters ‘Disarmed’ state.

_ In Disarming state, we assume that the door has been opened by someone who may be an authorized system user.

The system remains in this state for up to 60 seconds, after which – by default – it enters Intruder state. If, during the 60- second period, the user enters the correct password, the system enters ‘Disarmed’ state.

_ In Intruder state, an alarm will sound. The alarm will keep sounding (indefinitely), until the correct password is entered

_ The embedded operating system, sEOS,. _ A simple ‘keypad’ library, based on a bank of switches.This final system would probably use at least 10 keys (see Figure 10.3): support for additional keys can be easily added if required. _ The RS-232 library Running the program

The software Part of the intruder-alarm code Part of the intruder-alarm code (Main.C) /*------*- Main.c (v1.00) ------Simple intruder alarm system. SCE DEPT OF IT 121 IT2354 EMBEDDED SYSTEMS -*------*/ #include "Main.H" #include "Port.H" #include "Simple_EOS.H" #include "PC_O_T1.h" #include "Keypad.h" #include "Intruder.h" /* ...... */ void main(void) { // Set baud rate to 9600 PC_LINK_O_Init_T1(9600); // Prepare the keypad KEYPAD_Init();

// Prepare the intruder alarm INTRUDER_Init(); // Set up simple EOS (5ms tick) sEOS_Init_Timer2(5); while(1) // Super Loop { sEOS_Go_To_Sleep(); // Enter idle mode to save power } } Part of the intruder-alarm code (Intruder.H) /*------*- Intruder.H (v1.00) ------– See Intruder.C for details. -*------*/ #include "Main.H" // ------Public function prototypes ------void INTRUDER_Init(void); void INTRUDER_Update(void); /*------*- Part of the intruder-alarm code (Intruder.C) /*------*- Intruder.C (v1.00) ------Multi-state framework for intruder alarm system. -*------*/ #include "Main.H" #include "Port.H" #include "Intruder.H" #include "Keypad.h" #include "PC_O.h" // ------Private data type declarations ------

Possible system states SCE DEPT OF IT 122 IT2354 EMBEDDED SYSTEMS typedef enum {DISARMED, ARMING, ARMED, DISARMING, INTRUDER} eSystem_state; // ------Private function prototypes ------bit INTRUDER_Get_Password_G(void); bit INTRUDER_Check_Window_Sensors(void); bit INTRUDER_Check_Door_Sensor(void); void INTRUDER_Sound_Alarm(void); // ------Private variables ------void INTRUDER_Init(void) { // Set the initial system state (DISARMED) System_state_G = DISARMED; // Set the 'time in state' variable to 0 State_call_count_G = 0; // Clear the keypad buffer KEYPAD_Clear_Buffer(); // Set the 'New state' flag New_state_G = 1; // Set the (two) sensor pins to 'read' mode Window_sensor_pin = 1; Sounder_pin = 1; } /* ------*/ void INTRUDER_Update(void) { // Incremented every time if (State_call_count_G < 65534) { State_call_count_G++;

} // Call every 50 ms switch (System_state_G) { case DISARMED: { if (New_state_G) { PC_LINK_O_Write_String_To_Buffer("\nDisarmed"); New_state_G = 0; } // Make sure alarm is switched off Sounder_pin = 1; // Wait for correct password ... if (INTRUDER_Get_Password_G() == 1) { System_state_G = ARMING; SCE DEPT OF IT 123 IT2354 EMBEDDED SYSTEMS New_state_G = 1; State_call_count_G = 0; break; } break; } case ARMING: { if (New_state_G) { PC_LINK_O_Write_String_To_Buffer("\nArming..."); New_state_G = 0; } // Remain here for 60 seconds (50 ms tick assumed) if (++State_call_count_G > 1200) { System_state_G = ARMED; New_state_G = 1; State_call_count_G = 0; break; } break; } case ARMED: { if (New_state_G) { PC_LINK_O_Write_String_To_Buffer("\nArmed"); New_state_G = 0; } // First, check the window sensors if (INTRUDER_Check_Window_Sensors() == 1) { // An intruder detected System_state_G = INTRUDER; New_state_G = 1; State_call_count_G = 0; break; } // Next, check the door sensors if (INTRUDER_Check_Door_Sensor() == 1) { // May be authorised user – go to 'Disarming' state System_state_G = DISARMING; New_state_G = 1;

State_call_count_G = 0; SCE DEPT OF IT 124 IT2354 EMBEDDED SYSTEMS break; } // Finally, check for correct password if (INTRUDER_Get_Password_G() == 1) { System_state_G = DISARMED; New_state_G = 1; State_call_count_G = 0; break; } break; } case DISARMING: { if (New_state_G) { PC_LINK_O_Write_String_To_Buffer("\nDisarming..."); New_state_G = 0; } // Remain here for 60 seconds (50 ms tick assumed) // to allow user to enter the password // – after time up, sound alarm if (++State_call_count_G > 1200) { System_state_G = INTRUDER; New_state_G = 1; State_call_count_G = 0; break; } // Still need to check the window sensors if (INTRUDER_Check_Window_Sensors() == 1) { // An intruder detected System_state_G = INTRUDER; New_state_G = 1; State_call_count_G = 0; break; } // Finally, check for correct password if (INTRUDER_Get_Password_G() == 1) { System_state_G = DISARMED; New_state_G = 1; State_call_count_G = 0; break; } break; SCE DEPT OF IT 125 IT2354 EMBEDDED SYSTEMS } case INTRUDER: { if (New_state_G) { PC_LINK_O_Write_String_To_Buffer("\n** INTRUDER! **"); New_state_G = 0; } // Sound the alarm! INTRUDER_Sound_Alarm(); // Keep sounding alarm until we get correct password if (INTRUDER_Get_Password_G() == 1) { System_state_G = DISARMED;

New_state_G = 1; State_call_count_G = 0; } break; } } } /* ------*/ bit INTRUDER_Get_Password_G(void) { signed char Key; tByte Password_G_count = 0; tByte i; // Update the keypad buffer KEYPAD_Update(); // Are there any new data in the keypad buffer? if (KEYPAD_Get_Data_From_Buffer(&Key) == 0) { // No new data – password can’t be correct return 0; } Case study: Intruder alarm system 267 8322 Chapter 10 p255-284 21/2/02 10:04 am Page 267 // If we are here, a key has been pressed // How long since last key was pressed? // Must be pressed within 50 seconds (assume 50 ms 'tick') if (State_call_count_G > 1000) { // More than 50 seconds since last key // – restart the input process State_call_count_G = 0; Position_G = 0; } if (Position_G == 0) SCE DEPT OF IT 126 IT2354 EMBEDDED SYSTEMS { PC_LINK_O_Write_Char_To_Buffer('\n'); } PC_LINK_O_Write_Char_To_Buffer(Key); Input_G[Position_G] = Key; // Have we got four numbers? if ((++Position_G) == 4) { Position_G = 0; Password_G_count = 0; // Check the password for (i = 0; i < 4; i++) { if (Input_G[i] == Password_G[i]) { Password_G_count++; } } } if (Password_G_count == 4) { // Password correct return 1; } else 268 Embedded C 8322 Chapter 10 p255-284 21/2/02 10:04 am Page 268 { // Password NOT correct return 0; } } /* --- bit INTRUDER_Check_Door_Sensor(void) { // Single door sensor (access route) if (Door_sensor_pin == 0) { // Someone has opened the door... PC_LINK_O_Write_String_To_Buffer(’\nDoor open’); return 1; } // Default return 0; } /* ------*/ void INTRUDER_Sound_Alarm(void) { Case study: Intruder alarm system 269 8322 Chapter 10 p255-284 21/2/02 10:04 am Page 269 SCE DEPT OF IT 127 IT2354 EMBEDDED SYSTEMS if (Alarm_bit) { // Alarm connected to this pin Sounder_pin = 0; Alarm_bit = 0; } else { Sounder_pin = 1; Alarm_bit = 1; } } /*------*- ---- END OF FILE ------*-

Figure shows one possible organization for an embedded system.

An embedded system encompasses the CPU as well as many other resources.

In addition to the CPU and memory hierarchy, there are a variety of interfaces that enable the system to measure, manipulate, and otherwise interact with the external environment. Some differences with desktop computing may be:

 The human interface may be as simple as a flashing light or as complicated as real-time robotic vision.  The diagnostic port may be used for diagnosing the system that is being controlled -- not just for diagnosing the computer.  Special-purpose field programmable (FPGA), application specific (ASIC), or even non- digital hardware may be used to increase performance or safety.  Software often has a fixed function, and is specific to the application.

In addition to the emphasis on interaction with the external world, embedded systems also provide functionality specific to their applications. Instead of executing , SCE DEPT OF IT 128 IT2354 EMBEDDED SYSTEMS word processing and engineering analysis, embedded systems typically execute control laws, finite state machines, and signal processing algorithms. They must often detect and react to faults in both the computing and surrounding electromechanical systems, and must manipulate application-specific user interface devices.

Table 1. Four example embedded systems with approximate attributes.

In order to make the discussion more concrete, we shall discuss four example systems (Table 1). Each example portrays a real system in current production, but has been slightly genericized to represent a broader cross-section of applications as well as protect proprietary interests. The four examples are a Signal Processing system, a Mission Critical control system, a Distributed control system, and a Small consumer electronic system. The Signal Processing and Mission Critical systems are representative of traditional military/aerospace embedded systems, but in fact are becoming more applicable to general commercial applications over time.

Using these four examples to illustrate points, the following sections describe the different areas of concern for embedded system design: computer design, system-level design, life-cycle support, business model support, and design culture adaptation.

Desktop computing design methodology and tool support is to a large degree concerned with initial design of the digital system itself. To be sure, experienced designers are cognizant of other aspects, but with the recent emphasis on quantitative design (e.g., [8]) life-cycle issues that aren't readily quantified could be left out of the optimization process. However, such an approach is insufficient to create embedded systems that can SCE DEPT OF IT 129 IT2354 EMBEDDED SYSTEMS effectively compete in the marketplace. This is because in many cases the issue is not whether design of an immensely complex system is feasible, but rather whether a relatively modest system can be highly optimized for life-cycle cost and effectiveness.

While traditional digital design CAD tools can make a computer designer more efficient, they may not deal with the central issue -- embedded design is about the system, not about the computer. In desktop computing, design often focuses on building the fastest CPU, then supporting it as required for maximum computing speed. In embedded systems the combination of the external interfaces (sensors, actuators) and the control or sequencing algorithms is or primary importance. The CPU simply exists as a way to implement those functions. The following experiment should serve to illustrate this point: ask a roomful of people what kind of CPU is in the personal computer or workstation they use. Then ask the same people which CPU is used for the engine controller in their car (and whether the CPU type influenced the purchasing decision).

In high-end embedded systems, the tools used for design are invaluable. However, many embedded systems both large and small must meet additional requirements that are beyond the scope of what is typically handled by design . These additional needs fall into the categories of special computer design requirements, system-level requirements, life-cycle support issues, business model compatibility, and design culture issues.

LIFE-CYCLE SUPPORT

Figure shows one view of a product life-cycle (a simplified version of the view taken by [13]).

Figure An embedded system lifecycle. SCE DEPT OF IT 130 IT2354 EMBEDDED SYSTEMS First a need or opportunity to deploy new technology is identified. Then a product concept is developed. This is followed by concurrent product and manufacturing process design, production, and deployment. But in many embedded systems, the designer must see past deployment and take into account support, maintenance, upgrades, and system retirement issues in order to actually create a profitable design. Some of the issues affecting this life-cycle profitability are discussed below.

COMPONENT ACQUISITION

Because an embedded system may be more application-driven than a typical technology-driven desktop computer design, there may be more leeway in component selection. Thus, component acquisition costs can be taken into account when optimizing system life-cycle cost. For example, the cost of a component generally decreases with quantity, so design decisions for multiple designs should be coordinated to share common components to the benefit of all.

Design challenge:

 Life-cycle, cross-design component cost models and optimization rather than simple per- unit cost.

system certification

Embedded computers can affect the safety as well as the performance the system. Therefore, rigorous qualification procedures are necessary in some systems after any design change in order to assess and reduce the risk of malfunction or unanticipated system failure. This additional cost can negate any savings that might have otherwise been realized by a design improvement in the embedded computer or its software. This point in particular hinders use of new technology by resynthesizing hardware components -- the redesigned components cannot be used without incurring the cost of system recertification.

One strategy to minimize the cost of system recertification is to delay all design changes until major system upgrades occur. As distributed embedded systems come into more widespread use, another likely strategy is to partition the system in such a way as to minimize the number of subsystems that need to be recertified when changes occur. This is a partitioning problem affected by potential design changes, technology insertion strategies, and regulatory requirements.

Design challenge:

 Partitioning/synthesis to minimize recertification costs. logistics and repair

Whenever an embedded computer design is created or changed, it affects the downstream maintenance of the product. A failure of the computer can cause the entire system to be unusable until the computer is repaired. In many cases embedded systems must be repairable in a few SCE DEPT OF IT 131 IT2354 EMBEDDED SYSTEMS minutes to a few hours, which implies that spare components and maintenance personnel must be located close to the system. A fast repair time may also imply that extensive diagnosis and data collection capabilities must be built into the system, which may be at odds with keeping production costs low.

Because of the long system lifetimes of many embedded systems, proliferation of design variations can cause significant logistics expenses. For example, if a component design is changed it can force changes in spare component inventory, maintenance test equipment, maintenance procedures, and maintenance training. Furthermore, each design change should be tested for compatibility with various system configurations, and accommodated by the configuration management database.

Design challenge:

 Designs optimized to minimize spares inventory.  High-coverage diagnosis and self-test at system level, not just digital component level.

Upgrades

Because of the long life of many embedded systems, upgrades to electronic components and software may be used to update functionality and extend the life of the embedded system with respect to competing with replacement equipment. While it may often be the case that an electronics involves completely replacing circuit boards, it is important to realize that the rest of the system will remain unchanged. Therefore, any special behaviors, interfaces, and undocumented features must be taken into account when performing the upgrade. Also, upgrades may be subject to recertification requirements.

Of special concern is software in an upgraded system. Legacy software may not be executable on upgraded replacement hardware, and may not be readily cross-compiled to the new target CPU. Even worse, timing behavior is likely to be different on newer hardware, but may be both undocumented and critical to system operation.

Design challenge:

 Ensuring complete interface, timing, and functionality compatibility when upgrading designs.

Long-term component availability

When embedded systems are more than a few years old, some electronic components may no longer be available for production of new equipment or replacements. This problem can be especially troublesome with obsolete processors and small-sized dynamic memory chips.

When a product does reach a point at which spare components are no longer economically available, the entire embedded computer must sometimes be redesigned or upgraded. This redesign might need to take place even if the system is no longer in production, depending on the availability of a replacement system. This problem is a significant concern on the Distributed example system. SCE DEPT OF IT 132 IT2354 EMBEDDED SYSTEMS Design challenge:

Cost-effectively update old designs to incorporate new components

IT2354 EMBEDDED SYSTEMS

UNIT I - EMBEDDED COMPUTING PART A 1. Define Embedded Systems with examples. Embedded system is defined as any device that includes a programmable computer but is not itself intended to be a general-purpose computer. Examples: ,

2. Write the sequence of design process of an embedded system. Requirements, Specification, Architecture, designing hardware and software components and integration

3. Differentiate between Microcontroller and Microprocessor. Microprocessor is typically designed to be general purpose processor which requires separate external memory and I/O interfaces. Example: ARM Processor Microcontrollers can be considered as self-contained, cost effective systems with a processor, on chip memory and Input/Output peripherals built into a single package. Example: 8051 Microcontroller

4. What are functional and non-functional requirements? Functional description gives the basic functions of the embedded system being designed. Non-functional requirements are the other requirements such as performance, cost, physical size and weight, power consumption etc.

5. Why system integration phase is difficult? System integration is difficult because it usually uncovers problems. It is often hard to observe the system in sufficient detail to determine exactly what is wrong because the debugging facilities for embedded systems are usually much more limited than what you would find on desktop systems.

6. Differentiate between Von-Neumann and . Harvard It contains Separate memory for both program and data Self Modifying programs are difficult to write Von Neumann It contains same memory for both program and data More possibility of self Modifying programs

7. Differentiate big and little-endian byte ordering modes.

SCE DEPT OF IT 133 IT2354 EMBEDDED SYSTEMS

8. Give the difference between RISC and CISC architecture. CISC Contains complex instruction set Contains more addressing modes RISC Contains simple instruction set Contains few addressing modes

9. What is the use of CPSR register? Current Program Status Register in ARM is set automatically during every arithmetic, logical, or shifting operation. The top four bits of the CPSR hold useful information about the results of that arithmetic/logical operation.

10. Give the details of CSPR register (N)Negative-The negative (N) bit is set when the result is negative in two’scomplement arithmetic. (Z) Zero-The zero (Z) bit is set when every bit of the result is zero. (C)Carry-The carry (C) bit is set when there is a carry out of the operation. (V) Overflow- The overflow (V) bit is set when an arithmetic operation results in an overflow.

11. Give an example for register indirect addressing in ARM.

12. What is post indexing in ARM? In Post-indexing addressing mode the offset is not added to base register until after the fetch from the base register has been performed. SCE DEPT OF IT 134 IT2354 EMBEDDED SYSTEMS

13. What is base plus offset addressing in ARM. Base plus offset addressing does not use register value in base register directly as an address, the register value is added to another value to form the address. The added value is called the offset.

14. What are the components of an 8051 microcontroller? The 8051 microcontroller has 128 bytes of RAM, 4K bytes of on-chip ROM, Two timers, one serial port, four I/O ports, each 8 bits wide and 6 interrupt sources.

15. Name the most widely used registers in 8051. The most widely used registers in 8051 are A () for all arithmetic and logic instructions B, R0, R1, R2, R3, R4, R5, R6, R7, DPTR (data pointer), and PC (program counter). 16. What is program status word register? Give its structure. The program status word (PSW) register, also referred to as the flag register, is an 8 bit register. Only 6 bits are used and these four are CY (carry), AC (auxiliary carry), P (parity), and OV (overflow) are called conditional flags, meaning that they indicate some conditions that resulted after an instruction was executed. The PSW3 and PSW4 are designed as RS0 and RS1, and are used to change the register bank. The two unused bits are user-definable.

17. Give the structure of RAM in 8051. There are 128 bytes of RAM in the 8051 and the assigned addresses are from 00 to 7FH. The 128 bytes are divided into three different groups as follows: 1) A total of 32 bytes from locations 00 to 1F hex are set aside for register banks and the stack 2) A total of 16 bytes from locations 20H to 2FH are set aside for bit addressable read/write memory 3) A total of 80 bytes from locations 30H to 7FH are used for read and write storage, called scratch pad.

18. How can you switch to other register banks in 8051 microcontroller? We can switch to other banks by use of the PSW register. Bits D4 and D3 of the PSW are used to select the desired register bank. We use the bit-addressable instructions SETB and CLR to access PSW.4 and PSW.3.

SCE DEPT OF IT 135 IT2354 EMBEDDED SYSTEMS

19. What is the default register bank in 8051? RAM locations from 0 to 7 are set aside for bank 0 of R0-R7 where R0 is RAM location 0. Register bank 0 is the default when 8051 is powered up.

20. How are stacks accessed in 8051? The register used to access the stack is called the SP (stack pointer) register. The stack pointer register is only 8 bits wide and can take values of 00 to FFH. When 8051 is powered up, the SP contains value 07. This means RAM location 08 is the first location being used for the stack by 8051.

21. Name some of the hardware parts of embedded systems?  Power source  Clock oscillator circuit  Timers  Memory units  DAC and ADC  LCD and LED displays  Keyboard/Keypad

PART-B 1. Explain in detail the design process in embedded system. [16] 2. What are the challenges of Embedded System? [8] 3. Explain various architectural features of 8051 micro processor. [8] 4. Explain ARM Processor, Architecture and Instruction set Programming. [16] 5. Discuss the data operations of ARM processor. [8] 6. (i) What are the various problems that must be taken into account in embedded system design? [8] (ii) List the hardware units that must be present in the embedded systems. [8] 7. (i) Explain the various data operations involved in ARM. [8] (ii) Implement an if statement in ARM. [8] 8. With the help of a neat block diagram explain the 8051 processor. [16] 9. (i) List the various types of instructions in 8051. [8] (ii) Write an 8051 C program to toggle all the bits of P0, P1 and P2 continuously with a 250 ms delay. Use the sfr keyword to declare the port address. [8] SCE DEPT OF IT 136 IT2354 EMBEDDED SYSTEMS

UNIT II - MEMORY AND INPUT / OUTPUT MANAGEMENT PART A

1. What are Data registers in I/O devices? Data registers hold values that are treated as data by the device, such as the data read or written by a disk.

2. What are Status registers in I/O devices? Status registers provide information about the device’s operation, such as whether the current transaction has completed.

3. What is baud rate? The data bits are sent as high and low voltages at a uniform rate. This uniform rate at which data bits are sent or received is known as baud rate.

4. What is I/O -mapped I/O and memory-mapped I/O? I/O mapped I/O uses separate I/O instructions for I/O programming and a separate address space is provided for I/O operations. Memory-mapped I/O uses the same address space used by CPU instructions for the registers in each I/O device and Programs use the CPU’s normal read and write instructions to communicate with the devices.

5. What is polling? Checking an I/O device whether it is finished its ready by reading its status register is often called polling.

6. Define Interrupt. An Interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution. A hardware interrupt causes the processor to save its state of execution and begin execution of a handler. Software interrupts are usually implemented as instructions in the instruction set, which cause a context switch to an interrupt handler similar to a hardware interrupt. Interrupts are a commonly used technique for , especially in real-time computing. Such a system is said to be interrupt- driven.

7. What are interrupt priorities and vectors? Interrupt priorities allow the CPU to recognize some interrupts as more important than others, and interrupt vectors allow the interrupting device to specify its handler.

8. What is interrupt Masking? The priority mechanism must ensure that a lower-priority interrupt does not occur when a higher-priority interrupt is being handled. The decision process is known as masking.

9. What do you mean by a non-maskable interrupt? The highest-priority interrupt is normally called the non-maskable interrupt (NMI).

10. What is ISR?

SCE DEPT OF IT 137 IT2354 EMBEDDED SYSTEMS An interrupt handler, also known as an interrupt service routine (ISR), is a callback subroutine in an operating system or device driver whose execution is triggered by the reception of an interrupt used for servicing an interrupt.

11. What is a cache memory? A cache is a small, fast memory that holds copies of some of the contents of main memory. Because the cache is fast, it provides higher-speed access for the CPU; but since it is small, not all requests can be satisfied by the cache, forcing the system to wait for the slower main memory. Caching makes sense when the CPU is using only a relatively small set of memory locations at any one time; the set of active locations is often called the working set.

12. What is a cache controller? A cache controller mediates between the CPU and the memory system comprised of the main memory. The cache controller sends a memory request to the cache and main memory. If the requested location is in the cache, the cache controller forwards the location’s contents to the CPU and aborts the main memory request. If the location is not in the cache, the controller waits for the value from main memory and forwards it to the CPU.

13. What do you mean by a cache hit and a miss? If the requested location is found in the cache memory this is called a cache it. If it is not found in the cache the request is forwarded to the main memory, this is called a cache miss.

14. What are the types of cache misses? Compulsory miss, capacity miss and conflict miss

15. What is compulsory cache miss? A compulsory miss also known as a cold miss occurs the first time a location is used.

16. What is capacity cache miss? A capacity miss is caused by a too large working set.

17. What is conflict cache miss? A conflict miss happens when two locations map to the same location in the cache.

18. What is write-through and write-back? In write-through policy every write changes both the cache and the corresponding main memory location. This scheme ensures that the cache and main memory are consistent, but may generate some additional main memory traffic. We can reduce the number of times we write to main memory by using a write-back policy. If we write only when we remove a location from the cache, we eliminate the writes when a location is written several times before it is removed from the cache. This is known as write-back policy.

19. What is a page fault? When the CPU requests an address that is not in main memory, the generates an exception called a page fault.

SCE DEPT OF IT 138 IT2354 EMBEDDED SYSTEMS 20. What is a TLB? The efficiency of paged address translation may be increased by caching page translation information. A cache for address translation is known as a translation look aside buffer (TLB).

PART B 1. (i) Along with the structure of a cache memory explain it. [8] (ii) Explain the memory management units and address translation techniques. [8] 2. Explain in detail the programming of input and output devices. [16] 3. (i) List the various input and output devices and explain them in detail.[8] (ii) Explain the various memory devices along with component interfacing. [8] 4. Explain the various interrupt handling mechanisms. [16] 5. Discuss the memory devices and its interfacing concepts. [8] 6. Discuss the I/O devices & its interfacing concepts? [8]

UNIT III - PROCESSES AND OPERATING SYSTEMS PART A

1. Define Process. Process can be defined as a program unit in execution the state of which is controlled by OS. It is a single execution of a program. Process is also defined as computational unit, which is scheduled and runs on the CPU by a kernel. Kernel controls the creation of the process and states of the process.

2. Define . Thread is a light weight process. Processes that share the same address space are often called threads.

3. What are multi rate systems? Multirate embedded computing systems are very common, including automobile engines, printers, and cell phones. In all these systems, certain operations must be executed periodically, and each operation is executed at its own rate.

4. What is release time and deadline of a process? The release time is the time at which the process becomes ready to execute. A deadline specifies when a computation must be finished. The deadline for an aperiodic process is generally measured from the release time, since that is the only reasonable time reference.

5. What is the period and rate of a process? The period of a process is the time between successive executions. The process’s rate is the inverse of its period.

6. What is a task graph? A set of processes with data dependencies is called a task graph.

7. Define CPU utilization.

SCE DEPT OF IT 139 IT2354 EMBEDDED SYSTEMS Utilization is the ratio of the CPU time that is being used for useful computations to the total available CPU time. This ratio ranges between 0 and 1, with 1 meaning that all of the available CPU time is being used for system purposes.

8. What are the three basic scheduling states of a process? Waiting state Ready state and Execution state.

9. Define scheduling policy. A scheduling policy defines how processes are selected for promotion from the ready state to the running state.

10. What is hyperperiod? Hyperperiod is the least-common multiple of the periods of all the processes. It is a finite period that covers all possible combinations of process executions.

11. What is scheduling overhead? The execution time required to choose the next execution process, which is incurred in addition to any context switching overhead is the scheduling overhead.

12. What do you mean by a context switch? The set of registers that define a process are known as its context and switching from one process’s register set to another is known as context switching.

13. What is process control block? The data structure that holds the state of the process is known as the process control block.

14. What are two major ways to assign priorities to a process? There are two major ways to assign priorities: static priorities that do not change during execution and dynamic priorities that do change.

15. What is Rate monotonic analysis (RMS)? The Rate monotonic analysis is the theory behind the Rate monotonic scheduling. The simple model of the system is as follows,  All processes run periodically on a single CPU.  Context switching time is ignored.  There are no data dependencies between processes.  The execution time for a process is constant.  All deadlines are at the ends of their periods.  The highest-priority ready process is always selected for execution.

16. What is the response time and critical instant of a process? The response time of a process is the time at which the process finishes. The critical instant for a process is defined as the instant during execution at which the task has the largest response time.

SCE DEPT OF IT 140 IT2354 EMBEDDED SYSTEMS

17. What are the two ways of process communication? A process can send a communication in one of two ways: blocking or Non-blocking. After sending a blocking communication, the process goes into the waiting state until it receives a response. Non-blocking communication allows the process to continue execution after sending the communication.

18. Define earliest deadline first scheduling (EDF)? Earliest deadline first (EDF) is a scheduling policy which uses a dynamic priority scheme—it changes process priorities during execution based on initiation times. As a result, it can achieve higher CPU utilizations than RMS. The EDF policy is very simple: It assigns priorities in order of deadline. The highest-priority process is the one whose deadline is nearest in time, and the lowest priority process is the one whose deadline is farthest away.

19. Define preemption. Preemption is the act of forcing a process out of execution i.e. making a context switch.

20. If your set of processes is un-schedulable and you need to guarantee that they complete their deadlines, give possible ways to solve this problem? The techniques available are as follows  Get a faster CPU.  Redesign the processes to take less execution time  Rewrite the specification to change the deadlines

21. What is problem? Priority inversion is a situation where in lower priority tasks will run blocking higher priority tasks waiting for a resource. The two processes may deadlock.

PART B 1. With appropriate diagrams explain multiple tasks and multiple processes. [16] 2. Explain in detail the various scheduling policies with example. [16] 3. Explain the following a) Inter process communication [8] b) Context Switching [8] 4. Explain in detail the various performance issues. [16] 5. Compare RMF and EDS with suitable example. [8]

UNIT IV - EMBEDDED SOFTWARE PART A 1. State some advantages of Assembly language?  It gives a precise control of the processor internal devices and full use of processor specific features in its instruction set and its addressing modes.  The machine codes are compact.  With the help of assembly language the basic concepts could be easily studied.  Memory required for the system is less.  Minimum assembly languages instruction only needed for device drivers. SCE DEPT OF IT 141 IT2354 EMBEDDED SYSTEMS 2. Write the advantages of high level language?  Standard library functions  Modular programming approach  Bottom up design  Top down design  Data types  Type checking  Control structures  Portability

3. Give the details of TCON SFR.

 TR0, TR1 - Timer run control bits  TF0, TF1 - Timer overflow flags  IE0, IE1 - Interrupt flags  IT0, IT1 - Interrupt type control bit

4. Give the details of TMOD SFR?

 Mode 1 (M1 = 1; M2 = 0)-16-bit timer/counter (with manual reload)  Mode 2 (M1 = 0; M2 = 1) -8-bit timer/counter (with 8-bit automatic reload)  GATE -Gating control  C / T- Counter or timer select bit

5. What is the advantage of a hardware generated delay over a software generated delay? Software generated delay is unreliable; it may change during compiler optimizations or depending on the machine on which it is run. Hardware generated delays are precise and is reliable.

6. What is the advantage of a portable hardware delay? In a portable hardware delay the operating frequencies can be changed dynamically and it is easy for the maintenance of the system.

7. Give the need for timeout mechanism. When a system has to be created as reliable then it should assure that the system will not get into an indefinite loop. To do this assurance, we need the timeout mechanism. They are loop timeout which is software based and hardware timeout which is based on hardware.

SCE DEPT OF IT 142 IT2354 EMBEDDED SYSTEMS 8. Give the uses of timer 2? It is used to produce the delay. It is employed in 1. Creation of an operating system 2. Real time applications.

9. Give the types of Multi State Systems? 1. Multi-state (timed) 2. Multi state (input/timed) 3. Multi state (input)

10. Give the characteristics of Multi state systems.  They involve a series of system states.  In each state, one or more functions may be called.  There will be rules defining the transitions between states.  As the system moves between states, one or more functions may be called.

11. What are multi-state timed systems? Give example. In a multi-state (timed) system, the transition between states will depend only on the passage of time. A basic traffic-light control system is an example.

12. What are multi-state timed input systems? Give example. In a multi-state timed input system the transition between states (and behaviors in each state) will depend both on the passage of time and on system inputs. The autopilot system, a , or intruder alarm systems are the examples.

13. What do you mean by a host and target machine? A typical embedded computing system has a relatively small amount of everything, including CPU horsepower, memory, I/O devices, and so forth. As a result, it is common to do at least part of the software development on a PC or workstation known as a host. The hardware on which the code will finally run is known as the target.

14. What is a ? A cross-compiler is a compiler that runs on one type of machine but generates code for another. After compilation, the executable code is downloaded to the embedded system by a serial link or perhaps burned in a PROM and plugged in.

15. What is test-bench program? The test-bench program generates inputs to simulate the actions of the input devices; it may also take the output values and compare them against expected values, providing valuable early debugging help.

16. What is a breakpoint? A breakpoint is a debugging tool for the user to specify an address at which the program’s execution is to break. When the PC reaches that address, control is returned to the monitor program. From the monitor program, the user can examine and/or modify CPU registers, after which execution can be continued.

SCE DEPT OF IT 143 IT2354 EMBEDDED SYSTEMS 17. What is an in-circuit-emulator? The microprocessor in-circuit emulator (ICE) is a specialized hardware tool that can help debug software in a working embedded system. At the heart of an in-circuit emulator is a special version of the microprocessor that allows its internal registers to be read out when it is stopped. The in-circuit emulator surrounds this specialized microprocessor with additional logic that allows the user to specify breakpoints and examine and modify the CPU state.

18. What is a logic analyzer? A logic analyzer is an array of inexpensive oscilloscopes—the analyzer can sample many different signals simultaneously (tens to hundreds) but can display only 0, 1, or changing values for each. All these logic analysis channels can be connected to the system to record the activity on many signals simultaneously.

19. What are the two modes in which logic analyzer can acquire data? A logic analyzer can acquire data in either of two modes that are typically called state and timing modes.

20. What is a simulator? A simulator is a software tool that runs on your host and simulates the behavior of the microprocessor and memory in your target system.

21. What is a Monitor? Monitors are debugging tools which are used to run the software on the actual target microprocessor while still giving a debugging interface similar to that of an in-circuit emulator.

22. Why debugging is a challenge in real time systems? Real-time programs are required to finish their work within a certain amount of time; if they run too long, they can create much unexpected behavior. The exact results of missing real- time deadlines depend on the detailed characteristics of the I/O devices and the nature of the timing violation. This makes debugging real-time problems especially difficult.

PART B 1. (i) Explain the various software development tools with their applications. [8] (ii) List the features of (a) Source code engineering tool [4] (b) Integrated Development Environment [4] 2. Explain in detail programming the embedded systems in assembly and C. [16] 3. Discuss the concept of programming in assembly language Vs High-level language. [16] 4. Implement the multi state timed system with an example. [16] 5. Implement the multi state Input/Timed system with an example. [16] 6. Explain the time-out mechanisms with examples. [16] 7. Explain (i) Emulators [8] (ii) Debuggers [8]

SCE DEPT OF IT 144 IT2354 EMBEDDED SYSTEMS UNIT V - EMBEDDED SYSTEM DEVELOPMENT PART A

1. What is design technology? Design technology involves the manner in which we convert our concept of desired system functionality into an implementation. Design methodologies are used in taking the decisions at the time of designing the large systems with multiple design team members.

2. What are the goals of design process? A design process has several important goals beyond function, performance, and power. They are time to market, design cost and quality

3. What is a design flow? A design flow is a sequence of steps to be followed during a design.

4. What are the phases in water fall development model? The waterfall development model consists of five major phases; they are requirements analysis, architecture, coding, testing and maintenance.

5. What are the elements of concurrent engineering?  Cross-functional teams  Concurrent product realization  Incremental information sharing and use  Integrated project management  Early and continual supplier involvement  Early and continual customer focus

6. What are requirements and specification? Requirements are informal descriptions of what the customer wants, while specifications are more detailed, precise, and consistent descriptions of the system that can be used to create the architecture.

7. What are the several tests met by a good set of requirements? The several tests that should be met by a good set of requirements are Correctness, Unambiguousness, Completeness, Verifiability, Consistency, Modifiability and Traceability.

8. What does the acronym CRC stands for? CRC stands for Classes, Responsibilities and Collaborators.  Classes - define the logical groupings of data and functionality.  Responsibilities - describe what the classes do.  Collaborators -are the other classes with which a given class works.

9. What are the steps to be followed in a CRC card methodology?  Develop an initial list of classes:  Write an initial list of responsibilities and collaborators  Create some usage scenarios  Walk through the scenarios SCE DEPT OF IT 145 IT2354 EMBEDDED SYSTEMS  Refine the classes, responsibilities, and collaborators  Add class relationships

10. What is the need for quality assurance (QA)? The quality assurance (QA) process is vital for the delivery of a satisfactory system.

11. What are the observations about quality management based on ISO 9000?  Process is crucial  Documentation is important  Communication is important

12. What are the five levels of maturity in capability maturity model? Initial, Repeatable, Defined, Managed and Optimizing are the five levels of maturity in CMM.

13. What is a design review? Design review is a simple, low-cost way to catch bugs early in the design process. A design review is simply a meeting in which team members discuss a design, reviewing how a component of the system works.

14. Give the members of the design review team. Designers, Review leader, Review scribe and Review audience are the members of the design review team.

15. What is the role of a review scribe in a design review? The review scribe records the minutes of the meeting so that designers and others know which problems need to be fixed.

16. Give the role of the review leader in a design review team. The review leader coordinates the pre-meeting activities, the design review itself, and the post-meeting follow-up. During the meeting, the leader is responsible for ensuring that the meeting runs smoothly.

17. What are the potential problems to be looked for by the audience of a design review meeting?  Is the design team’s view of the component’s specification consistent with the overall system specification, or has the team misinterpreted something?  Is the interface specification correct?  Does the component’s internal architecture work well?  Are there coding errors in the component?  Is the testing strategy adequate?

18. Why is the verification of specification very important? Verifying the requirements and specification is very important for the simple reason that bugs in the requirements or specification can be extremely expensive to fix later on. A bug introduced in the requirements or specification and left until maintenance could force an entire redesign of the product SCE DEPT OF IT 146 IT2354 EMBEDDED SYSTEMS

19. What is prototype? Prototype is the model of the system being designed. Prototypes are a very useful tool when dealing with end users—rather than simply describe the system to them in broad, technical terms, a prototype can let them see, hear, and touch at least some of the important aspects of the system.

20. Define successive refinement design methodology. In successive refinement design methodology, the system is built several times. A first system is used as a rough prototype, and successive models of the system are further refined. This methodology makes sense when you are relatively unfamiliar with the application domain for which you are building the system.

PART B 1. Draw and explain basic system of an Automatic chocolate vending system. [16] 2. Discuss with the diagram task synchronization model for a specific application. [16] 3. Explain in detail the design issues and techniques in embedded system development. [16] 4. (i) Explain the case study of an embedded system for a smart card. [8] (ii) Explain the features of Vx Works. [8] 5. Discuss the complete design of typical embedded system. [16]

SCE DEPT OF IT 147