Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 2 (2013), pp. 155-162 © Research India Publications http://www.ripublication.com/aeee.htm

Reducing Switching Overhead by Processor Architecture Modification

Divi Pruthvi

M.tech (VLSI & Embedded Systems), Sreenidhi Institute of Science & Technology Hyderabad, Andhra Pradesh, INDIA.

Abstract

The is an essential component of the system software in any conventional system that even includes Real-time system (RTS) and the OS running on RTS refered to as RTOS. But, in case of real- time systems, the time at which output produced is of major concern, especially in case of Hard RTS the deadlines are to be met strictly, failure of which leads to disasters. Multi-tasking, one of the major attribute of OS involves context-switching. The switching between tasks involves saving the context of the current running task in to the stack and restoring the context of the task to be executed. The context of the task involves the value of , temporary register values, global pointer value, stack pointer value.The of storing and restoring of context from memory to processor or viceversa is referred to as Context-Switching. Context-Switching introduces significant amount of overhead to the overall task execution time, as transfer of contents between processor and memory is a bit time consuming. This overhead has to be minimized by avoiding usage of external memory during context-switching, so that the deadlines are met i.e. output is produced at stipulated time. This paper focuses on reducing the overhead by modifying the architecture of the processor by adding additional register files in to the existing register bank of processor so that context can be saved on processor itself, thereby additional clock cycles for storing and restoring from memory is eliminated. The architecture is developed in VHDL. A co-operative RTOS is developed along with the applicaions in C language to test the new architecture. Modified GCC compiler is used for compiling the executable files.XC3S1600E FPGA is used as target board and the results are observed on the hyper-terminal of PC in form of time-stampings. 156 Divi Pruthvi

Keywords: RTS, RTOS, Context Switching, Co-operative RTOS, Modified GCC compiler, XC3S1600E FPGA.

1. Introduction An operating system (OS) is a collection of software that manages the computer hardware resources and provides common services for computer programs.There are different types of OS available which includes Multi-tasking OS (Eg: windows). In multi-tasking OS, multiple tasks are executed at same time, but CPU can execute only 1 task at a time.The speed at which switching of tasks takes place is so high that it gives an illusion to the user that multiple tasks are handled by the system at a time.The tasks are executed on a time sharing basis and each task is allotted prescribed amount of time for it’s execution and once the time is elapsed, the next task will get CPU time for it’s execution.The tasks are scheduled by the scheduler, and is based on different algorithms, most simplest one is round-robin scheduling algorithm [1]. The concept of multi-tasking plays a vital role in case of Real-time systems where it is all about meeting deadlines, especially in case of Hard real-time systems where failure in meeting deadlines lead to disaster.The concept of multi-tasking involves a process called Context-switching and it imposes overhead due to its timing requirements [2], inturn degrading the performance of the RTOS.

1.1 Context-Switching & its Effect The essesntial feature of Multi-tasking OS is context-switching; it enables multiple processes to share single CPU. The cause for context switching to occur can be because of function calls, both hardware and software [9].The process of context switching involves storing the context (state) of the current executing task in to the stack and restoring the context of the task to be executed from the stack. The context of the task includes Program status word (flag register), program counter, stack pointer, temporary register values, data related to the next task [3]. As mentioned, context-switching will impose overhead due to it’s time requirements. The overhead can be reduced by migrating kernel services such as scheduling, time tick (a periodic to keep track of time during which the scheduler makes a decision) processing [4][8], and interrupt handling to hardware. The main objective of this paper is to reduce the effect of overhead caused due to context-switching, so as to meet the deadlines of Real-time systems. Dedicating register to a will eliminate the need for saving and restoring of context, but it reduces the number of registers available for other threads, potentially increasing their register-memory traffic and slowing execution [7]. In this paper, the overhead is reduced by restricting the use of memory during context-switching by adding register files to the processor. This makes the process to compute at much faster rate thereby reducing the overhead.

Reducing Context Switching Overhead by Processor Architecture Modification 157

2. Implementation The implementation has two parts involved in it, one is modification of processor architecture and hardware implementation of instructions (scxt, rcxt) and the other is software part which involves developing co-operative operatin system, modification of GNU MIPS assembler adding two new instructions and finally developing applications to check the functionality.

2.1 Architecture Modification In this, first part deals with the register file design for the existing register bank of plasma MIPS processor. Plasma MIPS processor implements the “reg_bank” module in the FPGA’s block RAM, and all the context registers saving on a register file in one CPU clock cycle is not possible with this approach. To achieve this task, the original Plasma MIPS design is modified by implementing all the “reg_bank” registers in FPGA’s logic blocks. This design requires more FPGA logic resources but provides fast access to registers as compared to the original design. In addition to the existing 32 registers, 4 additional register files are implemented with each register file holds 12 context registers. A thread’s context includes 9 saved or temporary registers ($16 - $23) and $30(frame pointer)), the global pointer register ($28), the stack pointer register ($29), and the link register ($31) [6]. The Figure1 shows the modified MIPS architecture with reister files added to the register bank. The next part involves, implementing two context-switching instructions named scxt and rcxt to access these register files for storing and restoring the context during context-switching operation. Scxt, rcxt implementation involves determining the instruction bitmap. The instruction bitmap contains the function value that is indicated by last 6 LSB bits for scxt (111100), rcxt (111101), rt index which determines the index of the register file that points which set of context registers are to be used.

2.2 Co-operative RTOS & Assembler Modification The software part is further divided in to two parts where the first delas with developing a co-operative operating system involving basic functions for initializing the OS, creating tasks, scheduling the tasks, and developing the application files for testing the efficiency of newly modified architecture. The second part involves modification of GCC compiler. The GCC compiler is made aware of the newly aded instructions in this step, which inturn used to compile the MIPS C files.

158 Divi Pruthvi

Figure1: Modified MIPS Architecture. void createTask(int TaskID, void *funcptr, unsigned char cnxt_type)

Figure2. Task Structure After initializing the OS by calling InitOS, the application needs to call the createTask OS function to create different threads. The parameters asociated with theads are defined at the time of creating the threads as shown in Figure2 and schedule function is called to schedule the threads. Round-robin algorithm is used for scheduling the tasks. The scheduler picks up the task and once the allotted time is finished it calls schedule function that calls the next thread starting function. The second part of software involves compiler modification. GCC compiler is used for executing the MIPS C files and inorder that compiler understands the newly added instructions, those instruction are added to the GNU assembler. The instructions are specified in GNU “binutils” version 2.19. The “binutils- 2.19/gas” (GNU assembler) folder contains the source code for the MIPS assembler. The file “mips-opc.c” in “binutils-2.19/opcode” contains all the instructions supported by the MIPS processor [10]. The new instructions have been added in the file “mips-opc.c”. The parameters specific to new instructions to be defined in assembler are as shown in the Table1

Table1. MIPS-opc.c strucure elements.

The addition of instructions to mips-opc.c makes GCC compiler compatible to the modified architecture and this compiler is used to compile the co-operative RTOS and application files written in C language. Reducing Context Switching Overhead by Processor Architecture Modification 159

2.3 Final Code synthesis & Experimentation The entire architecture is being implemented in VHDL language. The co-operative rtos and application files are implemented in C langugae and these MIPS executable files, application files, debugserial file are compiled using modified GCC compiler. The object files obtained as output of compilation are linked together to form .axf file, and finally “ram_image.vhd” is generated which determines the entry point of the RAM location in FPGA block. The compilation part, along with the generation of ram_image is done in environment, where a single make file is used as that holds the commands for peforming the actions mentioned above. The ram_image.vhd and the other vhdl files of modified MIPS architecture are synthesized using XILINX ISE, which gives us bit file and is dumped in to XC3S1600E FPGA. The output corresponding to the application is observed on Hyper-terminal. The performance improvement is shown in form of time stampings that is number of clock cycles saved.

2.4 Application & Results To prove the point that the modified architecture reduces context-switching overhead, the test applications are developed using functions defined in the co-operative RTOS. In the first application four tasks are created using createTask() function, and each task is assigned with specific operation. The variables are declared globally and initialized. The first task deals with incrementing variables followed by adding them, storig the result in sum variable and finally displaying number of clock cycles consumed. At the end of every task schedule function will queue the next task.The result coresponding to the application is as shown in Figure3 The second application comprises of four tasks, where two tasks are structured to undergo fast context switching using internal register files and the other two tasks are structured to undergo context switching using external RAM. The outcome of this application contains the number of clock cycles consumed for context switching using internal register files, external RAM. And difference between two which indicated number of clock cycles saved as shown in Figure4.

Figure 3: Test Application-1 Result. 160 Divi Pruthvi

Figure 4: Test Application-2 Result.

3. Conclusion This paper introduced a concept of restricting the process of context-switching to the processor itself, without having the need for external memory to store and restores the context of tasks, which presumably rules out the extra time consumption due to memory to hardware transfer and viceversa, from overall execution time, thereby the efficiency of the OS is improved. This concept proves to be vital in case of Hard Real time systems where deadlines are to be met, and the output depends not only on the logic but also on the time at which it is produced i.,e in case of time critical applications. This can be further enhanced by avoiding the need of modifying the hardware, instead, reducing the clock cycle consumption for to and fro transfer of contents from memory to processor.

References [1] Krithi Ramamritham, John A. Stankovic, “Scheduling Algorithms and Operating Systems Support for Real-Time Systems”, PROCEEDINGS OF THE IEEE, VOL. 82, NO. I, JANUARY 1994. [2] Xiangrong Zhou, Peter Petrov, “Rapid and Low-Cost Context-Switch through Embedded Processor Customization for Real-Time and Control Applications”, DAC 2006, July 24.28, 2006, San Francisco, California, USA. [3] Francis M. David, Jeffrey C. Carlyle, Roy H. Campbell, “Context Switch Overheads for Linux on ARM Platforms”, Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL. [4] Jeffrey S. Snyder, David B. Whalley, Theodore P. Baker "Fast Context Switches: Compiler and Architectural support for Preemptive Scheduling" Microprocessors and Microsystems, 1995, pp. 35-42.

Reducing Context Switching Overhead by Processor Architecture Modification 161

[5] Pekka Jaaskelainen, Pertti Kellomaki, Jarmo Takala, Heikki Kultala, Mikael Lepisto, “Reducing Context Switch Overhead with Compiler-Assisted Threading”, Department of Computer Systems, Tampere University of Technology. [6] Howard Huang, “Basic MIPS Architecture”. January 27, 2003. [7] Siddhartha Shivshankar, Sunil Vangara and Alexander G. Dean, “Balancing Register Pressure and ContextSwitching Delays in ASTI Systems “, Center for Embedded Systems Research, Department of Electrical and Computer Engineering North Carolina State University, Raleigh, NC, USA [8] Pramote Kuacharoen, Mohamed A. Shalan and Vincent J. Mooney III, “A Configurable Hardware Scheduler for Real-Time Systems”, Center for Research on Embedded Systems and Technology, School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, Georgia [9] Dan Tsafrir, “The Context-Switch Overhead Inflicted by Hardware Interrupts (and the Enigma of Do-Nothing Loops)”, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 [10] www.opencores.org [11] Spartan-3E FPGA Starter Kit Board User Guide, www.xilinx.com

162 Divi Pruthvi