Reducing Context Switching Overhead by Processor Architecture Modification
Total Page:16
File Type:pdf, Size:1020Kb
Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 2 (2013), pp. 155-162 © Research India Publications http://www.ripublication.com/aeee.htm Reducing Context Switching Overhead by Processor Architecture Modification Divi Pruthvi M.tech (VLSI & Embedded Systems), Sreenidhi Institute of Science & Technology Hyderabad, Andhra Pradesh, INDIA. Abstract The operating system is an essential component of the system software in any conventional system that even includes Real-time system (RTS) and the OS running on RTS refered to as RTOS. But, in case of real- time systems, the time at which output produced is of major concern, especially in case of Hard RTS the deadlines are to be met strictly, failure of which leads to disasters. Multi-tasking, one of the major attribute of OS involves context-switching. The switching between tasks involves saving the context of the current running task in to the stack and restoring the context of the task to be executed. The context of the task involves the value of Program counter, temporary register values, global pointer value, stack pointer value.The process of storing and restoring of context from memory to processor or viceversa is referred to as Context-Switching. Context-Switching introduces significant amount of overhead to the overall task execution time, as transfer of contents between processor and memory is a bit time consuming. This overhead has to be minimized by avoiding usage of external memory during context-switching, so that the deadlines are met i.e. output is produced at stipulated time. This paper focuses on reducing the overhead by modifying the architecture of the processor by adding additional register files in to the existing register bank of processor so that context can be saved on processor itself, thereby additional clock cycles for storing and restoring from memory is eliminated. The architecture is developed in VHDL. A co-operative RTOS is developed along with the applicaions in C language to test the new architecture. Modified GCC compiler is used for compiling the executable files.XC3S1600E FPGA is used as target board and the results are observed on the hyper-terminal of PC in form of time-stampings. 156 Divi Pruthvi Keywords: RTS, RTOS, Context Switching, Co-operative RTOS, Modified GCC compiler, XC3S1600E FPGA. 1. Introduction An operating system (OS) is a collection of software that manages the computer hardware resources and provides common services for computer programs.There are different types of OS available which includes Multi-tasking OS (Eg: windows). In multi-tasking OS, multiple tasks are executed at same time, but CPU can execute only 1 task at a time.The speed at which switching of tasks takes place is so high that it gives an illusion to the user that multiple tasks are handled by the system at a time.The tasks are executed on a time sharing basis and each task is allotted prescribed amount of time for it’s execution and once the time is elapsed, the next task will get CPU time for it’s execution.The tasks are scheduled by the scheduler, and is based on different scheduling algorithms, most simplest one is round-robin scheduling algorithm [1]. The concept of multi-tasking plays a vital role in case of Real-time systems where it is all about meeting deadlines, especially in case of Hard real-time systems where failure in meeting deadlines lead to disaster.The concept of multi-tasking involves a process called Context-switching and it imposes overhead due to its timing requirements [2], inturn degrading the performance of the RTOS. 1.1 Context-Switching & its Effect The essesntial feature of Multi-tasking OS is context-switching; it enables multiple processes to share single CPU. The cause for context switching to occur can be because of function calls, interrupts both hardware and software [9].The process of context switching involves storing the context (state) of the current executing task in to the stack and restoring the context of the task to be executed from the stack. The context of the task includes Program status word (flag register), program counter, stack pointer, temporary register values, data related to the next task [3]. As mentioned, context-switching will impose overhead due to it’s time requirements. The overhead can be reduced by migrating kernel services such as scheduling, time tick (a periodic interrupt to keep track of time during which the scheduler makes a decision) processing [4][8], and interrupt handling to hardware. The main objective of this paper is to reduce the effect of overhead caused due to context-switching, so as to meet the deadlines of Real-time systems. Dedicating register to a thread will eliminate the need for saving and restoring of context, but it reduces the number of registers available for other threads, potentially increasing their register-memory traffic and slowing execution [7]. In this paper, the overhead is reduced by restricting the use of memory during context-switching by adding register files to the processor. This makes the process to compute at much faster rate thereby reducing the overhead. Reducing Context Switching Overhead by Processor Architecture Modification 157 2. Implementation The implementation has two parts involved in it, one is modification of processor architecture and hardware implementation of instructions (scxt, rcxt) and the other is software part which involves developing co-operative operatin system, modification of GNU MIPS assembler adding two new instructions and finally developing applications to check the functionality. 2.1 Architecture Modification In this, first part deals with the register file design for the existing register bank of plasma MIPS processor. Plasma MIPS processor implements the “reg_bank” module in the FPGA’s block RAM, and all the context registers saving on a register file in one CPU clock cycle is not possible with this approach. To achieve this task, the original Plasma MIPS design is modified by implementing all the “reg_bank” registers in FPGA’s logic blocks. This design requires more FPGA logic resources but provides fast access to registers as compared to the original design. In addition to the existing 32 registers, 4 additional register files are implemented with each register file holds 12 context registers. A thread’s context includes 9 saved or temporary registers ($16 - $23) and $30(frame pointer)), the global pointer register ($28), the stack pointer register ($29), and the link register ($31) [6]. The Figure1 shows the modified MIPS architecture with reister files added to the register bank. The next part involves, implementing two context-switching instructions named scxt and rcxt to access these register files for storing and restoring the context during context-switching operation. Scxt, rcxt implementation involves determining the instruction bitmap. The instruction bitmap contains the function value that is indicated by last 6 LSB bits for scxt (111100), rcxt (111101), rt index which determines the index of the register file that points which set of context registers are to be used. 2.2 Co-operative RTOS & Assembler Modification The software part is further divided in to two parts where the first delas with developing a co-operative operating system involving basic functions for initializing the OS, creating tasks, scheduling the tasks, and developing the application files for testing the efficiency of newly modified architecture. The second part involves modification of GCC compiler. The GCC compiler is made aware of the newly aded instructions in this step, which inturn used to compile the MIPS C files. 158 Divi Pruthvi Figure1: Modified MIPS Architecture. void createTask(int TaskID, void *funcptr, unsigned char cnxt_type) Figure2. Task Structure After initializing the OS by calling InitOS, the application needs to call the createTask OS function to create different threads. The parameters asociated with theads are defined at the time of creating the threads as shown in Figure2 and schedule function is called to schedule the threads. Round-robin algorithm is used for scheduling the tasks. The scheduler picks up the task and once the allotted time is finished it calls schedule function that calls the next thread starting function. The second part of software involves compiler modification. GCC compiler is used for executing the MIPS C files and inorder that compiler understands the newly added instructions, those instruction are added to the GNU assembler. The instructions are specified in GNU “binutils” version 2.19. The “binutils- 2.19/gas” (GNU assembler) folder contains the source code for the MIPS assembler. The file “mips-opc.c” in “binutils-2.19/opcode” contains all the instructions supported by the MIPS processor [10]. The new instructions have been added in the file “mips-opc.c”. The parameters specific to new instructions to be defined in assembler are as shown in the Table1 Table1. MIPS-opc.c strucure elements. The addition of instructions to mips-opc.c makes GCC compiler compatible to the modified architecture and this compiler is used to compile the co-operative RTOS and application files written in C language. Reducing Context Switching Overhead by Processor Architecture Modification 159 2.3 Final Code synthesis & Experimentation The entire architecture is being implemented in VHDL language. The co-operative rtos and application files are implemented in C langugae and these MIPS executable files, application files, debugserial file are compiled using modified GCC compiler. The object files obtained as output of compilation are linked together to form .axf file, and finally “ram_image.vhd” is generated which determines the entry point of the RAM location in FPGA block. The compilation part, along with the generation of ram_image is done in linux environment, where a single make file is used as shell that holds the commands for peforming the actions mentioned above. The ram_image.vhd and the other vhdl files of modified MIPS architecture are synthesized using XILINX ISE, which gives us bit file and is dumped in to XC3S1600E FPGA.