<<

International Journal of Information Systems and Industrial Management Applications. ISSN 21507988 Volume 5 (2013) pp. 557563 © MIR Labs, www.mirlabs.net/ijcisim/index.html

Network Traffic Monitoring and Control for Multi core processors in computing applications

A.S.Radhamani1 , E.Baburaj 2

1Department of Computer Science and , Manonmanium Sundaranar University, Research Scholar, Tirunelveli. [email protected]

2 Department of Computer Science and Engineering, Sun College of Engineering and Technology, Professor , Nagercoil, [email protected]

Abstract- Parallel Programming (PP) used to be an area is Second, balancing the traffic among different processing units confined to scientific and applications. is challenging, as it is not possible to predict the nature of the However, with the proliferation of multicore processors, parallel incoming traffic. Exploiting the parallelism with programming has definitely become a mainstream of concern. To generalpurpose operating systems is even more difficult as satisfy the requirement, one can leverage multi-core they have not been designed for accelerating packet capture. architectures to parallelize traffic monitoring so as to progress During the last three decades, memory access has always been information processing capabilities over traditional one of the worst cases of and thus several solutions uni-processor architectures. In this paper an effective to this problem have been proposed .With the advent of framework for multi-core processors that strike a balance between control over the system and an effective network traffic Symmetric Multiprocessor Systems (SMP), multiple control mechanism for high-performance computing is proposed. processors are connected to the same memory bus, hereby In the proposed Cache Fair Scheduling (CFTS), causing processors to compete for the same memory information supplied by the user to guide threads scheduling and bandwidth. Integrating the memory controller inside the main also, where necessary, gives the fine control over processor is another approach for increasing the memory thread placement. For this wait-free data structure are applied in bandwidth. The main advantage of this architecture is fairly lieu of conventional -based methods for accessing internal obvious: multiple memory modules can be attached to each scheduler structure, alleviating to some extent serialization and processor, thus increasing bandwidth. In hence the degree of contention. Cloud computing has recently multiprocessors, such as SMP and NUMA (Non Uniform received considerable attention, as a promising approach for delivering network traffic services by improving the utilization Memory Access), a protocol [19] must be of data centre resources. The primary goal of scheduling used in order to guarantee synchronization among processors. framework is to improve application throughput and overall A multicore processor is a processing system composed of system utilization in cloud applications. The resultant aim of the two or more individual processors, called cores, integrated framework is to improve fairness so that each thread continues to onto a single chip package. As a result, the intercore make good forward progress. The experimental results show that bandwidth of multicore processors can be many times greater the parallel CFTS-WF could not only increase the processing than the one of SMP systems. rate, but also keep a well performance on stability which is For traffic monitoring and control applications, the most important for cloud computing. This makes, it an effective network traffic control mechanism for cloud computing. efficient approach to optimize the bandwidth utilization is to reduce the number of packet copies. In multicore and Keywords: Cache Fair Thread Scheduling, multicore, wait free data multiprocessor architectures, memory bandwidth can be structure, cloud computing, network traffic wasted in many ways, including improper scheduling, wrong balancing of Interrupt ReQuests (IRQ) and subtle mechanisms such as false sharing [14]. For these reasons, squeezing I. Introduction performance out of those architectures require additional effort. Even though there is a lot of ongoing research in this Cloud computing and multicore processor architecture is two area, most of the existing schedulers are unaware of emerging classes of execution environments that are rapidly architectural differences between cores; therefore the becoming widely adopted for the deployment of Web services. scheduling does not guarantee the best memory bandwidth Explicit parallel architectures require specification of parallel utilization. This may happen when two threads using the same task along with their interactions. In cloud computing data set are scheduled on different processors, or on the same environment multicore network traffic analysis become multicore processors having separate cache domains. challenge for several reasons. First, packet capture Therefore an effective software scheduler to better distribute applications are memory bound, but memory bandwidth does the workload among threads can substantially increase the not seem to increase as fast as the number of core available [3]. scalability is achieved by CFTS. In cloud computing software

MIR Labs, USA

558 Radhaman and Baburaj services are provided in the cloud and not on the local to study the tradeoffs of utilizing increased chip area for larger computer. In a multicore processor, several processor cores caches versus more cores is introduced in [1]. In [18], different are placed on the same chip. These cores can be utilized to scheduling policies for multicore processor with wait free data either execute several independent programs at the same time structure are evaluated in cloud computing applications. The or execute a single program faster. In the cloud, all the idea of treating clouds and multi cores as a single computing resources including infrastructure, platform and software are environment has been introduced by DavidWentzlaff et al. delivered as services, which are available to customers in a within the contest of the Fos [20]. They payperuse model. In this way, cloud computing gains propose the development of a modern Operating System to advantages such as cost savings, high availability, and easy target at the same time and in parallel the two different scalability [15]. Cloud must guarantee all resources as a computing platforms, using a well defined service based service and provide them to users by means of customized architecture. As cloud computing is a relatively new concept, it System Level Architectures (SLAs). However, in the cloud, is still at the early stage of research. Most of the published there might be tens of thousands or even more users accessing works focuses on general description of cloud, such as its resource simultaneously, which give an extremely high definition, advantages, challenges, and future [4] and [10]. In pressure on the cloud. When all users are requiring service detail, security is a very popular and important research field from the cloud, the out traffic from will be in cloud computing. Some researches focus on the data tremendous and the network bandwidth must be managed confidentiality and integrity in cloud computing, and some effectively to serve all users [23]. So, it is necessary to control analyze security problems in cloud computing from different the data streams and make a better utilization of free network points of view, such as network, servers, storage, system resources in the cloud. management and application layer [22]. Besides security, In this paper, a network traffic monitoring and control for another hot topic in cloud computing is multi core processors based on cloud is proposed. Also it technologies. A security Private Virtual Infrastructure [PVI] monitors the application behavior at runtime, analyze the and the architecture that cryptographically secures each virtual collected information, and optimize multicore architectures machine are proposed in [12]. A virtual machine image for cloud computing resources. The traditional operations on management system is introduced in [20], and a realtime traffic packet structures are modified to reduce the strong protection solution for virtual machine is given in [5]. A dependency in the sequential code. Then waitfree data RunTime Monitor (RTM) which is a system software to structures are applied selectively to make multicore monitor the characteristics of applications at runtime, analyze parallelization easier and manageable. Based on this, the the collected information, and optimize resources on a cloud parallel network traffic controller can run in a 1 way 2stage which consists of multicore processors are described in pipelined fashion on a multicore processor, which not only [16]. In this study on constructing many core architectures increases the processing speed significantly, but also performs well suited for the emerging application space of cloud well in stability. The scheduling framework is implemented computing where many independent applications are such that it first compares the existing deadline monotonic consolidated onto a single chip is described. Nevertheless, without wait free data structure and with wait free data both computing platforms require the software to correctly use structure. Similarly for parallel applications like cloud a very large and potentially heterogeneous pool of available computing the CFTS is also compared with and without wait execution resources. For instance, in the near future multicore free data structure. The remainder of this paper is organized as machines with more than 64 cores will become main stream follows. After the related work in Section 2, Section 3 gives [17]. On such a , the correct usage of each description of the problem for multi core systems in cloud core will become a relevant issue not only affecting the overall applications using Cache Fair Thread Scheduling algorithm performance of the service, but also impacting its power and Section 4 describes problem evaluation and section 5 consumption. provides results and discussions, followed by conclusion. To make CFTS capable for cloud computing, parallelization technology on multicore processor is required. A research II. Related Work trend on multicore parallelization is waitfree data structures. A general introduction to the wait free data structures is given To characterize scientific and transactional applications in in [8]. Cloud infrastructures IaaS, identifying the best virtual machine configuration in terms of the optimal processor III. Problem Statement allocation for executing parallel and distributed applications are proposed in [6]. A selforganized architecture for a This section first describes CFTS in detail by comparing it Cooperative Scheduling System considering that it must be with static scheduling algorithm (deadline monotonic) for able to perform scheduling in highly dynamic environments multi core systems while running cloud applications. The where there is incomplete information and changes often occur scheduling primitives must support a wide variety of is implemented in [2].The effect of heterogeneous data on the parallelization requirements. Moreover, some applications scheduling mechanisms of the cloud technologies and a need different scheduling strategies for different program comparison of performance of the cloud technologies under phases. In deadline monotonic, the time that a program takes to virtual and nonvirtual hardware platforms are given in [7]. The run will not depend only on its computation requirements but number of cores which fit on a single chip is growing at an also on the ones of its corunners. Therefore, the scheduler not exponential rate while offchip main memory bandwidth is only must select the task to be launched (e.g., a critical task) growing at a linear rate at best. This core count to offchip but also the appropriate core. The core must be selected bandwidth disparity causes percore memory bandwidth to according to the computational requirements of the tasks decrease as technology advances. An analytic model already running in each core. Therefore it is measured as time Network Traffic Monitoring and Control for Multi core processors in cloud computing applications 559 consuming for cloud applications. Also, as the number of tasks given foursteps are used to compute the cachefair scheduling (traffic) increases, the static scheduling system employed by a algorithm adjustment. traditional OS will no longer guarantee optimal execution of 1. Determine a thread’s fair L2 cache miss rate – a miss rate tasks. Therefore proposed CFTS reduces the effects of that the thread would experience under equal cache sharing. unequal CPU cache sharing that occur on these processors and 2. Compute the thread’s fair CPI rate – the cycles per cause unfair CPU sharing, priority inversion, and inadequate instruction under the fair cache miss rate. CPU accounting. CFTS attempts to minimize the effect of 3. Estimate the fair number of instructions – the number of thread level data dependencies and maintain efficient instructions the thread would have completed under the execution of all threads in the processor without excessive existing scheduling policy if it ran at its fair CPI rate (divide overhead for scheduling computation. In our work thread the number of cycles by the fair CPI). Then measure the actual scheduling performs in parallel with CPU operation by number of instructions completed. utilizing resources and thereby minimizing the amount of time 4. Estimate how many CPU cycles to give or take away to spent in the OS. In order to schedule the CFTS threads compensate for the difference between the actual and the fair effectively, it must have information regarding the current number of instructions. Adjust the thread’s CPU quantum state of all threads currently executing in the processor. To do accordingly. gain this knowledge wait free data structures are implemented The algorithm works in two phases: to store threads and maintain information about their current 1. Searching phase: status. Therefore on parallelizing CFTS by applying waitfree design principles can be used for the allocation and The scheduler computes the fair L2 cache miss rate for each management of shared network resources among different thread. classes of traffic streams with a wide range of performance requirements and traffic characteristics in highspeed 2. Calibration phase: packetswitched network architectures. The proposed CFTS maximizes the throughput, and can then be used for efficient A single calibration consists of computing the adjustment to bandwidth management and traffic control in order to achieve the thread’s CPU quantum and then selecting a thread from the high utilization of network resources while maintaining the best effort class whose CPU quantum is adjusted to offset the desired level of service for cloud computing by CFTS adjustment to the cachefair thread’s quantum. Calibrations scheduling in multi core systems. are repeated periodically. The challenge in implementing this algorithm is that in order to correctly compute adjustments to the CPU quanta and need A. Description of Cache Fair Thread scheduling algorithm to determine a thread’s fair CPI ratio using only limited CFTS is suitable for cloud computing for its idea of bandwidth information from hardware counters [13]. This algorithm borrowing. It can not only control the bandwidth of all users, reduces L2 contention by avoiding the simultaneous guaranteeing that all users could be given different levels of scheduling of problematic threads while still ensuring basic service by their payment, but also make more effective realtime constraints. usage of free resource and make a better user experience. In the cloud, there could be different kinds of leaf users, e.g. B. Description of Static Algorithm (Deadline Monotonic) 2Mbps, 5Mbps, regarding different service levels. Because To meet hard deadlines implies constraints upon the way in CFTS is a dynamic traffic control mechanism, classes can be which system resources are allocated at runtime. This includes added or removed dynamically. This makes CFTS scalable both physical and logical resources. Conventionally, resource enough for cloud computing. allocation is performed by scheduling algorithms whose On real hardware, it is possible to run only a single task at purpose is to interleave the executions of processes in the once, so while that one task runs, the other tasks that are system to achieve a predetermined goal. For hard realtime waiting for the CPU are at a disadvantage the current task systems the obvious goal is that no deadline is missed. One gets an unfair amount of CPU time. In CFTS this fairness scheduling method that has been proposed for hard realtime imbalance is expressed and tracked via the pertask systems is a type of deadline monotonic algorithm [11] .This is p>wait_runtime (nanosecunit) value. "wait_runtime" is the a static priority based algorithm for periodic processes in amount of time the task should now run on the CPU for it to which the priority of a process is related to its period. With this become completely fair and balanced. CFTS's task picking algorithm, several useful properties, including a schedulability logic is based on this p>wait_runtime value and it is thus very test that is sufficient and necessary the constraints that it simple: it always tries to run the task with the largest imposes on the process system are severe: processes must be p>wait_runtime value. So CFTS always tries to split up CPU periodic, independent and have deadline equal to period. The time between runnable tasks as close to ‘ideal multitasking processes to be scheduled are characterized by the following hardware' as possible. This algorithm redistributes CPU time relationship: to threads to account for unequal cache sharing: if a thread’s Computation time < deadline < period performance decreases due to unequal cache sharing it gets Based on this each core is characterized by: more time, and vice versa. The challenge in implementing this 1) The frequency of each core, fj, given in cycles per unit time. algorithm is determining how a thread’s performance is With DVS, fj can vary from fj min to fj max, where 0 < fj min < affected by unequal cache sharing using limited information fj max. From frequency it is easy to obtain the speed of the from the hardware. The cachefair scheduling algorithm does core, Sj, which is simply the inverse of the frequency. not establish a new CPU sharing policy but helps enforce 2) The specific architecture of a core, A(corej), includes the existing policies. The key part of our algorithm is correctly type the core, its speed in GHz, I/O, computing the adjustment to the thread's CPU quantum. The local cache and/or memory in Bytes. 560 Radhaman and Baburaj

Tasks: Consider a parallel application, T = {t1, t2, …, tn}, taskparallel execution environment in which parallel where ti is a task. Each task is characterized by: forloops can be executed interactively from the MATLAB 1) The computational cycles, ci, that it needs to complete. (The command prompt. The iterations of parfor loops are executed assumption here is that the ci is on labs (MATLAB sessions that communicate with each known a priori.) other). As a result, they can run on separate 2) The specific core architecture type, A(ti), that it needs to connected via a network. In the proposed work we only need to complete its execution. know that Toolbox makes parfor work 3) The deadline, di, before each task has to complete its efficiently on a single multicore system. execution. The application, T, also has a deadline, D, which is met if and EffThread=Nthread(index); only if the deadlines of all its tasks are met. Here, the deadline s=[]; can be larger than the minimum execution time and represents e=1; the time that the user is willing to tolerate because of the t=0; performanceenergy tradeoffs. The number of computational limit=numel(EffThread); cycles required by ti to execute on corej is a finite positive tStart=tic; number, denoted by cij. The execution time of ti under a while(limit>e) constant speed Sij, given in cycles per second is , for k=1:CacheMemory tij = cij/Sij. for i=1:NrMultiCore if(limit<=e) C. Description of Wait free data structure break; A waitfree data structure is a lockfree data structure with the end additional property that every thread accessing the data t=EffThread(e); structure can make complete its operation within a bounded e=e+1; number of steps, regardless of the behaviour of other threads. if(limit<=e) Each thread is guaranteed to be progressed itself or a break; cooperative thread [9]. This property means that highpriority end threads accessing the data structure never have to wait for if(i>1) lowpriority threads to complete their operations on the data t1=EffThread(e); structure, and every thread will always be able to make if(t==t1) progress when it is scheduled to run by the OS. For realtime s(k,i)=EffThread(e);%same core or semirealtime systems this can be an essential property, as pause(.002) the indefinite waitperiods of blocking or nonwaitfree else lockfree data structures do not allow their use within s(k,i)=EffThread(e);%Other cores timelimited operations. A waitfree data structure has the pause(.002) maximum potential for true concurrent access, without the end possibility of busy waits. end end IV. Problem Evaluation % pause(.05); end The advent of Cloud computing platforms and the growing if(limit<=e) pervasiveness of multicore processor architectures have break; revealed the inadequateness of traditional programming end models based on sequential computations, opening up many pause(.12) challenges for research on parallel programming models for end building distributed, serviceoriented systems. te=toc(tStart); To investigate the effects of different scheduler configurations on the performance of multicore processor, implementation is Table 1 .CFTS Algorithm a suite of MATLAB simulation. Parallel Computing Toolbox provides several highlevel programming constructs that help The main function of the deadline monotonic scheduling phase us to convert our applications to take advantage of computers is to coordinate the execution order of tasks. Based on this the equipped with multi core processors. Parallel programming execution priority assignments are assigned by each task and improves performance by breaking down a problem into are arranged in the descending order, and then be executed. smaller sub problems that are distributed to multiple For CFTS in the implementation it is assumed that the a processors. Thus, the benefits are twofold. First, the total multicore platform consisting of M cores and A cache amount of computation performed by each individual partitions, and a set of т independent tasks whose numbers of processor is reduced, resulting in faster computation. Second, cache partitions (cache space size needed) and Ei , the size of the problem can be increased by using more WCET(Worst Case Execution Time) are known for the memory available on multiple processors. platform , and further it is assumed that тi = (Ai,Ei, Di,Ti) to A parfor (parallel for) loop is useful in situations that require denote a task where Ai is the cache space size, Ei is the many loop iterations of a simple calculation, is used. As worstcase execution time (WCET), Di ≤ Ti is the relative maxNumCompThreads() controls the parallelism of the deadline for each release, and Ti is the minimum interarrival multithreading approach, the matlabpool command controls separation time also referred to as the period of the task. We the parallel behavior of the parfor syntax. matlabpool sets up a further assumed that all tasks are ordered by priorities, i.e., тi Network Traffic Monitoring and Control for Multi core processors in cloud computing applications 561 has higher priority than тj iff i

0 (CFTS) schedules are compared. This quantity is referred to as Games imrec zip coop Wpro plros ood Vls completion time. When running with a static scheduler, the Tasks difference between completion times is larger, but when Figure 3(a). Performance Variability with core2, cache2 running with the cache fair scheduler, it is significantly smaller. Figure 1 demonstrates normalized completion times Completion Time(core-4,Cache-2 5 DM with the static scheduler and Figure 2 demonstrates 4.5 DMWF Completion Time CFTS 4 normalized completion times with the Cache Fair Thread CFTS-W scheduler. Figure 3 shows the Performance variability of 3.5 3 different types of schedulers with variations in the cache and 2.5 core sizes for the different applications and the table shows the 2 average completion time for different applications with 1.5 1 various schedulers. From the performance analysis and 0.5 comparison made it is clear that, CFTSWF provides 0 Games imrec zi Wpro plro oo Vlsi maximum speed up, load balancing among all scheduling Tasks strategies used to obtain a minimum processing time. Figure 3(b). Performance Variability with core4, cache2

Completion Time(core-8,Cache-4 Hence the CFTSWF seek to maximize the use of concurrency 3 DM by mapping independent tasks on different threads, so that to Completion Time DMWF 2.5 CFTS minimize the total completion time by ensuring that processes CFTS-W are available to execute the tasks on the critical path as soon as 2 such tasks become executable, and it should seek to minimize 1.5 interaction among processes by mapping tasks with a high 1 degree of mutual interaction onto the same process. 0.5

0 Games imrec zi coo Wpro plro oo Vlsi

Tasks

Figure 3(c). Performance Variability with core8, cache4

562 Radhaman and Baburaj

Completion Time(core-32,Cache-4 3.5 cloud computing because it might face tens of thousands users DM

DMWF Completion3 Time CFTS at the same time. CFTS-W 2.5

2 VI. Conclusion 1.5 A significant body of the performance modeling research 1 literature has focused on various aspects of the parallel 0.5 computer scheduling problem and the allocation of computing 0 Games imrec zi coo Wpro plro oo Vlsi resources among the parallel jobs submitted for Tasks execution. Several classes of scheduling strategies have been Figure 3(d). Performance Variability with core32,cache 4 proposed for such computing environments, each differing in the way the parallel resources are shared among the jobs. This Core2, Core4, Core8, Core32, includes the class of spacesharing strategies that share the Algorithm cache 2 cache 4 cache 4 cache 4 processors in space by partitioning them among different (Average (Average (Average (Average parallel jobs, the class of timesharing strategies that share the completio completion completio completi n Time in Time in n Time in on Time processors by rotating them among a set of jobs in time, and Sec) Sec) Sec) in Sec) the class of scheduling strategies that combine both DM 5.9 2.6 1.7 1.6 spacesharing and timesharing. In this paper, a wait free DMWF 3.8 1.9 1.4 1.5 based parallel CFTS is implemented for effective and stable CFTS 1.9 1.7 2.0 1.7 traffic control in the cloud. Based on the algorithms on CFTSWF 1.3 1.2 1.3 1.4 accessing data structures and the usage of wait free FIFO, the Table 2. Average Completion time ( in sec) for different parallel CFTS can run a pipelined fashion. The experimental Algorithms analysis and evaluation results both indicate that parallel CFTS WF is more suitable for cloud computing, due to its excellent performance on both line rate and stability. Moreover, parallel network application based on multicore processor is cheaper and more scalable than special hardware and explore its more effective usage in cloud computing. This scheduler is a good solution to achieve the best VM (Virtual Machine) environment for running applications under Cloud Computing environments. In future CFTSWF can be experimented can be executed under a cluster with more processor scalability to allocate and deallocate cores in the VM in heterogeneous communication and computation environment.

Figure 4. Output traffic rate of total traffic References

[1] Agarwal, Anant; Miller, Jason; Beckmann, Nathan; Wentzlaff, David,” Core Count vs Cache Size for Manycore Architectures in the Cloud”, CSAIL Technical Reports, Volume 6568/2011, pp.3950. [2] Ana Madureira and Ivo Pereira, “Intelligent BioInspired System for Manufacturing Scheduling under Uncertainties” International Journal of Computer Information Systems and Industrial Management Applications, Volume 3 (2011) pp.072079 [3] K. Asanovic and others, The landscape of parallel Figure 5 .Output traffic rate of a selected user computing research: A view from Berkley. Technical Report UCB/EECS2006183, EECS Department, It is again designed to test whether the wait –free based CFTS University of California, Berkley, December (2006). could be competent for cloud scenario described in section 3 [4] R. Buyya, “MarketOriented Cloud Computing: under extremely high traffic pressure. Figure 4 and Figure 5 Vision, Hype, and Reality of Delivering Computing shows the output rate sampling at a certain time interval for as the 5th Utility”, Proc. of 9th IEEE/ACM both the total CFTS and a random selected user. To make the International Symposium on Cluster Computing and results more accurate, different time interval for total traffic the Grid (CCGRID’09), Shanghai, China, May, and user traffic is used, because their rates are at different 2009, pp. 16, doi: 10.1109/CCGRID.2009.97. levels. It is understood that the waitfree FIFO based [5] M. Christodorescu, R. Sailer. D. L. Schales, D. CFTSWF performs traffic control quite stable, and the Sgandurra and D. Zamboni, “Cloud Security Is Not resulting rate lines are nearly smooth, indicating the traffic (Just) Virtualization Security”, Proc. of the ACM rates are accurately retained. The stability is important for workshop on 2009, Network Traffic Monitoring and Control for Multi core processors in cloud computing applications 563

Chicago, IL,US, 2009, pp.97102, doi: [21] J. Wei, X. Zhang, G. Ammons, V. Bala, and P. Ning, 10.1145/1655008.1655022. “Managing security of virtual machine images in a [6] Denis R. Ogura, Edson T. Midorikawa, cloud environment”, Proc. Of the ACM workshop on "Characterization of Scientific and Transactional Cloud computing security 2009, Chicago, IL,US, Applications under Multicore Architectures on 2009, pp.9196, doi: 10.1145/1655008.1655021. Cloud Computing Environment," IEEE International [22] M. Yildiz, J. Abawajy, T. Ercan, and A. Bernoth, “A Conference on Computational Science and Layered security Approach for Cloud Computing Engineering, pp. 314320, 2010. Infrastructure”, Proc. of the 10th International [7] Ekanayake, J.; Gunarathne, T.; Qiu, J,” Cloud Symposium on Pervasive Systems, Algorithms, and Technologies for Bioinformatics Applications” , etworks, 2009, pp.763767, doi: IEEE Transactions on Parallel and Distributed 10.1109/ISPAN.2009.157. Systems, Volume: 22,pp 998 – 1011. [23] Zheng Li, Nenghai Yu, Zhuo Hao,” A Novel Parallel [8] K. Fraser and T. Harris, “Concurrent programming Traffic Control Mechanism for Cloud Computing, without locks”, ACM Transactions on Computer 2nd IEEE International Conference on Cloud Systems, vol. 25 (2), May 2007. Computing Technology and [9] J. Giacomoni, T. Moseley, and M. Vachharajani, Science,2010.pp.376382 “Fastforward for efficient pipeline parallelism: A cacheoptimized concurrent lock free queue”, Proc. Author Biographies of PPoPP’08, ew York, Y, USA, February 2008, A.S.Radhamani has received her Under Graduate degree in Electronics and pp.4352. Communication Engineering from Bharathiyar University and Master’s [10] J. Heiser and M. Nicolett, “Accessing the Security Degree in Computer Science and Engineering from Manonmanium Risks of Cloud Computing”, Gartner Inc., Stanford, Sundaranar University in 1995 and 2004 respectively. She is the author of six CT,2008,http://www.gartner.com/. publications. She is currently pursuing her PhD Degree at MSU, Tirunelveli. She is an active researcher and her research interest includes multi core [11] Ishfaq Ahmad, Sanjay Ranka, Sanjay Ranka, “Using computing, parallel and distributed processing and cloud computing. Game Theory for Scheduling Tasks on Multi Core Processors for Simultaneous Optimization of E. Baburaj has received his Under Graduate and Post Graduate in and Energy”, 2008.pp. 16. Science and Engineering from Madurai Kamaraj Univesity. He has obtained [12] F.J. Krautheim, “Private Virtual Infrastructure for his Doctoral Degree in Computer Science and Engineering from Anna University Chennai. Currently, he is the Dean of PG studies and Research of Cloud Computing”, in HotCloud’09, San Diego, CA, the Computer Science and Engineering Department, SCET, Nagercoil. His USA, June, 2009. main research focuses on High Performance and Computer Networks. More [13] S. Kim, D. Chandra and Y. Solihin. “Fair Cache than 20 publications are credited to his name. Sharing and Partitioning in a Chip Multiprocessor Architecture”, In Intl. Conference on Parallel Architectures and Compilation Techniques (PACT), 2004,pp.111122 [14] C. Leiserson and I. Mirman, “How to Survive the Multicore Software Revolution”, Arts, (2009). [15] Lizhe Wang, Jie Tao, Gregor von Laszewski, Holger Marten,” Multicores in Cloud Computing: Research Challenges for Applications”, Journal of computers, vol. 5, no. 6, june 2010. [16] Mikyung Kang , DongIn Kang, Stephen P. Crago, GyungLeen Park and Junghoon Lee , “Design and Development of a RunTime Monitor for MultiCore Architectures in Cloud Computing” , Sensors 2011, pp. 35953610. [17] Patterson. “The Trouble with Multicore”. Spectrum, IEEE, 47(7):pp.28–32, 2010. [18] A.S.Radhamani and E.Baburaj ,” Implementation of Cache Fair Thread Scheduling for multi core processors using wait free data structures in cloud computing applications”, Proc of 2011 World Congress on Information and Communication Technologies , ©2011, IEEE,pp.604609. [19] T. Tartalja and V. Milutinovich, “The Cache Coherence Problem in SharedMemory Multiprocessors”, Software Solutions, ISBN: 9780818670961, (1996). [20] D. Wentzlaff and A. Agarwal. “Factored Operating Systems (FOS): the Case for a Scalable Operating System for Multicores”. ACM SIGOPS Operating Systems Review, 43(2):76–85, 2009.(10)