Anale. Seria Informatic ă. Vol. XI fasc. 2 – 2013 45 Annals. Computer Science Series. 11 th Tome 2 nd Fasc. – 2013

EFFECTS OF SYSTEM PAGING / THRASHING OVERHEAD ON SYSTEM PERFORMANCE BASED ON DEGREE OF MULTIPROGRAMMING

Akhigbe–Mudu Thursday Ehis

Department of Computer Science, Federal University of Agriculture Abeokuta, Nigeria

ABSTRACT: There is usually a strong concern about the system may degrade by multiple orders of magnitude functionality of computer systems and very little concern as a result of excessive paging [MA82]. In virtual about its performance. In general, performance becomes an memory system, thrashing may be caused by issue only when the performance becomes perceived as programs or work load that present insufficient being poor by users. Major constraints on systems show , if the working set of a program themselves in form of external symptoms, stress conditions and paging being the chief forms. Thrashing occurs when a or a workload cannot be efficiently held within computer’s subsystem is in a constant state physical memory, then constant data swapping, i.e. of paging, rapidly exchanging data in memory for data on thrashing may occur. Hence, the performance disk, to the exclusion of most application processing. This evaluation of computer system and networks has thus causes the performance of the system to degrade by become a permanent concern of all who use them. multiple orders of magnitude. We developed analytical models with computational algorithms that provide the 1.1. Identifying System Constraints values of desired performance measures. Lyaponuv stability criteria provide sufficient conditions for the Major constraints on system show themselves in the system validity, the result shows that the savings generated form of external symptoms, stress conditions and from the page- based, offsets the additional time needed to process the increase number of page- faults. paging being the chief forms. The fundamental thing KEYWORDS: Paging, Thrashing, Multiprogramming, that has to be understood is that practically every Interfault, Frames and overhead symptoms of poor performance arises in a system that is congested. For example, if there is a slowdown in a 1. INTRODUCTION system, transactions doing data set activity pile up; there are waits on strings; there are more transactions in the Paging from the layman point of view is a technique system, there is therefore a greater virtual storage used by virtual memory operating system to help demands, there is paging, and because there are more ensure that the data you need is available as quickly transactions in the system, the task dispatcher uses more as possible. When a program needs a page that is not processor power scanning the task chains. We then have in the main memory, the operating system copies the task constraints; transaction class limit is exceeded and required page into memory and copies another page adds to the processor overhead because of retries and so back to the disk. As this process continues, a page on. The result of this is that the system shows heavy use fault occurs when the address of the page being of its resources and this is the typical system stress. requested is invalid and in that case, the application is Thrashing occurs when a computer’s virtual memory usually aborted. It is the page – in rate that is of subsystem is in a constant state of paging, rapidly primary concern, because page – in activity occurs exchanging data in memory for data on disk, to the synchronously [MAD94]. A page – in incurs a time exclusion of most application level processing. This cost for the physical I/O and a more significant causes the performance of the system to degrade or increase in processor usage. Page – in activity slows collapse. Hence, the battle cry for system optimal down the rate at which transactions flow through the performance resonates with a growing and wide spread system, that is, transactions take longer to get through concern to improve the performance of computer the system. If a system does not have enough pages, systems and to fail to do so today will amount to thrashing is a high paging activity, and the inviting catastrophic system failure tomorrow. rate is high. This leads to CPU utilization. In modern computers, thrashing may occur in the paging system 1.2. Memory Queuing (if there is not sufficient physical memory or the disk access time is overly long), or in the communications Multiprogramming is a central part of modern system (especially in conflicts over internal bus operating systems. It allows for several programs access), etc. Depending on the configuration and (jobs), transactions, processes and requests to be algorithms involved, the throughput and latency of a memory resident at the same time, sharing the system resources, such as processors and I/O devices. Most 46 Anale. Seria Informatic ă. Vol. XI fasc. 2 – 2013 Annals. Computer Science Series. 11 th Tome 2 nd Fasc. – 2013

programs do not overlap their I/O and processing magnitude. The intensity of paging activities is activities, so that while one program is making use of measured by page – ins (pages moved from disk to the processor, others are undergoing I/O processing. memory), and page – outs (pages moved from Therefore, multiprogramming leads to a better use of memory to disk), as exhibited in figure 1. Page – ins the system resources, by keeping I/O devices running incurs a time cost for physical I/O and a more concurrently with processors. However, due to the significant increase in processor usage, since it has to finitude of real memory, the number of programs in wait until the page transfer completes. As a result, execution may affect system performance. For most present day operating systems use demands example, in virtual memory systems, as the paging, which means that a page of code or data is multiprogramming level increases, the paging and not read into memory until the page is needed. In an swapping activities also increase and may cause a over committed situation, where the number of pages reduction of throughput of productive work. Thus, actively in use exceeds the number of pages of there must be control of the maximum number of physical memory, it becomes necessary to select programs allowed to share memory at the same time. pages that will be removed from physical memory in Usually, operating systems control the overall load by order to make room for new pages. When the paging acting on the degree of multiprogramming. activity starts to increase, the load on the paging device increases and it eventually becomes the 1.3. Problem Descriptions bottleneck. The throughput decreases and eventually goes to zero (Thrashing) and as such, paging activity Paging moves individual pages of processing from should be included in performance model systems. main memory to disk and back, as a result generates The rest of this paper is structured in the following I/O traffic . Each time a page is needed that is not manner: - section (2) discusses the literature review, currently in memory, a page fault occurs. An invalid section (3) focuses on the methodology of the work, page fault occurs when the address of the page being while section (4) analyzes the paging activities, requested is invalid; in this case, the application is implemented the method discussed in the body of the usually aborted and depending on the configuration paper and evaluated the result using Lyapunov and algorithms involved, the throughput and latency stability criteria. Section (5) discusses the summary of a system may degrade by multiple orders of and conclusion of the paper.

Physical memory Disk drive

Address spaces Virtual memory Figure 1: System Paging Activities

2. LITERATURE REVIEW increased complexity with little impact to the end – user’s performance. The drawback here was the key Poor performance of the operating systems can have to focus optimization by identifying performance considerable impact on application performance. To problems. [WW00] implementation and analysis of avoid the operating system from limiting application page based compression in the OS/2 operating system performance, it must be highly current. In the early yielded two results with respect to memory. First, OS 1990’s, researchers identified memory performance with page based compression uses less virtual as critical to system performance. [KGP89] noted that memory than an operating system without page – Cache performance and Coherency are critical based compression. In an unconstrained aspects of hardware which must be taken into environment, the system with page based account. They explicitly advocated that operating compression will generate fewer page faults than the systems must be optimized to meet the demands of system without page based compression, which users for high performance. They pointed out that OS translates into faster performance. Secondly, a system are large and complex and the optimization task is with page- based compression requires more locked difficult and without care, tuning can result in (wired) memory than a system without page based, Anale. Seria Informatic ă. Vol. XI fasc. 2 – 2013 47 Annals. Computer Science Series. 11 th Tome 2 nd Fasc. – 2013

which effectively reduces the amount of memory multiprogramming mix may make this approach available to the rest of the system. This will produce impractical [MLT12]. no notable effect in an unconstrained environment; however, a small reduction in the amount of memory 3. METHODOLOGY will cause an increase in the number of page faults. Also we found that the time saved due to Most modern computer systems have paging disks compression offsets the additional time caused by the that hold the pages of the virtual addresses of the increase in number of page faults. Though page based several processes. As a process page fault rate compression improves the performance of a demand (number of page faults divided by the number of paging operating system, it reduces the amount of memory references) increases, a greater portion of the memory available to the rest of the system which process execution time is spent on the paging device. does not support our effort. [BD06] used the number The load on the paging device affects all processes in of queries being executed concurrently as a measure execution. We shall show how the paging activity can of multiprogramming level, and choose to define be taken into account when modeling computer multiprogramming level as being the number of system. queries in any phase of execution. The drawback here is that data sharing is based on observation that in 3.1. Modeling Paging Activity certain data – based applications, multiple queries may be accessing the same relations concurrently, Let pag be the index of the paging device but this device and there is a high probability that index pages will can also be used to store files other than process address be accessed by these applications. [AEP11] proposed spaces, a process service demand at the paging device a tool for planning and control, costing systems play may be decomposed into two components: one due to a considerable role in providing information needs for p paging (D pag ) and another not due to paging managers. This research seeks to explore such NP organizational factors as organization size, industry (D pag ) which is a given parameter. type, cost structure, the importance of cost information and products and services. This study NP P fails to incorporate other important variables which Dpag= D pag + D pag (1) are likely to influence cost system design. p The problem here is how to estimate (D pag ) . We 2.1. System thrashing overhead can say that average service demand at the paging Overhead denotes hardware resources use by the device due to paging is equal to the average number operating system. It can be viewed as composed of of page faults multiplied by the average disk time to two parts:- a constant one and a variable one. The service a page fault ( S pag ). former corresponds to those activities performed by Hence an OS that do not depend on the level of system load, such as the CPU time handle an I/O interrupts. The Dp = average. number .. of page . faultsχ S (2) variable component of overhead stems from activities pag pag that are closely associated with the system load [ZNQ93]. For example, as the number of jobs in Since the average service time per page fault can memory increases, the work done by memory easily be computed as a function of the disk physical management modules also increases (Thrashing) characteristics and the size of a virtual page, our [MAD94]. Basically, there are two approaches for problem reduces to estimating the average number of representing overhead in performance models. One page faults per process. Let f be the number of page approach uses a special class of the model for frames allocated to a process in maim memory. Let representing the overhead of the OS activities IFT( f ) be a function that gives the inter–fault time performed on behalf of the application programs. There are problems associated with this approach. of a process when f page frames are allocated to the Because of its variable component, the service process. The inter–fault time is defined as the average demands of the special class have to be made load – time between consecutive page faults. So, if we dependent. Thus, whenever the intensity parameters divide the total CPU time of a process (Dcpu ) by the (e.g. multiprogramming level and arrival rate) of the application classes change, the service demands of value of the IFT( f ) function, we obtain the average the overhead class also have to be modified. The number of page faults. The graph of the IFT inter-dependency between overhead parameters and function is monotonically increasing as shown in figure 2. 48 Anale. Seria Informatic ă. Vol. XI fasc. 2 – 2013 Annals. Computer Science Series. 11 th Tome 2 nd Fasc. – 2013

IFT(f) IFT(f)

No. of page frames

Figure 2: Interfault Time versus Number of Page Frames

Initially, the interfault time grows very fast as the process acquires more page frames. However, as the 2 2 number of page frames approaches the size of the p a  a Dpag =[1 +  − ] xS pag (5) process working set, the rate of increase of the NPn/  F interfault time decreases sharply and tends to zero as the number of page frames allocated to the process Therefore, using Eq. (5) in Eq. (1), we get that: approaches its total number of pages. At this point, there are no page faults and the interfault time becomes equal to the total CPU time for the process. 2 2  NP a  a A possible expression for such a function is: Dpag= D pag ++1   −  xS pag (6) NPn/  F 

D IFT() f = cpu (3) Note that Eq. (6) is valid only when the degree of 1+()()af /2 − aF / 2 multiprogramming is such that there are not enough page frames to store the complete address space of all

processes (i.e. when nxF> NP ) . Otherwise, there Where F is the number of virtual pages that compose a process address space and the constant are no page faults and the paging device service “ a ” may be obtained from measurement data. demands due to paging activity are zero. We can now use Eq. (6) to adjust the residence time recurrence Let NP be the total number of page frames to be expression for the paging device in the algorithm divided by all processes and let be the n given in figure (3). multiprogramming level. Assuming that the page frames are equally divided among all processes, each R' ( n ) = (7) will be allocated a number of frames f given by pag

NP/ n . Hence, the service demand due to paging can be written as: a 2 a 2   DNP ++ −  xS  x +− n n  pag1   pag 1 pag () 1  NP/ n  F   D Dp = cpu xS (4) DNP pagIFT() NP/ n pag pag

Using Eq. (3) in Eq. (4), we get that: Anale. Seria Informatic ă. Vol. XI fasc. 2 – 2013 49 Annals. Computer Science Series. 11 th Tome 2 nd Fasc. – 2013

3.2. Throughput Estimation while. error>∈ . do

begin We see that to compute the residence time R' N i, r ( ) compute. Queue . lenghts .. for Non− LD . devices we need the values of the probabilities . Pi ( j−1/ N ) for.. all non− LD . device ... i do forr :1... = to r do & To compute these probabilities we need the values of N −1 k nNI_ r= nN ir + ∑ nN ij the throughputs Xo, r ( N ) , which depend on the () ,() , () Nr i=1 residence time values. We propose the following iterations approach: compute. probabilies ... for LD devices 1. Estimate initial values for the −1 N R throughputs Xo, r ( N ) . These estimates can be   PN0/1++∏j DXN prev − 1 obtained by approximating the throughput by its i() ∑ k=1 ∑ iror , , ()  j=1 i = 1  asymptotic upper bound, namely R k 1  D Xprev N X≤ min ND / , D  . ∑ ir, or , () or, r∑ ir ,() ir , j r=1 i=1 max i  PjNi()/= PoN i() / ∏ k =1 α k Although this is rather loose upper bound, it is i () usually good enough as a starting point for the j=1..., N iteration, compute.Re sidenceTimes . 2. Compute the probabilities Pi ( j−1/ N ) for. i := 1. to . k . do 3. Compute the residence times, 4. Compute new values for the throughputs using for. r := 1. to . R . do N Little’s result as X N = r o, r () k   Dir, 1+ nNI j( − r )  ∑ R'i, r () N i=1 D Ri, r '( N ) = i, r 5. If the relative error between the throughputs

obtained in the current iteration and the previous N   one is greater than a certain tolerance ∈ then Dir, ∑ j/α i() jPj  i ()− 1/ N goto step (2). Otherwise, compute the final j=1 metrics and stop. This approach is specified in details in the form computethroughputs of algorithms in figure (3). curent Nr for.:1. r= to .. R do . Xo, r () N = k 3.3. Algorithms ∑ Ri, r ' i=1 Input parameters: compute. queue . lenghts .. per class DSNkR,, ,,,,∈α ( j ) i, r i for.. all non− LD . device ....:1.. j do for r = toR do initialization : curr nNir,() = X or ,() NxR' ir , () N for. r := 1. toRdo . . & compute. relative . error fori= tok doif D > then .:1... ..i, r 0. Xcurr N− X prev N m N= N/ K i, r() o , r () ir,() r ri , Error := max curr Xo, r () N k prev 1  Prepare for next iteration for. r := 1. toRdo . . . Xor, = min N r /∑ D ir , , () D ir ,  prev curr i=1 max  forr.:1....= toRdoXor, N : = X or , N i ( ( )) Error=∈+ 1;{ force . entering .... loop for first time } End;{} while Iteration loop Figure 3: Multiple Class Algorithms

50 Anale. Seria Informatic ă. Vol. XI fasc. 2 – 2013 Annals. Computer Science Series. 11 th Tome 2 nd Fasc. – 2013

4. IMPLEMENTATION devices we use the normal residence time equation, In this example, the residence time equation becomes: Example: A computer system has one CPU and two 2 2 disks, ( (disks:.. D1 and disk : D 2) , disk . D 2 is used      4000 4000   Rn'D ( )=+ 1 n  −   xxnn 0.03 1 +−pag () 1 for paging purposes only. The service demands at the 2   10,240   2048   CPU and at disk D1 are 0.02 and 0.06 seconds 2   respectively. The paging disk has an average service =+()1 0.153n − 3.81 xxnn 0.03 1 +−pag () 1  time equal to 30msecs. The system has 10Mbytes of forn.> 5 = 10,240/2048 memory available for paging. The size of the page frame is equal to 1024 bytes. The address space of forn.≤ 5 each process is equal to 2 Mbytes. The constant “ a ” n=1,..., N for the interfault time function is assumed to be equal R' n = 0 to 4000. Plot a graph of the throughput versus the D2 () multiprogramming level. We need to compute the mode parameters from the The resulting throughput graph is given in figure 4. It problem data. The number of virtual pages per predicts the expected result, namely that at the address space F, is equal to beginning, the throughput increases with the degree 2x 1024 x 1024 /1024= 2048. pages . of multiprogramming. After the paging activity starts The total number of page frames is equal to to increase, the load on the paging device increases NP=10 x 1024 x 1024/1024 = 10,240. pages and eventually becomes to the bottleneck. The NP throughput decreases and eventually goes to zero. Also D2 = 0 since the paging disk is used for This phenomenon is called thrashing. paging only and S pag = 0.03sec . We Can now use the Algorithms and compute the throughput for various values of the degree of multiprogramming n. we need to use Eq. (7) for the residence time of the paging disk. For the other

throughput(job/sec) throughput(job/sec)

Multiprogramming

Figure 4: Throughput versus Multiprogramming Level

Anale. Seria Informatic ă. Vol. XI fasc. 2 – 2013 51 Annals. Computer Science Series. 11 th Tome 2 nd Fasc. – 2013

4.1. System Evaluation By expanding this matrix equation, we obtain three simultaneous equations as follows: For a given system, stability is usually the most −2p12 = − 1 important thing to be determined. Lyaponuv stability p− p − p = 0 criteria is the most general method for the 11 12 22 determination of stability of non- linear / or time – 2p12− 2 p 2 = − 1 varying systems [B+06]. Lyaponuv presented two solving. for ., p p , p .. we obtain methods (first and second methods) for determining 11 12 22 the stability of dynamic systems described by 3 1  .  ordinary differential equations. The first method p11. p 12  2 2 consists of all procedures in which the explicit form   =   p21. p 22  1  of the solutions of the differential equations is used .1  for the analysis. The second method does not require 2  the solutions of the differential equations. Therefore, To test the positive definiteness of P , check the the second method is quite convenient for the determinants of the successive principal minors. stability analysis of non – linear systems, for which Q= −( A* P + PA ) is positive definite the exact solution may not be obtained because 3 solving non-linear / or time- varying state equations Q1 = > 0 is usually very difficult. Lyaponuv provides a 2 sufficient condition for asymptotic stability and this 3 1  theorem is stated as follows; Clearly p. p  .  11 12 2 2   =  > 0 p. p 1 4.2. Theorem 21 22  .1  2  Suppose that a system is described by Clearly, P is positive definite. Hence the equilibrium x& = f( xt, ) state at the origin is asymptotically stable in the large and the Lyaponuv function is: w h ere f()0 , t = 0 1 2 2 vxp='x =() 32 x1 + xx 12 + 2 x 2 for. all . t 2 2 2 v& = −() x1 + x 2 If there exist a scalar function v x, t having ( ) continuous, first partial derivatives and satisfying the If the page fault rate gets below the computed range, following conditions: then the process losses a frame and if the page fault 1. v( x, t ) is positive definite, rate gets too high, the process gains a frame (see figure 5), and this results in fewer page – faults.

2. v&( x, t ) is negative definite, Therefore, our approach has been able to evaluate the Then the equilibrium state at the origin is effectiveness of the analytical model which is uniformly asymptotically stable. composed of a set of formulas and / or computational We shall consider the second order system algorithms. Interestingly, we find that the savings described in the paper by: generated from page- based offsets the additional time needed to process the increase number of page - x&10.1   x 1 =    and we shall determine the faults. x&2−1. − 1   x 2 stability of this state. Let us assume a tentative Lyaponuv function, v( x) = xPx′ wherePistobe.....det er min ed . from : AP′ + PA = − 1 or

0.1−p11. p 12  p11. p 12   0.1 − 1.0   +   = 1.1−p21 . p 22 p21. p 22   −−− 1101

52 Anale. Seria Informatic ă. Vol. XI fasc. 2 – 2013 Annals. Computer Science Series. 11 th Tome 2 nd Fasc. – 2013

Number of page faults of faults page Number

No. of Frames

Figure 5: Page Fault Frequency

5. CONCLUSION The measure of load is given by the average number of batch jobs that are concurrently in execution and When a program needs a page that is not in the main this number is called the average degree of memory, the operating system copies the required multiprogramming and is denoted by N . page into memory and copies another page back to the disk and this is referred to as paging. It is the Notations page – in rate that is of primary concern because a K= number of devices page – in incurs a time cost for the physical I/O and a R = number of classes of customers, more significant increase in processor usage. N= class. r . population Depending on the configuration and algorithms r involved, the throughput and Latency of a system Si, r = average service time of class r – customers at may degrade by multiple orders of magnitude. Hence, device i the battle cry resonates with a growing and wide D = average device demand of class r- customers spread concern for effective system performance. In i, r this paper, analytical model, composed of a set of at device iD; ir,= VS ir , ir , formulas and computational algorithms provides the Ri, r = average response time per visit of class r values of desired performance measures as a function customers to device , of the set of values of the performance parameters. i Lyaponuv stability criteria provides a sufficient R 'i, r = average residence time of class r customers at condition for the system validity. Interestingly, the device i, result shows that the savings generated from the n = average number of class r customers at device page- based, offsets the additional time needed to i, r process the increase number of page- faults even in a i, memory constrained environment. X i, r = class r throughput at device i,

Terminologies REFERENCES The term workload designates all the processing requests submitted to a system by the user [AEP11] Taha Ahmadzadeh, Hossein Etemadi, community during any given period of time. Ahmed Pifeh - Exploration of Factors Throughput is the number of jobs that can be Influencing on Choice of the Activity – completed per unit time. Based Costing System in Iranain Residence time is the time spent by the job once it is Organization , International Journal of loaded into memory. Anale. Seria Informatic ă. Vol. XI fasc. 2 – 2013 53 Annals. Computer Science Series. 11 th Tome 2 nd Fasc. – 2013

Business Administration, Vol. 2, No. 1, [MAD94] Daniel Menasce, Virgilo A.T. 2011. Almelda, Larry W. Dowdy - Capacity [BD06] Haran Boral, David J. Dewitt - A Planning and Performance Modeling , Methodology for DtataBase System Prentice Hall, Inc. A Simon and Performance Evaluation , Computer Scheater Company, Eaglewood Cliff, Science Department, University of new Jersey 07632, pages 64 – 210, 1994. Wisconin Madison, 2006. [MLT12] Heng Ngee Mok, Yeow Leong Lee, [B+06] Qi Bi, Pi-Chun Chen, Yang Yang, Wee Kiat Tan - Setting up a Low-Cost Qinqing Zhang - An Analysis of VOIP Lab Management System for a Services Using IXEV – DO revision of a Multipurpose Computing Lab Using system , IEEE Journal on Selected Areas Virtualization Technology , Australasian in Communications, Vol. 24, No. 1, Journal of Educational Technology, pages 36 – 45, 2006. 28(2), pp. 266 – 278, (2012).

[KGP89] R. Katz, G. Gibson, D. Patterson - [WW00] Allen Wynn, Jie Wu - The Effect of Disk System Architecture for High Compression on Performance in a Performance Computing , Proceedings of Operating Systems and the IEEE, Vol. 77, No. 12, Dec. 1989. Software , The Journal of Systems and Software, 50 (2000), pp. 151 – 170. [MA82] D. A. Menasce, V. A. Almeida - Operational Analysis of Multiclass [ZNQ93] Xiaodang Zhang, Nagas Nalluri, Systems with Variable Multi- Xiaohan Qin - A Tool for Monitoring programming Level and Memory and Visualizing Min-Based Multi- Queuing . Computer Performance, Vol. processor Performance , Journal of 3, No. 3, Sept, 1982. Parallel and Distributed Computing, 18, pp. 231 – 241, 1993.