<<

versus .Net: A Comparative Study of Performance for Distributed Processing

Gregory J Blajian Maurice Eggen *Roger Eggen Gerald Pitts Computer and Information Sciences Department of Computer Science University of North Florida Trinity University Jacksonville, FL USA San Antonio, TX USA Phone 904.620.1326 Phone 210.999.7487 Fax 904.620.2988 Fax 210.999.7477 [email protected] [email protected]

Abstract: has released .NET, a platform so that subsequent executions are at native dependent standard for the # programming code speed. At a higher level, many languages have language. Sponsored by /, Mono, the that target MSIL for execution in the CLR open source development platform based on the .NET whereas the JRE is essentially a Java only framework, has been developed to be a platform environment. This paper compares the cross-platform independent version of the C# programming CLR implementation, Mono, with Microsoft’s CLR environment. While .NET is platform dependent, as applied to parallel distributed computing. Mono allows developers to build and cross- platform applications. Mono’s .NET implementation is based on the ECMA standards for C#. This paper 2 Fundamentals examines both of these programming environments with the goal of evaluating the performance As with any application performance study, characteristics of each. Testing is done with various a reasonable set of performance criteria must be algorithms. We also assess the trade-offs associated established in the beginning. This involves the with using a cross-platform versus a platform selection of hardware platform, the operating systems dependent implementation. to be used and the performance measuring code requirements. The hardware selection was simply based on reasonable availability. The research Keywords: distributed processing, algorithm facility is a cluster of six Dell personal computers. [4] efficiency, native code Each personal computer had an Intel Pentium III 550MHz processor, the same motherboards, BIOS’s, 1 Introduction the same built in Network Interface Cards, and either 128MB of RAM or 256MB of RAM. The study was This research compares the relative not meant to test memory management or memory efficiency of Mono, [1] a cross-platform performance so the differences in available memory implementation of C# with Microsoft’s .Net would does not impact the results in any significant environment. [2] Microsoft introduced .Net and the or measurable way. , CLR, as a runtime The next choice was that of operating environment similar to Sun’s Java Runtime systems. The choice of ™ XP Environment, JRE. [3] CLR is an environment that Professional [5] with Service Pack 2 seemed the accepts an intermediate language, MSIL, which is appropriate choice for one of the two sets of tests interpreted at execution time similar to the byte code since it is the incumbent on which used by JRE. the first implementation of the CLR runs and for While both languages use an intermediate which the CLR was designed. The expectation was language, there are some distinct differences between that the Microsoft implementation of the CLR should Microsoft’s CLR and Sun’s Java , easily outperform any CLR implemented by a JVM. In Microsoft’s implementation, the MSIL is competitor. The choice of alternative operating actually compiled to managed native code on the first systems was Fedora Core 4 Test 3. [6] Arguably, this choice may not have been the best of those Linux operating systems available, but the choice seemed 4 O(n log2 n) quite reasonable given the strong ties to Redhat’s previous commercial offerings. [7] The choice of a linux operating system was not expected to impose Interestingly, an O(n log2 n) algorithm does any penalties on the study due to the performance not benefit nearly as much from creating subsets of characteristics of the operating system. After all, the the problem set for a distributed environment. In operating systems would be run on the identical fact, the contribution of the merge sort to reconstitute hardware and were expected to perform in a similar a distributed O(n log2 n) sort can be significant in the manner regardless of the operating system installed. overall processing time required for even reasonably A variety of algorithms exhibiting varied sized problem sets. This must be kept in mind when performance characteristics were chosen to challenge viewing the results, but it does not invalidate the the distributed capabilities of each environment. original assumption that the amount of time required Using data sets of varying size and algorithms to move the data between machines would be a requiring different execution times tests reasonably close approximation of the time required communication as well as runtime efficiency. A to perform the heap sort on each subset of the overall bubble sort algorithm, O(n2), requires more processor problem set. O(n log2 n) to sort dominates O(n) to time and correspondingly less communication time. merge. The second algorithm was chosen to more evenly align compute and data transfer times. A heap sort 5 O(n) algorithm running in O(n log2 n) was used. The last algorithm is a simple exhaustive search algorithm, O(n) that would place more burden on the network The exhaustive search algorithm, O(n), has a performance than on processors performance. somewhat interesting problem when in a distributed environment. The problem set is only required to 2 move one way and the result set is expected to be 3 O(n ) significantly smaller than the outbound set. Like the O(n log2 n) algorithm, this does not impact the results If an O(n2) algorithm receives double the or invalidate the initial assumption that more time amount of data then execution time increases four would is required to marshal and un-marshal the times, thus the notation O(n2). In a distributed problem set across the network than is required to environment, O(n2) algorithm will take more perform the algorithm. execution time when the problem size, n, grows larger; however, if the problem is split either to 6 Results execute in separate threads on a symmetric multiprocessor (SMP) or to execute as a on separate machines, the problem no longer grows in a The machines used in this study were strictly O(n2) manner. For instance, let us assume a configured in the fashion indicated in Figure 6.1. problem of size n is run in a single thread on a singe Those objects marked S indicate the Servers, the one machine; the execution time will take O(n2). Now marked C is the Client machine and the one marked assume the same problem set is divided by k Sw is the 10/100Mbps switch. All the NICs in the processors, where each subset is distributed, and the machines were 100Mbps and operated at this same algorithm is applied to each subset. One can easily see that the time is now k * (n/k)2 plus the time to distribute and to reconstitute the full resulting set. For the purposes of the study the resulting set is reconstituted using an ordered merge sort in O(n) time. At this point it becomes difficult to determine how much performance benefit is gained simply by decreasing the size of each calculated element and how much is due to the CLR environment in which it is running. However, both environments are organized in exactly the same manner so that the relative performance characteristics are meaningful.

Linux O(n^2)

14:24.00 12:57.60 11:31.20 1024 2048 10:04.80 4096 08:38.40

e 8192

m 07:12.00 i

T 16384 05:45.60 32768 04:19.20 65536 02:52.80 131072 01:26.40 00:00.00 18 15 12 9 6 3 1 Number of Threads

Figure 6.1.2

negligible increases in time required for 6 or more Figure 6.1 threads. It is only when the larger array sizes must be 2 sorted with 3 or fewer threads that the expected n speed for both sets of tests. The client machine behavior becomes clearly evident. shared the same hardware characteristics as the Figure 6.1.3 contains the results for the same servers. environment using the O(n log2 n) algorithm.

6.1 The Mono Results Linux O(nLogn)

00:05.18

The first sets of tests to be conducted were 1024 00:04.32 the Linux tests using Mono. Fedora Core 4 Test 3 2048 operating system was on each server and the client in 00:03.46 4096 8192

the default package mode for a personal desktop. e

m 00:02.59 16384 i

Next, the Mono CLR environment, version 1.1.7.1, T 32768 was installed on all seven machines. Figure 6.1.2 00:01.73 2 65536 shows the results obtained running the n algorithm in 131072 00:00.86 the Mono CLR on the Fedora Core 4 Test 3 262144 machines. 00:00.00 524288 One may note that even as the problem set 18 15 12 9 6 3 1 1048576 size gets to the reasonably large value of 131072 Number of Threads integers, all of the problem sizes exhibit Figure 6.1.3 The O(n log2 n) graph shows a somewhat expected behavior as a result of the merge sort combined with the network and algorithm processing requirements for each machine. Note that the times required actually decrease as the number of threads approaches the number of physical machines and that for the smaller array sizes even beyond that point. The smaller array sizes actually suffer a significant amount in time lost due to the splitting and reconstitution of the array. However, as the array size approaches a significant value the benefit of the distributed model begins to itself. Figure 6.1.4 shows the results for the same environment using the O(n) algorithm. It is certainly not to be unexpected that the O(n)

Windows O(n^2) Linux O(n) 05:02.40 00:01.04 04:19.20 1024 1024 00:00.86 2048 03:36.00 2048 00:00.69 4096 4096 8192 02:52.80 e e 8192 m m 00:00.52 16384 i i T T 16384 32768 02:09.60 00:00.35 65536 32768 01:26.40 131072 65536 00:00.17 262144 00:43.20 131072 00:00.00 524288 18 15 12 9 6 3 1 1048576 00:00.00 Number of Threads 18 15 12 9 6 3 1 Number of Threads Figure 6.1.4 Figure 6.2.1 algorithm will display near linear behavior in a negative direction since one would not expect to gain any benefit from either splitting an array up for O(n) Figure 6.2.2 contains the results of the .Net CLR on processing via threads or via threads in combination Windows for the O(n log2 n) algorithm. with a distributed system. That being said, some benefit may be gained if the array size is large enough that the time required to process each subset Windows O(nLogn) of the array begins to outweigh the cost of transferring the array across the network. This would 00:03.46 seem to be born out for array sizes 524288 and more 00:03.02 1024 so for array size 1048576. 00:02.59 2048 4096 00:02.16 6.2 The .Net Results 8192 e

m 00:01.73 16384 i T As one would expect the results for the 00:01.30 32768 windows set of tests were expected to be significantly 65536 00:00.86 better than the results indicated in the Mono CLR 131072 environment and the results of the .Net CLR tests 00:00.43 262144 certainly seemed to bear that out at first glance. 00:00.00 524288 Unfortunately, things are not always what they seem. 18 15 12 9 6 3 1 1048576 2 The results from the O(n ) algorithm on the Windows Number of Threads operating system using the .Net CLR are illustrated in Figure 6.2.1. Figure 6.2.2

And lastly, Figure 6.2.3 contains the results of the O(n) algorithm. Windows O(n) Linux versus Windows (Native)

12:00:01 AM 900 800 12:00:01 AM 1024 700 12:00:01 AM 2048 600 4096

12:00:00 AM e 8192 500 Linux m e i

T 400 Windows m 12:00:00 AM 16384 i T 300 12:00:00 AM 32768 65536 200 12:00:00 AM 131072 100 12:00:00 AM 262144 0 1024 2048 4096 8192 16384 32768 65536 131072 12:00:00 AM 524288 18 15 12 9 6 3 1 1048576 Size of Data Set Number ofThreads

Figure 6.2.3 Figure 6.3.1

Please note the vast difference indicated in You may note that the results look quite a bit the two operating systems even utilizing native code. different when compared to the Linux results using At 131072 integers in the array the Fedora Core 4 the Mono CLR. The O(n2) algorithm is pretty Test 3 operating system performed more than 3 times similar, but a look at the actual numbers unveils a slower than the exact same code in Windows. significant difference. At this point a suspicion may be dawning that things are not as they should be for this comparison. After all the times required for the 6.4 Combined Results O(n) algorithm on Windows are completely linear whereas they were not on the Linux platform. The The following three figures indicate the O(n log2 n) results while they are reasonably similar direct comparison between .Net and Mono. In the are again quite a bit faster. This leads to some figures, N = .Net and M = Mono. Figure 6.4.1 speculation that there may be other factors at work. contains the O(n2) results, while figure 6.4.2 contains It is also important to note that there was more the O(n log2 n) results and finally figure 6.4.3 difference in the results than these graphs indicate. contains the O(n) results. The chart indicating the The Linux machines all demonstrated a much quicker direct comparison N = .Net and M = Mono. propensity to go to the hard drive cache/swap space as the problem size grew which led to wild .Net vs Mono O(n^2) 1024 N fluctuations in the results. These results were 1024 M discarded as not pertinent since it was not intended 14:24.00 2048 N for the study to validate the efficiency of the memory 12:57.60 management, but the effect could not be entirely 2048 M 11:31.20 ignored either. 4096 N 10:04.80 4096 M 08:38.40 8192 N

6.3 Native Code e

m 07:12.00 8192 M i T 05:45.60 16384 N Since the results seemed to indicate that the 16384 M .Net CLR outperformed the Mono CLR written and 04:19.20 32768 N supported by operating system vendor Novell [8] by 02:52.80 32768 M such a wide margin the suspicion led to the native 01:26.40 65536 N code testing of each operating system. Windows 00:00.00 was tested using a bare O(n2) bubble sort algorithm 18 15 12 9 6 3 1 65536 M 131072 N written in C/C++ and compiled using Microsoft Number of Threads 131072 M Visual Studio 6 in release mode. The Fedora Core 4 Test 3 operating system was tested using the exact Figure 6.4.1. same code compiled using GCC, version 3.4.4, without debugging. The results are indicated in Figure 6.3.1. 1024 N to .Net. In fact, if one was considering such a move .Net vs Mono O(nLogn) 1024 M they may get even more performance than they may 2048 N otherwise expect. 00:05.18 2048 M If one is familiar with Java and RMI, the Java 00:04.32 4096 N equivalent of .Net/Mono distributed processing, it is 4096 M noteworthy to point out that the Mono/.Net was 00:03.46 8192 N easier to implement than the same functionality in 8192 M e Java.

m 00:02.59

i 16384 N T 16384 M 00:01.73 32768 N 8 Future Directions 32768 M 00:00.86 65536 N It is easy to argue that these results more 65536 M 00:00.00 than justify further study. Of the free Linux 18 15 12 9 6 3 1 131072 N operating systems available, one could try Debian, Number of Threads 131072 M Ubuntu, Knoppix, FreeBSD, or another not listed 262144 N here, possibly even one or more commercial variants of linux, such as Suse/Novell or Redhat. One should Figure 6.4.2 consider the direct comparison of native code speeds before pursuing CLR. 1024 N Another possible direction in this would be .Net vs Mono O(n) 1024 M to compare both the Mono CLR versus .Net CLR on 2048 N 12:00:01 AM the Windows operating system. Mono will run on 2048 M Windows, by doing so will eliminate operating 12:00:01 AM 4096 N system impact. 4096 M A third possible study would entail a 12:00:01 AM 8192 N 8192 M comparison of the Mono CLR with Java. Both could e

m 12:00:01 AM be run on a Linux platform and neither would i 16384 N T 16384 M necessarily gain an operating system advantage. It 12:00:00 AM 32768 N may even be reasonable to expect that one could 32768 M 12:00:00 AM compare the two in the same clustered environment 65536 N as this study did. 12:00:00 AM 65536 M 18 15 12 9 6 3 1 131072 N Number of Threads 131072 M 9 References 262144 N Figure 6.4.3 [1] What is Mono? (31 Jan 2006). Retrieved February 16, 2006, from http://www.mono- 7 Conclusions project.com/Main_Page

[2] .NET Highlights. (December 27, 2005). So where does this leave the results as a Retrieved February 20, 2006 from function of the study? Clearly the use of the Fedora http://www.microsoft.com/net/default.mspx Core 4 Test 3 operating system severely handicapped the Mono CLR as a test environment. It would seem, [3] Java.sun.com: The Source for Java Developers. however, that not all is as bad as it may at first (2006). Retrieved February 24, 2006 from appear. Looking at the time required to run the O(n2) http://java.sun.com/ algorithm in the Mono CLR with 131072 integers in the array, 774.64s, and comparing that to the length [4] Dell Products and Services. (2006). Retrieved of time it took to run the same number of integers February 1, 2006 from http://www.dell.com/ using native code, 825.14s, it would seem that the

Mono CLR is indeed MORE efficient than even [5] Windows XP Editions. (226). Retrieved February native code. Though the operating system 1, 2006 from handicapped the results such that truly direct http://www.microsoft.com/windowsxp/default.mspx comparisons seem unreasonable to take, there is certainly no reason, given the results of Mono versus

Native code, to discount Mono as a viable alternative [6] Fedora. (March 20, 2006). Retrieved February 21, 2006 from http://fedora.redhat.com/

[7] Red Hat. (March 20, 2006). Retrieved February 21, 2006 from http://redhat.com/

[8] Novell: Software for the Open Enterprise. (2006). Retrieved February 19, 2006 from http://www.novell.com/