<<

Unix performance benchmarking

Isolating application performance issues Establishing performance benchmarks Why bother?

• Identify issues with shared resources • Understanding your SAS processes • Helping System Administrators don’t understand SAS • Prove that something is wrong Measurements

• Disk Read/ (I/O) • Memory • CPU SAS vs non-SAS

• Disk IO – Not many options within SAS code (bufno, bufsize) • Memory – no Unix equivalent to some SAS features (realmemsize, sortsize) • CPU – SAS threaded kernel too complex to replicate in Unix Basic SAS Disk IO testing options fullstimer ; /* Write performance */ libname outlib ‘/disk_output_path’ ; data outlib.mybigfile ; do n = 1 to 100000 ; randnum = ranuni(0) ; output ; end ; run ; /* Read performance */ data _null_ ; set outlib.mybigfile ; run ; Basic Unix IO testing

$ /path_to_big_file /path_to_write_location

Or

$./iotest.sh –t /path_to_write_location Disk caching

• Most Unix servers are 64bit and have lots of memory. • Disk IO is always a bottleneck • Modern Unix uses spare memory instead of disk • Performance is always great after a reboot • Caching distorts IO performance measurement Real Unix IO testing

• Disable caching (Linux) • Lots of concurent processes • Large files (20Gb) created by each process • Capture results systems

• Local (ext4) • Shared (gfs2, gpfs) • External (xfs) – includes Fusion IO • Temp (tmpfs)

On Linux, use –hT Typical results Minumum SAS requirements

• ETL – 50-75MBs/sec per CPU core • Adhoc analytics – 15-25MBs/sec per CPU core • SASWORK - 50-75MBs/sec per CPU core Comparing channels Running tests

• Schedule a quiet • Start small : 3 concurrent tests of 2Gb against /saswork: $ ./iotestv1.sh -i 3 -t /saswork -b 15625 -s 128

• $ df /write_location_mountpoint • $ –s /path_to_write_location • Clean up afterwards: $ /saswork - f -name 'iotest*' -user $USER -exec -f {} \; 2>&1 | -v 'Permission denied' Gathering results

Example script output in listing dc5cad,07Jan2014:00:09:37,3,64,312500,60,30. 17,/saswork,iotestv1.sh-writetest.out.2 dc5cad,07Jan2014:00:09:37,3,64,312500,60,2.0 2,/saswork,iotestv1.sh-readtest.out.2 Processing results

hostname streams blksize blocks target mode dtime iteratn filesz elapsed thruput dc5cad 3 64 312500/saswork R 07JAN14:00: 1 20000000 62.04 322372.66 09:37 dc5cad 3 64 312500/saswork R 07JAN14:00: 2 20000000 62.02 322476.62 09:37 dc5cad 3 64 312500/saswork R 07JAN14:00: 3 20000000 64.36 310752.02 09:37 dc5cad 3 64 312500/saswork W 07JAN14:00: 1 20000000 90.18 221778.66 09:37 dc5cad 3 64 312500/saswork W 07JAN14:00: 2 20000000 90.17 221803.26 09:37 dc5cad 3 64 312500/saswork W 07JAN14:00: 3 20000000 48 416666.67 09:37 dc5cad 3 128 156250/saswork R 07JAN14:00: 1 20000000 50.15 398803.59 11:48 dc5cad 3 128 156250/saswork R 07JAN14:00: 2 20000000 50.1 399201.6 11:48 dc5cad 3 128 156250/saswork R 07JAN14:00: 3 20000000 50.03 399760.14 11:48 dc5cad 3 128 156250/saswork W 07JAN14:00: 1 20000000 79.7 250941.03 11:48 dc5cad 3 128 156250/saswork W 07JAN14:00: 2 20000000 79.13 252748.64 11:48 dc5cad 3 128 156250/saswork W 07JAN14:00: 3 20000000 79.14 252716.7 11:48 dc5cad 4 64 312500/saswork R 07JAN14:00: 1 20000000 70.21 284859.71 14:49 Results for interpretation and analysis

At this point, the testing has resulted in a set of results. The results are groups of tests, varying by: – The server used for execution; – How many concurrent streams were executed; – The block size of the files being transferred; – Whether the data were being written to or read from disk. File size is constant (20 GB). Information available for analysis is elapsed time. Diagrammatically, it looks like this: 20 20GB GB 80 20 Seconds 20GB GB20 20GB GB • If we have two concurrent streams, each reading 20 GB, in 80 seconds, our throughput is logically 40 GB in 80 seconds, or .5 GB / second (500 MB / second). • In the case of four concurrent streams, it would be 80 GB in 80 seconds, or 1 GB / second. Therefore, for each group, we must collapse the iterations, selecting the MAX of elapsed time, the SUM of the data volumes (individually always 20 GB in our case), and keeping the host name, number of streams, block size, and mode as classification variables. And we need to calculate Throughput as Volume / Elapsed, to use as the analysis variable in our graphs. Impact of block size, 64 KB and 128 KB Duration increases with concurrent streams Throughput is independent of number of concurrent streams

Reading Writing data data Throughput reading data is much higher than writing Does server performance vary?

Reading Writing data data

References

• Margaret Crevar’s definitive guide to performance tuning: http://support.sas.com/rnd/papers/sgf07/sgf2 007-iosubsystem.pdf Authors

• Tom Kari: [email protected] • Andrew Farrer: [email protected]

Acknowledgements • Dan Gelinas, IBM Canada for deep insights into the filesystem cache • Clifford Myers, SAS Institute. The original author of iotest.sh