Pontificating on Profiling Lisa Hagemann VP Engineering, Dyn Inc. twitter.com/lhagemann twitter.com/dyninc What is Profiling?

A way to evaluate what is your program doing.

Commonly evaluates the behavior of the program, measuring frequency and duration of different call paths including the call paths.

Used to evaluate areas ripe for optimization. What is Benchmarking?

Defining a measurement for comparison

Benchmark time, memory, database calls

Provides data, not answers TMTOWTDI

There’s More Than One Way To Do It

What’s the Best way to do it?

use Benchmark; use Benchmark;

perldoc Benchmark

Built in module which encapsulates a number of routines to help you figure out how long it takes to execute some code. 1 #!/usr/bin/env perl use Benchmark; 2 Built in module 3 #List::Util::first() is slower than a for loop: encapsulates a number of 4 5 use Benchmark qw(:all :hireswallclock); routines to help you 6 use List::Util qw(first); figure out how long it 7 takes to execute some 8 my @list = 1..100; code. 9 10 my $results = timethese(1_000_000, { timethese (COUNT, 11 'first' => sub { CODEHASHREF,[STYLE]) 12 my $f; Time COUNT iterations of 13 $f = first { $_ == 5 } @list; 14 return $f; CODEHASHREF. 15 }, 16 'loop' => sub { cmpthese ( COUNT, 17 my $f; CODEHASHREF, [ STYLE ] ) 18 for (@list) { or 19 if ($_ == 5) { cmpthese 20 $f = $_; ( RESULTSHASHREF, 21 last; [ STYLE ] ) 22 } Uses timethese (or the 23 } 24 return $f; results of a timethese() 25 }, call and outputs in a 26 }); comparison table 27 28 cmpthese($results); $ perl simpleloop2.pl Benchmark: timing 1000000 iterations of first, loop... first: 1.59767 wallclock secs ( 1.48 usr + 0.01 sys = 1.49 CPU) @ 671140.94/s (n=1000000) loop: 1.08002 wallclock secs ( 0.92 usr + 0.01 sys = 0.93 CPU) @ 1075268.82/s (n=1000000)

Rate first loop first 671141/s -- -38% loop 1075269/s 60% --

output from timethese() is the default style ‘auto’ key of coderef followed by the times for 'wallclock' time, user time, and system time followed by the rate output from cmpthese() gives us a comparison chart sorted from slowest to fastest, and shows the percent speed difference between each pair of tests. Things to consider

Focus on code that will be executed the most (think loops)

Are there expensive comparisons/computations that can be cached sensibly?

Are there chains of comparisons that aren't optimized statistically?

Unnecessary sorting?

Are you reinventing the wheel? A simple text parsing script

Uses a package for creating objects

Simple parsing of a zone file into DNS records: hash ‘em if we know how

20K lines to parse 1 #!/usr/bin/env perl -l use Benchmark; 2 use strict; 3 use warnings; Built in module encapsulates a number of routines to help 4 use RR; 5 use Net::DNS::RR; you figure out how long it takes to execute some code. 6 use Benchmark qw(:hireswallclock); 7 new(): 8 my $t0 = new Benchmark; 9 while (my $line = <>) { Returns the current time as an object the Benchmark 10 chomp($line); 11 methods use 12 # Ignore blank lines 13 next unless $line; timediff( T1 , T2 ): A Benchmark object 14 15 my $obj = RR->new($line); representing the difference between two Benchmark times, 16 17 # Generate Net::DNS::RRs suitable for passing to timestr(); 18 my $rr; 19 if ($obj->as_hash()) { timestr ( TIMEDIFF, [ STYLE, 20 $rr = Net::DNS::RR->new($obj->as_hash()); 21 } else { [ FORMAT ] ] ): returns a string in the 22 $rr = Net::DNS::RR->new($obj->as_string()); 23 } requested format suitable for printing. Format defaults to 24 } 25 my $t1 = new Benchmark; ‘%5.2f’. 26 my $runtime = timestr(timediff($t1,$t0)); 27 28 print "Zone parse time: $runtime"; Benchmark

$ perl zoneparse.pl zone.com.txt Zone parse time: 4.61972 wallclock secs ( 4.55 usr + 0.02 sys = 4.57 CPU)

4.61 seconds to process the file

Good? Bad? Can it be better? Profiling Packages

Devel::DProf

Built in, produces an output file, utility to format that

watches subroutine calls noting elapsed time

totals each run into a total time spent in the subroutine

Devel::SmallProf

Install from CPAN

Human readable output file, clunky for programs with imported libraries

Devel::NYTProf Devel::NYTProf http://search.cpan.org/~timb/Devel-NYTProf-4.06/

Devel::NYTProf from CPAN is a powerful, fast, feature-rich perl source code profiler*

Statement and Subroutine profiling showing Inclusive and Exclusive Time

Inclusive includes time spent in subroutines called from within another subroutine

Handy report HTML generator Run the script with Devel::NYTProf

$ perl -d:NYTProf zoneparse.pl zone.com.txt

-d flag starts the debug mode which is shorthand for -MDevel::

loads the module Devel::NYTProf before running the provided script

produces nytprof.out file

Adds a little overhead $ nytprofhtml -o ./nytprof_run1 -f ./nytprof_run1.out --open Reading ./nytprof_run1.out Processing ./nytprof_run1.out data Writing sub reports to ./nytprof_run1 directory 100% ... Writing block reports to ./nytprof_run1 directory 100% ... Writing line reports to ./nytprof_run1 directory 100% ...

nytprofhtml generates HTML report

Useful flags for keeping multiple runs

-f --file: file name to use; defaults to ./nytprof.out

-o --out: the output directory to place all the html files

1 #!/usr/bin/perl -l 2 3 use strict; 4 use warnings; 5 6 use RR; 7 use Net::DNS::RR; 8 9 use Benchmark qw(:hireswallclock); 10 11 my $t0 = new Benchmark; 12 13 while (my $line = <>) { 14 chomp($line); 15 16 # Ignore blank lines 17 next unless $line; 18 19 my $obj = RR->new($line); 20 21 # Generate Net::DNS::RRs 22 my $rr = Net::DNS::RR->new($obj->as_string()); 23 } 24 25 my $t1 = new Benchmark; 26 my $runtime = timestr(timediff($t1,$t0)); 27 28 print "Zone parse time: $runtime"; Benchmark

$ perl zoneparse2.pl zone.com.txt Zone parse time: 2.03943 wallclock secs ( 2.01 usr + 0.01 sys = 2.02 CPU)

4.61 seconds down to 2.03 seconds

> 50% speed up! Any others?

$ perl -d:NYTProf zoneparse2.pl zone.com.txt Zone parse time: 10 wallclock secs ( 9.43 usr + 0.03 sys = 9.46 CPU)

$ nytprofhtml -o ./nytprof_run2 -f ./nytprof_run2.out --open Reading ./nytprof_run2.out Processing ./nytprof_run2.out data Writing sub reports to ./nytprof_run2 directory 100% ... Writing block reports to ./nytprof_run2 directory 100% ... Writing line reports to ./nytprof_run2 directory 100% ...

Net::DNS

presentation2wire name2labels wire2presentation

stripdot

Net::DNS::RR::new_from_string 1 #!/usr/env/bin perl -l 2 3 use strict; 4 use warnings; 5 6 use RR; 7 use Net::DNS; 8 use Net::DNS::RR; 9 use Benchmark qw(:hireswallclock); 10 11 sub stripdot { 12 my ($str) = @_; 13 #Replace any period at the end of a label that is not escaped by '\' 14 $str =~ s{(?) { ... 33 } 34 35 my $t1 = new Benchmark; 36 my $runtime = timestr(timediff($t1,$t0)); 37 print "Zone parse time: $runtime"; Benchmark

$ perl zoneparse3.pl zone.com.txt Subroutine Net::DNS::RR::stripdot redefined at zoneparse3.pl line 19. Zone parse time: 1.10183 wallclock secs ( 1.10 usr + 0.00 sys = 1.10 CPU)

2.03 seconds to 1.10 seconds

Another ~50% speed up!

$ perl -d:NYTProf zoneparse3.pl zone.com.txt Subroutine Net::DNS::RR::stripdot redefined at zoneparse3.pl line 19. Zone parse time: 3 wallclock secs ( 3.51 usr + 0.02 sys = 3.53 CPU)

$ nytprofhtml -o ./nytprof_run3 -f ./nytprof_run3.out --open Reading ./nytprof_run3.out Processing ./nytprof_run3.out data Writing sub reports to ./nytprof_run3 directory 100% ... Writing block reports to ./nytprof_run3 directory 100% ... Writing line reports to ./nytprof_run3 directory 100% ...

77% speed up

Total run time: 4.61 seconds down to 1.10 seconds

Time in external module reduced from nearly 5 secs to 6ms.

This includes the overhead of calling the profiling module References

CPAN

http://search.cpan.org/~timb/Devel-NYTProf-4.06/

http://search.cpan.org/~salva/Devel-SmallProf-2.02/

PerlMonks.org

Perl Best Practices by

Modern Perl by chromatic