<<

The nofib Benchmark Suite

of Haskell Programs

Will Partain

University of Glasgow

Abstract

This p osition pap er describ es the need for makeup of and rules of the

game for a b enchmark suite of Haskell programs It do es not include

results from running the suite Those of us working on the Glasgow

Haskell hop e this suite will encourage sound quantitative as

sessment of lazy systems This version of this

pap er reects the state of play at the initial prerelease of the suite

Towards lazy functional b enchmarking

History of b enchmarkingfunctional

The quantitative measurement of systems for lazy functional programming is a

nearscandalous sub ject Dancing b ehind a thin veil of disclaimers researchers

in the eld can still b e found quoting nbssec or something equally egre

gious as if this refers to anything remotely interesting

The April Computer Journal sp ecial issue on lazy functional program

ming is a notto odated selfp ortrait of the community that promotes comput

ing in this way It is one that nonsp ecialists are likely to see There are three

pap ers under the heading Eciency of functional languages

The Yale group rep orting on their ALFL compiler cites results for the

b enchmarks queens nfib tak mm deriv tfib qsort and

init noting that several are from the Gabriel suite of LISP b enchmarks They

say that results from these tests indicate that functional languages are indeed

b ecoming comp etitive with conventional languages page

Augustsson and Johnsson have a section ab out p erformance in their pap er

on the LML compiler They consider some of the usual susp ects queens

fib prime and kwic comparing against implementations of these algo

1

rithms in Edinburgh and New Jersey SML and Miranda To their credit

the wondrous Chalmers hackers are somewhat ap ologetic conceding that mea

suring p erformance of the compiled co de is very dicult

Finally Wray and Fairbairn argue for programming techniques that make

essential use of nonstrictness and for an implementation TIM that makes

these techniques inexp ensive Though they delve into a substantial spread

sheetlike example program they do not rep ort any actual measurements How

ever they astutely take issue with the usual toy b enchmarks There was in

the past a tendency for implementations to b e judged on their p erformance for

unusually strict b enchmarks

1

Miranda is a trademark of Research Software Ltd

History of b enchmarkingimp erative

Our imp erativeprogramming colleagues are not far removed from our brutish

b enchmarking condition Only a few years ago MIPS ratings Dhrystones

and friends were all the rage marketeers bandied them ab out shamelessly

compiler writers tweaked their to sp ot sp ecic constructs in certain

b enchmarks users were baed and noone learned much that was worth know

ing The section on p erformance in Hennessy and Pattersons standard text on

computer architecture is an admirable exp ose of these shenanigans and is well

worth reading

Then in enter the Standard Performance Evaluation Corp oration

SPEC b enchmarking suite The initial version included source co de and

sample inputs workloads for four mostlyinteger programs and six mostly

oatingp oint programs These are all either real programs eg the GNU C

compiler or the kernel of a real program eg matrix oatingp oint

matrix multiplication Computer vendors have since put immense eort into

improving their SPECmarks and this has delivered real b enet to the work

station user

Towards lazy b enchmarking

The SPEC suite is the most visible artifact of an imp ortant shift towards sys

tem b enchmarking A big reason for the shift lies in the b enchmarked sys

tems themselves Fifteen years ago a typical computer systemhardware and

softwareprobably came from one manufacturer sat in one ro om and was a

computing environment all on its own

An excellent discussion ab out b enchmarking from the selfcontainedsys

tems era is Gabriel and Masinters pap er ab out LISP systems The prop er

role of b enchmarking is to measure various dimensions of Lisp system p erfor

mance and to order those systems along each of these dimensions page

A toy b enchmark of the typ e I have derided so far can fo cus on one of these

dimensions thus contributing to an overall picture

Much early measurement work in functional programming was of this plot

alongdimensions style however the concern was usually to assess a particular

implementation technique not the system as a whole For example Hailp ern

Huynh and Revesz tried to compare systems that use strict versus lazy evalua

tion They went to considerable eort to factor out irrelevant details hoping

to end up with pristine data p oints along interesting dimensions Hartels eort

to characterise the relative costs of xedcombinator versus programderived

combinator implementations was even more elab orate using nontoy SASL

programs

So can SPEC b e seen as a culmination of go o d practice in b enchmarking

bycharacteristics No SPEC makes no eort to pinp oint systems along in

teresting dimensions except for the very simplestelapsed wallclo ck time

An underlying premise of SPEC is that systems are suciently complicated

that we probably wont even b e able to pick out the interesting dimensions to

measure much less characterise b enchmarks in terms of them SPEC represents

a shift to lazy b enchmarking of systems

Conte and Hwus survey conrms that at least in computer architecture

this shift towards lazy systemoriented b enchmarking is supp orted as a Go o d

Thing The trend can also b e seen in some sp ecialised areas of computing

the Perfect Benchmarks for sup ercomputers Crays etc running FORTRAN

programs and the Stanford b enchmarks for parallel sharedmemory sys

tems are two examples

Some serious b enchmarks nofib

We the Glasgow Haskell compiler group wish to help develop and promote

a freelyavailable b enchmark suite for lazy functional programming systems

called the nofib suiteconsisting of

Source co de for real Haskell programs to compile and run

Sample inputs workloads to feed into the compiled programs along with

the exp ected outputs

Rules for compiling and running the b enchmark programs and more

notably for rep orting your b enchmarking results and

Sample rep orts showing by example how results should b e rep orted

Our nonmotivations in creating this suite

Benchmarking is a delicate art and science and its hard work to b o ot We

have quite limited goals for the nofib suite are hoping for lots of help and are

prepared to overlo ok considerable shortcomings esp ecially at the b eginning

Motivations

 Our main initial motivation is to give functionallanguage implementors

including ourselves a common of real Haskell programs to attack

and study We encourage implementors to tackle the problems that ac

tual ly make Haskell programs large and slow thus hastening solutions to

those problems

And of course b ecause the b enchmark programs are shared it will b e p os

sible to compare p erformance results b etween systems running on identical

hardware eg Chalmers HBC vs Glasgow Haskell b oth running on the

same Sun Racing is the fun part

 Our ultimate motivation for this b enchmark suite is to provide end users

of Haskell implementations with a useful predictor of how those systems

will p erform on their own programs

The initial nofib suite will have no value as a predictive to ol Perhaps

those with greater exp ertise will help us correct this If necessary we will

gladly hand over the token for the suite to a more disinterested party

 We are very keen on some might say paranoid ab out readilyacces

sible reproducible results That is the whole p oint of the rep orting rules

elsewhere in this pap er

Go o dbutirrepro ducible b enchmarking results are very damaging b e

cause they lull implementors into a false sense of security

 After the initial prerelease of the suite which will b e for p ossibly ma jor

debugging we intend to keep the suite stable so that sensible comparisons

can b e made over time

 Having said that b enchmarks must change over time or they b ecome

stale It is dicult to brim with condence ab out the Gabriel b enchmarks

for LISP systems they are more than a decade old

Being forced to change a b enchmark suite can b e a mark of success The

SPEC p eople made substantial changes to their suite in so much

work had gone into compiler tricks that improved SPEC p erformance

results that some tests were no longer useful notably the matrix test

mentioned earlier

Nonmotivations

We are profoundly uninterested in distilling a single gure of merit eg

MIPS to characterise a Haskell implementations p erformance

Initially at least we are also uninterested in any statistics derived from the

raw nofib numb ers eg various means standard deviations etc You may

calculate and rep ort any such numb ersall honest eorts to understand these

b enchmarks are welcomebut the raw underlying numb ers must b e readily

available

An imp ortant issue we are not addressing with this suite is interlanguage

comparisons How do es program X written in Haskell fare against the same

program written in language Y Such comparisons raise a nest of issues all

their own for example is it really the same program when written in the two

compared languages This disclaimer aside we do provide the nofib program

sources in other languages if we happ en to have them

The Real subset

The nofib programs are divided into three subsets Real Imaginary and Sp ec

tral somewhere b etween Real and Imaginary

The Real subset of the nofib suite is by far the most imp ortant In fact

we insist that anyone who wishes to rep ort any results from running this suite

in whatever form must rst distribute their complete raw results for the Real

subset in a public forum eg available by anonymous FTP

The programs in the Real subset are listed in Table Each one meets most

of the following criteria

 Written in standard Haskell version or greater

 Written by someone trying to get a job done not by someone trying to

make a p edagogical or stylistic p oint

 Performs some useful task such that someone other than the author might

want to execute the program for other than watchademo reasons

 Neither implausibly small or imp ossibly large the Glasgow Haskell com

piler written in Haskell falls in the latter category

Program Description Origin

anna Strictness analyser Julian Seward Manchester

calc arbitraryprecision calculator Liang Mirani Yale

compress Text compression Paul Sanders BT

fluid Fluiddynamics program Xiaoming Zhang Swansea

gamteb Monte Carlo photon transp ort Pat Fasel Los Alamos

gg Graphs from GRIP statistics Iain Checkland York

hpg Haskell program generator Nick North NPL

infer HindleyMilner typ e inference Phil Wadler Glasgow

lift Fullylazy lamb da lifter David Lester Manchester

Simon Peyton Jones Glasgow

maillist Mailinglist generator Paul Hudak Yale

mkhprog Haskell program skeletons Nick North NPL

parser Partial Haskell parser Julian Seward Manchester

pic Particle in cell Pat Fasel Los Alamos

prolog miniProlog interpreter Mark Jones Oxford

reptile Escher tiling program Sandra Foubister York

veritas Theoremprover Gareth Howells Kent

Table nofib b enchmarks Real Subset

 The run time and space for the compiled program must neither b e to o

small eg time less than ve secs or to o large eg such that a research

student in a typical academic setting could not run it

Other desiderata for the Real subset as a whole

 Written by diverse p eople with varying functionalprogramming skills

and styles at dierent sites

 Include programs of varying ages from rst attempts to heavilytuned

rewrittenfourtimes b ehemoths to transliterationsfromLML etc

 Span across as many dierent application areas as p ossible

 The suite as a whole should b e able to compile and run to completion

overnight in a typical academic computing environment

The Sp ectral subset

The programs in the Sp ectral subset of nofiblisted in Table are those

that dont quite meet the criteria for Real programs usually the stipulation

that someone other than the author might want to run them Many of these

programs fall into Hennessy and Pattersons category of kernel b enchmarks

b eing small key pieces from real programs page

The Imaginary subset

The Imaginary subset of the suite is the usual small toy b enchmarks eg

primes kwic queens and tak These are distinctly unimp ortant and you

Program Description Origin

boyer Gabriel suite b oyer b enchmark Denis Howe Imp erial

cichelli Perfect hashing function Iain Checkland York

clausify Prop ositions to clausal form Colin Runciman York

fish Draws Eschers sh Satnam Singh Glasgow

knights Knights tour Jon Hill QMW

life Game of Life John Launchbury Glasgow

mandel Mandelbrot sets Jon Hill QMW

minimax tictacto e s and Xs Iain Checkland York

multiplier Binarymultiplier simulator John ODonnell Glasgow

pretty Prettyprinter John Hughes Chalmers

primetest Primality testing David Lester Manchester

rewrite Rewriting system Mike Spivey Oxford

scc Stronglyconnected comp onents John Launchbury Glasgow

sorting Sorting algorithms Will Partain Glasgow

Table nofib b enchmarks Sp ectral Subset

may get a sp ecial commendation if you ignore them completely They can b e

quite useful as test programs eg to answer the question Do es the system

work at all after Simons changes

Rules for running and rep orting

Glasgow will provide the nofib program sources as well as input workloads

and exp ected outputs We will also provide some scaolding to make it

easier to run the b enchmarks

Anyone can then run the b enchmark programs through their Haskell system

The price for using the b enchmark suite is that you must follow our rules if

you rep ort your results in any public forum including any publication

In the bigmoneyontheline world of the SPEC suite the running and

rep orting rules are complicated and arcane Thats b ecause there are many

p eople who would rather b e sneaky than do honest work to improve their

systems p erformance For now we assume that functional programmers are

more noble creatures the nofib rules are therefore quite simple

The basic rep orting principle is You must provide enough information and

results that someone with a similar hardwaresoftware conguration can easily

duplicate your results

The most imp ortant sp ecic nofib rep orting rule is if you wish to rep ort

or publish some results from running some part of the nofib suite then you

must rst le a complete set of howIdiditwhatIgot information for the

entire Real subset of programs in some public forum a newsgroup mailing

list an anonymousFTP directory Thereafter you may claim whatever

you like the idea b eing that p eople can lo ok up your led information and

laugh at you if youre making unsubstantiated claims

We are not insisting on these rules b ecause we like playing lawyer We

intend as little hindrance as p ossible to creative honest uses of this suite

There are more details ab out the rep orting rules in the version of this pap er

that is distributed with the suite

Concluding remarks

Inattention to b enchmarking is not just sloppy it ends up as selfdelusion

Assertions that various functionallanguages compilers generate co de that

2

is comp etitive with that generated by conventional language compilers

are simply false by any commonsense measure whats more when they are

rep eated by Resp ected People they are downright harmful they detract from

the urgency of building b etter implementations

By intro ducing the nofib suite of Haskell programs we hop e for an imme

diate payo simply by giving all Haskell implementors a common set of sources

with which to race each other We also hop e that we are setting the foundation

for a sound predictor of Haskellsystem p erformance

This suite follows the general trend away from plotthecharacteristics

b enchmarking and towards lazy systems b enchmarking of which the SPEC

suite is the most prominent example This approach to b enchmarking gives the

greatest credence is given to gross system b ehaviour on sizable real programs

Comments on this pap er and on the nofib suite itself are most welcome

Contributions of substantial functional programs that could b e added to the

suite would b e even more welcome I can b e reached by electronic mail at

glasgowhaskellrequestdcsglasgowacuk

Haskellrelated things including the nofib suite can b e retrieved by anony

mous FTP from ftpdcsglasgowacuk in pubhaskellglasgow The sites

nebulacsyaleedu and animalcschalmersse usually have copies as well

in the same directory

An uptodate version of this pap er will b e included in the nofib distribu

tion There is also a toplevel README which is the rst le you should consult

Acknowledgements My thanks to John Mashey for his many ne articles in

comparch that promote sensible b enchmarking and to Je Reilly for providing

information ab out SPEC Vincent Delacour Denis Howe John ODonnell Paul

Sanders and Julian Seward were among those who provided helpful comment

on earlier versions of this pap er Of course we are most indebted to those

p eople who have let their co de b e included in the suite

References

L Augustsson and T Johnsson The Chalmers LazyML compiler Com

puter Journal April

Adrienne Bloss P Hudak and J Young An optimising compiler for

a mo dern functional language Computer Journal April

2

Citation withheld to protect the guilty

Thomas M Conte and Wenmei W Hwu A brief survey of b enchmark us

age in the architecture community Computer Architecture News

June

Richard P Gabriel and Larry M Masinter Performance of Lisp systems In

Conference Record of the ACM Symposium on LISP and Functional

Programming pages Pittsburgh PA August

Brent Hailp ern Tien Huynh and Gyorgy Revesz Comparing two func

tional programming systems IEEE Transactions on Software Engineering

May

Pieter H Hartel Performance of lazy combinator graph reduction

SoftwarePractice and Experience March

John L Hennessy and David A Patterson Computer Architecture A

Quantitative Approach Morgan Kaufmann Publishers Inc San Mateo

CA

Lynn Pointer editor Perfect rep ort CSRD Rep ort Center for Su

p ercomputing Research and Development University of Illinois Urbana

IL March

Jaswinder Pal Singh WolfDietrich Web er and Ano op Gupta SPLASH

Stanford parallel applications for sharedmemory Computer Architecture

News March

S C Wray and J Fairbairn Nonstrict languagesprogramming and

implementation Computer Journal April