A joint newsletter of the Statistical Computing & Sections of the American Statistical Association.

December 96 Vol.7 No.3

A WORD FROM OUR CHAIRS SPECIAL FEATURE ARTICLE Statistical Computing : An Emerging Discipline This is Sallie Keller-McNulty'slast By Ulf Grenander and Michael I. Miller column as 1996 Chair of the Statis- tical Computing Section. The edi- The last several decades have witnessed a revolutionary tors would like to thank her for her change in medical imaging. New imaging instrumenta- contributions to the newsletter. tion has transformed the ®eld from one that was dom- This is my ®nal column as your Section Chair. I am inated by the time honored ¯at Xrays to a quickly ex- pleased to say that I have now placed the leadership of panding technology with powerful tools like MRI (Mag- the Statistical Computing Section in the able hands of netic Resonance Imaging), PET (Positron Emission To- Daryl Pregibon. Some of our other new Section Of®cers mography ) and SPECT (Single Photon Emission Com- that will be performing important functions for the Sec- puted Tomography ) to mention but a few. This has rad- tion this coming year are James Marron, our new Pub- ically changed the diagnosticians' ability to acquire pa- lication Liaison; Russell Wol®nger, the 1998 Joint Sta- tient data with more or less non-invasive procedures. tistical Meetings (JSM) Section Program Chair; Karen Some sensors, for example ultra sound, produce very Kafadar our Section Chair-Elect; and our new Council noisy data; others like MR have a better signal-to-noise of Sections Representatives, Naomi Altman and Terry ratio. In both cases the presence of randomness ne- Therneau. CONTINUED ON PAGE 2 cessitates statistical treatment, but it should be empha- sized that the main dif®culty is not due to noise, whose Statistical Graphics probabilistic properties can be derived from the physics of the sensor with some degree of accuracy. Instead William DuMouchel is the outgo- the overwhelming challenge facing the analyst of data ing 1996 Chair of the Statisti- from medical imaging is understanding the randomness cal Graphics Secion. The editors that represents biological variability. This is more terra would like to thank him for his help incognita than noise analysis, although pioneering work in providing this column during the was done as early as the 1970's by Bookstein (1978) past year. and others studying shape change. Today an increasing number of mathematicians, , anatomists and As my year as Chair of the Statistical Graphics Section radiologists are exploring the new discipline computa- ends, I would like to thank all the of®cers of the Section tional anatomy. for working so hard during the year to make our activi- ties successful. Many of them deserve special mention: Statisticians are wont to complain about the scarcity of Stephen Eick organized and shepherded a great program data, but computational anatomy should be the statisti-

for last August's meetings in Chicago. Dianne Cook, cians El Dorado, since data come in huge quantities, es-

8 10

10 our incoming Program Chair, has lined up another great pecially in 3D imaging where 10 data points are program for next summer. Robert Newcomb, our Sec- not unusual. We believe that with the emergence of fun- retary/Treasurer, has, in addition to his of®cial duties, damentally new technologies in CRYOSECTIONING CONTINUED ON PAGE 3 CONTINUED ON PAGE 4 EDITORIAL In the ªUnix Computingº column starting on page 14, Phil Spector picks up from where he left off in the Au- gust issue and discusses more advanced applications One Year Later... of the Perl programming language. In another of our We have been editing the newsletter for a year, now, and regular columns, ªTopics in Information Visualizationº proudly notice that it hasn't folded yet! On the contrary, (page 16), Daniel Carr and Suzanne Pierson address the we have been able to bring you stimulating articles on issue of redesigning choropleth maps using micromaps. a wide variety of topics and to keep you abreast of re- Their article points out visual and representational prob- cent developments concerning our sections and our ar- lems encountered with typical choropleth maps and de- eas of interest. Of course, in this era of ªelectronic infor- scribes a new template that overcomes these problems mation,º it is dif®cult, if not impossible, for the printed by linking micromaps and row-labeled plots. medium to remain fully competitive as far as timeli- This issue also contains a (belated) summary of the ness is concerned. We remind you, however, that the highly successful scienti®c programs that our sections electronic version of the newsletter becomes available sponsored at the Joint Statistical Meetings in Chicago ahead of the printed version and can be easily accessed and a report on the annual joint business meeting and by following the links to either of our sections from the mixer. A clear sign of the vitality of our scienti®c com- ASA web page (http://www.amstat.org/). In munity is given by the many conferences on topics re- our sections' web pages you will also ®nd an abundance lated to statistical computing and graphics that will take of additional professional information and resources. place in the next several months. Please take a minute This issue contains the last columns as section chairs by to check out the notices printed in the last few pages of Sallie Keller-McNulty and Bill DuMouchel. Looking the newsletter and decide if you would like to attend. As back at what happened over the past year, they gladly usual, if you have any comments or if you wish to send recognize the numerous accomplishments of our sec- us a contribution, we would be happy to hear from you. tions and share with us their thoughts for the future. We Thank you and enjoy your reading. are certain to convey the sentiment of all of our readers by extending to both of them our thanks for their man- Mark Hansen agerial and administrative efforts. Editor, Statistical Computing Section The special feature article by Ulf Grenander and Bell Laboratories Michael Miller that begins on page 1 is an excellent [email protected] introduction to a fascinating emerging discipline: com- putational anatomy. The authors argue that, to take full Mario Peruggia advantage of the huge amount of diagnostic data gener- Editor, Statistical Graphics Section ated by novel medical imaging techniques, a clear un- The Ohio State University derstanding of biological variability is needed. In the [email protected]

article, they describe their approach to examining the

geometric properties of the brain substructures based on the construction of statistical measures of anatomi- cal variation. They also point out the challenges that lie FROM OUR CHAIRS (Cont.). . . ahead in this area of research. After a year-long hiatus, we are happy to bring you back the ªBits from the Pitsº column. Al Liebetrau, who used Statistical Computing to contribute regular pieces featuring statistical comput- CONTINUED FROM PAGE 1 ing and statistical graphics activities in science and in- The Section Executive Committee has made an im- dustry, has kindly agreed to become the column editor. portant addition to the committee. We have added For this issue, he has secured an interesting contribution an appointed position of Electronic Communica- from two of his colleagues at the Columbus headquar- tions Liaison to the committee. Our ®rst Elec- ters of Battelle Memorial Institute. Beginning on page 9, tronic Communications Liaison is Tom Devlin. Tom Stephen Wall and John Orban describe a computer sim- will serve a three year term. Even before this ap- ulation program that they developed to illustrate the use pointment, Tom has been active in this capacity for of different statistical methodologies for col- the Section. He is the creator of our homepage lecting highway traf®c data. (http://www-stat.montclair.edu/asascs/).

2 Statistical Computing &Statistical Graphics Newsletter December 96 Tom ([email protected]) welcomes national Conference on Knowledge Discovery and Data suggestions about the homepage and any other thoughts Mining. It will be held immediately after the JSM in you have about activities with which he should become Newport Beach, California (see page 8 for information). involved. So, as you can see, there are many events of interest that Section dues have been used on a variety of exciting ac- the Section is involved in this year! I wish you all a pro- tivities this past year. We again sponsored a student pa- ductive new year and I look forward to seeing many of per competition session for the JSM this past August. you at the various conferences and symposia. The session was a huge success and the Journal of Com- putational and Graphical has requested that the Sallie Keller-McNulty students submit their papers for publication. We spon- Kansas State University sored three short courses at the JSM and one at the In- [email protected]

terface Symposium in Australia. We help support an un-

dergraduate data analysis competition. The winners of the competition (Theresa Crofts, Victoria Field, AAron Holt, Jeffrey Edwardy and Amy Stai from Winona State; FROM OUR CHAIRS (Cont.). . . Matthew Haubrich and Matthew Schwab from Iowa State; Aidan Palmer from Carnegie Mellon; Ellie Nagel and Todd Nelson from Brigham Young) gave presenta- Statistical Graphics tions at the JSM last August, as well as joining in the CONTINUED FROM PAGE 1 festivities at our Joint Mixer. continued to develop and maintain our Web page. Mario In the upcoming year, the Section will sponsor another Peruggia, as the Graphics Section co-editor of this student paper competition for the 1997 JSM this Au- newsletter, has, with Mark Hansen, continued to main- gust in Anaheim. We will also be supporting travel tain its wonderfully high quality. And Sally Morton, the for Peter Huber to do a special presentation on massive incoming Chair, assisted with many administrative tasks datasets at the 1997 JSM. We are helping to support the throughout her year as Chair-Elect. We can all look for- Third North American Conference of New Researchers ward to a great year for the Statistical Graphics Section in Statistics that is being organized by Snehalta Huzur- with Sally as Chair! bazar and Aparna Huzurbazar (see page 25 for informa- I hope all of you have had reason to access the ASA's tion). Finally, the Section is co-sponsoringa symposium Web site [www.amstat.org] by now. Primarily be- this spring at New York University on Recent Develop- cause of the support of our two Sections, it has become ments in Smoothing Methods, (see page 26 for informa- very useful and professional-looking ± we should all be tion). proud. I ®nd the member index really useful. And of course the link to our own Graphics Section web site is The year promises to be an exciting one for our com- especially valuable! I would like to direct your atten- munity. The 1997 JSM will focus on ªShaping Statis- tion in particular to two parts of our site. First, the list tics for Success in the 21st Century.º With this theme of JSM-96 Exhibitors who contributed prizes to the raf- we will see many sessions dealing with the topic of mas- ¯e conducted at our mixer in Chicago. Second, the de- sive datasets. There will also be much discussion at the scription of the 1997 Data Exposition. JSM about the impact of computing and computational sciences on statistics in the 21st century. In February, The data for this year's Exposition were compiled by the International Association for Statistical Computing Colin Goodall, who is also a Council of Sections Repre- will host the Second World Conference in Pasadena Cal- sentative for our Section. The challenge for participants ifornia. The conference theme is Computational Statis- in the 1997 Exposition is especially relevant and topi- tics &Data Analysis...on the Eve of the 21st Century cal ± Can statistical and graphical methods help improve (http://www.stat.unipg.it/iasc.html). We the quality of our health care system? Let's assume the should all be sure not to miss this year's Interface prerequisite for improving the quality of any system is Symposium being chaired by David Scott. The sym- deciding how to measure its quality. Our data consist of posium theme is ªMining and Modeling of Massive over a hundred variables describing costs, patient out- Data Sets In Science, Engineering, and Business,º comes and sample sizes for subpopulations of patients (http://www.stat.rice.edu/). Another meeting in each of over 1000 hospitals. Is it possible to prepare of interest is being sponsored by the American Asso- report cards that fairly summarize hospital performance ciation for Arti®cial Intelligence. It is the Third Inter- and separate the good hospitals from the not-so-good?

December 96 Statistical Computing &Statistical Graphics Newsletter 3 Health care in the U.S. represents almost a trillion dollar be imaged, in the left panel of Figure 1, we present a

annual expenditure. Shouldn't the tools for improving cryosection through a macaque brain, the whole 3-D

 480  200

process quality that our profession is famous for having brain consisting of 640 voxel volume el-

3

:1mm developed be able to increase the ef®ciency of this pro- ements at approximately 0 resolution. Notice cess by at least a few percent? If statisticians don't par- the exquisitecortical folds of white and gray matter. The ticipate in the solution of this problem, we will have only right panel shows a whole brain MRI-MPRAGE image ourselves to blame if whatever methodology gets even- volume with a section through the whole brain showing tually adopted ends up supplying endless examples for a clear delineation of the surface of the hippocampus. ºhow to lie with statisticsº. As well, mapping tools now exist which support the So please check out our web site for more discussionand geometry of these brain structures (see, for example, description (and the data, of course) and enter the con- Christensen, Rabbitt, and Miller 1996). Thus far we test! have principally worked in the homogeneous anatomy Best wishes to you all for the new year. setting, in which the space of anatomical imagery is as- sumed topologically equivalent. Space limitations do Bill DuMouchel not allow us to give more than a rough sketch of the pat- AT&T Labs - Research tern theoretic methodology. To ®x ideas, say that we are [email protected] interested in a nucleus or brain structure such as the hip-

pocampus H in the brain (right panel of Figure 1) and



for the only consider its shape as a volume H

3

 R I : X ! R X and its texture given by a function

SPECIAL FEATURE ARTICLE inside and outside the nucleus located in a background X space X ,andwhere will usually be a rectangular set

CONTINUED FROM PAGE 1 in 3-D. To represent the shape variability of H ,wein- and high ®eld strength MR there is a need for examina- troduce a similarity group S with element mapping

tion of statistical measures of anatomical variation in the

: X ! X context of the geometric properties of the brain substruc- s tures. This has been the principal focus of our work. It homeomorphically so that we represent the actual shape is now possible to study the cortical surfaces, folds and as the deformed template

subvolumes of the brain, simply because the new imag-

I x= I sx; s 2 S ; x 2 X: ing technologies are de®ning upwards of 10 times the temp resolution of previous 1-2mm MRI supporting the geo- Then the anatomical ensemble (space of anatomical re- metric properties of these brain structures.

alizations) is the orbit (deformable template) under S so

I I 2I 2 To illustrate examples of brain structures which can now that any two images 1 , are topologically

Figure 1: The left panel shows a whole brain cryosection through a macaque brain illustrating white and gray matter folds. Data taken from David Van Essen of the Department of Anatomy and Neurobiology at Washington University. The right panel shows a section through the whole brain MRI scan delineating the hippocampus. Data taken from Dr. John Csernansky of Department of Psychiatry at Washington University.

4 Statistical Computing &Statistical Graphics Newsletter December 96

Figure 2: Showing transverse sections through MRI images with three correspondence points (arrows) depicted be-

2 S tween the two images. These are three elements of the vector ®eld transformation determining the similarity s mapping one anatomical coordinate system to another.

equivalent in the sense that anatomical variability.

9s 2 S 3 I x= I sx;x 2 X: 2

1 (1) Here we have started from one or several templates

I

I 

After choosing one image, call it the template temp ,

temp that describe the intensity ®eld in and around X all anatomies can be generated from it. Characteriz- H

a typical shape temp in . How to determine the ing normal anatomy becomes an empirical procedure of template from data is a problem in statistical estimation constructing probability laws on the transformations S of unconventional type that cannot be discussed here.

from the family of transformations observed in actual I

Anyway, once temp has been chosen, we represent bi-

anatomies. In this setting, disease or abnormality then ological variability by a measure  on

 

corresponds to a transformation which is a large devia- the group S . Call the probability density with re- S tion from the identity in the group of transformations as spect to an invariant measure m on

re¯ected by the perhaps normal and disease measures.

s= ds=mds : Building anatomical representations becomes an empir-  ical procedure of constructing probabilities on the trans- In passing we point out that we insist on modelling the formations from the various populations of anatomies. 3 anatomy in the continuum R in which it lives, not on As human anatomy is exquisitely complex, we have any discrete lattice as has been customary in most pat- been studying similarities which are constructed at their tern recognition. Of course, the computing will have to highest resolution from vector ®elds, products of the ba- be done discretely since we rely on digital machines, so sic translation groups. They correspond to transforma- that discretization has to be done sooner or later. Our tions of dimension equivalent to the number of voxels advice is: do it later!

in the imagery, on the order of 100,000,000 parameters. D

y ;y 2 Y Let us denote the observed image by I , Figure 2 shows two sections from volume imagery as- which is an array indexed by y . For an MRI, for exam-

sociated with MRI image data, depicting three points

256  256  256 ple, Y could be so the image would in a similarity corresponding to the vector ®elds de®n- be an enormous 3D array, say of dimension d.Nowwe

ing the mapping of one anatomical coordinate system to D

I js

specify the L , i.e. the condi- d

another, as well as vectors associated with three points D R tional probability density of I in so that we can ap- forming the vector ®elds. ply Bayes' theorem and get the conditional probability

The major challenges facing computational anatomists density of the unknown group element s given the de- D

in the future is that even in this ªhomogeneous anatomy formed image I through the proportionality D

setting,º representing variability of shape and other en- D

sjI  /  sLI js : tities in anatomy requires new mathematical tools. One p of them is theory as presented for example in We have avoided the question of how to perform the ac- Grenander (1993) in which probability measures are in- tual modelling of images through con®guration spaces, troduced on groups and other families of transforma- connectors and other pattern theoretic concepts. The tions in order to create knowledge representations of reader is referred to Grenander and Miller (1993) for a

December 96 Statistical Computing &Statistical Graphics Newsletter 5

I =87A

Figure 3: The left panel shows the template temp . The middle panels shows the transformed templates

I s x;I s x I =93G I =90C

1 temp 2 1 2

temp . The right panel shows the two targets and .

I =87A

Figure 4: The left panel shows a section through template temp . The middle panel shows sections through the

I s x;I s x I =93G I =90C

1 temp 2 1 2 transformed templates temp . The right panel shows two target sections and .

detailed discussion of these topics in the context of cy- s we obtain an analysis of the brain data by computing



I s x; x 2 X

tological micrographs as well as to a more concise treat- temp . ment of brain anatomies in Miller, Christenson, Amit, Automated tools for generation of whole brain maps

and Grenander (1993).

2 S corresponding to these similarities s from these

Note that the posterior p contains all the information fabulously high resolution scans are only now starting available for analyzing the data given only the MRI: to be generated in laboratories throughout the country. both medical knowledge from an ªanatomical text- To illustrate, Figures 3, 4 taken from Joshi, Grenander, bookº, and empirical knowledge residing in the ob- and Miller (1996) demonstrate whole brain maps in the served image. Any inference should therefore be deriv- macaque monkey. The left panel of Figure 3 shows lo-

able from p, so that the main question has now been cations of the deep sulcul folds, called the fundi of the

reduced to handling p analytically and/or computation- sulci. These have been labeled in David Van Essen's  ally. Once we have an estimate s of the group element laboratory in the Department of Anatomy and

6 Statistical Computing &Statistical Graphics Newsletter December 96 Figure 5: Showing the sample brain with a section through it. Generated by mapping three whole brains to one common coordinate system. Whole brain volume reconstructions taken from the laboratory of David Van Essen of the Department of Anatomy and Neurobiology at Washington University.

Figure 6: The left and middle panels show rendered surfaces from two of the target volumes resulting from mapping the template hippocampus onto two targets. The right panel shows the mean hippocampus generated from a population of 10 anatomies. Data taken from Dr. John Csernansky of the Department of Psychiatry at Washington University. Figures taken from Joshi, Grenander, and Miller 1996.

Neurobiology at Washington University, and are de- Lancaster 1995, for example). Average brains can now picted via the superimposed lines. The brains were then be generated, with variations around the average stud- mapped one to another; the template brain 87A was ied via probability measures on the similarities. Such an mapped to two target brains 90C and 93. Figure 3 shows example is depicted in Figure 5 showing the mean brain the results from the whole brain mapping procedure. generated from 3 whole macaque brains. Figure 4 shows sections through the whole brains. The

The mean brain was generated by mapping a whole I

left panels show the template temp . The middle pan-

macaque provisory template brain onto two whole

I s x;I s x

1 temp 2 els show temp resulting from map- macaque brains, then averaging the coordinate system ping the template to the two targets. The right pan-

transformations and applying the average transforma-

I ;I 2 els show the targets 1 . The similarity transforma- tion to the template. This average transformation ap- tions consisted of vector ®elds the dimension of which plied to the provisory template is then the closest man- is equivalent to the number of voxels. In this case, the made brain to the population. mapping consisted of about 10,000,000 parameters. No- tice the fantastic correspondence and the detail that is ac- Shown in Figure 6 are average surfaces associated complished by the whole volume transformation. with the hippocampus. Panels 1 and 2 show two ren-

dered surfaces embedded in two of the target volumes

Ms x;i =1;2

Statistical properties of such mapping are i . These surfaces were generated by beginning to emerge. Groups have already begun char- transforming the provisory template hippocampusvol- acterizing large populations; (see Evans, Collins, and ume through the volume transformations carrying the Holmes, 1996, and Mazziotta, Toga, Evans, Fox, and template onto the target,and then composing the map

December 96 Statistical Computing &Statistical Graphics Newsletter 7 with the 2-dimensional surface manifold M represent- ing the provisory template hippocampus. The right Ulf Grenander panel shows the average hippocampus generated by combining ten anatomical maps from a populationof ten [email protected] anatomies. This is an emerging disciplinein its earliest stages of de- Michael I. Miller velopment, with tremendous opportunity for the devel- Washington University [email protected]

opment of new statistical methods for characterizing hu-

man variation. We call upon the statistical community to engage itself in research in the ®eld of computational anatomy. Not only can the results be of value to medical science and ultimately to patients; the intellectual chal- CALL TO ACTION lenges are overwhelming, and while much progress has been made during the last ®ve years we can expect major practical breakthroughs as well as theoretical advances Data Mining in the years to come. By Daryl Pregibon References While it has been considered a dirty word in Statis- Bookstein, F.L., (1978), The Measurement of Biological tics, ªdata miningº is an emerging area that is projected Shape and Shape Change, New York: Springer-Verlag. to be a multi-billion dollar industry by the year 2000. The modern usage of the term data mining connotes the Christensen, G.E., Rabbitt, R.D., and Miller, M.I. goal of ªextracting information from dataº, that is, re- (1996), ªDeformable Templates Using Large Deforma- ®ning crude and abundant raw data into high grade in- tion Kinematics,º IEEE Transactionson Image Process- formation for decision making. The ®eld of data min- ing, 5, 1435±1447. ing emerged from the applications side of the Machine Evans, A.C., Collins, D.L., and Holmes, C.J., (1996), Learning community, a largely theoretical subgroup of ªComputational Approaches to Quantifying Human Arti®cial Intelligence (AI). As such, data mining is a Neuroanatomical Variability,º in Brain Mapping: The blend of statistics, AI, and database research. There is Methods, 343±361, eds. Toga, A.W. , and Mazziotta, much overlap between statistics and data mining and J.C., San Diego: Academic Press. therefore tremendous opportunity for statisticians who Grenander, U., (1993), General , Oxford, are looking for challenging new areas in which to apply UK: Oxford University Press. their skills and experience in data analysis and inference. Grenander, U., and Miller, M.I., (1994), ªRepresenta- Statisticians interested in learning about the area will tions of Knowledge in Complex Systems,º Journal of have several opportunities to do so at the Joint Sta- the Royal Statistical Society, Ser. B, 549±603. tistical Meetings in Anaheim this summer. Special sessions are planned of both a tutorial and research Joshi, S., Grenander, U., and Miller, M.I., (1996), ªOn nature. Immediately following the Joint Meetings in the Geometry of Brain Sub-Manifolds,º International nearby Newport Beach CA, the American Associa- Journal of and Arti®cial Intelli- tion of Arti®cial Intelligence, in cooperation with the gence, Special Issue on Processing of MR Images of the ASA, will be hosting the 3rd International Confer- Human Brain, to appear. ence on Knowledge Discovery and Data Mining (KDD- Mazziotta, J.C., Toga, A.W., Evans, A., Fox, P., and 97). This is the preeminent conference in data min- Lancaster, J. (1995), ªProbabilistic Atlas of the Hu- ing and a good place to see ®rst hand what all the man Brain: Theory and Rationale for Its Development,º hype is about. For more details point your browser to Neuroimage, 2, 89±101. http://www-aig.jpl.nasa.gov/kdd97/. Miller, M.I. Christensen, G.E., Amit, Y., and Grenan- der, U. (1993), ªMathematical Textbook of De- Daryl Pregibon formable Neuroanatomies,º Proceedings of the Na- AT&T Labs Research tional Academy of Sciences, 11944±11948. [email protected]

Also visit the Web Sites:

http://www.dam.brown.edu/pattern/ http://www.cis.wustl.edu

8 Statistical Computing &Statistical Graphics Newsletter December 96 BITS FROM THE PITS de®nes constraints on the sampling approach. For ex- Edited by Albert Liebetrau ample, when estimating traf®c volumes or average ve- hicle distance traveled, it is possible, in most instances, to count all vehicles that pass a sampling location. On A Simulation Tool For the other hand, determining if commercial vehicles are complying with safety regulations imposes constraints Evaluating Design And on the number of vehicles that can be inspected. Other considerations include whether or not prior information Analysis Options For is available to ªoptimizeº the sampling allocation, lim- Monitoring Highway its on the number of sampling locations or level of sam- pling at each location, and regulatory issues such as the Traf®c Characteristics requirements to include certain locations or to inspect certain types of vehicles. By Stephen M. Wall and John E. Orban Just as there are different options for designing the mon- 1. Introduction itoring program, there are also options on how the re- Transportation planning agencies, faced with a need to sults should be reported. From a statistical perspective, improve highway system performance and safety, are one would like to have estimators that are unbiased with making greater efforts to monitor travel trends and the minimum . However, depending on the com- changing characteristics of the vehicles used on the na- plexity of the sampling design, the ªoptimalº approach tion's roadways. For example, the Federal Highway Ad- may not be obvious. Furthermore, regulators and ad- ministration (FHWA) established the Highway Perfor- ministrators may not be comfortable with complex anal- mance Monitoring System (HPMS) to monitor traf®c ysis routines; especially when a ªsimpleº approach pro- volumes, annual vehicle distance traveled (AVDT), an- duces essentially the same answer. nual average daily traf®c (AADT), vehicle classi®ca- Thus, in considering the various options for designing tion, and truck weight (Traf®c Monitoring Guide 1995). the monitoring program and analyzing the results, a tool Truck safety compliance rates and vehicle occupancy is needed to evaluate the statistical implication of the de- rates are also quantities of interest to transportation plan- cisions. Such a tool is the topic of our article. ners. Each of these monitoring programs requires the development of sampling plans and estimation proce- 3. Simulation Modules dures that are statistically valid and practical to imple- In designing the simulation program, we identi®ed the ment. Kinateder et al. (1997) discuss many of these is- following four modules that would be incorporated into sues from a statistical perspective. This article discusses the ®rst version of the software: Stratum Module, Sam- how a computer simulation program was developed to pling Design Module, Simulation Module, and Results evaluate and illustrate the statistical properties of differ- Module. Because each of these modules represents a ent sampling strategies and analysis techniques. We be- distinct component of the sampling problem, our goal gin by discussing some of the statistical sampling and was to make them as independent as possible. Each analysis options that might be under consideration in a module would be developed with well-de®ned inputs given situation; then, we present an overview of how the and outputs that could be used by the other modules, simulation program is used to evaluate these options. but all data manipulation would be localized within the 2. Options for Sampling Designs and Analy- modules. This approach would provide ¯exibility for sis Procedures modifying individual modules without affecting the op- eration of other modules. This approach also mini- The choices of sampling designs and analysis proce- mizes the initial development and testing effort, and the dures to be employed on a speci®c monitoring program amount of integration testing required when individual depend on several factors. The ®rst consideration is the modules are modi®ed. level of strati®cation required. The monitoring program might involve several regions (states, counties, etc.) or 3.1 Stratum Module functional classes of roadway (interstate, rural, etc.). The Stratum Module is used to de®ne the strati®cation Furthermore, the planners might be interested in esti- strategy. In its current form (see Figure 1), one can se- mating traf®c characteristics at various times of the day lect different locations, functional classes, and sampling or different days of the week. The second factor to con- times. However, as statistical methodology is developed sider is the choice of performance measures. Often this to handle complex strati®cation schemes, this module

December 96 Statistical Computing &Statistical Graphics Newsletter 9 Figure 1: Stratum Module.

Figure 2: Sampling Design Module.

10 Statistical Computing &Statistical Graphics Newsletter December 96 Figure 3: Simulation Module. will be modi®ed to allow the user to classify each stra- of safety violations. Eventually, the program will be tum and determine the sampling times for that stratum. set up to display different simulations depending on the This information is used by the Simulation Module to performance measure selected in the Sampling Design predict the number of vehicles that will be traveling on Module. the stratum and the number of violations that will occur The Simulation Module is also where the user executes during sampling. the simulation algorithm, views the simulated sampling 3.2 Sampling Design Module protocol, and performs replicate sampling. The random sampling process is illustrated by ªclickingº on the ªRe- The Sampling Design Module is used to de®ne the sam- generate Sampleº button. The new sample immediately pling strategy. The program screen representing the appears on the screen. The ªReplicate Samplingº func- Sampling Design Module is shown in Figure 2. The data tion generates multiple simulated samples in order to input to this module de®ne how the Simulation Mod- evaluate the sampling istributions of estimators. The ule will select the highway links and the number of ve- results are displayed in the Results Module. The user hicles to be sampled on each link. Also, the user can speci®es the number of replicate samples to generate. choose among several different performance measures Eventually, we foresee that there may be several dif- and specify whether the prior information (e.g., traf- ferent simulation algorithms, or at least ª¯avorsº of the ®c volumes) is known exactly or approximately. Each original algorithm, incorporating different performance performance measure dictates how the simulation algo- measures. Currently, we only have one algorithm in this rithm is executed and what data items are displayed as version of the software. part of the Simulation Module. 3.4 Results Module 3.3 Simulation Module The Results Module presents a statistical summary of In setting up the Simulation Module (Figure 3), the user the simulated estimators. The screen representing the enters the data that characterize the links in each stra- Results Module is shown in Figure 4. Currently, the tum. The data that are displayed depend on which per- module is set up to compare two methods for estimat- formance measure is selected in the Sampling Design ing violation rates ± one based on simple averages and Module. In Figure 3, the module displays a simulation the other using statistical sampling weights. As we

December 96 Statistical Computing &Statistical Graphics Newsletter 11 Figure 4: Results Module.

Figure 5: Results Module with New Title and Plot Style.

12 Statistical Computing &Statistical Graphics Newsletter December 96 implement new performance measures and simulation 5. Future Plans algorithms, the Results Module will be expanded to in- When we began the development of our simulation pro- clude other comparison methods. gram, we recognized the need for a customized tool that could evolve as we explore different statistical design These plots show the sampling distributions of the two and analysis approaches. For example, we may want estimators. Speci®c properties, such as bias, standard to look at different sampling designs, add the ability to deviation, and root mean squared error (RMSE) are also include strati®cation, or evaluate different performance presented for each estimator. We developed the Results measures. Or, if we decide there are better ways to com- Module in such a manner to allow the user to customize pare the simulation results, we can add them to the Re- these plots. For example, the user can change titles, sults Module. With the software selected, we have com- fonts, or type of plotting routine in much the same way plete ¯exibility to change nearly every aspect of the pro- one makes these changes in a modern spreadsheet pack- gram. age. Figure 5 shows an alternative way to present the results of the simulation. We started the development process before Microsoft, Inc. released Windows 95. Because we believe that a 4. Software Development Environment 32 bit operating system will provide greater stability and enhanced program execution, we plan to port the soft- Because our goal was to develop an easy, intuitive ware to operate with Windows 95 and Windows NT. user interface for the simulation program, we selected Since these are the predominate platforms for today's Microsoft Windows 3.x as our development platform. desktop computing, we don't expect to develop the soft- Throughout the development process, we made every ware to operate with any other operating system. effort to make the program feel like a typical Windows Other enhancements to the program include: incorpo- application. Most users would be familiar with these rating the remaining performance measures, adding ad- types of programs. ditional stratum classi®cation parameters, and adding the ability to simulate a sampling protocol that includes As our software development tool, we chose DelphiÐ multiple strata. a rapid application development (RAD) product from Borland, Inc. Delphi provides a highly optimized com- References piler and a visual programming environment that is ex- Traf®c Monitoring Guide, 3rd ed. (1995), U.S. Depart- cellent for prototyping applications. With the prototype ment of Transportation, Federal Highways Administra- approach, we were able to design and evaluate the user tion, Of®ce of Highway Information Management. interface early in the development process. Changes to Kinateder, J.G., McMillan, N.J., Orban, J.E., Skarp- the interface could be made before developing the sim- ness, B.O., and Wells, D. (1997), ªSampling Designs ulation and analysis routines. The prototype approach and Estimators for Monitoring Vehicle Characteristics also allowed us to develop and unit test the program on When Inspection Capacity is Limited,º Transportation a module by module basis. Research Board Publication No. 971372, to appear. Another reason for choosing Delphi is its drag &drop, component-based architecture. This was important to Stephen M. Wall us because we could build our application using com- Battelle Memorial Institute ponents that were developed and tested by Borland and [email protected] other software vendors. Delphi includes such compo- nents as tabbed notebooks, database controls, grids, and John E. Orban edit boxes. A good example of a software vendor com- Battelle Memorial Institute ponent is embedded in the Results Module. To provide [email protected] the level of customization required, we included a plot-

ting component called GigaSoft ProEssentials from Gi- gasoft, Inc. By incorporating these pre-tested compo- nents in our application we were able to greatly increase productivity while decreasing development costs. We chose Paradox from Borland, Inc. as our database be- cause it provided fast data retrieval, a relational data model, and seamless integration with Delphi.

December 96 Statistical Computing &Statistical Graphics Newsletter 13 UNIX COMPUTING ond will use Steven Brenner's cgi-lib package (avail- able at http://www.bio.cam.ac.uk/cgi-lib)to Advanced Perl produce and process an HTML form suitable for a web Applications browser. Communication with other Computers By Phil Spector The University of Michigan operates a computer In the previous article in this series, I gave a brief intro- which provides weather reports for the entire United duction to some of the features of perl that make it so States. In normal use, you telnet to the machine attractive for rapidly developing applications like data madlab.sprl.umich.edu through port 3000, and formatting and report generating. In this article, I'd like make a variety of menu choices to ®nd the informa- to expand the of possibleapplications by introduc- tion you want. After displaying a banner, the following ing the idea of packages. Basically, packages are col- prompt appears: lections of perl functions, either distributed with perl, or contributed by other authors. You include packages Press Return for menu, or enter 3 in your program with the require directive; perl then letter forecast city code: searches the directories stored in the @INC array to ®nd the necessary programs, and then includes them. After At this point, I would select the code ªsfoºformylo- that, you can refer to the functions in the packages in cal forecast, which would be displayed on the screen. your program, and greatly expand the range of what you After the forecast, it displays a menu entitled ªCITY can do with your perl programs. FORECAST MENUº and prompts for a menu selection To illustrate these points, I'll provide simple programs with the phrase ªSelection:º. I thought it would be to perform two interesting tasks; the ®rst will use the convenient to have a program that I could execute which system-provided chat2 package to communicate with would supply the city code, print the forecast, and then the University of Michigan weather server, and the sec- exit from the menu, without any intervention on my part.

#!/usr/local/bin/perl require 'chat2.pl';

&chat'open_port("madlab.sprl.umich.edu",3000) || die "Couldn't open port"; *S = *chat'S;

# Change input separator to read up until the first prompt $/ = "3 letter forecast city code:"; $str = ;

# The plus sign suppresses paging on this particular server print S "sfo+\n";

# Now we can read the input until it says "CITY FORECAST MENU", so we'll # make that the record separator $/ = "CITY FORECAST MENU";

# read up to the prompt, and eliminate the final line ($str = ) =Ä s#\n.*$/##;

print $str; # print the weather report

$/ = "Selection:"; $str = ; # Send an "x" to exit, and we're done! print S "x\n";

14 Statistical Computing &Statistical Graphics Newsletter December 96 To do this task in perl, one feature I repeatedly used is this article. An excellent introduction to the gen- the ability to change the de®nition of the input record eral topic of HTML can be found at http://www. separator. This variable, which is known as $/ in perl, ncsa.uiuc.edu/General/Internet/WWW/ defaults to newline, so that each record read by the <> HTMLPrimer.html; an ªinstantaneousº introduc- input operator will normally read one line of text. Since tion to forms and CGI scripts can be found at url the menu prompts from the weather server don't have http://kuhttp.cc.ukans.edu/info/forms/ newlines, I rede®ne the separator to read up until the forms-intro.html. next prompt, so at each stage, I can read everything I The basic idea behind the CGI interface is that if you want with one execution of the input operator. Since the point your web browser at an executable script, and if chat2 package opens a socket, I can use the same perl ®le the HTTP server on the machine where the script is lo- handle to both read from and write to the remote com- cated permits that script to execute, the script will be run puter. with its printed output going back to the web browser To use the chat2 package, you ®rst open the required port which originally called it. HTTP servers are generally using the function &chat'open port. The apostro- con®gured to only allow scripts in certain directories phe in the name indicates that the open port func- to be executed, so check with your server's administra- tion is part of the chat package. This should return a tor to ®nd out where to install your scripts. The cgi-lib non-null value, unless the program was unable to make package provides a routine called &ReadParse which a connection. Next, you associate a local ®le handle gets the input from a form and creates an associative ar- with the one de®ned within the chat2 package; from ray, indexed by the name of the input ®elds in the form, that point on, reading from the ®lehandle is equiva- which contains the values which were entered in that lent to reading the output from the remote machine and particular ®eld. Since the script's output is displayed printing to the ®lehandle is equivalent to typing re- in the browser which called it, most CGI programmers sponses to be interpreted by the remote machine. The write what are known as ªcomboformsº; that is a script perl program is listed on page 14. For communicating which both displays the form on the browser, and pro- with standard protocols (like NNTP, HTTP, etc.), port cesses the results. numbers can be determined by using the system call getservbyname (provided in perl) or by looking at Before this starts sounding too complicated, we should the ®le /etc/services. look at an example, because it really is remarkable how simple it is to produce a comboform. In the example be- Processing HTML Forms low (which is continued on the bottom of page 18), the The general topics of HTML, forms and con®g- perl subroutine doform is written to display the blank uration of HTTP servers is beyond the scope of form. The PrintHeader subroutine of the cgi-lib

#!/usr/local/bin/perl

require 'cgi-lib.pl';

# The ReadParse routine from cgi-lib.pl creates an associative array # containing all the variables entered from the html form

&ReadParse(*input); if($input{"submit"}){ print &PrintHeader; print("Results from Sample Form\n"); print("

Thanks for using the Sample Form!


\n"); foreach $k (keys(%input)){ printf("%s:
%s

",$k,$input{$k}); } print(""); } else{ &doform; }

December 96 Statistical Computing &Statistical Graphics Newsletter 15 package prints the necessary ªContent-typeº line to in- TOPICS IN INFORMATION VISUALIZATION form the world that it is producing an HTML document; all other ®elds need to be ®lled in by the script. The Emphasizing Statistical program starts by calling the &ReadParse routine. If the submit ®eld of the form has a value, that Summaries and Showing that the program is being called from a form; otherwise, it's being called directly, and it simply needs to print the Spatial Context with necessary HTML to display the form. To keep this ex- ample simple, we'll just have one ®ll-in-the-blank ®eld, Micromaps and a set of three checkboxes, but any of the more com- By Daniel B. Carr and Suzanne M. Pierson plex form elements can be easily accommodated with 1. Introduction this scheme. Similarly, the results are just printed back to the browser, but the full range of perl's capabilities This article concerns redesigning a choropleth map. could be used to do something more interesting with this During my (Dan) fellowship at the Bureau of Labor information. Statistics (BLS), the staff showed me press releases with Further Resources maps similar to the map in Figure 1. They did not like the map and asked me to develop new and innovative This article doesn't begin to show you the wide methods for displaying the data. My thoughts turned variety of packages and scripts that are avail- quickly to micromaps. In response to Tony Olsen's able to extend perl's capabilities. As always, guiding query, I had previously proposed micromaps the ®rst place to start is the CPAN archives for linking row-labeled plots to ecoregion maps (Olsen, (http://www.perl.com/perl/CPAN/CPAN.html). Carr, Courbois, and Pierson 1996). Sue Pierson's ®rst Another source of scripts is the Metronet archive at implementations and Pip Courbois' variations demon- http://www.metronet.com/perlinfo/scripts. strated to the team that micromaps do more than provide links. Micromap sequences directly reveal spatial pat- Phil Spector terns. Below is our rationale for evolving from a tradi- UC at Berkeley tional choropleth map to a new and powerful template

[email protected] that links micromaps and row-labeled plots.

CONTINUED ON THE NEXT PAGE

sub doform{ # The action field in the form below is the location of this script.

print &PrintHeader; print < Sample Form

Welcome to the Sample Form!

Fill in the blank:
My name is

Choose your favorite color:
Blue
Green
Red

EOF }

16 Statistical Computing &Statistical Graphics Newsletter December 96 2. Visual and Representational Problems rates appear as a caricature because the conversion to class intervals adds noise. The choropleth map in Figure 1 has several visual and representational problems. Visually, the map is remi- Class interval options are caricature options for statis- niscent of pen plotter era. Representing values by line tical distributions. Common choices include equal size density and crosshatched is predicated upon no- intervals and gap-based intervals covering the range. tions of reproduction ease and cost, not upon notions of Even for percentage point options, the default is often aesthetic communication. That is, lines copy better than based on the number of regions. Carr and Olsen (1995a half-toned gray and less expensively than color. Ques- and 1995b) argue for class intervals based on percent- tions about state grouping reveal the groups to be ad- age points of a cumulative distribution function chosen ministrative divisions that have little bearing on analy- for interpretation purposes, such as the percent of people sis. The state grouping can be dropped. However, the involved. Carr and Olsen also propose a visual summary problems extend beyond unaesthetic appearance and ir- of the distribution that can appear in a small legend. In relevant grouping. the current case, only ®fty-one estimates are to be repre- sented. We take the radical approach of directly showing When evaluating Figure 1, consider the story to be pre- all estimates. sented. The story focuses on sample-based estimates of unemployment rates with associated uncertainty es- The representational problem is more than caricaturiz- timates, counts of unemployed, and spatial indices. ing the unemployment rates. The map does not show Directly to the point, The Power of Maps by Wood estimate uncertainties. While MacEachren (1994) de- (1992) contains an intriguing chapter entitled ªEvery scribes methods for representing uncertainty on maps, Map Shows This ... But Not That.º If we look though they are rarely used. (A notable exception is Pickle, statistical eyes at what is and is not represented in Fig- Mungiole, Jones, and White 1996). Beyond failing to ure 1, we see cartographic bias and representational provide details about estimate precision, the omission of problems. estimate uncertainties is serious on two counts. The cartographic bias in Figure 1 re¯ects representa- First, the presence of uncertainties suggests that sound tional choice and relative emphasis. The cartographic statistical methodology produces the estimates. The literature (for example, see Bertin 1983, MacEachren world is awash in convenience-based guesstimates. The 1994) provides systematic treatment of ways to repre- public needs clues to decide if data is statistically sound. sent variables on maps. Cartographic choices implic- Omission of clues helps politics and sales compete on an itly assume that the best representation, position along equal footing with science. Frederick Mosteller (*) says, a common scale, is devoted to the two spatial coordi- ªIt is easy to lie with statistics, but easier to lie without nates. This leaves second best choices to show data val- them.º A corollary is: it is easy to lie with con®dence ues and other information. The systematic treatment intervals, but it is much easier to lie without them. fails to show scatterplot alternatives favoring statistics, A second consideration is that the public needs to be ed- e.g., melanoma rates on the y axis, cloud-free days on ucated about uncertainty. People may not like the prob- the x axis, latitude encoded as circle area and longi- abilities of weather forecasters, but over time the report- tude encoded as circle color. Carr, Little®eld, Nichol- ing convention has become familiar. The failure to show son, and Little®eld (1987) provide an early example of a con®dence intervals for estimates is a missed chance to balanced representation. They use position along a com- educate the public. mon scale for data values, spatial coordinates, and time The most important design task is to represent the statis- while linking subsets across panels with color. Our re- tical summary. The spatial component of the summary design of Figure 1 also uses position along a common is important, but secondary. The new design should re- scale for both data values and spatial coordinates, im- ¯ect this priority. proving representation of the statistical information. 3. Dot Plots, Visual Simplicity and Grouping The traditional cartographic choice plays out in terms of emphasis. Figure 1 emphasizes state boundaries. The Considerations often-used Albers projection preserves the relative areas The variable of interest is the state unemployment rate. of the continental U.S. states. A large number of ver- Labeled dot plots (Cleveland 1985, 1993) provide a tices is devoted to boundary representation and substan- good way to show such estimates. Unfortunately, dot tial graphic space is dedicated to representing state area. plots have been slow to appear in government publica- In contrast, detail associated with unemployment rates tions. The efforts of Carr, Valliant, and Rope (1996) are is limited to a few class boundaries. The unemployment intended to help remedy the situation by providing

December 96 Statistical Computing &Statistical Graphics Newsletter 17 AA_95 R.I. 7.0% or over 6.0% - 6.9% 5.0% - 5.9% 4.0% - 4.9% 3.9% or below MASS. MAINE D.C. CONN. DEL. N.H.

N.J. South VT. Atlantic MD. N.Y. VA. N.C. PA. FLA. S.C. New England W.VA. Middle Atlantic GA. OHIO KY. MICH. East ALA. TENN. IND. East North Central MISS. ILL. South Central WIS. LA. ARK. MO. West IOWA MINN. South Central West OKLA. KAN. North Central

TEX. NEB. S.D. N.D. (U.S. rate = 5.6 percent) COLO. N.M. WYO. MONT. HAWAII Mountain UTAH ARIZ. IDAHO NEV.

ALASKA WASH. ORE. CALIF. Unemployment rates by state, 1995 annual averages Pacific JAVA-based network tools and examples using govern- Eleven groups are too many to put into a single simple- ment data. Our effort extends the scope of the exam- appearing perceptual unit. The groups need to be ples. The design goals for the current example include grouped. As shown in Figure 2, we create an additional 1) adding information to ease interpretation and 2) sim- information layer using three larger groupings with a 5- plifying visual appearance to facilitate communication 1-5 pattern. This creates symmetry and calls attention to with the public. This section addresses the second task, the . simplifying visual appearance. The basic layout for Figure 2 is an eleven row (5-1-5 For improved visual interpretation, Cleveland (1985) pattern) by four column matrix. The state names appear advocates presenting dots in sorted order. Carr (1994) in the second column and unemployment rates with 95 and Carr and Olsen (1996) echo this advice and note that percent con®dence intervals appear in the third column. sorting improves plot simplicity by reducing the visual (The con®dence intervals are model-based and subject path between dots. to re®nement. The BLS has not extended the con®dence One can further simplify plot appearance. Important vi- interval calculations to the seasonally adjusted estimates sual simpli®cation techniques include grouping and lay- it often shows.) The con®dence intervals detract a bit ering. For example, see Kosslyn (1994) for a discussion from the goal of visual simplicity, but the above discus- about grouping and Tufte (1983 and 1990) for discus- sion motivates their inclusion. sions of small multiples and layering. Graphics software Representation of con®dence intervals is a design chal- has not successfully automated thoughtful grouping and lenge. Typical error bars draw visual attention to the layering, so some thought about the current example is least precise estimates (Carr 1994). Further, error bars instructive. detract from the visual ¯ow in following the estimates. A list with ®fty-one lines can be visually intimidating In Figure 2, the interval endpoints appear as small gray (see Carr and Olsen 1996). By analogy, a ®fty-one line dots. Connecting adjacent endpoints with a thin black paragraph may visually intimidate many people. One line reinforces desirable vertical ¯ow and brackets the can lose one's place. Breaking a long paragraph into group of estimates. The pinch points call attention to the short paragraphs helps the reader with visual tracking. most precise estimates. In other examples, an estimate Similarly, breaking a list of names into groups helps dot can overplot the con®dence interval dots. In isolated the reader in visual tracking. A further bene®t of creat- cases, the slope of con®dence interval lines from above ing smaller perceptual groups is that readers can easily or below can suggest the size of the hidden interval. In spot names at group edges. Spotting an interesting name the new design, even those new to con®dence intervals draws the reader into the graphic. In other words, small may surmise a connection between the California pinch perceptual groups increase the number of interest-based point and the large number unemployed. Perhaps better entry points. approaches will emerge, but the approach in Figure 2 has considerable merit. How do we divide ®fty-one states into visually effec- tive groups? (Here we include the District of Columbia The next steps are to show the remaining secondary in- as a state, but it is often preferable to treat D.C. as a formation. The information includes spatial positions city.) One choice would be to partition the states us- and estimated numbers of people unemployed. The ®nal ing large jumps in the sorted rates. This has merit, but step is to link everything together with the state names. can get awkward when many states have similar values. 4. Micromap Design Our approach starts with regular partitioninginto groups of ®ve. (For many applications Kosslyn (1994) recom- Associated with each unemployment rate is a spatial po- mends groups of four or less.) For vertical grouping we sition, the state location. In the current example, as ®nd that groups of ®ve facilitate counting and still allow in many studies, the exact spatial position and precise quick label and value matching by relative position. For boundaries are not important. All that is needed is a map example, one can easily match the third label in a group caricature showing the general position and neighbor- of ®ve labels to the third dot in a group of ®ve dots. This hood relationships. As demonstrated in the ®rst column works even when the labels and dots are separated by of Figure 2, a micromap for each group of ®ve can show nearly a page width. Consequently, grouping obviates the state locations. A full page map is not required. the need for the horizontal dots that Cleveland (1985 and The design of small maps requires attention because dis- 1993) used to assist in visual tracking. The partitioning tinguishing hues in small regions can be dif®cult. The produces ten groups of ®ve and one group of one. map caricature needs to enlarge small regions while re- The ®rst layer of grouping produces eleven groups. taining enough features to provide region recognition.

December 96 Statistical Computing &Statistical Graphics Newsletter 19 The task is not as simple as it might seem. After several and dropping the dots. This drops one visual element independent attempts, we chose to modify a state visi- and simpli®es plot appearance. However, Monmonier bility map developed over a decade ago by Monmonier (1993) recommends against colored labels because they and illustrated in Mapping it Out (Monmonier 1993). need to be large enough to carry color and are dif®cult Figure 2 shows ten micromaps on a standard page in por- to read. Switching to a bold font in Figure 2 works trait orientation. Making the micromaps much smaller ®ne for carrying color. We concur that changing col- will complicate color perception for relatively small ors makes reading a little harder. Further, the colorful states like Rhode Island. The example is pretty close names draw visual attention away from other columns. to the minimum size limit. Observe that Illinois has Still, we think the problems are minor and that colored the median rate and appears in black in the two middle names might be used on occasion to add variety. maps. This avoids the need for an eleventh micromap. 7. Interpretation and Comparison 5. Related Data Figure 2 provides much more statistical information Carr, Valliant and Rope (1996) argue that graphics than Figure 1. Consider four questions. should provide metadata to facilitate proper interpreta- 1. What is the unemployment rate for California? tion. The current example is static, so we cannot ex- ploit web-based access to the Bureau of Labor Statistics 2. Is the unemployment rate higher for California or Handbook of Methods or to other information. How- Alaska? ever, the design readily accommodates one or two ad- 3. What are the con®dence bounds for the California ditional columns. Figure 2 shows the number of unem- estimate? ployed in the fourth column. While unemployment rates 4. On the average how many were unemployed in are useful for comparison purposes, the number of un- California? employed shows the importance of the rates in terms of human lives. For the ®rst question, Figure 2 provides a more precise The con®dence intervals for the counts were not avail- determination of the estimated rate. For perceptual ac- able for this article. As an approximation, the rate inter- curacy of extraction (Cleveland and McGill 1984), dot vals could be scaled by the population size. We chose plots with grid lines are a hard graphic to beat. For ques- to focus attention on the rates and to omit con®dence in- tion two, Figure 2 provides a complete while tervals for the secondary information. The elegance of Figure 1 only provides a ranking of equivalence classes. the con®dence interval representation depends on sort- Figure 1 does not provide answers for questions three ing. Adding con®dence intervals to the count estimates and four. competes with the goal of visual simplicity. The traditional choropleth map does not do as well as the 6. Region Labels and Linking linked micromap row-labeled plot template in regard to the above questions. This motivates the search for tasks In Figure 2, the state labels link the information. The in which the large choropleth map has performance ad- relative vertical position of a label and a dot within a vantages. Two tasks seem evident, ®nding the value of group of ®ve is an adequate positional link. The colored a position-known state and locating the values of neigh- dot beside the state name may not increase the match- boring states. The micromap template requires a scan ing speed over the positional link, but may remove doubt of small maps or a list of ®fty-one names before link- about a correct match. The colored dot in the label is ing to a value. The scan is a slow process. An addi- an important link to a particular state. Some may as- tional alphabetic-label position-link column (not illus- sume that everyone knows state positions and question trated here) may speed locating a given state by name. the need for a color link. However, many informed U.S. The position link following a name can be little black citizens may hesitate when labeling all the states on a dots in a 5-1-5 pattern with the location dot highlighted map. Further, altered boundaries in the map caricature by color (or shape). For Hawaii, the third dot would be may slow recognition for some states. We conjecture red and link to red in the third vertical group. The real that color links to the map increase matching speed and memory and search intensive task in using micromaps is sometimes provide an educational tool. The color links to ®nd values for a state's neighbors. With micromaps go both ways. Some people use maps rather than names one can quickly observe if neighboring states have sim- to ®nd their state and the corresponding estimate. ilar , but that is not the same as ®nding all the With the holiday season as motivation, we seriously values. Traditional choropleth maps have a few advan- considered using colored names in the label column tages.

20 Statistical Computing &Statistical Graphics Newsletter December 96 Unemployment Rate By State 1995 Annual Average

Maps States Rates and 95% CI No. Unemployed · D.C. ···· · West Virginia ···· · California ··· · · Alaska ···· · Rhode Island ···· · Louisiana ··· · · Washington ··· · · New Jersey ··· · · New York ··· · · New Mexico ··· · · Alabama ··· · · Mississippi ··· · · Texas ··· · · Pennsylvania ··· · · Montana ··· · · Hawaii ··· · · Maine ··· · · Florida ··· · · Connecticut ··· · · Nevada ··· · · Massachusetts ··· · · Kentucky ··· · · Idaho ··· · · Michigan ··· · · Tennessee ··· · · Illinois ··· · · South Carolina ··· · · Maryland ··· · · Arizona ··· · · Georgia ··· · · Arkansas ··· · · Wyoming ··· · · Oregon ··· · · Ohio ··· · · Missouri ··· · · Oklahoma ··· · · Indiana ··· · · Virginia ··· · · Kansas ··· · · North Carolina ··· · · Delaware ··· · · Vermont ··· · · Colorado ··· · · New Hampshire ··· · · Wisconsin ··· · · Minnesota ··· · · Utah ··· · · Iowa ··· · · North Dakota ··· · · South Dakota ··· · · Nebraska ··· · 23456789 024681012 Percent 100,000 People

Figure 2: Improved display with micromaps.

December 96 Statistical Computing &Statistical Graphics Newsletter 21 Cartographers may also suggest that a single choropleth new to statistics may surmise that the big dots for rate map provides better global spatial pattern perception estimates are not exactly THE TRUTH. For those with a than a sequence of micromaps. Since integrating infor- little background, the ®gure is pretty close to being self- mation while scanning across all the micromaps is non- explanatory. trivial, the claim is likely true. However, many impor- Figure 2 has educational merit beyond showing unem- tant tasks involve local pattern perception. ployment rates. One can learn about the positions of the For tasks involving local pattern perception, micromap states. The ®gure can prepare people to answer the ques- sequences may be very competitive to a single choro- tion, to what side does the median belong? More im- pleth map. While the colors in Figure 2 may be dis- portantly, the ®gure can prepare people to learn about tracting, ten micromaps provide the rough equivalent of uncertainty. To the thoughtful, the ®gure suggests that ten class intervals. For sound perceptual reasons, typ- rank orderings are a bit arbitrary in the presence of un- ical choropleth maps show six or fewer class intervals. certainty. The ®gure also suggests that rate magnitude This complicates direct comparison in cognitive testing. and rate importance are distinct concepts. The template Different class intervals bring out different patterns, so of linked micromaps and row-labeled plots extends to there may be no clear winner. many other spatial contexts. For example, one can show county data within a state. When ®fty or fewer coun- The micromap patterns can be quite suggestive. Fig- ties will be displayed, a layout similar to Figure 2 will ure 2 shows many small groups and raises questions likely suf®ce. The main challenge would be to develop about economic similarities. As two of several exam- a county within state visibility map. Some states are not ples, Vermont and New Hampshire form a pair while easy. Virginia and North Carolina form another. The two bot- tom micromaps show a larger group of states in the up- Many variations of the template are possible besides per Midwest and Northern Plains. It is not hard to inte- those mentioned above. Some may prefer a different grate patterns across two small juxtaposed maps. While set of hues. For example, one can pick a set designed beyond the domain of local pattern perception, the mi- for the color blind. A good reference concerning color cromap sequence wins hands down when it comes to choice and mapping is Brewer (1994). Other variations ranking. The cyclic colors in a rough spectral order may slightly improve perceptual accuracy of extraction break the ties within the ten micromaps. The combi- for dot plots. For example translucent dots in the right nation of multiple maps and color provides a complete two panels will help keep the grid lines visible. A small, ranking without reference to the other columns. similar hue dot inside each big dot may help locate the dot center precisely without being too distracting. The Variations on micromaps may strengthen global spatial possibilities are numerous. pattern perception or at least bring out additional pat- The data and Splus source code used to gener- terns. A darker shade of gray can distinguish all states ate Figure 2 are available through anonymous ftp above median and provide another layer of information (galaxy.gmu.edu). The newsletter software direc- in the top ®ve micromaps. The darker gray region ap- tory seems to change periodically. The current path for pears the same in all ®ve maps, except for the over- this article is pub/dcarr/newsletter/micromap. riding hue-linked states that provide compositional de- For Splus users, the matrix layout tools should be of tail for the high unemployment region. New patterns interest by themselves and a technical report with doc- emerge. For example Appalachian states have above av- umentation details should be available in the same time erage rates. A similar approach accentuates the low un- frame as this article. employment region in the bottom ®ve micromaps. I (Dan) continue to seek design challenges and opportu- 8. Closing Remarks nities for collaboration. Also, I appreciate gentle com- Viewing graphics as puzzles to be solved is often in- ments about potential improvements. Please contact me structive. How do the pieces ®t together and what do at the address below. they mean? We conjecture that the template illustrated by Figure 2 is a puzzle accessible to many. Learning to read a dot plot is easy. Linking by color and position is simple. Determining that the visual islands are Alaska, Hawaii, and D.C. should be manageable by those not familiar with the U. S. A deep understanding of con- ®dence intervals goes beyond the graphic, but readers

22 Statistical Computing &Statistical Graphics Newsletter December 96 Acknowledgments he American Statistical Association, 79, 531±554. Conceptual work behind this paper was supported by the Cleveland, W.S. (1985), The Elements of Graphing EPA under cooperative agreement No. CR8280820-01- Data, Summit, NJ: Hobart Press. 0. Speci®c work on this application was supported by Cleveland, W.S. (1993), Visualizing Data, Summit NJ: the BLS. The article has not been subject to review by Hobart Press. either the EPA or the BLS, does not necessarily re¯ect the view of the agencies, and no of®cial endorsement Kosslyn, S.M. (1994), Elements of Graph Design,New should be inferred. Special thanks go to Tony Olsen for York, NY: W.H. Freeman and Company. the design challenges, Pip Courbois for experiments in MacEachren, A.M. (1994), Some Truth with Maps: A micromap variations, and to Rick Valliant and Dan Rope Primer on Symbolization &Design, Washington D. C.: for coordinating the BLS data and graphics access. Association of American Geographers References Monmonier, M. (1993), Mapping It Out, Chicago, IL: The University of Chicago Press. Bertin, J.B. (1983), Semiology of Graphics Diagrams Networks Maps, Translated by Berg, W.J., London, UK: Olsen, A. R., Carr, D.B., Courbois, J.P., and Pierson The University of Wisconsin Press. S.M. (1996), ªPresentation of Data in Linked Attribute and Geographic Space,º Poster presentation, ASA An- Brewer, C.A. (1994), ªColor Use Guidelines for Map- nual Meeting, Chicago, Il. ping and Visualization,º in Visualization in Modern , 123±147, eds. MacEachren, A.M., and Mosteller, F. (*), Personal communicationÐThe quote Taylor, D.R.F., Oxford, UK: Pergamon/Elsevier Sci- is not from a publication but likely originated in a ence. speech. Carr, D.B., (1994), ªConverting Plots to Tables,º Tech- Pickle, L.W., Mungiole, M., Jones, G.K., and White nical Report No. 101, Center for Computational Statis- A.A. (1996), Atlas of United States Mortality,Hy- tics, George Mason University, Fairfax, VA 22030. attsville, MD: Public Health Service Pub. No. 97-1015. Carr, D.B., Little®eld, R.J., Nicholson, W.L., and Lit- Tufte, E.R. (1983), The Visual Display of Quantitative tle®eld, J.S. (1987), ªScatterplot Matrix Techniques For Information, Cheshire, CT: Graphics Press. Large N,º Journal of the American Statistical Associa- Tufte, E.R. (1990), Envisioning Information,Cheshire, tion, 82, 424±436. CT: Graphics Press. Carr, D.B., and Olsen, A.R. (1995a), ªParallel Coordi- Woods, D. (1992), The Power of Maps,NewYork,NY; nate Plots For Representing Distribution Summaries in The Guilford Press. Map Legends,º Proceedings 1 of the 17th International Cartography Association Conference, 10th General As- Daniel B. Carr sembly of the ICA, 733±742. George Mason University Carr, D.B., and Olsen, A.R. (1995b), ªParallel Coor- [email protected] dinate Variants of CDF and Quantile Plots,º Statistical Computing and Statistical Graphics Newsletter,Vol.6, Susanne M. Pierson No. 1 13±18. Anteon Corporation [email protected]

Carr, D.B., and Olsen, A.R. (1996), ªSimplifying Vi-

sual Appearance By Sorting: An Example Using 159 AVHRR Classes,º Statistical Computing and Statistical Graphics Newsletter, Vol. 7, No. 1, 10±16. Carr, D.B., Valliant, R., and Rope, D. (1996), ªPlot In- terpretation and Information Webs: A Time-Series Ex- ample From the Bureau of Labor Statistics,º Statistical Computing and Statistical Graphics Newsletter,Vol.7, No. 2 19±26. Cleveland, W.S., and McGill, R. (1984), ªGraphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods,º Journal of

December 96 Statistical Computing &Statistical Graphics Newsletter 23 REPORTS FROM JSM 96 Nonparametrics. Noteworthy sessions also included the Special Contributed Session with the winners of the Student Paper Competition sponsored by the Statistical Scienti®c Program Computing Section. And a special co-sponsored session by James Rosenberger and Stephen Eick highlighted presentations from the Undergraduate Data Statistical Computing Analysis Contest. As program chair for the 1997 meeting, I look forward At the Joint Statistical Meetings in Chicago, in August to your many contributed paper and poster submissions 1996, the Statistical Computing Section sponsored 5 in- to complement the invited program, and am putting to- vited sessions and 17 contributed paper sessions. In ad- gether an exciting program for the Anaheim meeting. dition there were 16 posters and 5 roundtable luncheons Expect to see more electronic presentations, and spe- sponsored by the section. cial poster sessions highlighted this year. Feedback and The invited sessions were organized by Rob Tibshirani, suggestions for improving the meetings are always wel- Program Chair, and included the following stimulating come. A preliminary look at the program will be pro- themes. A session on Monday focused on ªBayesian In- vided in the next Newsletter. ference for High-Dimensional Problems,º chaired and organized by Radford Neal, it covered topics on A James L. Rosenberger Multiresolution Model; Bayesian Regres- Statistical Computing sion using Gaussian Process Priors; and Spatial Sub- 1996 Program Chair-Elect; ordinates and . Art Owen orga- [email protected] nized a session on Tuesday titled ªStatistics and Numer- ical Integrationº which included talks on: Monte Carlo Statistical Graphics and Quasi-Monte Carlo; Number-Theoretic Methods in The 1996 summer meetings featured four invited Statis- Multivariate Statistical Tests; and Lattices and Dual tical Graphics sessions, Information Visualization: The Lattices in Experimental Design for Fourier Models. Next Wave In Statistical Graphics, organized by Steve Also on Tuesday a session on ªWavelets and Time- Eick, Innovations in Graphics, again organized by Steve Frequency Analysis,º organized by Jonathan Buckheit, Eick, Statistical Graphics and Multimedia Education, included talks on: Curve Estimation with the Sta- organized by David Scott, and Transactional Data Anal- tionary Transform; Improved Local Discrim- ysis, organized by Daryl Pregibon. As past graphics pro- inant Bases Using Empirical Probability Estimation; gram chair, I now appreciate how dif®cult it is to orga- and Alternatives to Karhunen-Loeve Expansions Us- nize new and innovative invited sessions. Please, those ing Libraries of Orthogonal Bases. A Wednesday ses- of you with session ideas, become involved. The chair sion organized by Martin Schumacher on ªResampling needs your help! In my opinion, as you might expect, all and Cross-Validation in Model Buildingº included talks of the this year's sessions were outstanding. on Cross-Validation in , The Boot- strap and Modulation Estimators, and Reduction of Bias Visualization, of which Statistical Graphics is part, is an Caused by Model Building. The ®nal invited session, exploding research area. A new IEEE journal recently ªAlgebraic Algorithmsº organized by James Stafford, started (March 1995), IEEE Transactions on Visualiza- included talks on: Symbolic Ito Calculus: A Further Ap- tion and Computer Graphics, and has over 6 thousand plication to the of Shape; Stochastic subscribers. A new symposium, Information Visualiza- Differential Equations; A Computer Algebra for Sam- tion, is now in its third year. Much of the interest in this ple Survey Theory; and An Operator for the Symbolic area has been generated by the explosive growth of the Calculation of Properties of Bootstrap Estimates. World-Wide-Web, widespread availabilityof networked databases, and recent surge of interest in Data Mining. The contributed paper sessions included many excellent Computer Scientists, particularly those involved with presentations, nicely organized around coherent themes, Human Computer , have been building many due to the large number of submissions. Excellent ses- novel and dynamic interfaces for displaying data. sions were organized on Linear Models and Regres- sion, and Mixed Models, Tree- One of the themes for this years sessions was to invite Based and other Models, EM Algorithm and Simula- some of the well-known computer scientists to partici- tion, Bayesian and Hierarchical Bayes Methods, Mul- pate in our meetings. I occasionally see powerful and ef- tivariate Analysis, Markov Chain Monte Carlo Imple- fective statistical graphical displays that lack the engag- mentation, Clustering, Classi®cation, Time Series and ing user interfaces that characterize much of the work in

24 Statistical Computing &Statistical Graphics Newsletter December 96 . Conversely, the computer scientists' Springer-Verlag New York, Inc

tools often lack the data analysis sophistication that we SPSS, Inc

take for granted. By bridging the two communities, I Statistical Graphics Corp.

hoped to encourage progress in both disciplines and to Texas Instruments broaden our respective perspectives. Several of the ses- John Wiley &Sons Inc, Publishers sions were followed by intense and stimulating discus- sions, suggesting progress. I welcome your thoughts, feedback, and suggestions. Nandini Raghavan The Ohio State University [email protected]

Stephen G. Eick

Statistical Graphics, 1996 Program Chair

[email protected]

CONFERENCE NOTICES Business Meeting and The Third North American Mixer at JSM 96 Conference of By Nandini Raghavan New Researchers July 23-26, 1997 The Statistical Graphics and Computing Sections held their joint business meeting and mixer on the evening Laramie, Wyoming. of Monday, Aug 5-th. With snacks and wine aplenty, The purpose of this meeting is to provide a venue for the buzz of conversations ®lling the air and the pleas- recent Ph.D. recipients in Statistics and Probability to ant anticipation of prizes to be won later, it made meet and share their research ideas. All participants will for a very enjoyable evening. Sallie Keller-McNulty, give a short expository talk or poster on their research Chair, Statistical Computing Section and William work. In addition, three senior speakers will present DuMouchel, Chair, Statistical Graphics Section in- overview talks. Anyone who has received a Ph.D. af- formed the attendees about the happenings in their ter 1992 or expects to receive one by 1998 is eligible. sections. And then, it was time for the raf¯e draw- The meeting is to be held immediately prior to the IMS ings! The door prizes were generously donated by Annual Meeting in Part City, Utah (July 28±31, 1997), the following exhibitors. Their email addresses and and participants are encouraged to attend both meetings. URLs can be found through the section home page at: Abstracts for papers and posters presented in Laramie http://orion.oac.uci.edu/Ärnewcomb/ will appear in the IMS Bulletin. statistics/graphics/graphics.html The New Researchers' Meeting will be held on the cam-

Academic Press Inc. pus of the University of Wyoming in Laramie, and hous-

Arnold Publishers ing will be provided in the dormitories. Transportation to Park City will be available via a charter bus. Par- Chapman &Hall tial support to defray travel and housing costs is avail- Conceptual Software, Inc. able for IMS members who will also be attending the

Current Index to Statistics Park City meetings, and for members of sponsoring sec-

Data Description, Inc. tions of the ASA. The Section on Statistical Computing

The Interface Foundation is one of the many sponsors of this meeting and appli-

Mathsoft, Inc. cants who are members of this section should indicate McGraw-Hill this on their application. Additional information on the

NCSS Statistical Software conference and registration is available at the website:

 Stats http://www.math.unm.edu/NR97.html. Or con-

Sage Publications, Inc. tact Prof. Snehalata Huzurbazar, Department of Statis-

SAS Institute, Inc. tics, University of Wyoming, Laramie, WY 82071-

Scienti®c Computing Associates 3332, USA; e-mail: [email protected]; fax: 307-766-

SIAM: Contact Vickie Kearn, [email protected] 3927.

December 96 Statistical Computing &Statistical Graphics Newsletter 25 This meeting is sponsored in part by the Institute of Mathematical Statistics; the National Science Founda- INTERFACE '97 tion, Statistics and Probability Program; and the ASA Sections on Bayesian Statistical Sciences, Statistical Computing, and Quality and Productivity. New York University Symposium on Recent Developments in Smoothing Methods Friday, May 30th, 1997

 Sponsored by the Department of Statistics and Opera- tions Research, Leonard N. Stern School of Business

 Cosponsored by the Section on Statistical Computing, American Statistical Association 29th Symposium on the Interface: The existence of high speed, inexpensive computing has Computing Science and Statistics made it easy to look at data in ways that were once impossible. These computational advances have led to May 14-17, 1997 great interest in what might be termed ¯exible models, Holiday Inn - Houston Medical Center where strict parametric forms are replaced with smooth Houston, Texas representations of underlying patterns and relationships.

The symposium will bring together leading researchers The theme of Interface '97 is ªMining and Modeling of in the theory and practice of smoothing methods. The Massive Data Sets in Science, Engineering, and Busi- objective is to create a forum for the discussion of recent ness,º with subthemes in and developments and long standing issues in the ®eld. All Graphics. The keynote speaker is Jerry Friedman, Stan- interested researchers and practitioners are welcome. ford University. Over 25 invited paper sessions have Speakers and tentative titles of talks: been organized, and contributed papers are sought. Par- tial funding for young investigators and students may be

 Assessing lack of ®t for parametric regression models available, subject to ®nal grant funding. via techniques R. L. Eubank (Texas A &M University) The meeting is sponsored by the Interface Foundation of

 A versatile approach to local modeling North America. Cooperating Institutions are the Amer- J. Fan (University of North Carolina) ican Statistical Association, the Institute for Mathemat- ical Statistics, the International Association for Statisti-  Extended linear modeling and an application to speech recognition cal Computing, the Society for Industrial and Applied C. L. Kooperberg (University of Washington) , the Institute for Operations Research and the Management Sciences, and the Biometrics Society

 Bayesian wavelet shrinkage (ENAR and WNAR). J. S. Marron (University of North Carolina) Organizer and program chair is David W. Scott,  Remarks on making smoothing methods work better in several dimensions and host institutions include Rice University and D. W. Scott (Rice University) M.D. Anderson Cancer Institute. Full information on hotel, program, exhibitors, and activities can For additional information contact Jeffrey S. Simonoff: be obtained accessing the World Wide Web site phone: (212) 998-0452 http://www.stat.rice.edu/, or sending e-mail to Fax: (212) 995-4003 [email protected], or writing to Inter- E-mail: [email protected] face '97, c/o David W. Scott, Department of Statistics, WWW: http://www.stern.nyu.edu/SOR MS-138, Rice University, 6100 Main Street, Houston, (Click on ªEventsº) TX 77005-1892, USA., 713-527-6037.

26 Statistical Computing &Statistical Graphics Newsletter December 96 SECTION OFFICERS Statistical Computing Section - 1996 Daryl Pregibon, Chair Statistical Graphics Section - 1996 908-582-3193 Sally C. Morton, Chair AT&T Laboratories 310-393-0411 ext 7360 [email protected] The Rand Corporation Karen Kafadar, Chair±Elect Sally [email protected] 303-556-2547 Michael M. Meyer, Chair±Elect University of Colorado-Denver 412-268-3108 [email protected] Carnegie Mellon University Sallie Keller-McNulty, Past±Chair [email protected] 913-532-6883 William DuMouchel, Past±Chair Kansas State University 212-305-7736 [email protected] Columbia University James L. Rosenberger, Program Chair [email protected] 814-865-1348 Dianne H. Cook, Program Chair The Pennsylvania State University 515-294-8865 [email protected] Iowa State University Russel D. Wol®nger, Program Chair±Elect [email protected] SAS 919-677-8000 Edward J. Wegman, Program Chair±Elect [email protected] George Mason University Mark Hansen, Newsletter Editor (96-98) 703-993-1680 908-582-3869 [email protected] Bell Laboratories Mario Peruggia, Newsletter Editor (96-97) [email protected] 614-292-0963 Evelyn M. Crowley, Secretary±Treasurer (97-98) Ohio State University 317-494-6030 [email protected] Purdue University Robert L. Newcomb, Secretary/Treasurer (97-98) [email protected] 714-824-5366 James S. Marron, Publications Liaison Of®cer University of California, Irvine 919-962-5604 [email protected] University of North Carolina, Chapel Hill Michael C. Minnotte, Publications Of®cer [email protected] 801-797-1844 MaryAnn H. Hill, Rep.(95-97) Council of Sections Utah State University SPSS 312-329-2400 [email protected] [email protected] Lorraine Denby, Rep.(96-98) to Council of Sections Janis P. Hardwick, Rep.(96-98) Council of Sections 908-582-3292 313-769-3211 Bell Laboratories University of Michigan [email protected] [email protected] Colin R. Goodall, Rep.(95-97) to Council of Sections Terry M. Therneau, Rep.(97-99) Council of Sections 814-865-3993 507-284-1817 The Pennsylvania State University Mayo Clinic [email protected] [email protected] Roy E. Welsch, Rep.(97-99) to Council of Sections Naomi S. Altman, Rep.(97-99) Council of Sections MIT, Sloan School of Management 607-255-1638 617-253-6601 Cornell University

naomi [email protected]

December 96 Statistical Computing &Statistical Graphics Newsletter 27 INSIDE

A WORD FROM OUR CHAIRS

Statistical Computing ::: :::: ::: :::: : 1

Statistical Graphics :: ::: :::: ::: :::: : 1 SPECIAL FEATURE ARTICLE Computational Anatomy: An Emerging Discipline 1 The Statistical Computing &Statistical Graphics

EDITORIAL ::: :::: ::: :::: ::: :::: : 2Newsletter is a publication of the Statistical Comput- CALL TO ACTION ing and Statistical Graphics Sections of the ASA. All

Data Mining : :::: ::: :::: ::: :::: : 8communications regarding this publication should be BITS FROM THE PITS addressed to: A Simulation Tool For Evaluating Design And Mark Hansen Analysis Options For Monitoring Highway Editor, Statistical Computing Section

Traf®c Characteristics :::: ::: :::: : 9Statistics Research UNIX COMPUTING Bell Laboratories Murray Hill, NJ 07974 Advanced Perl Applications :::: ::: :::: : 14

TOPICS IN INFORMATION VISUALIZATION (908) 582-3869  FAX: 582-3340 [email protected] Emphasizing Statistical Summaries and Showing http://cm.bell-labs.com/who/cocteau/

Spatial Context with Micromaps :: :::: : 16 REPORTS FROM JSM 96 Mario Peruggia

Scienti®c Program :: ::: :::: ::: :::: : 24 Editor, Statistical Graphics Section

Business Meeting and Mixer at JSM 96 : :::: : 25 Department of Statistics CONFERENCE NOTICES The Ohio State University The Third North American Conference of New Re- Columbus, OH 43210-1247

(614) 292-0963  FAX: 292-2096 searchers :::: ::: :::: ::: :::: : 25 New York University Symposium on Recent De- [email protected]

velopments in Smoothing Methods : :::: : 26 http://stat.ohio-state.edu/Äperuggia/

Interface '97 : :::: ::: :::: ::: :::: : 26 All communications regarding membership in the ASA SECTION OFFICERS and the Statistical Computing or Statistical Graphics

Statistical Graphics Section ± 1997 : ::: :::: : 27 Sections, including change of address, should be sent to:

Statistical Computing Section ± 1997 :: :::: : 27 American Statistical Association 1429 Duke Street Alexandria, VA 22314-3402 USA

(703) 684-1221  FAX (703) 684-2036 [email protected]

Nonpro®t Organization U. S. POSTAGE PAID Permit No. 50 Summit, NJ 07901

American Statistical Association 1429 Duke Street Alexandria, VA 22314-3402 USA This publication is available in alternative media on request.