arXiv:0806.0560v1 [astro-ph] 3 Jun 2008 and analysis. config- scientific meet- highly exploratory for and of promise needs open wider the and such ing deeper that offer suggest systems urable and scriptability demon- extensible ISIS We and of modularity more strong of exploita- systems. the reach full the strate analysis within its the interactive are limit that in configurable Problem, tasks to scientific Box for referred Black tion the interpreter, as TCL archi- large controlling and XSPEC of numerical its internals that and the is between theme nu- discontinuities anal- tectural emergent and of The enablers generate powerful interpreted quickly ysis. as of to them roles manipulate capability the merically the A upon and placed practice. arrays, in is astronomy emphasis series of heavy a representative from examples drawn systems contrasts of both with of surveyed, features broadly The are criteria. 2002), general (Houck these as- ISIS meet and in 1996) used (Arnaud actively XSPEC tronomy, applications analysis spectral source of details plumbing platform. the computational on algorithmic given less on a and free- more be concerns possible, concentrate scientific should as to or general customization practitioner and the the powerful, ing of simple, mech- the as results that made and even is corollary perhaps for A or anisms supported, creators. and yet its by not system crucial envisioned, directions the is in incor- into it to it scien- codes drive scientists experimental so enables for custom, This complete, systems porate extensible. software truly be never sense they that are this analysis to In can used tific knowledge technology new distilled. which of from be limits data analyze the and also acquire but knowledge, of lcrncades [email protected] address: Electronic rpittpstuigL using 2018 typeset 2, Preprint November version Draft h aetmjrvrin fII n XSPEC, and ISIS of versions major latest The open- two which to extent the compares paper This boundaries the only not pushes science of pursuit The 12.x n xesbeaayi ltomsc sII aiiae ayforms many a facilitates how ISIS headings: show as Subject such we platform generation, paralle analysis th code analysis, extensible interface, and numerical automated top-level and modeling, management, the custom presen data in from structure examples hidden internal from features Drawing XSPEC We important the its modeling. its user, spectral that of conventional astronomical and beyond the “loca astronomers, use contributed of its to user with value of limitations inclusion immense co of the of of via example commonly strong been a extensible—most has as offered XSPEC Spectral ISIS Interactive with that the analysis, customized ISIS on and is XSPEC tool, analysis spectral epeetaqatttv oprsnbtensfwr feature software between comparison quantitative a present We epciey eeue norsuy While study. our in used were respectively, EODXPC OAD IHYCNIUAL SRPYIA A ASTROPHYSICAL CONFIGURABLE HIGHLY TOWARDS XSPEC: BEYOND 1. A INTRODUCTION T E tl mltajv 08/22/09 v. emulateapj style X aaAayi n Techniques and Analysis Data al nttt o srpyisadSaeResearch Space and Astrophysics for Institute Kavli ascuet nttt fTechnology of Institute Massachusetts .S ol,M .Nowak A. M. Noble, S. M. rf eso oebr2 2018 2, November version Draft abig,M 02139 MA Cambridge, 1.4.x ABSTRACT igepeiinFortran77. precision single l rga.Oesol o,hwvr qaeteact the equate however, sin- a not, as should seamless One of as appear program. to gle made with be can, effort, applications scheme sufficient Cooperating balancing sensible and a entity. goals. with reuse, is logical deadlines it fostering complexity, single tools, managing known a for in into trust programs on Building distinct be. would but memory, tary in image object an shared smooth a to importing say by however module, space ISIS,” an - address in “working issuing external its be an enlarging example, not spawn would to for tool ISIS sualization scheme, from call this system address- In operating applications or the access within memory. only data able which or code thereof, the portions modify or commands, of earlaeo SE 2wscra20) ti more is it models custom 2005), non- of circa variety public, a was with first 12 extend (the to XSPEC use difficult of wide release in beta remains 11 XSPEC rcal,cnrt icsin oad hsedw de- we end this a Towards into fine debate discussion. open-ended concrete potentially tractable, a transform to aim are achieved we be demonstrated principle tools, in or might from what using languages with result of concerned any not combination virtually any replicate program- nearly strong eventually a can that premise with mer the system On each effort. in achieve reasonable can scientist motivated cally favorably 12. less compare XSPEC would does extensibility than of terms in so 1 ercgieteuiiyo nertn complemen- integrating of utility the recognize We udn u nevrwl etesneo htatypi- a what of sense the be will endeavor our Guiding o xml,XPC1 uprsol oa oescddin coded models local only supports 11 XSPEC example, For • okn ihna application an within working aapout ln h a,te odn these application loading parent the then into way, back the data temporary create new along and state products this se- read data a to spawning tools files, of disk ries to state internal dumping ihrXPCo SSbtrte htcnbe can what rather but ISIS or XSPEC either saBakBxPolm ihmany with Problem, Box Black a ts emphasis Our System. Interpretation sdsorgn srcustomization. user discouraging us within favne srpyia inquiry. astrophysical advanced of fiual otae hl noting While software. nfigurable ftedfcosadr X-ray standard defacto the of s ueial citbe modular, scriptable, numerically oes—eietf series a identify models”—we l ru htfo h viewpoint the from that argue optto,visualization, computation, l cetfi oei moderately is core scientific ah yaodn yohtcl we hypotheticals avoiding By each. oma h execution the mean to NALYSIS 1 and 2 Noble et al. with the ability to was recently added (ca. 2005). XSPEC was also one of the earliest astronomical applications to adopt a widely- • invoke a function to operate upon a data structure, used, general-purpose scripting language for interactive with both resident in memory. use and batch control; the incorporation of TCL, intro- duced in version 10 and stabilized in version 11, provided In this paper we will argue that the latter offers greater programmability and a path to graphical interfaces immediacy of use, as well as significantly more algorith- with the Tk widget set. mic, data, and visualization flexibility. It also addresses ISIS was conceived to support analysis of high- real scalability concerns as datasets increase in size or resolution Chandra X-Ray gratings spectra, then quickly tasks increase in time complexity. A stark example of grew into a general-purpose analysis system. ISIS is es- this is given in Davis et al. (2005), which discusses the sentially a superset of XSPEC, combining all of its mod- methods employed to analyze the megasecond observa- els (though they are not required for ISIS operation) and tion of Cassiopeia A (Hwang et al. 2004) by the Chan- more with the S-Lang scripting language (Ayers 1996), dra X-Ray Observatory in 2004, which yielded the first whose mathematical performance rivals both commer- ever map of electron acceleration in a supernova remnant cial packages such as MatLab (Gilat 2004) or IDL (Kling (Stage et al. 2006). Over 9 million PHA spectra were 1999) as well as the numerical extensions for popular extracted from over 300 million events, with hundreds of open source languages (Noble 2007, §3). thousands of spectral fits performed in ISIS to yield arc- As with XSPEC, users may define custom models in second resolution spatial maps of plasma temperatures, compiled code, external tables, or as a string specifying line emission, and Doppler velocities. Custom modules simple arithmetic combinations of existing models, but were developed to manage the multi-gigabyte files in- ISIS takes it further by also allowing models to be inter- volved and distribute the spectral extractions and fits preted S-Lang functions; this supports rapid prototyping across a network of workstations using the Parallel Vir- and, because of the high-performance of S-Lang numer- tual Machine (Geist et al. 1994). Based upon the time ics (§3), need not sacrifice speed for the convenience of required to extract a single spectrum from these enor- using an interpreter. XSPEC does not support the use mous datasets, with the standard Chandra dmextract of TCL procedures as models. tool in CIAO (ca. version 3.2), the authors extrapolated The chief means of controlling XSPEC and ISIS in- that the spectral extractions alone would have required teractively is by textual prompt, with both providing nearly 10 years had a tool-based approach been taken. In VI and key bindings through GNU readline, contrast, each run through the entire PVM-based analy- as well as filename completion and persistent command sis sequence required less than 12 days to complete. history. XSPEC accepts commands entered in abbre- 2. viated form, such as mo for model, but does not auto- FEATURE SURVEY complete TCL command or variable names from partial XSPEC and ISIS have been primarily concerned with input. ISIS provides strong auto-completion facilities, al- fitting models to 1D spectra: a theoretical model of pho- lowing any S-Lang function or variable name to be iden- ton counts is calculated, convolved with an instrument tified from minimal partial input. In addition to the response, and compared to the actual photon counts ob- command prompt, optional packages providing graphi- served by the detector, using a given fit statistic (typi- cal interfaces, such as the Gtk-based VWhere (§7.4) or the cally χ2). Each provides mechanisms for indexed loading SAOds9 (Joye and Mandel 2003) image display module and management of data (observed counts, instrument (§7.2), have been publicly available and utilized within responses, et cetera), as well as the capacity to visually ISIS for several years. Graphical extensions for XSPEC inspect, using PGPLOT to construct a variety of 2D plots, have been discussed in the literature (Jordan et al. 1994) the data, models, residuals, fit statistics, and so forth. and online, but, other than wholly external tools like fv While initially targeted for X-ray analysis, both XSPEC (Pence et al. 1997), none seem publicly available for use and ISIS have been utilized in multiple wavebands. The within XSPEC proper. primary authors of XSPEC and ISIS are practicing as- The XSPEC prompt presents an appealingly simple, tronomers with active publication records in the domains everything-is-a-command interface, and fosters ease of served by their applications. Both applications are thus use by offering brief, germane command names, assuming subject to continuous, rapid, and rigorous scientific vet- sensible defaults, and in some cases transparently com- ting. ISIS and XSPEC update cycles are measured in bining multiple operations into one: e.g. loading both terms of days and weeks, which can be beneficial for users data and background files with a single command. A under time constraints before their proprietary data go typical command is shorter and requires less punctua- public. tion in XSPEC, although the auto-completion capability XSPEC is the tool most widely used in X-ray astron- of ISIS can compensate for its relative verbosity. In short, omy for spectral fitting; with a legacy spanning more XSPEC makes it easy to perform many common spectral than two decades and hundreds of citations, its value analysis tasks. to the scientific community is beyond question. XSPEC The ISIS prompt was designed to be a proxy S-Lang bundles more than 50 theoretical models, each of which interpreter (Fig. 3), making ISIS programmable and ex- may also be used by external programs simply by link- tensible since inception. Its everything-is-a-function phi- ing to its libraries. The core set of models may be ex- losophy permeates the implementation and user experi- tended with either precomputed file-based table models ence, providing, at the cost of a steeper initial learning curve, considerable flexibility and power. Consider for il- or compiled codes. The bulk of compiled XSPEC mod- lustration the Astrophysical Plasma Emission Database els are coded in Fortran, however as part of a significant (APED) of Smith et al. (2001). Because ISIS provides redesign effort support for custom and C++ models Beyond XSPEC: Towards Highly Configurable Astrophysical Analysis 3

Fig. 1.— Ionization fractions for Fe XVII through XXI. Fig. 2.— Performance of IDL 6.1 (binary distribution) and S- Lang (statically linked), on √b2 4ac for arrays of various sizes. programmatic access to virtually all of APED, curves − like those of Fig. 1, showing the ionization fractions at terpreted numeric arrays cannot be utilized in XSPEC, 25 temperatures for Fe XVII through XXI, can be cre- because the TCL extensions BLT and NAP (see next sec- ated with just a few statements tion) make this possible; nor does the problem lie with temperature = 10.0^[6.0:7.2:0.05]; PGPLOT, since it provides more capabilities than XSPEC foreach ion ( [17, 18, 19, 20, 21] ) { currently exploits. Rather, it is the XSPEC architecture fraction = ion_frac(Fe, ion, temperature); which does not connect the two in a straightforward man- oplot (temperature, fraction); ner, leaving the user little choice but to look elsewhere } for open-ended numerics and visualization. This opacity and disconnectedness is evident in other functional areas either directly from the ISIS prompt or within analysis of XSPEC, notably the manner in which data are input scripts. The importance to X-ray analysis of this sort of & managed (§4) and how custom models are specified facile manipulation of atomic data is heavily underscored (§5). in Kallman and Palmeri (2007). Echoing similar criticisms levied upon graphical inter- faces when compared to command lines (Norman 2007; 2.1. The Black Box Problem Bland et al. 2007), we collectively refer to these issues as It is not clear how a similar ad-hoc query of APED the Black Box Problem: by hiding complexity to make could be formulated in XSPEC, nor how the results of common tasks easy, uncommon or novel tasks can be such might be used within it for further analysis. One made difficult or impossible. problem is that the use of APED in XSPEC is highly The ease of use that has been a hallmark of XSPEC– constrained, due to it being hidden, e.g., within the in- written by astronomers to do things astronomers need, in ternals of the apec model and identify commands. An- a way natural to them—has served the community well. other obstacle is that high-level plotting in XSPEC is not A challenge, though, is that as instruments and the data generic: the plot and iplot commands are also opaque, collected from them grow in size and complexity, caus- in that they take their vectors from internal state rep- ing us to consider new questions and possibly develop resenting the current data or model under inspection. new techniques to answer them, the Black Box Problem Intimately tied to the semantics of 1D spectral analysis, can engender You May Only! patterns of analysis, rather these commands may not be used until after a dataset than foster rapid, ad hoc explorations of What If? sce- has been loaded. narios. More modular, configurable, and open implemen- While ISIS contains similar visualization commands tations (Kiczales 1996) can help resolve this tension, al- specifically tailored to spectral analysis, the oplot func- lowing applications to rapidly evolve to suit specific user tion shown above is part of a family of routines which needs while freeing their primary authors of the burden of allows one to create arbitrary image, scatter, histogram, coding and maintaining such enhancements themselves. and contour [over]plots from arbitrary data, indepen- dently of the current spectra being analyzed. This makes 3. HIGH PERFORMANCE NUMERICS exploratory coding and visualization like Compact multidimensional numerics are a major ba- isis> x = [PI/2: 2*PI: PI/100] sis for the popularity of commercial toolsets such as IDL isis> plot(x, cos(x)) or MatLab. The innate capability of S-Lang in this re- isis> oplot(x, sin(x)) gard was a primary motivator for its adoption in ISIS,3 as natural in ISIS as it is in other interactive analy- and endows it with analytic expressiveness and scripting sis systems such as IDL or PyRAF (de La Pe˜na et al. performance not equaled in XSPEC. 2001). In XSPEC such dynamic creation of interpreted As indicated in the earlier code fragments, complex arrays and direct mathematical manipulation and visu- mathematical abstractions may be stated concisely in S- alization of them is more difficult, involving a melange Lang, without regard to whether they will operate upon of TCL code, XSPEC commands, intermediate files, and 3 2 Another motivator was wide availability: S-Lang is utilized low-level QDP/PLT directives. The issue is not that in- in numerous open source projects and, in part by virtue of being bundled as a core component of every major Linux distribution, is 2 Or going outside of XSPEC to use tools like fv or XIMAGE. available on millions of machines worldwide. 4 Noble et al.

ISIS Command Interpreter S−Lang Interpreter Dynamically S−Lang Library Loadable Modules

Off−The−Shelf (OTS) and System Libs

cfitsio PGplot readline XSPEC GSL PVM GTK volview ISIS ISIS ISIS ISIS ISIS ISIS module module module module module module module module Atomic Emis− ASCII Fitting Modeling Histo− DB sivity I/O module module gram cfitsio PGplot GNU XSPEC GNU volpack module module module or Scient. PVM GIMP module library library models toolkit Toolkit rendering S−Lang Library library readline library Libraries

Fig. 3.— The ISIS main() program is a thin ANSI C layer above the S-Lang interpreter, used primarily to establish hooks to modules and gather user input. Taking modularity to this extreme, where carefully orthogonalized components provide the bulk of capability, provides a clean and flexible implementation. Most modules may also be used outside of ISIS. scalars, vectors, multidimensional arrays, or some com- scripting means that rapid development techniques— bination of each, and are computed with performance on irrespective of language—can be applied to a broad scope par with commercial software (e.g., as in Fig. 2). In this of analysis problems, allowing the writing of compiled section we discuss the more involved example of Fig. 4, code to gradually become a last resort instead of the consisting of an orbital model implemented in pure S- primary avenue of attack. S-Lang bears a strong re- Lang and taken “as is” from an MIT research effort The semblance to C and IDL, arguably the most popular model was fit to a coarsely binned lightcurve (provid- scripting language used by astronomers, and in fact we ing an example of how ISIS can model non-spectroscopic data directly from in-memory S-Lang arrays, as discussed define orbital(lo, hi, par) { in §4), and represents a steady amplitude curve inter- variable a,c,w1,w2,s1,s2,s3,m1,m2,m3,b1,b2,b3,i,=@lo; rupted by an eclipse with a quadratic ingress and egress, a = par[0]; c = par[1]; w1 = par[2]; w2 = par[3]; with low-level emission during the eclipse also modeled s1 = par[4]; s2 = par[5]; s3 = par[6]; m1 = par[7]; by a quadratic function. (Trowbridge et al. 2007). m2 = par[8]; m3 = par[9]; b1 = par[10]; b2 = par[11]; As a means of assessing the numerics and extensibil- b3 = par[12]; ity of XSPEC we attempted to translate this model into i = where( lo >= c-w1/2 and hi <= c+w1/2 ); TCL; for completeness we also made Perl and Python r[i] = s1 * ((hi[i]+lo[i])/2)^2 + s2 * ((hi[i]+lo[i])/2) + s3; translations, since these languages are actively employed i = where( lo >= c+w1/2 and hi <= c+w2/2); by astronomers in systems like PDL and PyRAF and so r[i] = m1 * ((hi[i]+lo[i])/2)^2 + m2 * ((hi[i]+lo[i])/2) + m3; are useful contrasts for gauging the numerical capabili- i = where( lo >= c-w2/2 and hi <= c-w1/2); ties of S-Lang. Because neither TCL , Perl, nor Python r[i] = b1 * ((hi[i]+lo[i])/2)^2 + b2 * ((hi[i]+lo[i])/2) + b3; are intrinsically vectorized we used numerical extension modules for each conversion: BLT & NAP for TCL, PDL i = where( lo >= c+w2/2 or hi <= c-w2/2); r[i] = 1; for Perl, and Numeric, NumArray, & NumPy for Python. The Python and Perl conversions were straightforward, i=where((lo < c-w1/2 and hi > c-w1/2) or (lo < c+w1/2 and hi > c+w1/2)); with Python being the somewhat easier of the two; we did r[i] = 0; not complete the conversion to TCL, however, largely be- i=where((lo < c-w2/2 and hi > c-w2/2) or (lo < c+w2/2 and hi > c+w2/2)); cause neither BLT nor NAP provided a clear equivalent r[i] = 1; to the where command or a means of using the results of return a*r; such for array slicing. This model would therefore need } to be recoded in a compiled language before it could be Fig. 4.— Pure S-Lang orbital model, fit by ISIS directly to in- used in XSPEC. A plot of the performance of Perl 5.8.4 memory data arrays. As described in Trowbridge et al. (2007), the and Python 2.4 implementations of this model, divided model represents a lightcurve of constant amplitude a, with an eclipse centered at c, with a width w2 from the start of ingress to by the corresponding S-Lang 2.0.7 performance, is given the end of egress. The ingress and egress have a quadratic form in Fig. 5. (m and b parameters), and there is a quadratic “bounceback” (s The testing methodology behind Figs. 2 and 5 is de- parameter) with width w1. scribed in Noble (2007), along with memory statistics and additional tests showing similar performance trends. and colleagues have converted numerous IDL and Mat- Briefly, the datapoints in each curve are the mean times Lab scripts to S-Lang for use in ISIS with relative ease. of 1000 invocations of the model on a given grid size (31 in all, from 1 to 1e6 bins); all codes were com- 4. piled using GCC 3.3 with -O3 optimization, and exe- ARRAY-BASED INPUT cuted on a dual Athlon (1.8Ghz) machine running De- As we found with the orbital model, converting such bian 3.1. Although not a comprehensive series of bench- scripts to TCL for use in XSPEC would pose a consider- marks, these results hint that the numerical engine of able challenge: although the primary interface of XSPEC ISIS is among the strongest available. High performance is a scriptable interpreter in which multidimensional ar- rays may be created and mathematically manipulated— Beyond XSPEC: Towards Highly Configurable Astrophysical Analysis 5

ing file-based input in the manner of XSPEC, but with ASCII in addition to FITS, ISIS also permits most facets of the modeling process to be specified directly from in- terpreted S-Lang arrays in memory, including the the- oretical and observed counts, ARF, RMF, and back- ground. The problem of augmenting spectroscopy with data in foreign formats is thus reduced from having to master XSPEC internals to merely being able to create S-Lang arrays from such files. This is within the reach of end users, not just application authors, particularly because automatic code generators can then be employed to sim- plify the creation of scriptable wrappers for the relevant I/O libraries. Fig. 5.— Performance of orbital models implemented in Perl 5.8.4 and Python 2.4, relative to S-Lang 2.0.7. TCL is not represented As an example of how these considerations can mat- here because neither of its numerical extensions, BLT and NAP, ter in practical use, consider the HDF5 file format and offered a clear equivalent to the where() function. I/O library (Folk et al. 1999). While FITS is the stan- dard format for archiving and distributing astrophysical with the aid of the BLT or NAP—the results of such observations, HDF5 has become the defacto standard for cannot be straightforwardly utilized for spectroscopy. storing astrophysical simulations such as those generated The fact that the internal data tables of XSPEC are by FLASH (Fryxell et al. 2001). Having the ability to not exposed for direct population from the TCL inter- easily compare and contrast observations with simula- preter is another instance of the Black Box Problem; tions, e.g., using simulation output as source model in- data may only be read from FITS files, and only via put, could foster more sophisticated analyses. We have the data command or its INTEGRAL mission-specific explored such questions in ISIS as part of the HYDRA4 variant SPIdata. There is no documented provision by project, using SLIRP to generate the SLh5 module5 for which interpreted arrays may be used, for instance, to reading and writing HDF5 data directly to and from in- specify observed counts or responses. memory S-Lang arrays. The resulting objects may be Reflecting a fundamental difference in approach, sliced with S-Lang array manipulators or mathematically XSPEC is more static than ISIS, taking the position transformed in arbitrary ways (§3) before further analy- that extensive data preparation happens outside the sis, such as being treated as fit data or model data, or application. An advantage of this approach is self- passed to 2D plotting or 3D volumetric routines for vi- documentation: well-written tools record the path along sualization. This module also obviates the need for a which a FITS file travels as history keywords in its conversion tool to migrate data from HDF5 to FITS for header. A drawback is the underlying implication that spectroscopy6. input data need little massaging during interactive anal- This spirit of imposing fewer constraints on how its in- ysis; when the need for such data manipulation arises, ternal tables may be populated or manipulated leads to the solution typically involves dumping XSPEC state to more flexible and mathematically diverse analysis. For disk files, and/or running FTOOLS or other programs example, consider the problem of grouping data to en- to generate new file products, then reloading these data sure an adequate signal to noise ratio is obtained for each back into XSPEC. detector channel. With XSPEC data are grouped prior This files-only data management paradigm can lead to input, usually with the grppha tool. The assigned to slower performance and more cumbersome analytics grouping persists for the life of the loaded dataset, and (§1, §7.4). It can also be an evolutionary disadvantage, may only be changed by re-running grppha outside of i.e. by potentially limiting the pool of individuals able XSPEC, deleting the original dataset in XSPEC, and or willing to contribute new I/O codes to XSPEC. As reloading it with the new grouping. With ISIS spec- exemplified by the implementation of the SPIdata com- tra may be dynamically rebinned in memory— using the mand, endowing XSPEC with the ability to directly read functions rebin data(), group data(), or variants— additional file or mission formats requires detailed mas- with the associated dataset remaining intact. ISIS also tery of its internals. This means general-purpose code provides options not available in the XSPEC grppha generators like SWIG (Beazley 1996) or SLIRP (Noble paradigm, such as grouping by signal to noise ratio, or 2007) cannot be leveraged to automate the wrapping allowing datasets to be combined directly in memory, of I/O libraries and simplify the use of new formats bringing to bear the full numerical power discussed in in XSPEC. To foster the widest possible use, by de- §3 and §7, instead of shelling out to run tools such as sign such wrapper generators know nothing about the mathpha, marfrmf, or addrmf—an approach of more lim- applications in which the bindings they create will be ited mathematical generality. ISIS also supports combin- used. They emit code targeted to the scope of a scripting ing non-linear (e.g. piled-up) data, which in general can- language, rather than application-specific C++ methods not be done with tools because of the model evaluation like SPI Data::read() which is suitable only for internal involved. XSPEC use. 4 http://space.mit.edu/hydra 4.1. Towards Dynamic Data Management 5 http://space.mit.edu/cxc/software/slang/modules/slh5 6 http://www.hdfgroup.org/RFC/fits2h5/fits2h5.htm describes ISIS aims for a more dynamic and configurable ap- preliminary efforts to go in the other direction. proach to data management. In addition to support- 6 Noble et al.

5. CUSTOM MODELS The result is that custom modeling cannot be scripted in In §3 we detailed how the performance of S-Lang XSPEC with the same level of flexibility as in ISIS. For couples with the ability of ISIS to use purely inter- example, initpackage does not expose the routines it preted models to provide an alternative to the tradi- wraps to the top-level TCL interpreter, but rather makes tional method of using compiled codes as custom spec- them accessible only through the the XSPEC tral models. This significant feature, not available to model command. Not being able to call models directly XSPEC users since it does not permit TCL procedures from TCL means the convenience of an interpreter can- to be used as models, can encourage rapid experimenta- not be leveraged when testing and using the models. tion with short, highly analytic codes. These brief, self- This can be easy to overlook, because many model contained scripts are easily exchanged among users, as writers devise their own nominal tests in whatever com- they may be loaded immediately into ISIS from any di- piled language they used to write the model. However, rectory. This is in contrast to code that must be compiled it would be valuable for both writers and users of models into a shareable object then accessed from fixed locations if they could craft TCL commands or scripts to exercise and in conjunction with external description files, such them (in a more powerful and streamlined manner than as $LOCAL MODEL DIRECTORY and lmodel.dat are the multiplexing tclout command, or faking a dataset) respectively used in XSPEC. or pass their output to other features in the application Moving to the realm of compiled models, while version for further numerical processing. Consider Fortran com- 12 of XSPEC provides many improvements over version mon blocks, for instance, and the XSPEC warmabs model 11—notably support for C and C++, double precision in particular: after evaluation the ewout block in this numerics, and the initpackage automated model wrap- model contains a list of the strongest lines, sorted by el- ping command—a number of limitations persist. To be- ement and ion. Wrapping warmabs with SLIRP makes the content of this and every other common block in the gin, using local models at all means one must first build 7 XSPEC from source. Although HEASOFT generally model accessible to ISIS as S-Lang variables. This pro- builds well, this can still be daunting, and is unnecessary vides scientists another means of accessing the internals with ISIS as both interpreted and compiled user mod- of a code, without needing to master the internals of the els may be used from a binary installation. Next, model application for which that code was initially written. components in XSPEC must be designated as having a In addition, initpackage wraps only the outermost specific form, e.g. additive, multiplicative, convolution, interface routine of a model: if the spectrum returned by etc. This too is unnecessary in ISIS, as it places no re- a model is actually computed in several steps by call- striction upon the computational form of a model—any ing additional routines, those lower level routines are conceivable mathematical function may be used. not exposed by initpackage for use in XSPEC outside As XSPEC has been the defacto standard for so long, the model evaluation. As an example of how exposing it is also easy for spectral model developers to code ex- the low level routines could be useful, we consider the pressly for XSPEC, by relying too heavily upon its inter- pshock and vpshock models. Internally, both rely upon nal functions (such as udmget for memory allocation or the lower-level ionseqs() routine to calculate nonequi- xwrite for message output) when crafting models. When librium ionization ionic concentrations. In response to XSPEC was the only X-ray spectral modeling tool avail- a collaborator (Ji 2007, priv. comm.) who wished to able this was not a concern, but today it makes reusing obtain these ionic concentrations independently, we used models in other environments more difficult. For exam- SLIRP to wrap ionseq.f in a matter of only minutes, ple, pulling the pexriv model out of XSPEC 11, which which allowed ionseqs() to be called directly from a remains widely used, required S-Lang script in ISIS. -lxspec -lxspec_lfn -lxspec -lwcs The jet model of Markoff et al. (2005) provides an- -lxanlib -lreadline -lcfitsio -lpgplot other good example. The output spectrum is summed -lcurses -ltermcap from 4 individual components (disk, comptonization, synchrotron, and synchrotron self-comptonization spec- on the link line, plus the introduction of a “dummy” tra), each computed by distinct routines within a single function to resolve an unused symbol name Fortran source file. To analyze or visualize the inde- (Markoff and Nowak 2004). Standing out here is pendent contribution of each component it is necessary the recursive use of -lxspec, and the need for auxiliary to isolate each from the total flux. To achieve this the terminal management and plotting libraries that provide model can be executed in different modes, one of which no numerical or physics contribution to the model. will execute write statements to output component con- Although this situation is improved in XSPEC 12, these tributions to disk files as they are computed. Because our show that it is not enough to simply make source code SLIRP wrapper for this jet model makes every routine available to the public: orthogonality is an important within the Fortran file visible to the top-level S-Lang in- aspect of open and flexible software. To promote terpreter, not just the outermost model routine, the com- external reusability, code must also be separable into ponent fluxes can in principle now be accessed by mak- logically distinct units free of unwanted dependencies. ing the relevant routine calls, or accessing the associated These are core implementation principles of ISIS (Fig. common blocks, directly from ISIS. To our knowledge the 3). only way to achieve a similar result in XSPEC would be 5.1. to wrap the code twice, once with initpackage to make Automated Model Wrapping the custom model routine visible to the model command, While initpackage does help automate the use of cus- tom models in XSPEC 12, it is not a generic, application- 7 Common block values may also be modified using regular S- neutral wrapper generator in the style of SLIRP or SWIG. Lang assignment expressions. Beyond XSPEC: Towards Highly Configurable Astrophysical Analysis 7 and again as a TCL extension module to expose the re- As discussed in Noble et al. (2006) and earlier in §1, maining routines to the TCL interpreter. Tools like ftcl for several years PVM has been used in ISIS9 to ap- or critcl8 can simplify bindings generation in the lat- ply pools of machines to coarse-grained calculations that ter case, but not automate it to the level that SLIRP do not fit within the compute space of one worksta- does; it is also unclear whether they support a range of tion. Consider relativistic Kerr disk models, for exam- features seen in general scientific codes, such as common ple. Historically, implementors have opted to use pre- blocks, string and multidimensional arrays, or complex computed tables to gain speed at the expense of limit- datatypes. ing flexibility in searching parameter space. However, Using initpackage for models which call external li- by recognizing that contributions from individual radii braries can also be problematic. For example, collabora- may be computed independently the model has been tors are extending the above jet model with codes that parallelized in ISIS to avoid this tradeoff (A. Young, call GNU Scientific Library routines, but initpackage priv. comm., Brenneman and Reynolds 2006). Likewise, documents no mechanism by which the GSL can be the aforementioned jet model can be expensive to com- linked in at build time. Following compiler convention, pute, particularly when calculating error bars: generat- this is achieved with SLIRP by specifying -lgsl on the ing 90% confidence limits for 12 free parameters, at the command line. very coarse tolerance of 0.5, required 4 days on a single Finally, functions wrapped by SLIRP can be automat- 2.66 GHz Intel Core 2 Duo processor. To increase per- ically vectorized, allowing them to operate over entire ar- formance we assigned the confidence limit search to 12 rays in a single call, and at the speed of compiled C code Intel Xeon 3.4GHz processors10 within a PVM. This re- instead of using slower, interpreted looping constructs. duced the runtime at 0.5 tolerance to under 24 hours, a Vectorized wrappers can also be tuned for parallelization greater than 75% speedup; it also allowed us to discern with OpenMP (Noble 2007), allowing scientists to take finer features in parameter space by increasing the toler- better advantage of their multicore machines during ISIS ance resolution by a factor of 500, to 0.001, while keeping analysis. To our knowledge autogenerated vector parallel the overall runtime to 50 hours, or approximately half of wrappers are currently unique to SLIRP, and therefore the serial runtime at 0.5 resolution. inaccessible in XSPEC. Temperature mapping is another problem that was straightforward to parallelize with ISIS. For instance, 6. PARALLEL AND DISTRIBUTED COMPUTATION Wise and Houck (2004) provides a map of heating in the intracluster medium of Perseus, computed from 10,000 Parallel computing is not new, but to this day few spectral extractions and fits on 20+ CPUs in just several astronomers employ parallelism for standard problems hours. Additional efforts have also led to improvements in data analysis. To provide a quantitative sense of in related areas of research, such as pvm xstar, a paral- this assertion relative to X-ray spectroscopy (Noble et al. lelizing wrapper for XSTAR which has made it feasible 2006), we performed the following full text searches on for us to probe thousands of photoionized gas physical all published papers indexed in the ADS (Kurtz et al. scenarios in the time it has previously taken to compute 1993) by September of 2005. Extra keywords were in- only a handful of such models (Noble & Ji, in prep.). At the other end of the architectural scale, we have also Keywords Number of Hits shown that ISIS can make transparent use of OpenMP parallel AND pvm 38 to exploit shared memory multiprocessors (Noble 2007). message AND passing AND mpi 21 This is especially relevant in light of the emergence of xspec 832 multicore architectures: soon most astronomers will have xspec AND parallel AND pvm 0 parallel computers on their desktops, but few astron- xspecANDmessageANDpassingANDmpi 0 omy applications are poised to take advantage of them. Analysis systems which do not embrace parallelism can TABLE 1 process at most the workload of only 1 CPU, result- Published references to parallelism and XSPEC, as ing in a dramatic 1/n underutilization of resources as indexed in ADS through 2005. more CPU cores are added. However, astronomers are well versed in scripting, particularly with very high-level, array-oriented numerical packages like IDL, PDL, and S- Lang, to name a few. They combine easy manipulation cluded with PVM and MPI (the Message Passing In- of mathematical structures of arbitrary dimension with terface, Gropp et al. 1999) so as to cull false matches most of the performance of compiled code, with the lat- ter due largely to moving array traversals from the inter- (e.g. with the Max Planck Institute). Queries in ADS preted layer into lower-level code like this C fragment on other modeling tools, or with other search engines such as Google, all yielded similar trends: astronomers case SLANG_TIMES: and astrophysicists have employed parallel computing, for (n = 0; n < na; n++) but mainly for highly customized, large-scale problems c[n] = a[n] * b[n]; in simulation, image processing, or data reduction. Even though a majority of papers published in observational which provides vectorized multiplication in S-Lang. This astronomy over recent decades result from fitting models suggests that much of the strength and appeal of nu- within established software systems, virtually no one is 9 PVM was used due its longstanding tolerance of faults on het- employing parallelism to do so, especially in the interac- erogeneous clusters of workstations, but in principle an MPI mod- tive context. ule could be used just as easily within ISIS. 10 Using new versions of the cl master and cl slave scripts 8 http://ftcl.sourceforge.net, http://wiki.tcl.tk/2523 described on the ISIS website. 8 Noble et al. merical scripting stems from relatively simple loops over regular structures, making them ripe for automatic par- allelization. This pattern was exploited in SLIRP and S-Lang to effect the OpenMP parallelizations described in (Noble 2007). We believe that ISIS is the only general purpose spec- troscopy system in which such a range of parallelism— from single processors on multiple machines to multiple processors on single machines—has been demonstrated. In both cases speedups were obtained without exposing parallelism at the ISIS application level. For instance, the same serial implementations of spectroscopic mod- els have been used for both single- and multi-processor execution. This minimizes two traditional barriers to the use of parallelism by non-specialists: learning how to program for concurrency and recasting sequential algo- Fig. 6.— Power Spectrum of X1820-303, generated from the PCA rithms in parallel form. We believe these considerations in RXTE observation P10075 (data set 10075-01-01-02). are important because the emerging ubiquity of multi- core architectures, combined with the ever-growing size features desired in one’s work. In such cases it can be of datasets and analysis complexity, makes the regular expedient to craft new modules which add missing fea- use of parallelism in astronomy not a question of if it tures, or make certain functionality more easy to incor- will occur, but rather one of merely when. porate into modeling and analysis. For example, consider Fig. 7, drawn from Chandra 7. BEYOND 1D SPECTROSCOPY observation 4800 of interacting galaxy pair NGC7714, and the following series of commands: Data from modern telescopes are getting larger and more detailed, with the use of multiple datasets from isis> require("ds9") multiple missions becoming commonplace. Making sense isis> require("gsl") of it all often requires techniques more sophisticated isis> image = ds9_get_array() than traditional plots or images. Canned data reduc- isis> hist = sum(image, 0) isis> range = [440:680] tion threads, while an important piece of the puzzle, isis> hplot(range-1, range, hist[range]) can only go so far. Data openness and architectural isis> xa = [440:680:4]; ya = hist[xa] modularity (Fig. 3) make it easy to push ISIS beyond isis> smoothed = interp_cspline(range, xa, ya) its original mission of 1D spectroscopy to address this isis> oplot(range, smoothed) broader set of analysis problems. As we illustrate here, it has become unnecessary to use entirely separate appli- First the SAOds9 and GNU Scientific Library exten- cations (as XImage, XSelect, or Xronos are used along- sion modules (Primini et al. 2005) are loaded; we show side XSPEC) or be constrained by a strictly file → tool this explicitly for didactic purposes, but ISIS can be con- → file model, in order to supplement spectral modeling figured to load either module automatically when the with, for example, advanced filtering, imaging, or timing given functions are invoked. The 1024×1024 image is analysis. For each of the examples shown here it is not then retrieved directly into a properly byteswapped 2D clear how a similar result might be obtained so directly S-Lang pixel array, using an XPA (Mandel et al. 1995) within XSPEC. binary transfer instead of intermediate file I/O. This im- age is collapsed along its X axis using the native S-Lang 7.1. Timing Analysis sum function, to create an intensity histogram plot. The 11 GSL cubic spline function is then called to smooth the The optional SITAR module endows ISIS with tim- brightest portion of the intensity histogram, interpolat- ing analysis capability, obviating the need to use a sep- ing over every fourth point therein, and finally we over- arate timing application such as XRONOS. The power plot the result (dotted red line). Although the task here spectrum in Fig. 6 was generated entirely in ISIS: the is relatively straightforward, it again shows how open- data were read, the FFTs were performed and aver- ended analysis objectives can be achieved by weaving ex- aged, the Power Spectrum was logarithmically binned isting tools together in new ways, using an array-based over Fourier frequency, the constant + two Lorentzians interpreter as the thread. DS9 is extended beyond its model was fit, and the results plotted, operating directly essential role as a qualitative display tool and into the upon arrays in memory, all without ever leaving the pro- realm of quantitative analysis, while the ISIS user is able gram. to exploit the imaging capabilities of DS9 within inter- active or scripted analysis. 7.2. Modular Imaging with S-Lang Numerics Another example is evt2img,12 a simple guilet13 which Many excellent imaging tools are available to as- combines the histogram and Gtk modules to provide tronomers, but it is impossible for any one of them to 12 http://space.mit.edu/cxc/software/slang/modules/hist/evt2img.html do everything. Here again, the extensibility and modu- 13 larity of the ISIS paradigm helps avoid the fate of “doing This term is used in Noble (2005) to refer to graphical applets embedded within a primarily command-line application, and typ- nothing” while waiting for the implementation of new ically coded in a scripting language, allowing the use of graphical interfaces when convenient while avoiding an exclusively graphical 11 http://space.mit.edu/CXC/analysis/SITAR interface. Beyond XSPEC: Towards Highly Configurable Astrophysical Analysis 9

Fig. 7.— ACIS image from 2004 Chandra observation of NGC7714, with corresponding square and smoothed pixel intensity histograms. an interactive mechanism for creating 3-color images di- an animation. Here we consider a 99 spectra data cube rectly from event lists. Images are created in evt2img by generated by CUBISM15 from Spitzer observation 3310 defining 3 energy band filters, binning the events selected of Cassiopeia A, with dimensions RA, DEC, and λ. To by each filter with the hist2d function, and overlaying identify where emission is strongest in the spatial and the resulting red, green, and blue monochrome images wavelength domains, the entire FITS datacube can be within a Gtk display window. As shown in Fig. 8, a passed directly to volview (Fig. 10) useful feature of evt2img is its ability to plot the energy isis> require("volview") spectrum as a 1D histogram in order to tune the color isis> cube = fits_read_image("casa_ll1_s12.fits") band filters in real time, simply by adjusting the indi- isis> volview(cube) vidual red, green, and blue sliders. For interactive ex- ploration this method is faster and more intuitive than This visualization makes it instantly clear that the the traditional file → tool → file approach, and produces clumpy regions correspond to bright emission at fairly no temporary file litter while experimenting with color specific wavelengths. The volview guilet is our interface assignment filters. Evt2img may be run standalone from to the Volpack rendering library (Lacroute and Levoy the operating system prompt or as a function in ISIS, 1994), which, although somewhat dated, is small, has no and in the latter case allows event data to be input di- external dependencies, requires no special hardware, and rectly from in-memory interpreted arrays in addition to is very fast. The shear-warp factorization algorithm in traditional event files. The HYDRA project at MIT pro- Volpack has been generally recognized as the fastest ren- vides a wealth of more sophisticated examples14 of how dering technique available (Meißner et al. 2000), a cru- advanced imaging can be tied more directly to numeri- cial factor for interactive analysis. In contrast to high- cal modeling and analysis in ISIS. Under the auspices of end tools like ParaView and VISIT16, volview provides the NASA AISRP program, a broad suite of 2D and 3D a simple, low buy-in path to volume rendering and is fitting, geometry, and visualization routines have been actively used in the HYDRA project.17. developed, e.g. to do forward folding and comparison of models with 2D event-based data sets. These are de- 7.4. Visual Correlation and Multidimensional Filtering scribed in detail online, but are briefly illustrated here in Having the ability to cut, visualize, and correlate data, the context of an analysis of cavities in Hydra A, using in complex ways and from multiple missions, is invaluable the Chandra observation 4969. A model is defined by to analysis. This exploratory process ultimately seeks to combining traditional spectral components (e.g. mekal derive sets of constraints C = {C0, ..., Cn} upon input and powerlaw) with 3D geometric components (e.g. an data D by iterating through a series of models, compar- AGN modeled as a sphere). The fitting is performed di- isons to data, and refinements of assumptions and model rectly in ISIS, as is the generation of the residual, data, parameters. Because its analytic process does not en- and model images, and 2D and 3D model component compass whole-array numerics (§3, §4), an iteration of projections (Fig. 9). this cycle in XSPEC can involve dumping application state to disk, followed by the execution of multiple tools 7.3. Volume Visualization such as fselect or fcalc to transform or cut file data, Astrophysical observation and simulation are in the possibly fv or XIMAGE to visualize filtered data subsets, midst of a transition beyond 1D spectra and 2D images, and XSPEC to incorporate the results back into analysis. and into the realm of three-dimensional phenomena. A This process creates temporary file litter products which number of astronomy software tools exist which enable can quickly become difficult to manage, and requires slow visualization of so-called 3D data cubes, but a problem disk I/O passes over files when applying each unique set shared by many of them is that they are limited to dis- playing one 2D slice at a time, optionally in series as 15 http://ssc.spitzer.caltech.edu/archanaly/contributed/cubism 16 http://www.paraview.org, http://www.llnl.gov/visit 14 http://space.mit.edu/hydra/E2D demo/e2d demo.html 17 http://space.mit.edu/hydra/v3d.html 10 Noble et al.

Fig. 8.— Interactive 3-color image creation in evt2img, using the level-2 event file acisf00120N001 evt2.fits from Chandra observation 120 of supernova remnant E0102-72. One may pan around (by clicking to center the image), zoom in and out, and set various scaling properties by clicking on the corresponding buttons. The image in the left column shows a fair amount of high energy background noise, as evidenced by the blue background pixels. These were removed in realtime by adjusting the colored energy band sliders that define the RGB levels (bottom panels), to yield the improved image in the right column. of filter constraints. Other limiting factors are that it re- 1 (Nowak et al. 2005), correlate model parameters with stricts the scope of exploration to data expressed in the fit results—directly in memory from dozens of fits—for FITS file format, and that TCL-based I/O in tools like scores of observations (Wilms et al. 2006; Markoff et al. fv does not scale well to datasets containing millions of 2007; Nowak et al. 2008), and quickly corroborate col- events. league results (Heinz et al. 2007). In contrast, VWhere (Noble 2005), a graphical ex- Filtering in VWhere amounts to manipulating regions of tension of the S-Lang where array slicing function, interest on plots generated from S-Lang arrays (Fig. 11), unifies the constraint cycle by enabling each phase to with no syntax required and the effects upon any combi- be performed within a single application. The result nation of vectors automatically visualized for inspection can be an intuitive, faster, and more powerful alter- (Fig. 12). This approach reveals data correlations that native to file-based data mining. By supporting the can be difficult and time-consuming to ferret out with use of interpreted S-Lang arrays ISIS & VWhere make tool-based techniques. New data vectors may be created it easy to combine data from multiple sources and for- on the fly and with no additional I/O overhead, using mats, not just FITS files, and examine them in natu- anything from simple arithmetic combinations of existing ral ways, as well as foster scalability. VWhere has been vectors to any mathematical transformation that can be used to simultaneously mine hundreds of observations specified in S-Lang or imported from external C, C++, from multiple telescopes, e.g. for evidence of transi- or Fortran codes. tions between the soft and hard states of Cygnus X- In contrast, file-based filtering tools require the use of Beyond XSPEC: Towards Highly Configurable Astrophysical Analysis 11

Fig. 9.— Images created in ISIS by HYDRA Event-2D and Source-3D routines, while analyzing cavities within the Chandra observation 4969 of Hydra A. Top: residuals, data, and model. Bottom: 2D and 3D projections.

Fig. 10.— Entire datacube from Spitzer observation 3310; the two clumpy areas are instantly recognizable as regions of strong emission. 12 Noble et al. syntax which may conflict in subtle ways from package of science, and remains a topic of debate in wider circles to package. Specifying mathematical operations to such (Collins 1992; Giles 2006), but is not treated in depth tools, e.g., in the form of quoted Unix shell syntax, can be within the astronomy literature. We do not attempt tedious and cumbersome, and does not approach the ex- to fill that gap here. Rather, we aim only to address pressive power (consider recursion, for example) or per- the perception that configurable, interpreter-based meth- formance of ISIS numerics. As an illustration, consider ods compromise reproducibility relative to file-based tool the use of calculator tools for the discriminant computa- tion in §3: the 1-million element S-Lang array datapoint methods, and hope that this may contribute to a more in Fig. 2 corresponds to a runtime of ca. 0.15 seconds: complete treatment of reproducibility elsewhere. Central to reproducibility is the recording of history. isis> a = [1:1e6+1]; b = 3*a; c = 2*a Many tools assist that process by annotating modified isis> tic; d = sqrt(b^2 - 4*a*c); toc files with FITS history keywords. This is of clear utility, 0.151172 especially when tracing pipelines and other well-defined Using tools to perform similar operations upon files data reduction procedures. For open-ended analysis, however, we do not believe it is superior to the forms fcalc in out d "sqrt(b^2 - 4*a*c)" ftcalc in out d "sqrt(b^2 - 4*a*c)" of history recorded by full-fledged analysis applications. dmtcalc in out expression="d=sqrt(b^2 - 4*a*c)" Consider what is being captured in history records writ- ten by tools: each keystroke of the command used to (from a locally compiled -O2 distribution of HEASOFT invoke the tool. This capability is not unique to tools. 6.1 and a CIAO3.3 binary distribution, respectively) was XSPEC and ISIS record keystroke history by virtue of consistently 1 to 2 orders of magnitude slower: roughly their GNU readline command recall, editing, and log- 1.8 sec for fcalc, 2.3 sec for ftcalc and 70 sec for ging capabilities, as do many other systems. Focusing dmtcalc. If we include the time to write on keystrokes alone, though, obscures a larger point: re- isis> s = struct{a,b,c}; s.a=a; s.b=b; s.c=c gardless of whether one prefers logs or header keywords, isis> fits_write_binary_table("in","arrays",s) indiscriminately long lists of commands typed are of little value without some sense of their relevance to publish- the arrays to disk and then read them back in able results. isis> d = fits_read_table("out").d In the end what matters most is that results may be regenerated so that conclusions may be plausibly con- the performance penalty of the tool approach is even firmed by others, rather than having every bump or greater. Moreover, the output product of each tool is wrong turn along the way reproduced in high fidelity. not directly useful: to be visually inspected or used in Scripts, the logical endpoint of our interpreter-based ap- further analysis, for instance, it first needs to be loaded proach to analysis, confer this conceptual prioritization into another program (e.g. fv, or XSPEC), incurring ad- by making explicit the data and algorithms of greatest ditional I/O and application overheads that are not seen significance. Such scripts arise in tool-based analysis as in ISIS because its result arrays may be immediately ma- well, only they tend to be expressed in system-oriented nipulated or passed to subsequent computations. languages like Bourne shell or Perl, rather than intrinsi- Shared memory can, in principle, mitigate the disk I/O cally whole-array numeric languages like S-Lang or IDL. cost, but in this case it did not help: instructing ftcalc It can be argued, then, that scripts lead to higher forms to write to shmem://h1 consistently resulted in runtimes of duplicability than FITS history records alone. They of nearly 3 minutes, while the CIAO dmtcalc tool would are also easier to annotate with additional commentary. not permit the creation of a shared memory FITS table. We therefore conclude that that tool- and interpreter- Even if shared memory were faster than files in this case, based approaches to analysis are approximately equiva- the point raised earlier in §4 still remains, namely that lent in the degree to which they facilitate reproducibility. XSPEC documents no clear provision though which such Care must be still taken when “feature creep” introduces generic data arrays may be utilized in analysis. It must incompatibilities that make the use of older scripts prob- also be noted that few, if any, file-based transformation lematic with newer software versions, so it is important tools are extensible. If we wished to use a hypergeometric that version information be recorded and that older soft- function in VWhere it is as simple as loading the GSL ware remains accessible on the internet. module with require("gsl") and calling the relevant function. If the mathematical operations required for 8. CONCLUSION their research are not supported, tool users would have to Progress in science can be measured by our ability to either make an enhancement request and wait or create pose increasingly advanced, open-ended questions and their own solution. address them with flexible techniques of commensurate 7.5. Aside On Reproducibility sophistication. Since the age of Newton, one implication of this is that scientists must also practice mathematics, We have been refining the approach to analysis es- and since the age of Turing it has also meant they must poused in this paper for much of the past decade. Dur- practice programming. ing that time, one of the more persistent concerns we In this paper we have compared two open-source spec- have privately encountered is that it can lead to dimin- tral analysis applications, XSPEC and ISIS, in an at- ished reproducibility, particularly in contrast to the file 18 tempt to gauge how their scientific and mathematical → tool → file model. Reproducibility is a cornerstone reach may be extended with custom programming. Con- 18 This is relevant to the overall paper because it is the dominant trasts between the two have been drawn from the con- means through which data are prepared for analysis in XSPEC, but text of current research efforts utilizing high performance applies to other analysis packages as well. computation, numerical modeling, atomic physics, visu- Beyond XSPEC: Towards Highly Configurable Astrophysical Analysis 13

Fig. 11.— VWhere in ISIS using SITAR and the AGLC package to generate a lightcurve for Chandra observation 3814 of Cygnus X-1. Configurable plots are generated from an axis expression window containing several text fields. Each X/Y expression may contain any valid S-Lang statement, including calls to C, C++ or Fortran functions within external modules. The chief constraint upon an expression is that, when evaluated, it generate a numeric vector equal in length to the existing data vectors. Here we show VWhere launched with a struct containing 3 input vectors, as well as how easily new vectors (e.g. lchan + mchan) can be fabricated from them on the fly. Each vector expression is saved in the Choose dropdown menu, for easy reuse with no retyping. Vectors of arbitrary size may also be overplotted, while the S-Lang> prompt may be used as an interactive command line, e.g., to issue ISIS commands or execute arbitrary S-Lang code.

Fig. 12.— A polygon filter applied to the color intensity diagram, and its effect upon the lightcurve (Hanke et al, in prep.). 14 Noble et al. alization, automated code generation, and data manage- vorably equipped to support ad-hoc research needs while ment. We have demonstrated how these have led to the not appreciably compromising reproducibility. Origi- incorporation of new analytic techniques in ISIS, in ways nally envisioned as a tool for Chandra gratings spec- that are either unexplored or problematic for the current troscopy, ISIS has heavily influenced the development XSPEC architecture and its associated file → tool → file of additional Chandra analysis software, while also re- model of analysis. ceiving NASA AISRP funding to continue its evolution XSPEC has been of tremendous value to the X-ray within the aforementioned HYDRA project. Although community, continues to be actively developed, and has some of the ISIS capabilities we’ve emphasized do exist a strong history of community enhancement via local in other astronomy software, we are unaware of any pub- models (and to some extent, TCL/TK scripts). How- lications demonstrating how similar breadths of flexibil- ever, we have argued that the top-level simplicity of the ity and computational power have been collected under XSPEC interface, long and rightfully one of the pillars the umbrella of a single open-source analysis application of its appeal, can shroud much of its internal computa- and brought to bear on published research in spectral tional and data handling mechanisms. This in turn can analysis and X-ray astronomy in the manner discussed render XSPEC less adaptable—by the typical user—for here. evolving research needs. Commands which operate as black-boxed common denominators of analysis, and per- mitting only file-based data input, may not be enough We would like to thank Daniel Dewey, J¨orn Wilms, to probe some of the more computationally challenging and our MIT colleagues for reviewing drafts of the problems facing astronomers. paper, as well as Tracey DeLaney for supplying the We conclude that analysis applications such as ISIS, Spitzer 3D spectral cubes of CasA. We are grateful to endowed with interpreted whole-array numerical capabil- our anonymous referee for constructive criticism, and to ities and an open, modular, and scriptable architecture NASA for funding this work through the AISRP grant designed expressly for high configurability, are more fa- NNG06GE58G.

REFERENCES Arnaud, K. A. (1996). XSPEC: The First Ten Years. In Jacoby, Gropp, W., Lusk, E., and Skjellum, A. (1999). Using MPI (2nd G. H. and Barnes, J., editors, ASP Conf. Ser. 101: ed.): portable parallel programming with the message-passing Astronomical Data Analysis Software and Systems V, pages interface. MIT Press, Cambridge, MA, USA. 17–+. Heinz, S., Schulz, N. S., Brandt, W. N., and Galloway, D. K. Ayers, L. (1996). S-Lang Applications for Linux. The Linux (2007). Evidence of a Parsec-Scale X-Ray Jet from the Gazette, 12. http://linuxgazette.net/issue12/slang.html. Accreting Neutron Star Circinus X-1. ApJ, 663:L93–L96. Beazley, D. (1996). SWIG : An Easy to Use Tool for Integrating Houck, J. C. (2002). ISIS: The Interactive Spectral Interpretation Scripting Languages with C and C++. In 4th Tcl/Tk System. In Branduardi-Raymont, G., editor, High Resolution Workshop Proceedings. X-ray Spectroscopy with XMM-Newton and Chandra, Mullard Bland, W., Naughton, T., Vallee, G., and Scott, S. L. (May 2007). Space Science Laboratory of University College, London. Design and implementation of a menu based oscar command Hwang, U., Laming, J. M., Badenes, C., Berendse, F., Blondin, line interface. High Performance Computing Systems and J., Cioffi, D., DeLaney, T., Dewey, D., Fesen, R., Flanagan, Applications, 2007. HPCS 2007. 21st International Symposium K. A., Fryer, C. L., Ghavamian, P., Hughes, J. P., Morse, J. A., on, pages 25–25. Plucinsky, P. P., Petre, R., Pohl, M., Rudnick, L., Sankrit, R., Brenneman, L. W. and Reynolds, C. S. (2006). Constraining Slane, P. O., Smith, R. K., Vink, J., and Warren, J. S. (2004). Black Hole Spin via X-Ray Spectroscopy. ApJ, 652:1028–1043. A Million Second Chandra View of Cassiopeia A. ApJ, Collins, H. M. (1992). Changing Order: Replication and 615:L117–L120. Induction in Scientific Practice. University of Chicago Press. Jordan, J. M., Jennings, D. G., McGlynn, T. A., Bonnell, J. T., Davis, J. E., Houck, J. C., Allen, G. E., and Stage, M. D. (2005). Gliba, G. W., Ruggiero, N. G., and Serlemitsos, T. A. (1994). Analyzing the Cas A Megasecond in Less than a Megasecond. An Xwindows/Motif Graphical User Interface for Xspec. In In Shopbell, P., Britton, M., and Ebert, R., editors, Crabtree, D. R., Hanisch, R. J., and Barnes, J., editors, ASP Astronomical Society of the Pacific Conference Series, pages Conf. Ser. 61: Astronomical Data Analysis Software and 444–+. Systems III, pages 71–+. de La Pe˜na, M. D., White, R. L., and Greenfield, P. (2001). The Joye, W. A. and Mandel, E. (2003). New Features of SAOImage PyRAF Graphics System. In Harnden, Jr., F. R., Primini, DS9. In Payne, H. E., Jedrzejewski, R. I., and Hook, R. N., F. A., and Payne, H. E., editors, Astronomical Data Analysis editors, ASP Conf. Ser. 295: Astronomical Data Analysis Software and Systems X, volume 238 of Astronomical Society Software and Systems XII, pages 489–+. of the Pacific Conference Series, pages 59–+. Kallman, T. R. and Palmeri, P. (2007). Atomic data for x-ray Folk, M., McGrath, R., and Yeager, N. (1999). HDF: an update astrophysics. Reviews of Modern Physics, 79:79–133. and future directions. In IEEE Geoscience and Remote Kiczales, G. (Jan 1996). Beyond the black box: open Sensing Symposium, 1999, Vol. 1, pages 273–275. implementation. Software, IEEE, 13(1):8, 10–11. Fryxell, B., Zingale, M., Timmes, F. X., Lamb, D. Q., Olson, K., Kling, R. (1999). Application Development with IDL. Ronn Calder, A. C., Dursi, L. J., Ricker, P., Rosner, R., Truran, Kling Consulting. J. W., MacNeice, P., and Tufo, H. (2001). Numerical Kurtz, M. J., Karakashian, T., Grant, C. S., Eichhorn, G., simulations of thermonuclear flashes on neutron stars. Nuclear Murray, S. S., Watson, J. M., Ossorio, P. G., and Stoner, J. L. Physics A, 688:172–176. (1993). Intelligent Text Retrieval in the NASA Astrophysics Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., Data System. In Hanisch, R. J., Brissenden, R. J. V., and and Sunderam, V. S. (1994). PVM: Parallel Virtual Machine: Barnes, J., editors, Astronomical Data Analysis Software and A Users’ Guide and Tutorial for Networked Parallel Systems II, volume 52 of Astronomical Society of the Pacific Computing. Conference Series, pages 132–+. Gilat, A. (2004). MATLAB: An Introduction with Applications, Lacroute, P. and Levoy, M. (1994). Fast volume rendering using 2nd Ed. John Wiley & Sons. a shear-warp factorization of the viewing transformation. In Giles, J. (2006). The trouble with replication. Nature, SIGGRAPH ’94: Proceedings of the 21st annual conference on 442:344–+. Computer graphics and interactive techniques, pages 451–458, New York, NY, USA. ACM Press. Beyond XSPEC: Towards Highly Configurable Astrophysical Analysis 15

Mandel, E., Swick, R., and D., T. (1995). The X Public Access Nowak, M. A., Wilms, J., and Pottschmidt, K. (2008). In Mechanism: Software Cooperation for Space Science and preparation. Beyond. Proceedings of the 9th Annual X Technical Pence, W., Xu, J., and Brown, L. (1997). FV: A New FITS File Conference, Boston, MA. Visualization Tool. In Hunt, G. and Payne, H., editors, ASP Markoff, S. and Nowak, M. A. (2004). Constraining X-Ray Conf. Ser. 125: Astronomical Data Analysis Software and Binary Jet Models via Reflection. ApJ, 609:972–976. Systems VI, pages 261–+. Markoff, S., Nowak, M. A., and Wilms, J. (2005). Going with the Primini, F., Noble, M., and The CXC Science Data Systems Flow: Can the Base of Jets Subsume the Role of Compact Group (2005). Extending the Capabilities of CIAO with Accretion Disk Coronae? ApJ, 635:1203–1216. S-Lang-based Tools. In Shopbell, P., Britton, M., and Ebert, Markoff, S., Young, A., and Nowak, M. A. (2007). In preparation. R., editors, ASP Conf. Ser. 347: Astronomical Data Analysis Meißner, M., Huang, J., Bartz, D., Mueller, K., and Crawfis, R. Software and Systems XIV, pages 17–+. (2000). A practical evaluation of popular volume rendering Smith, R. K., Brickhouse, N. S., Liedahl, D. A., and Raymond, algorithms. In VVS ’00: Proceedings of the 2000 IEEE J. C. (2001). Collisional Plasma Models with APEC/APED: symposium on Volume visualization, pages 81–90, New York, Emission-Line Diagnostics of Hydrogen-like and Helium-like NY, USA. ACM Press. Ions. ApJ, 556:L91–L95. Noble, M. S. (2005). VWhere: A Visual, Extensible ‘where’ Stage, M. D., Allen, G. E., Houck, J. C., and Davis, J. E. (2006). Command. In Shopbell, P., Britton, M., and Ebert, R., Cosmic-ray diffusion near the Bohm limit in the Cassiopeia A editors, ASP Conf. Ser. 347: Astronomical Data Analysis supernova remnant. Nature Physics, 2:614–619. Software and Systems XIV, pages 237–+. Trowbridge, S. R., Nowak, M. A., and Wilms, J. (2007). Noble, M. S. (2007). Getting More From Your Multicore: Tracking the Orbital and Super-orbital Periodicities of SMC Exploiting OpenMP From An Open Source Numerical X-1. In press. Scripting Language. Computation and Concurrency: Practice Wilms, J., Nowak, M. A., Pottschmidt, K., Pooley, G. G., and and Experience. In press. arXiv e-print at Fritz, S. (2006). Long term variability of Cygnus X-1. IV. http://adsabs.harvard.edu/abs/2007arXiv0706.4048N. Spectral evolution 1999-2004. A&A, 447:245–261. Noble, M. S., Houck, J. C., Davis, J. E., Young, A., and Nowak, Wise, M. and Houck, J. (2004). Mapping Heating in Clusters of M. (2006). Using the Parallel Virtual Machine for Everyday Galaxies. In 35th COSPAR Scientific Assembly, volume 35 of Analysis. In Gabriel, C., Arviset, C., Ponz, D., and Enrique, COSPAR, Plenary Meeting, pages 3997–+. S., editors, ASP Conf. Ser. 351: Astronomical Data Analysis Software and Systems XV, pages 481–+. Norman, D. (2007). The next ui breakthrough: command lines. interactions, 14(3):44–45. Nowak, M. A., Wilms, J., Heinz, S., Pooley, G., Pottschmidt, K., and Corbel, S. (2005). Is the “IR Coincidence” Just That? ApJ, 626:1006–1014.