The Messenger
Total Page:16
File Type:pdf, Size:1020Kb
SN 1987A 30th anniversary The Messenger The ALMA Science Archive ALMA Band 5 VANDELS high-z galaxy survey No. 167 –March 2017 167 No. Telescopes and Instrumentation DOI: doi.org/10.18727/0722-6691/5000 The ALMA Science Archive Felix Stoehr1 Figure 1. Fraction of 1 ALMA publications that Alisdair Manning 20 % PI + archival 1 make use of either only Christophe Moins archival archival ALMA data 2 Dustin Jenkins (green) or both ALMA PI Mark Lacy3 and archival data at the Stéphane Leon4 15 % same time (blue). 2013 5 was the first year when Erik Muller ALMA PI data became 5 Kouichiro Nakanishi public and thus the first 6 archival publications Brenda Matthews 10 % Séverin Gaudet2 appear in 2014. Eric Murphy3 Kyoko Ashitagawa5 Akiko Kawamura5 5% 1 ESO 0% 2 Canadian Astronomical Data Centre (CADC), National Research Council of 2012 2013 2014 2015 2016 Canada, Victoria, Canada 3 National Radio Astronomy Observatory (NRAO), Charlottesville, USA proposal process, one of the main pur- in the world at that time. As astronomy 4 Joint ALMA Observatory (JAO), Vitacura, poses of a science archive is indeed to will inevitably transform into a science Santiago, Chile enable independent research. where the largest fraction of observed 5 National Astronomical Observatory of pixels will never be looked at by a human, Japan (NAOJ), National Institutes of For only a very small fraction (of the machine-aided analysis will inevitably Natural Sciences, Tokyo, Japan order 1–3 %) of the total yearly opera- increase in importance. This approach 6 National Research Council of Canada, tional cost of a facility, substantial addi- includes scientific pre-analysis (for exam- Victoria, Canada tional scientific progress can be obtained ple, the ALMA Data MIning Toolkit, through public provision of a science ADMIT: Teuben et al., 2015), remote visu- archive. This is, for example, true in the alisation (for example, the Cube Analysis Science archives help to maximise case of the Hubble Space Telescope and Rendering Tool for Astronomy, the scientific return of astronomical (HST), where publications making use of CARTA: Rosolowsky et al., 2015) and facilities. After placing science archives archival data have by now outnumbered remote analysis (code-to-data), as well into a slightly larger context, we the publications of PI observations by as analysis based on machine learning. describe the current status and capa- the proposing teams. Romaniello et al. In particular, deep learning is currently bilities of the ALMA Science Archive. (2016) also report the growth of an ESO witnessing an epochal change and dra- We present the design principles and archive community, where almost 30 % matic new possibilities can be expected technology employed for three main of users downloading data from the Sci- over the next few years. Successful contexts: query; result set display; and ence Archive Facility (SAF) have never approaches, like automatic caption gen- data download. A summary of the been PI or co-investigator of an ESO pro- eration for images2 and human-quality ALMA data flow is also presented as posal. For the still very young Atacama astronomical object classification are access statistics to date. Large Millimeter/submillimeter Array ( Dieleman et al., 2015), give an indication (ALMA) facility and its ALMA Science of the future prospects in this area. A Archive1, we can report a rapidly increas- powerful well-characterised science Introduction ing fraction of publications making use archive is the basis of such data-mining. of archival data (Figure 1), already reach- The overall success of an astronomical ing 16 % (or 27 % including publications Depending on the nature of the project facility is measured by the quality and from Science Verification data) in 2016 and its goals, and notwithstanding the quantity of science produced by its com- (see also Stoehr et al., 2015). remark about the small operational costs munity. By helping the principal inves- of archives, the fraction of the total cost tigators (PIs) and archival researchers of The requirement that data be well that astronomical facilities spend on the facility to easily discover, explore and described and easy to discover through data management is expected to slowly download the data they need, a science science archives can be expected to increase. An extreme showcase of this archive helps to maximise the scientific grow rapidly in the future, as the amount evolution, admittedly in a different con- return and thus to increase the success of data increases exponentially. For text, is the Large Synoptic Survey Tele- of the facility. In addition to the delivery of example, we estimate that the fully opera- scope (LSST), where 52 % of the total data to PIs, provision of data-persistence tional Square Kilometre Array (SKA) will survey cost of $1.25 billion is expected for independent verification of scientific deliver around 200 TB per year of sci- to be spent on data management3. results and duplication checking in the ence images for every active astronomer 2 The Messenger 167 – March 2017 Figure 2. Query inter- face of the ALMA Science Archive with grouped keywords displaying self-opening input fields, unobtrusive tooltip help and the three different views for selection. Querying cations and proposals. Currently 31 input – show all public, but unpublished, fields are available, of which 14 are observations. This enables the ALMA Searching astronomical data via search numerical. For the input fields, a variety project to survey non-publishing PIs interfaces differs greatly from standard of operators can be used (equals, like, and to investigate the reasons why they web searches, such as that provided by or, <, >, range, not, …). The query is could not publish (Stoehr et al., 2016); Google. Whereas the latter solve the completely unscoped, that is we do not – show all publications making use of problem “find words in a collection of text require users to first query by position or full-polarisation data; documents”, searches in astronomical object name, or even require any con- – show the proposals, data from which archives are inherently multi-dimensional straint at all. Hitting search without con- were used in publications having and many parameters are numerical straints will return the full holdings. This “molecular hydrogen” in the publication rather than textual. In that sense, astro- choice also has the positive side-effect abstract; nomical search engines are closer to that the multi-parameter search capability – show all publications making use of product-finder search engines4. Moreo- is automatically extended to all the more data from the programme “Discs ver, the target audience of astronomical rarely used columns in the results table around high-mass stars”; searches is extremely homogeneous which do not show up on the query form, – show all observations of active galaxies and highly educated, as the vast majority but for which we still provide a sub-filtering reaching line sensitivities of 1 mJy/beam of the users will hold degrees in astron- capability on the results page, like, for at 10 km s–1 resolution or continuum omy or physics. example, whether or not an observation sensitivities of 0.1 mJy/beam. is a mosaic or which antenna types were With this consideration in mind, our main employed. The user can choose to dis- Maximally physical query design principles in the ALMA Science play the results of any query in a view Great efforts have been made to allow Archive are: access to the full parameter where one row corresponds to one constraints to be placed on as many space; a maximally physical query; and, observation, or to one project, or even to physical parameters as possible, accord- at the same time, minimal interaction one publi cation. Given the homogeneous ing to the main properties a photon cost. We consider each of these princi- and educated audience, we intentionally carries: position, energy, time and polari- ples in turn. chose not to provide an additional “basic” sation; see also Stoehr et al. (2014). interface. Examples are the angular and spectral Full parameter space resolutions, the field of view, frequencies, In the ALMA Science Archive we provide This multi-dimensional unscoped inter- bandwidth and the largest angular scale. the capability to place query constraints face permits powerful queries to be exe- In addition, users can now also query simultaneously on observations, publi- cuted. For example: on the estimated sensitivity expected to The Messenger 167 – March 2017 3 Telescopes and Instrumentation Stoehr F. et al., The ALMA Science Archive Figure 3. Results page for the ALMA Science Archive featuring foot- print display on Aladin- Lite and the results table with sub-filtering, sort- ing, adding/removing of any of the 37 columns and the bookmarking/ exporting links. be reached for line or continuum obser- ALMA archive context this means reduc- In contrast to the one-line interfaces of vations. This value, corresponding to the ing the cost of reading, identifying, as word-in-text searches, the knowledge limiting magnitude in optical observa- well as memorising, the structure and of the search space (“what constraints tions, is a particularly useful constraint. functionality of the interface. It also can be given”) on advanced interfaces is In addition, we capture the physical con- includes reducing the mouse travel dis- not trivially acquired by the users. There- tent of the observations from the users, tance and the number of mouse clicks, fore the first task of any such interface offering the scientific keywords specified as well as ensuring that users should must be to explain that search space. In when the proposal was written and the not be forced to leave the page during order to reduce the interaction cost of scientific categories, as well as allowing their interaction with the interface. A key this process, we visually group the con- searches through the titles and abstracts to reducing interaction cost is to only cepts, order them by importance within of the proposals and also publications provide the information to the users that the groups, remove everything that is making use of ALMA data.