Everything You've Heard About Agile Development Is Wrong
Total Page:16
File Type:pdf, Size:1020Kb
2016 ASTRONOMICAL DATA ANALYSIS SYSTEMS AND SOFTWARE CONFERENCE Tutorials T1: Simon O'Toole Australian Astronomical Observatory Everything you’ve heard about Agile development is wrong ora sess. dalle I propose an educational session on Agile software development. There are many astronomical projects where using Agile methods can lead to great efficiency gains. This is especially important in a time of ever-limited ora relazione dalle resources. I will put to rest many of the misconceptions about Agile and provide an overview of the various Agile methodologies. The main focus of the session will be to introduce the common Agile techniques and the basics of prioritisation and timeboxing. The goal of the tutorial is to teach skills that can be incorporated into new, and even existing, astronomical software projects. Tutorial abstract TRIESTE, ITALY 16 - 20 October 2016 2016 ASTRONOMICAL DATA ANALYSIS SYSTEMS AND SOFTWARE CONFERENCE Tutorials T2: Thomas Robitaille Freelance Scientific Software Developer Multi-dimensional linked data exploration with glue ora sess. dalle Modern data analysis and research projects often incorporate multi-dimensional data from several sources, and new insights are increasingly driven by the ability to interpret data in the context of other data. Glue ora relazione dalle (http://www.glueviz.org) is a graphical environment built on top of the standard scientific Python stack to visualize relationships within and between data sets. With glue, users can load and visualize multiple related data sets simultaneously, specify the logical connections that exist between data, and this information is transparently used as needed to enable visualization across files. Glue includes a number of data viewers such as a scatter plot viewer, an image viewer, and more advanced 3D viewers, and also provides a mechanism for users to build their own custom visualizations. The aim of this (beginner) tutorial will be to get users set up with glue, loading datasets, interactively learning about the different viewers, and exploring data with linked selections. Tutorial abstract TRIESTE, ITALY 16 - 20 October 2016 2016 ASTRONOMICAL DATA ANALYSIS SYSTEMS AND SOFTWARE CONFERENCE Session 1 - Key Theme 4 - Long-term Management of Data Archives I1.1: Cristophe Arviset ESA, European Space Astronomy Centre, Madrid, Spain From ISO to Gaia : a 20-years journey through data archives management ora sess. dalle In the mid-90s, ESA decided to change its data management strategy and started to build at ESAC data archives for its space science missions, initially for its Infrared Space Observatory and then expanding through other ora relazione dalle astronomy missions and later on, to planetary and solar heliospheric missions. The ESAC Science Data Centre now hosts more than 15 science archives, with various others in preparation. Technology has evolved a lot through this period, from the simple web pages towards rich thin layer web applications, interoperable and VO built-in archives. Maintaining old legacy archives while building new and state of the art ones (eg Gaia), managing people and preserving expertise over many years, offering innovative multi missions services and tools to enable new science (ESASky) have been some of the many challenges that had to be dealt with. Future prospects ahead of us also look exciting with the advent of the "Archives 2.0" concept, where scientists will be able to work "within" the archive itself, bringing their analysis code to the data, sharing their data, code and results with others. Data Archives have been and continue to be in constant transformation and they are now to evolved towards collaborative science exploitation platforms. Oral abstract TRIESTE, ITALY 16 - 20 October 2016 2016 ASTRONOMICAL DATA ANALYSIS SYSTEMS AND SOFTWARE CONFERENCE Session 1 - Key Theme 4 - Long-term Management of Data Archives O1.2: Sara Nieto ESAC - European Space Astronomy Center, European Space Agency, Spain THE EUCLID ARCHIVE SYSTEM: A Data-centric approach to big data ora sess. dalle Euclid is the ESA M2 mission and a milestone in the understanding of the geometry of the Universe. Euclid faces two main challenges from the point of view of the data processing. Firstly, the unprecedented accuracy which ora relazione dalle must be achieved in order to meet the scientific goals. Secondly, the mission will depend heavily on the processing and reprocessing of ground-based data which will form the bulk of the stored data volume. In total Euclid will produce up to 26 PB per year of observations. The Euclid Archive System (EAS) is in the core of the Euclid Science Ground Segment. It supports the processing and storage of Euclid data from the raw frames to the creation of science-ready images and catalogues. The Euclid Archive System consists of three components. The Data Processing System (DPS) provides a centralized metadata storage system to support data processing while the Distributed Storage System (DSS) stores the data files. Regarding the long term preservation, the EAS will provide access to the most valuable scientific metadata through the Science Archive System (SAS). The SAS is being built at the ESAC Science Data Centre (ESDC), which is responsible for the development and operations of the scientific archives for the Astronomy, Planetary and Heliophysics missions of ESA. The SAS is focused on the needs of the scientific community and it will provide access to the most valuable scientific metadata coming through a set of public data releases. The DPS implements the object-orientated Euclid Common Data Model which describes both the scientific data (data products generated by pipelines) and the processing/operational metadata. The latter includes the processing and data distribution orders, location of the file in the DSS and processing plans. The content of the DPS is mapped to the SAS which implements the relational Science Exploitation Data Model, optimised for use in scientific exploration. We review the architectural design of the system, implementation progress with tests and the main challenges in the building of the EAS. Oral abstract TRIESTE, ITALY 16 - 20 October 2016 2016 ASTRONOMICAL DATA ANALYSIS SYSTEMS AND SOFTWARE CONFERENCE Session 1 - Key Theme 4 - Long-term Management of Data Archives O1.3: Stephan Witz NRAO - National Radio Astronomy Observatory Towards a Self-Healing Archive ora sess. dalle The new NRAO Archive encompasses data from the Jansky VLA, the legacy VLA, the Green Bank Telescope and the VLBA while additionally providing access to ALMA data stored and managed separately. In this environment, ora relazione dalle metadata is extracted centrally but generated independently by different software for each instrument. Errors in metadata generation and extraction are unavoidable, but after fixing the bug, how do you correct the data? This paper introduces a self-healing approach that leverages otherwise idle archive storage nodes by having them continuously re-parse stored metadata with the latest software. Upon detecting a difference, the re-parser can take certain actions, such as updating incorrect records in the searchable metadata database, or broadcasting a notification. Data validity can be verified at the same time if desired. Oral abstract TRIESTE, ITALY 16 - 20 October 2016 2016 ASTRONOMICAL DATA ANALYSIS SYSTEMS AND SOFTWARE CONFERENCE Session 2 - Key Theme 4 - Long-term Management of Data Archives O2.1: Walter Landry IRSA - NASA/IPAC Infrared Science Archive Instantaneous Archives ora sess. dalle The NASA/IPAC Infrared Science Archive (IRSA) is one of the largest and busiest astronomy archives in the world. In the past, our main emphasis was on making new data and new capabilities available. With the ora relazione dalle widespread implementation of Virtual Observatory protocols, there are a number of useful tools that can quickly and easily perform insightful, sophisticated queries from archives around the world. The queries, if not handled quickly, can easily overwhelm the site and interfere with other users. In addition, reducing latency below the point of human perception enables more interactive and exploratory science. In this talk, I will discuss our multi-pronged efforts to improve performance on all levels. This includes: 1) Upgrading network hardware and links. 2) Fine tuning the indexing and partitioning strategies for our traditional databases. 3) Benchmarking various spatial indexing schemes (htm, q3c, h3c, postgis). 4) Rearchitecting our query pipeline to eliminate process,filesystem, and database connection overheads. Taken together, these improvements have delivered radical, order of magnitude improvements in latency and throughput. I will also discuss how distributed and in-memory databases could be used to improve performance even more. Oral abstract TRIESTE, ITALY 16 - 20 October 2016 2016 ASTRONOMICAL DATA ANALYSIS SYSTEMS AND SOFTWARE CONFERENCE Session 2 - Key Theme 4 - Long-term Management of Data Archives O2.2: Sarah Frances Graves East Asian Observatory The JCMT SCUBA-2 Legacy Release: Unexpected Benefits and Lessons Learned ora sess. dalle East Asian Observatory is currently releasing all SCUBA-2 850um data taken from 2011 to 2015, re-reduced in a uniform manner with automatically produced coadds and catalogs. While the primary reason for doing this ora relazione dalle release was to produce a scientifically useful data product for our community, the process of creating it generated many benefits for the observatory itself. We produced a new 'generic'