Transcript

A COURSE ON ASTRONOMY AND ASTROPHYSICS, IUCAA

MODULE 9 ASTRONOMY FROM ARCHIVAL DATA

Chapter 9.2 Astronomy from archival data - II Yogesh Wadadekar, NCRA-TIFR

[00:00:10] The tools that we have looked at so far: TopCat, Aladin etc, were developed as part of a world-wide effort known as the Virtual Observatory. Let's spend a few minutes looking at what the Virtual Observatory is all about.

Section 9.2.1 The Virtual Observatory

[00:00:32] The Virtual Observatory had a vision that astronomical data sets and other resources related to astronomy should work as a seamless whole. Many different projects and many different organisations and data centres across the world are working towards this goal. They have come together and formed the International Virtual Observatory Alliance, which debates and agrees on the technical standards that are needed to make a working Virtual Observatory.

[00:01:11] So the main contribution of the International Virtual Observatory Alliance is in the development of the standards and protocols that will ensure that astronomical data as taken with different telescopes on space and on the ground are completely interoperable. That means you can share data from one telescope annal.. with analyse it with any software and then share it with some other data obtained at some other telescope.

[00:01:48] The organisation also acts as a focus group for VO aspirations and provides a framework for discussing and sharing ideas and technology. And is a body for promoting and publicising the Virtual Observatory. It's done a lot of good work over the last decade or decade and a half. And the tools that we have looked at: TOPCAT, Aladin etc. were all developed by the VO community.

[00:02:23] India is one of the pioneers in this effort. And has developed some other tools like VO plot, which is a VO compatible plotting tool which are used by many applications Transcript within the VO community. In fact tools like Aladin and TOPCAT use the VO plot tool as a plotting tool for their application. There are a number of VO tools that have been developed for different applications. There is a full list available at the URL that is listed at the bottom of this page. There are some important concepts that are related to the VO which anyone who wants to use archival data for research should be aware of. So, I'm just going to ta lk about them very briefly.

[00:03:24] The first is the FITS data format, which we've already encountered before. It stands for the flexible image transport system. It is the standard format for astronomical images and tables. It provides a standard way of expressing metadata, which means descriptions of the data contained in the image or table, as a pair of key word and its value. And specifies a small number of mandatory keywords along with a somewhat larger number of reserved keywords, which can't be used anywhere else, which have to have a particular meaning if a file uses them.

[00:04:10] The second major innovation that was brought in by the VO was the concept of the Astronomy Data Query Language or ADQL for short. What is this? It's a standard language for querying any astronomical database. And it is very closely aligned to SQL or the Standard Query Language. Standardization of this kind was necessary because the popular commercial and open source variants of of SQL all differ slightly. And the Virt ual Observatory wanted a standard way of specifying a region which SQL does not provide.

[00:04:59] So the idea of a region is basically a small part of the sky, which is a very it's it's concept which comes entirely from astronomy. And not unsurprisingly SQL, which is a general-purpose database processing language does not is not able to handle this concept. So what was done was that standard SQL was taken and it was enhanced with some astronomy specific capabilities. And when that was done that what emerged from that process was the ADQL language, which is used for querying any kind of astronomical database.

[00:05:43] The most basic thing that one can do with a search is what is called as a Cone Search. It's basically you want to know: in your survey or in your catalog or in the image that you're providing, what are the sources that are located within so much distance so much angular distance of a certain position. So I might possess specify a position on the sky with its right ascension and declination and ask the Virtual Observatory service: give me a list of all sources that you have in your database that are within one degree of the position that I have specified.

[00:06:29] Viewed in 3D, of course this radius defines a cone of space. And you get in a Cone Search a fixed set of columns back which are set by the data service in question. So if I query a data service, which is a radio source catalogue, it might return to me the positions Transcript of all the radio sources in the part of the sky that I am interested in, along with some information about those radio sources like: how bright they are? How large they are? And are they seen at other frequencies? What is their red shift and so on?

[00:07:10] A related concept after a Cone Search is the concept of a Simple Image Access. Okay? This is specified in a protocol known as SIAP the Simple Image Access Protocol. A SIAP service is an archive that returns astronomical images within a specified position and radius. So it goes beyond what a Cone Search does. Cone Search will only give you a list, a table of astronomical sources within that part of the sky. A SIAP service will return to you an entire image in that part of the sky.

[00:07:48] Similar to SIAP service is the SSAP service Simple Spectrum Access Protocol which instead of returning an image will return spectra for all objects that are located in that part of the sky.

[00:08:07] In addition to understanding FITS for as a format for tables, the Virtual Observatory developed its own format for storing tabular data. They call it VOTable. It's the standard format for storing and exchanging tabular data within the Virtual Observatory. Most of the data archives now offer export of data in VOTable format. And a large number of tools read VOTables. When we looked at TOPCAT, we saw that it could read FITS tables, but it can just as happily read VOTables. Any FITS table can be expressed as a VOTable. So inter-conversion is possible. For example using TOPCAT, you could load a FITS file in TOPCAT and then write it out as a VOTable.

[00:09:06] In order to improve access to astronomical catalogue data or tables, the VO developed a new protocol which they called TAP: Table Access Protocol. And what TAP does: it provides you query driven access to astronomical tables and databases. For example, when you do a simple Cone Search, you can only search by sky position. And it returns a fixed set of columns, which is specified by the data service that you are accessing.

[00:09:38] On the other hand TAP service allows you to make searches along the lines of: give me all the records with B-V greater than 2.0. This refers to the colours of or . And give me just columns B, D, F and G. So you can choose what columns you want. And you can put conditions on the kind of sources that you are looking for. This kind of search can become arbitrarily complex. You can say that okay, I want to you to show me only quasars between 5 and 6 in this part of the sky. Or you can say I want you to show me a list of white dwarfs that are brighter than a certain in this part of the sky and so on.

Transcript

[00:10:30] Queries that use the TAP service will need to be formulated in the standard data query, astronomy data query language ADQL. But often the tool that you are using will automatically construct this for you. So there are now some advanced tools that allow you to do TAP queries without actually knowing the details of the ADQL.

Section 9.2.2 Major astronomical archives

[00:11:04] Let us now move on and look at some of the major astronomical archives that exist. Large archives are found at all wavelengths and there are a very large number of them. I found it really difficult to short list a few to show you because if I had simply listed all of them, that would not have been very useful. So I have chosen a few archives at optical, radio and x-ray wavelengths, which I have listed here. I feel that these are archives that are useful to a large community. Of course, this will be my biased view and I apologise to people if I have left your favourite archive out of this list. So let's look at some of the major archives in optical astronomy.

[00:12:03] So as you know, the Hubble Space Telescope has been one of the major astronomy facilities over the last two and a half decades or so. They have constructed a very large archive, which is known as the Hubble Legacy archive which is located at the URL shown there. A very large optical survey was was carried out starting in the 2000s. It's still ongoing and we shall go into the details of this survey in a few minutes. This is referr ed to as the Sloan Digital Sky Survey and they have of course a very large archive.

[00:12:46] The ESO organization head-quartered in Germany is a consortium of various mostly European countries that run a number of facilities telescope facilities in the southern hemisphere in Chile. And they have a large archive, which is also very useful.

[00:13:10] A satellite named GAIA was launched about five years ago, in order to study the basically studied using parallax methods, the distribution of stars in our . And also to measure radial velocities for many of them and to also measure their proper motions. So the GAIA data archive had a major data release which they refer to as data release 2 or data DR2, a few months ago. And that is is available at this URL. It's an extremely large database because for more than a billion stars, they measured parameters. And they are publishing them and they will come out with a series of data releases, which will refine their measurements over the next few years. So very important archive particularly for people who want to study stars in our Galaxy.

[00:14:13] In radio bands, there are again a number of archives. I have chosen three here. The first one listed is the GMRT archive for the Giant Metrewave Radio Telescope, which Transcript operates at a site about 80 kilometres north of Pune in India. GMRT has archived over the last 15 years all interferometric data that had been gathered since Cycle 1 and they're available at this URL. We are going to look at that in a minute.

[00:14:49] The other large radio telescope is the VLA, the Very Large Array telescope in the United States, now referred to as the Jansky VLA telescope. This telescope has been collecting data from more than 30 years and all of it is available in the VLA data archives.

[00:15:12] A new sub-millimeter millimeter-wave telescope has come online over the last few years. It's name is ALMA, the Atacama Large Millimeter Array. It is located in Chile. And it explores a region of parameter space in terms of its wavelength coverage resolution and sensitivity that has not been probed before. And therefore its archive is very important and it's available at the URL that I've indicated there.

[00:15:46] In x-rays, there are two large telescopes which have been observing now for many years. The Chandra telescope which has its own archive the Chandra Data Archive, which is at hosted at Harvard. And the XMM-Newton archive, which is a European lead x- ray telescope, which is which has its own large archive, which is given at this URL.

[00:16:16] In addition to the archives that I have described here, there are literally hundreds of other websites that serve data, typically releases from large surveys. We will look at the Sloan Digital Sky Survey archive as a case study of how a well-designed archive works. And how it allows you to access data in a wide variety of ways. But before we go there as an example of a large archive of radio data, let me show you the GMRT data archive.

[00:17:12] So here is the search page of the archive. In fact, many other archives will have a very simple easy to use interface like like this one that allows you to search for data. So you could for example, type in a proposal code and search for all data in the archive that's available for this proposal code. For the GMRT archive, you need to be a registered user. So you can search for data without being a registered user. But if you want to download data, then you need to have need to register. Then you need to login. When you do that, then this like many other archives provides you with a shopping cart kind of interface. So you go shopping for data. You identify the data sets that you want and you simply add them to the cart and you're good to go.

[00:18:25] When you do a search, the search can be can be quite complicated. So here is our first search page. For the GMRT you can search by proposal. So if you know the code of the observing proposal that you're interested in you can use that. You can use the name of the principal investigator, can use the title of the proposal, even the scientific category and the observation type can be used to down select your thing your search. You can search by Transcript observation number if you happen to know that. You can also search by coordinates. So let's try to do that.

[00:19:03] So let's say I'm interested in the galaxy M31, which is the Andromeda galaxy. So I don't even have to remember what its right ascension and declination is: I simply type M31 and I use either Sim Simbad or NED to resolve the coordinates. Which means a query sent from my machine out to these Virtual Observatory services, which returned for me what is the right ascension and declination of this particular galaxy. And it asked me what is the search radius? Perhaps I could increase it to about 60 arc minutes. And I could narrow it down by frequency band and channel spacing and time on source and things like that, but I'm not going to do that for now. And I run a search.

[00:19:56] So when I do that I get a number of results. Okay, so it tells me what is the proposal id, who the PI (Principal Investigator) was, what the observation number i s, what is the exact RA Dec that was observed? What is the time that we spent on source? And which frequency it was observed at? What I can do here: if I click all of them show all in Google Sky, I get a pictorial representation of where these observations are. So in the background is an optical image of the Andromeda galaxy and over plotted on it, you can see various circles of various colours. You will notice that the blue circles are the smallest ones.

[00:20:42] The green circles are somewhat bigger, the yellow circles are even bigger and orange circles are even bigger and so on. This the diameter of the circles corresponds to the half power beam width of the GMRT at these various frequencies. So naturally the smallest circles correspond to the highest frequencies, which is the 1420 megahertz L-band of the GMRT. And as you go to lower and lower frequencies, the diameter of these circles increases.

[00:21:18] So what we are able to see here is at a glance, what is what part of the Andromeda galaxy is covered by observations at what frequency? So you can see that the centre of the Andromeda galaxy has a nice 610 megahertz pointing shown in green. The outer edges of the galaxy have data in in different frequencies. Ah often the L-band data unfortunately is not there at the centre of the galaxy and only towards the edges towards the edges.

[00:21:58] So at a glance we are able to tell what is the data that's available. You could for example also show with one click, what is the time on source for each of these observations? And thereby determine whether the observation in the archive is of your scientific interest. The GMRT archive currently provides only raw data. These are the raw interferometric visibility.. visibilities that as observed at the telescope. There are plans to process these raw data sets and get user-friendly or user science ready images from the data. If that happens and when that happens, we will make sure that these are well integrated with the Virtual Transcript

Observatory so that people are able to use tools like Aladin and TOPCAT in order to visualise the very large number of radio images that will be produced by this effort.

Section 9.2.3 Digital sky survey, SDSS

[00:23:05] Let us now move on and take a detailed look at one of the largest surveys that has been carried out so far. This survey is referred to as the Sloan Digital Sky Survey. And now I'm going to spend the next several minutes describing to you what is this survey was all about? And at the end, I'll tell you why I went into such great detail in order to describe the survey.

[00:23:40] We've already looked at the Palomar Observatory Sky Survey which was carried out in the optical, in the 1950s. What is shown here on the left panel is an image of the small of a small portion of the sky as observed with the Palomar Observatory Sky Survey. And on the right there is an image taken of the Sloan digital Sky Survey. You can see very easily that there is a dramatic improvement in the quality of the data as you go from a digitised version of a photographic plate to a fully digital survey.

[00:24:24] So what are the basic characteristics of the Sloan survey? The Sloan survey is an imaging and spectroscopic survey of π steradians or 10,000 square degrees or about 1/4 of the sky, using a custom designed telescope. It carries out near-simultaneous imaging in five optical broadband filters using a large 120 megapixel camera. For some of the objects that are detected in the imaging, they carry out a follow-up spectroscopy. And these include more than a 100,000 quasar candidates and more than a million galaxies. The main science goal of the original survey was to study the large-scale structure of the Universe. But over time this goal has expanded and the Sloan survey is being used today to study everything from asteroids to the most distant quasars.

[00:25:35] Why is the Sloan survey so special? It's special because it's the largest freely available homogeneous data set of imaging and spectroscopy. The quality of the data and of the data processing are uniform and excellent. Many data products are available and we will look at some of them: ranging all the way from raw images and spectra to catalogs of measured parameters from the images and spectra.

[00:26:13] It is such a monumental effort that it is unlikely to be superseded for decades. So just like the Palomar Observatory Sky Survey was the dominant sky survey for about 40 years, the Sloan Digital Sky Survey also is like to be likely to be used by astronomers for many decades. Since the days of data sets are precisely defined and free, othe r researchers can easily reproduce your results. And this is very important because this provenance, this Transcript knowledge of exactly how a piece of data was processed, greatly improves the credibility of a scientific result.

[00:26:58] The Sloan survey is being carried carried out with a dedicated 2.5 metre telescope, with a fairly large 3 degree field-of-view,in a f/5 configuration. There is no dome. It's protected by a co-moving light baffle structure, which you see that here in the background. It's supported by a smaller 20-inch photometric telescope that is used for photometric calibrations.

[00:27:32] It is located at the Apache Point Observatory in in New Mexico. So towards the left you can see hanging over the cliff is the Sloan telescope. Towards the centre that small shiny dome is the 20 inch photometric calibration telescope. And there is a larger 3.5 metre telescope, which is there in the background.

[00:28:02] This is an image of the Sloan imaging camera and it contains a mosaic of CCDs. There are a very large number of CCDs. You will see here that there are six columns of CCDs, each with five different colours, ranging from the sort of the redder end of the optical spectrum to the bluest end of the optical spectrum. So there are six CCDs in each filt er. And there are five filters for a total of 30 CCDs. These CCDs are rather large and the size of this camera is also correspondingly quite big.

[00:28:55] It operates in what is known as a drift-scan mode, which means stars are allowed to drift across the CCDs and the CCDs are read out at the same rate at which stars pass over the CCD. In its in this arrangement, an observation is made for 54.1 seconds in each filter, in the order r, i, u, z, g. In addition to the 30 CCDs, 30 large CCDs, which record the the photometry of the sky, there are also some auxiliary CCDs, smaller CCDs that are used for calibration and for astrometry. So there are a total of 54 CCDs on the focal plane with a total of a 145 million pixels. The data rate is not high by modern standards but it was quite high 20 years ago when the survey started. It's about five megabytes per second resulting in a data rate of 18 Gigabytes per hour. It can cover the sky at 20 square degrees per hour.

[00:30:15] Objects that are observed through the imaging are followed up with spectroscopy. And there are two spectrographs. You see one of them here, towards the left. It is permanently mounted at the back-end of the telescope. There are two telescope there are two spectrographs because that allows you to gather twice as many spectra at one go.

[00:30:39] So here is a little schematic of the SDSS spectrographs. So there are there is a drilled plate with holes drilled into a metal plate which is fixed onto the foc.. at the focal plane. And from the holes you have optical fibers that lead into one of two dual spectrographs. So there are in the original SDSS you had 320 fibres from each. Now that Transcript has been enhanced. But the fibres go into a camera where there a dichroic splits it into the and send some light into the blue camera and some light into a red camera. So there is a grism separate grism for both of them and a blue camera with blue sensitive CCDs and a red camera with red sensitive CCDs, can simultaneously gather spectra. So at one go which means about in about an hour several hundred spectra of stars and galaxies can be gathered.

[00:31:48] This is how the SDSS operates. It obtains imaging data on on pristine nights when there are no clouds or Moon and the atmospheric seeing is excellent. The photom etric observations are automatically calibrated with an auxiliary telescope. The data are processed to the point of calibrated object lists. And about two kilobytes of information is measured for every object. From these resulting catalogues, one can extract lists of galaxies that are brighter than a certain r magnitude and quasar candidates.

[00:32:32] These are then assigned spectroscopic targets to spectroscopic plates, to maximise the observing efficiency. Once you have the positions where you want to obtain spectra, one can go ahead and drill spectroscopic plates, hand-plug them, determine the correspondence between the fibre and the object that you are observing and put the load the spectrograph and load the plate and observe the plate spectroscopically on non-pristine nights. And generate calibrated spectra, classification, redshifts etc. So this is an operation which is extremely complex and requires a large number of people but has been carried out very very efficiently for the last two decades or so.

[00:33:22] The survey went first went online in the summer of 2000. After five years of operation were completed, the they got a three-year extension called SDSS-II. SDSS-II completed the original extragalactic survey for large-scale structure and added two additional components. One was called SEGUE - Sloan Extension for Galactic Understanding and Exploration, to obtain spectroscopy for a large number of stars within our Galaxy. And also another survey for Type Ia supernovae which are important for cosmology.

[00:34:00] That part of SDSS was again very successful and they got third exten.. a second extension and SDSS-III ran from 2008 to 2014. SDSS-IV is currently ongoing and will run for a six-year period starting in 2014. And plans are fairly advanced already for the SDSS- V project which will start from 2020 onwards.

[00:34:31] SDSS-IV which is the survey that's currently running has again made three major components. It is extending precision cosmological measurements to critical early phase of cosmic history via the eBOSS project. e stands for extended and BOSS stands for the Baryon Oscillation Sky Survey where they use baryon oscillations in order to study the the cosmological history of the Universe. Transcript

[00:35:07] There is also an revolutionary infrared spectroscopic survey of stars in our Galaxy called APOGEE, which is now running in its second iteration, which is now called APOGEE-2 and is now running on two telescopes: the Sloan telescope in the northern hemisphere and another similar telescope in the southern hemisphere. What is new with SDSS-IV is that they're now using the Sloan spectrographs to make spatially resolved maps of individual galaxies under the MaNGA project - Mapping Nearby Galaxies at APO.

[00:35:46] The scientific returns from SDSS have been prodigious. As of July this year, more than 8,000 papers had been published in the refereed literature and had been cited more than 400,000 times and these papers had a citation h-index of 250. The survey has made discoveries in many scientific areas not planned for originally. For example in asteroids, Kuiper belt objects, dwarf stars and the like.

[00:36:22] From the beginning the Sloan survey adopted a data distribution policy that was extremely democratic and open. Data that are taken are available to collaboration members immediately after they are obtained. They can use these to carry out an analysis of their data for scientific reasons, but they can also use these data to provide feedback to the SDSS operations about the quality of the data that are being obtained. All the data are reduced through automatic pipelines and then released in a staggered manner about once a year as a public data release. And this is why the SDSS forms the one of the largest archives available to us today. There was an early data release around 2000. And this was followed by data release DR1 through 14, which have been made to date and by the end of this year 2018, DR15 should be out.

[00:37:32] Let us look at some of the features of the most recent data release 14, DR14. It has a large imaging footprint: 14,000 square degrees of sky. And there are a staggering 469 million unique objects that are listed in the survey. Spectroscopy is also extensive and the spectra that have been obtained so far include about two and a half million galaxies, more than 600,000 quasars, and nearly a million stars. These numbers encapsulate why the SDSS is a treasure trove for archival research in astronomy.

[00:38:23] Because the data volumes are so large, it is not possible for the SDSS collaboration to do all the research themselves. So even after a number of papers have been written, because the data volumes are so large, newer and newer papers that explore newer and newer concepts and ideas in astronomy are tried and tested with the SDSS data archive. So for almost any science that you can think of, for any object in the Universe, the SDSS will provide a valuable resource for data.

Transcript

[00:39:02] Their practices for data access and distribution have been adopted as best practices by the entire community. And the way that they made it very easy to access data, set the standards for other surveys to emulate. So how does the SDSS provide this data access? We are going to go into that in a minute. But let me first justify why I've spent so much time in describing to you this survey in great detail. One must remember that while using any data for astronomical research from this survey or any other survey, one needs to spend a considerable amount of time to understand the characteristics of the survey, its various biases and limitations, in order to produce results that are reproducible and believable. So in that sense, if one were doing an analysis with SDSS data, one would need to understand in great depth how the survey has been carried out and how it actually operates. That was the reason why I spent so much time in looking at SDSS.

Section 9.2.4 Accessing data from the sky survey, SDSS

[00:40:34] Let's now move on to the next stage, which is how does one access the SDSS data? SDSS data can be accessed in various ways. The most recent data releases actually serve data through something known as the science archive server. This allows you to have an interactive look at the spectra and image mosaics. One can download raw data as well as processed data. They have a companion website called Skyserver which provides resources for learning SQL and browser-based access to catalogues. Many users need to run complicated ADQL SQL queries for querying the imaging and spectroscopic catalogues.

[00:41:30] This is made possible through the CasJobs interface, which allows you to run queries that can either be very short quick queries, but also queries that take a long time to run and therefore have to be placed into a queue. So that the query after the query runs, the user will get an email that the results of the query are available and is able to download the results of that query. It also provides for a number of tools to explore the objects in the survey, both the images and the spectra simultaneously, generate finding finding charts and so on. we are now going to look at each of these in turn.

[00:42:17] So let us begin by searching for the SDSS data release 14. I search for it, the URL loads up and here I am on the landing page of the SDSS data released 14. It tells you what are the main ways in what in which one can access the data and this indeed is what we are looking for. So as I mentioned previously, DR14 science archive server is an interactive way to access images and spectra. The DR14 Skyserver provides browser-based access to the catalogue archive server with lots of resources for learning SQL and also projects to do science with. The third one is CasJobs which allows us to query the SDSS catalogs directly. If you are an expert and you want to directly look at the images and spectra then the DR14 FITS will allow you to do that. The data model gives you a description of the details of the science archive server, its directory structure file for file formats and so on. Transcript

[00:43:31] So let us begin by looking at the DR14 science archive server. So this page describes to you what all is available. And the various formats and so on that it is available. Let us go now to the SDSS sky server and see what what tools it provides us. So as I mentioned the Skyserver gives you access not only to various tools to browse through the data, but also a number of activities that educators can use in their lessons plans, college lab activities and so on.

[00:44:23] So, let me spend a few minutes just introducing you to all of these concepts because many of the listeners of these programs will be school and college te achers. So if you go to the education tab on Skyserver, it gives you a list of projects that can be done with SDSS data. They are cater.. categorised into various subcategories. There are some fairly basic projects which are suitable for middle school and high school students. There are some advanced projects which are appropriate for very advanced high school and college students. And also people who want a very detailed understanding of the astronomy under underlying astronomy.

[00:45:09] They also have categories of projects which they refer to as a research challenges. These are independent research projects in astronomy. You pick a problem. They will specify the problem for you. And how to solve it is something that is left open. These are excellent problems for a science fair project or some major science project. There are some projects that are designed for very young kids, which is in the for kids section. There are projects which are games and contests and so on.

[00:45:50] So I don't have the time to go through all of these in detail, but I'm just going to click on advanced projects and show you what are the kind of projects that are possible to do quite easily with with SDSS data. The Hubble diagram project allows you to retrace the steps of Edwin Hubble who discovered the expansion of the Universe. Okay, you can measure the recession velocities of various galaxies from SDSS data, from SDSS spectroscopy. You can gather information about their distances from Cepheid variable observations, for example, and then use that to verify the expansion of the Universe.

[00:46:38] In the second project titled Color, you can look at stars of various colours and try to understand how the colour of a can give us a lot of information about the physical condition of that star, like its mass, its surface temperature and so on. The third project is on spectral types. And it tells you how astronomers make sense of the millions of stars that they see. And it will teach you how to use the spectrum of a star in order in order to classify a star into its spectral type. You can do that yourself. There's a vast amount of data for this. Remember the Sloan survey has now gathered spectroscopy on about a million stars.

Transcript

[00:47:39] You can use the colour and of stars to construct what is known as the Hertzsprung-Russell diagram, which is the most one of the most fundamental diagrams in stellar astrophysics. It's very easy with Sloan survey data to actually construct such a diagram and this particular project teaches you how to do that. In the galaxies project, you will examine a large number of galaxies and try to understand their properties and so on. In the sky surveys projects, you will learn about how astronomers map the sky from ancient times to today's cutting-edge technology. You will understand how archives have evolved from very primitive simple things to the complex thing that is the Sloan Digital Sky Survey today.

[00:48:36] There's a separate project on quasars that will allow you to learn a lot more about quasars which are the most distant and often the most one of the most luminous objects in the entire Universe. So these projects that are listed in the education section of Sky server are indeed a very useful and valuable resource for any kind of educational activity with SDSS data. In that I like to mention that the Galaxy Zoo project which allows you to classify galaxies and there are several other projects that are listed in the Zooniverse website and so on.

[00:49:18] There are a number of guides available. So for example in the Education section, there are a number of instructor guides, which will give detailed information to teachers to help them guide their students in producing the finest projects.

[00:49:42] But let's now come back to the use of the SDSS archive for astronomical data analysis. One of the most important easy useful ways of accessing the SDSS data is through this: the SDSS explore interface. So I've clicked on Explore and we are now redirected to another page where I see clearly one image of a galaxy. If I click on it, I get a bigger image of the same galaxy. I can zoom in and zoom out as I wish. And so on but let's go back to the Explorer in interface.

[00:50:34] So what does it tell us? It tells us what type of object it is: in this case a galaxy. And what is its object ID and what are its coordinates? What are the various flags that were generated when this image was processed. This is very important because it is these flags that will tell us about a lot about the quality of the data and if there are any limitations in the data themselves. It tells you the magnitude of the galaxies in all the five filters of the SDSS and the corresponding uncertainties in those measurements. It tells you the size of the galaxy. It tells you what its photometric is and so on. And it shows you an image of the optical spectrum of the galaxy.

[00:51:35] Right, if I click on Interactive spectrum over here, it'll load up the spectrum of the galaxy in an interactive window. So here you go. So here it's basically showing me the spectrum of the galaxy. It showing me all the lines that are detected. If it's an emission line, Transcript it's indicated with with a blue colour. If it's an absorption line, then it is indicated by a red colour and so on. So I could.. this thing is interactive. I could switch off for example, the emission lines and the absorption lines. Now, I'm only seeing the two things here: the spectrum itself and over which is shown in black colour and the best fit line for that particular object.

[00:52:43] So if I switch off the flux, then I'm only seeing the best fit spectrum which is shown in red. If I turn the flux on again and shift.. switch off the best-fit spectrum, then I'm only seeing the the actual data. It's possible to mark certain sections and to zoom in. Okay. So I've zoomed in now on various group of emission lines, which are close to the Hα line. I can switch on the emission line filter and then I will be able to identify for example the Hα line, which is the strongest line in this particular star-forming galaxy.

[00:53:33] So with just a few clicks we are able to drill down into the details of the object, of its spectrum, measure the characteristics of the spectrum and even measure or determine what kind of object it is. So this is terribly useful. And this is available not for every object that has been observed with the SDSS. For some objects, there will be no optical spectra. So you won't be able to do what we just did. But the basic photometric information will be available for 469 million objects in the imaging survey.

[00:54:26] Let us now look at the additional information that's available for this galaxy. I'm not going to go into these details, but it turns out that this particular galaxy was also observed with the MaNGA survey. So in addition to the image, in addition to the spectra, now there is a data cube which will give you a spectrum for every position in this galaxy as observed by the MaNGA survey. This is something that is new. It's only become possible with the SDSS DR 14 that you have data releases with so much data.

[00:55:08] Right. So these this Explorer tool indeed is a very very easy way of gaining access to the data themselves. Now, we've looked at the measurements of the data. Suppose we want the actual data. So there are links here at the top left. So it there's a link called FITS and the tooltip says get FITS images of the SDSS fields containing this object.

[00:55:41] So if I wanted to actually go, if I click on that, it will show me a list of all the images that are available that contain this particular galaxy. And it gives me both this somewhat what is referred to as the corrected frames here, which is the final process data, but it also gives me the bin frames, the mask frames and so on which are relatively raw versions or relatively less processed versions of the SDSS data.

[00:56:18] One can do the same for the specs spectra, if I clicked on spectral summary. Then it will give you me access to the spectrum FITS file and so on. And some options for larger Transcript searches for nearby objects. So it's possible to gain access to processed parameters. It's possible to gain access to processed data. It's also possible to gain access to the raw data. So indeed this all of these are very powerful interfaces to a particular object.

[00:57:01] Let me now show you one of the other important capabilities of the SDSS and that is called CasJobs. So I click on CasJobs from Sky server here. It will ask me to log in. I've already created a login account. Which I will use to log in. I must mention here that the SDSS data are now served through a portal, which is known as SciServer SciServer. I'll show you towards the end what the capabilities of SciServer are but for the moment, let us just look at CasJobs.

[00:57:55] I sign in. And I'm shown an empty screen. Into the screen I am able to type in whatever query that I want. Okay. So, for example, I will cite type a simple query: selec t top 10 star from PhotoObjAll. Now this might look like gobbledygook to you and it is because you if you don't know the SQL language, this is something that I will need to explain what it means. What this tells the CasJob server is that I want you to select just 10 objects, that's why top 10, the 10 objects at the beginning of the catalogue. Star means I want to you to select all the columns. If I wanted it to select a specific column, I could have listed a comma-separated list of all the columns that I wanted. But for now, I'm going to say just list me all the columns.

[00:59:07] And from PhotoObjAll.. what is PhotoObjAll? PhotoObjAll is a large table that contains information for all objects for which the SDSS has a photometric measurement. So I've typed in the query. I have to set the context. By context I mean, on which data release do I want to run it? I want to run it on the most recent one DR14. So I set the context to DR14. Before I submit the query I can do a syntax check. So if I'm not sure of MySQL, I do a syntax check and it came out in this case as syntax OK, which means the syntax is fine.

[00:59:53] And then I can do one of two things: quick or submit. Submit will push it to a queue, where it will be queued and with all the other people who are trying to run queries of this sort or any other sort. And then eventually I will get an email saying that your query results are available. Then I can go to the SDSS and download those queries query results, but now we don't have that much time. So I'm going to do click on quick.

[01:00:26] What quick does: it runs the query immediately instantaneously and only short duration and data limited queries can be run this way. So in this case, I know this query will not take any time because it's a very simple query. It's only asking CasJobs to tell me show me the first ten objects that it has in the PhotoObjAll database. If I do quick, the query executes immediately and the results within a few seconds are displayed at the bottom of my screen. Now there are many many columns listed here. About 600 independent Transcript measurements are made for every photometric object. And you can see that there are many hundreds of columns therefore for this particular object.

[01:01:24] Instead of looking for objects with spectros.. photometry, I could look for objects from the spectroscopic table. So this SpecObj is a table that contains information about all objects for which the SDSS has a spectrum. So once again, I click on syntax to verify that the syntax is correct. I get a syntax okay message, indicate all is indicating all is fine. I do a quick submission. And I get a list of spectroscopic objects, right? So again here there are many many measurements. How they were targeted and so on.

[01:02:25] There is a measurement here called z, which tells you the redshift of the particular object that we are looking at in SpecObj, right? So now suppose I were interested only in high redshift objects suppose. I wanted to look at quasars with a redshift greater than 6, in the extremely distant Universe. So I can quickly modify my query and say select top 10 stars from SpecObj where z greater than 6. z greater than 6. Again, I need to check the syntax. It says the syntax is okay. And I'm going to run the query.

[01:03:27] Now it's going to go through the entire table of SpecObj and return for me a list of ten quasars all of which have a redshift greater than 6. There you are. It doesn't take very long. Two or three seconds and a catalogue which has a hundred thousand entries. That's the quasar catalog and the spectroscopic catalog which has more than a million entries has been searched and the results have been returned. So let's go and make sure that the redshift of these objects is what we think it is. There you go.

[01:04:10] Now you notice immediately that there is a something called z warning flag. So this indicates, a nonzero value indicates that there is could be some problem with this spectroscopic redshift determination. So we are going to improve our query by saying where z greater than 6 and z warning equals 0. So which means there is no warning on the redshift. I do a syntax check once again. I forgot the and so it warned me. Failed the syntax check. I put that in.

[01:04:59] The syntax is okay. I'm ready to submit the query again. I click on quick. And here now, I have a list of quasars that have a red shift greater than 6. And whose redshift is reliable because their z warning flag is equal 0. There you go. So these are the quasars with redshifts all greater than 6, where the redshift error is relatively small. Needless to say their class is all QSO because they're only quasi-stellar objects that are known at such prodigiously high redshifts.

[01:06:03] The queues themselves are easy to use. The CasJobs also provides you the concept of what they call MyDB, which is your database. So you can upload your own Transcript catalog and you can match it with the SDSS either with the spectroscopic catalogue or the photometric catalogue and then get get outputs. So they give you a reasonable amount of disk space. If you want more then one needs to use something known as SciServer. So I will spend the next few minutes looking at SciServer, which is a new interface to the SDSS data.

[01:06:49] This note is different from SkyServer, which is spelt s k y. SciServer is s c i. So what is SciServer? It's a collaborative research environment for large-scale data driven science. And it provides you a number of tools. It hosts a number of data sets. The SDSS is just one of them. There are other data sets from biology and so on which are available. But what are the tools that the SciServer provides?

[01:07:35] SciServer provides you a dashboard, which is basically just a tool to manage your files and data sets across the SciServer. It provides you access to CasJobs, which allows you to run complex queries on large data sets and save and share those results for future analysis. In CasJobs, it's possible to run queries, save the outputs of the your query and share the outputs with your collaborators. So it becomes already a very useful tool when you are working with other people. You create a table and you simply share it on CasJobs. So when they log into CasJobs, they're able to see the table and use it in their analysis.

[01:08:23] SciServer also provi.. provides you with SciServer compute. Ok, which is a very useful resource because what it does is that it allows you to write Jupyter notebooks. Jupyter is a kind of a technology that allows you to write notebooks. Basically, these are programs that you basically type into a browser window. And those programs can then run on the compute resources that are provided by SciServer. So for example, if you want to run some kind of complex query or if you want to run some kind of complex processing, you can use the SciServer compute environment.

[01:09:12] The SciServer compute jobs can be batch processed using the compute jobs tool. So you basically create a computation. Basically you write a program that will take maybe several hours or several days to run and you can use the compute jobs resource in order to run it on SciServer's compute resources. So you don't need necessarily a powerful computer at your end. You can run it off your laptop and the job itself will run on the cloud on SciServer. SciServer also provides you with SciDrive, which is a dropbox-like distributed storage environment for science data that interfaces between both databases and flat file systems. So it allows you gives you some amount of storage area.

[01:10:09] And SkyQuery is a new utility that allows users to cross match data from multiple astronomical data sets. So that discoveries can be made from multi-wavelength observations. So that is an additional tool. So the sci the whole SciServer interface is really proving to be very useful and very valuable, primarily because it allows you to do complex astronomical analysis without having access to a large amount of compute and storage Transcript resources. So I encourage all of you to explore the SciServer portal, create accounts for yourself and use it for analysis of data including the SDSS data.