Qt547296qf.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

Qt547296qf.Pdf UC Irvine UC Irvine Previously Published Works Title Analysis of self-describing gridded geoscience data with netCDF Operators (NCO) Permalink https://escholarship.org/uc/item/547296qf Journal Environmental Modelling and Software, 23(10-11) ISSN 1364-8152 Author Zender, CS Publication Date 2008-10-01 DOI 10.1016/j.envsoft.2008.03.004 License https://creativecommons.org/licenses/by/4.0/ 4.0 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Environmental Modelling & Software 23 (2008) 1338–1342 Contents lists available at ScienceDirect Environmental Modelling & Software journal homepage: www.elsevier.com/locate/envsoft Short communication Analysis of self-describing gridded geoscience data with netCDF Operators (NCO) Charles S. Zender* Department of Earth System Science, University of California, Irvine, CA 92697-3100, USA article info abstract Article history: The netCDF Operator (NCO) software facilitates manipulation and analysis of gridded geoscience data Received 22 August 2007 stored in the self-describing netCDF format. NCO is optimized to efficiently analyze large multi-di- Received in revised form 24 February 2008 mensional data sets spanning many files. Researchers and data centers often use NCO to analyze and Accepted 14 March 2008 serve observed and modeled geoscience data including satellite observations and weather, air quality, Available online 28 April 2008 and climate forecasts. NCO’s functionality includes shared memory threading, a message-passing in- terface, network transparency, and an interpreted language parser. NCO treats data files as a high level Keywords: data type whose contents may be simultaneously manipulated by a single command. Institutions and Geoscience data portals often use NCO for middleware to hyperslab and aggregate data set requests, while scientific Time-series Data analysis researchers use NCO to perform three general functions: arithmetic operations, data permutation and Arithmetic compression, and metadata editing. We describe NCO’s design philosophy and primary features, illus- Metadata trate techniques to solve common geoscience and environmental data analysis problems, and suggest Model design ways to design gridded data sets that can ease their subsequent analysis. netCDF Ó 2008 Elsevier Ltd. All rights reserved. Software availability The netCDF Operators have evolved over the past decade to serve research the needs of individual researchers and data centers Name of software: netCDF Operators (NCO) for fast, flexible tools to help manage netCDF-format data sets. The Developer: Charles S. Zender NCO User’s Guide (Zender, 2007) documents NCO’s functionality Operating system: All and calling conventions. Zender and Mangalam (2007) describe the Programming languages: C/C99/Cþþ core NCO arithmetic algorithms and their theoretical and measured Availability and cost: Freely available at http://nco.sf.net scaling with data set size and structure. This paper describes NCO’s design philosophy and primary features, illustrates techniques to solve common geoscience and environmental data analysis prob- 1. Introduction lems, and suggests ways to design gridded data sets that can ease their subsequent analysis. Gridded geoscience model and sensor data sets present an in- We will demonstrate the NCO paradigm and features by ap- teresting set of challenges for researchers and the data portals that plying them to frequently occurring geoscience data reduction serve them (Foster et al., 2002). Many geoscience disciplines have problems taken from the field of climate data analysis. The reader transitioned or are transitioning from data-poor and simulation- will see that these problems are generic to disciplines where large poor to data-rich and simulation-rich (NRC, 2001). A software gridded data sets are regularly produced and analyzed. Modern ecosystem has evolved to help researchers exploit this transition weather, climate, and remote sensing research often require iden- with fast data discovery, aggregation, analysis, and dissemination tical analyses of hundreds of variables in thousands of files. Tradi- techniques (e.g., Domenico et al., 2002; Cornillon et al., 2003). In tional analysis approaches that use low-level, compiled languages this ecosystem are the netCDF Operators (NCO) – software for and most high level, interpreted languages fail to scale well to this manipulation and analysis of gridded geoscience data stored in the problem space (Wang et al., 2007). Re-coding compiled or inter- self-describing netCDF format. NCO is used in several niches in preted data analysis scripts to act on new variables and new data geoscience data analysis workflow (Woolf et al., 2003), because sets is tedious and non-productive when it requires, for example, its functionality is independent of and complementary to data manually changing variable names and loop counters even when discovery, aggregation, and dissemination. the underlying analysis (such as averaging) does not change. NCO helps solve this problem by using the self-describing ca- pability of the netCDF data format (Rew and Davis, 1990) and POSIX * Tel.: þ1 949 824 2987; fax: þ1 949 824 3874. shells (Newham and Rosenblatt, 1998) to define a specific analysis E-mail address: [email protected] of a generic type without user intervention. This flexibility is 1364-8152/$ – see front matter Ó 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.envsoft.2008.03.004 C.S. Zender / Environmental Modelling & Software 23 (2008) 1338–1342 1339 important to geoscience researchers who often analyze and inter- multi-file sequences, the metadata in the first file, along with compare gridded data sets in an open-ended fashion, creating a list of the other files, adequately preserves the processing unique analysis workflows through trial and error. For the same history. By convention, NCO keeps this information in the reasons, many data portals use NCO to fulfill the unpredictable history attribute (Rew et al., 2005). hyperslab requests issued by users on their WWW front-ends, e.g., 4. There is value in maintaining the distinctions and associations the NCAR Community Data Portal (CDP; https://cdp.ucar.edu), and between dimensions, coordinates, and variables (Rew and Davis, the NOAA Climate Diagnostics Center (CDC; http://www.cdc. 1990) during data analysis. Unless otherwise specified, NCO noaa.gov/PublicData). NCO is middleware in that it processes data automatically attaches coordinate data (i.e., dimension values) sets in netCDF format, generated by models or retrieval procedures, to variables it transfers. to new netCDF data sets, more suitable for graphical display, dis- 5. Tools should treat data as generically as possible, and impose semination, or numerical analysis. no software limitations on data dimensionality, size, type, or Geoscience researchers use many toolkits besides NCO to ana- ordering. lyze large volumes of gridded data. These include the Climate Data Analysis Tools (CDAT) (Fiorino and Williams, 2002), the Climate This design philosophy allows users to remain relatively igno- Data Operators (CDO; http://www.mpimet.mpg.de/fileadmin/soft- rant to details of file and variable names, field geometry, and NCO ware/cdo), the Grid Analysis and Display System (GrADS; http:// itself. www.iges.org/grads/grads.html), the Interactive Data Language (IDL; http://www.ittvis.com/idl), MATLAB (http://www.math- 3. Operators works.com), and the NCAR Command Language (NCL; http:// www.ncl.ucar.edu). Of these toolkits, CDO is closest to NCO in that NCO partially fulfills the netCDF designers’ original vision for both use command line operators constructed to perform chainable a follow-on set of generic data operators (Rew and Davis, 1990). operations like traditional UNIX filters. Unlike NCO and CDO, the Presently NCO includes 12 utilities built from a common library CDAT, GrADS, IDL, MATLAB, and NCL toolkits support comprehen- (Table 1). sive integrated visualization capabilities, but their design is not Operator names are acronyms for their functionality, prefixed optimized for batch-driven operations on large number of files. with ‘‘nc’’ to indicate their relationship to netCDF. The 12 operators typically read netCDF files as input, perform some manipulations, 2. Design philosophy then write netCDF files as output. In this sense the operators are filters, or middleware. The NCO User’s Guide (Zender, 2007) doc- Traditional geoscience data processing works with an intra-file uments the functionality and calling conventions for all operators. paradigm where users open one or a few files to read and manip- The primary purpose of the arithmetic operators is to alter ulate one or a few variables at a time. The intra-file paradigm works existing or create new data. The other operators, called metadata well in cases where all the pertinent data reside in a few files, and operators, manipulate metadata or re-arrange (but do not alter) the processing of each variable is unique and requires hand-coding. data. The arithmetic operators can be quite computationally In large geoscience applications data storage requirements may intensive, in contrast to the metadata operators which are mostly I/ dictate that relevant data be spread over multiple files. Level one O-dominated. The amount of data processed varies strongly by satellite data, for example, are often stored in a file-per-day or file- operator type. The multi-file operators (MFOs) are the most data- per-orbit format. Data produced by geophysical time-stepping intensive. Often they are applied to entire
Recommended publications
  • NOMADS User Guide V1.0
    NOMADS User Guide V1.0 Table of Contents • Introduction o Explanation of "Online" and "Offline" Data o Obtaining Offline Data o Offline Order Limitations • Distributed Data Access o Functional Overview o Getting Started with OPeNDAP o Desktop Services ("Clients") o OPeNDAP Servers • Quick Start to Retrieve or Plot Data o Getting Started o Using FTP to Retrieve Data o Using HTTP to Retrieve Data o Using wget to Retrieve Data o Using GDS Clients to Plot Data o Using the NOMADS Server to Create Plots • Advanced Data Access Methods o Subsetting Parameters and Levels Using FTP4u o Obtain ASCII Data Using GDS o Scripting Wget in a Time Loop o Mass GRIB Subsetting: Utilizing Partial-file HTTP Transfers • General Data Structure o Organization of Data Directories o Organization of Data Files o Use of Templates in GrADS Data Descriptor Files • MD5 Hash Files o Purpose of MD5 Files o Identifying MD5 Files o Using md5sum on MD5 Files (Linux) o Using gmd5sum on MD5 Files (FreeBSD) o External Links • Miscellaneous Information o Building Dataset Collections o Working with Timeseries Data o Creating Plots with GrADS o Scheduled Downtime Notification Introduction The model data repository at NCEI contains both deep archived (offline) and online model data. We provide a variety of ways to access our weather and climate model data. You can access the online data using traditional access methods (web-based or FTP), or you can use open and distributed access methods promoted under the collaborative approach called the NOAA National Operational Model Archive and Distribution System (NOMADS). On the Data Products page you are presented with a table that contains basic information about each dataset, as well as links to the various services available for each dataset.
    [Show full text]
  • Computing and Information Technology [CIT] Annual Report for FY 2001
    Computing and Information Technology [CIT] Annual Report for FY 2001 This annual report for FY2001, CIT’s final year, chronicles the cumulative efforts to provide computing and information technology support for faculty, staff, and University students. Fourteen years after CIT’s formation, the entire campus community now has access to a rich array of networking and computing resources. Members of the University community have come to depend upon information technology for research, teaching, learning, and administration. CIT consists of the following departments: Academic Services supports academic uses of information technology at the University, including support of classroom and web-based instruction, language instruction, and uses of multi-media. Administrative Services provides support services for CIT (personnel, contract administration, facilities operation, and planning). The group also provides a range of services to the University, including Printing and Mailing, ID Card, Software Sales, Telecommunications, and Policy and Security. Budget and Finance assists the operating units within CIT with all financial issues. The group consolidates budgets, acts as the financial representative on funding issues, sets rates for CIT services, and ensures compliance with University rules and procedures. Information Systems implements and supports core administrative applications and provides information services (Network, UNIX, NT, DBA, and mainframe) to the entire campus. IT Architecture works to define and document the present and future Information Technology infrastructure at the University. Enterprise Services provides core middleware, e-mail, monitoring, and software and coordinates deployment of desktop computer systems for the campus community. Support Services provides front line support for all members of the University community. In addition, the group installs and maintains the campus networking infrastructure.
    [Show full text]
  • Usage of NCL, Grads, Pyhdf, GDL and GDAL to Access HDF Files
    May 22, 2009 TR THG 2009-05-22.v1 Usage of NCL, GrADS, PyHDF, GDL and GDAL to Access HDF Files Choonghwan Lee ([email protected]) MuQun Yang ([email protected]) The HDF Group This document explains how to access and visualize HDF4 and/or HDF5 files using five freely available software packages. 1 Prerequisite This document assumes that users have a basic knowledge of the following data formats and the corresponding software packages: HDF4 (1) HDF5 (2) HDF-EOS2 (3) netCDF-4 (4) 2 Introduction In this document we describe how to use NCL (5), GrADS (6), PyHDF (7), GDL (8), and GDAL (9) to access HDF files. We use NASA HDF files to show the procedures for accessing or visualizing the data with these open source packages. For the general usages of these packages, please refer to their user’s guides, (10), (11) and (12), or websites. An evaluation of these packages can be found in the document An Evaluation of HDF Support for NCL, GrADS, PyHDF, GDL, and GDAL (13). 3 Environment An Intel x86-based system running GNU/Linux was used to run all five packages. The packages were linked with HDF4.2r3, HDF5 1.8, and netCDF-4. We used GCC 3.4.6 to build the libraries and packages. For PyHDF, Python 2.5.2 was used. 4 Sample Files One HDF4 file and two HDF-EOS2 files were used to demonstrate how to access or visualize HDF data with these packages. These files are used to store data from the NASA Special Sensor Microwave/Imager (SSMI) Sensor (14) and the Advanced Microwave Scanning Radiometer for the EOS (AMSR-E) satellite system (15).
    [Show full text]
  • The NOAA National Operational Model Archive and Distribution
    THE NOAA OPERATIONAL MODEL ARCHIVE AND DISTRIBUTION SYSTEM (NOMADS) G. K. RUTLEDGE† National Climatic Data Center 151 Patton Ave, Asheville, NC 28801, USA E-mail: [email protected] J. ALPERT National Centers for Environmental Prediction Camp Springs, MD 20736, USA E-mail: [email protected] R. J. STOUFFER Geophysical Fluid Dynamics Laboratory Princeton, NJ 08542, USA E-mail: [email protected] B. LAWRENCE British Atmospheric Data Centre Chilton, Didcot, OX11 0QX, U.K. E-mail: [email protected] Abstract An international collaborative project to address a growing need for remote access to real-time and retrospective high volume numerical weather prediction and global climate model data sets (Atmosphere-Ocean General Circulation Models) is described. This paper describes the framework, the goals, benefits, and collaborators of the NOAA Operational Model Archive and Distribution System (NOMADS) pilot project. The National Climatic Data Center (NCDC) initiated NOMADS, along with the National Centers for Environmental Prediction (NCEP) and the Geophysical Fluid Dynamics Laboratory (GFDL). A description of operational and research data access needs is provided as outlined in the U.S. Weather Research Program (USWRP) Implementation Plan for Research in Quantitative Precipitation Forecasting and Data Assimilation to “redeem practical value of research findings and facilitate their transfer into operations.” † Work partially supported by grant NES-444E (ESDIM) of the National Oceanic and Atmospheric Administration (NOAA). 1 2 1.0 Introduction To address a growing need for real-time and retrospective General Circulation Model (GCM) and Numerical Weather Prediction (NWP) data, the National Climatic Data Center (NCDC) along with the National Centers for Environmental Prediction (NCEP) and the Geophysical Fluid Dynamics Laboratory (GFDL) have initiated the collaborative NOAA Operational Model Archive and Distribution System (NOMADS) (Rutledge, et.
    [Show full text]
  • "Climate Data Operators" As a User-Friendly Processing Tool for Cmsaf's Satellite-Derived Climate Monitoring Products
    "CLIMATE DATA OPERATORS" AS A USER-FRIENDLY PROCESSING TOOL FOR CMSAF'S SATELLITE-DERIVED CLIMATE MONITORING PRODUCTS Frank Kaspar 1, Uwe Schulzweida 2, Ralf Müller 2 1Satellite Application Facility on Climate Monitoring (CM SAF), Deutscher Wetterdienst, Frankfurter Str. 135, 63067 Offenbach, [email protected] 2Max Planck Institute for Meteorology, Bundesstr. 53, 20146 Hamburg Abstract EUMETSAT‘s Satellite Application Facility on Climate Monitoring (CM SAF) provides satellite-derived geophysical parameter data sets suitable for climate monitoring. CM SAF provides data on several cloud parameters, surface albedo, radiation fluxes at the top of the atmosphere and at the surface as well as atmospheric water vapour, temperature and humidity profiles. Potential applications are for instance input for climate models and validation of climate models. Although there is a growing interest in the climate modelling community to use satellite data for such validation studies, the different data formats and structures of both communities are still a drawback for such applications. NetCDF with metadata according to the “climate and forecast (CF) metadata conventions” is a frequently used data format of the climate modelling community and is also an accepted standard of other applications. Providing satellite products according to this standard will therefore significantly increase the acceptance of such data in this community. The ‘climate data operators (CDO)’ are a collection of operators that were developed for processing and analysis of such data and are widely used by climate modellers. In order to allow easy access to CM SAF datasets for this community, the possibility to import CM SAF data has been integrated into the ‘climate data operators’.
    [Show full text]
  • Building the Opengrads Bundle from Sources Which Grads Version?
    Building the OpenGrADS Bundle from Sources - ... http://opengrads.org/wiki/index.php?title=Buildin... Building the OpenGrADS Bundle from Sources From OpenGrads Wiki Contents 1 Which GrADS Version? 2 Quick Instructions 3 Introduction 4 Deciding which external libraries you need 5 Getting the sources 6 Doing the build 6.1 Using the Supplibs 6.1.1 FreeBSD 6.3/DesktopBSD 1.6 6.1.2 Mac OS X 6.2 Building the Dependencies from Scratch 6.3 Satisfying dependences with packages in your OS distribution 6.3.1 Fedora 6.3.2 Arch 6.3.3 MacOS X: Fink 6.3.4 MacOS X: MacPorts 6.3.5 Mandriva 6.3.6 Debian 6.3.7 Ubuntu 6.3.8 SuSe 6.3.9 OpenBSD 6.3.10 FreeBSD 6.3.11 NetBSD 7 Building the OpenGrADS Extensions 8 Contact Information Which GrADS Version? The instructions in this page applies to the OpenGrADS Bundle versions 1.10 and 2.0. Although most of the information found here is also valid for the core GrADS 1 of 11 04/17/2016 02:40 PM Building the OpenGrADS Bundle from Sources - ... http://opengrads.org/wiki/index.php?title=Buildin... available from COLA (http://grads.iges.org/grads/downloads.html), some of details may differ. Quick Instructions Here is the quickest way to build the OpenGrADS Bundle from sources: Choose a working directory and make sure you have GNU make. Get the supplibs either pre-compiled or compile it yourself. You will need Supplibs v2.2.1 for building GrADS v1.10 or GrADSv2.0.0.oga.x and later Supplibs v2.0.0 for building GrADS v1.9-rc1 Pre-compiled supplibs: Download the binary Supplemental Libraries (Supplibs) from our sf.net site (http://sourceforge.net/project /showfiles.php?group_id=161773&package_id=241681), and untar them in this working directory (replace 2.2.1 below with the actual supplibs version): % gunzip supplibs-2.2.1.tar.gz % tar xvf supplibs-2.2.1.tar The compiled libraries along with binaries and header files are contained in a platform dependent subdirectory; e.g.
    [Show full text]
  • Practical Data Interoperability for Earth Scientists Version 1.0 Christopher Lynnes, Ken Keiser, Ruth Duerr, Terry Haran, Lisa Ballagh, Bruce H
    Practical Data Interoperability for Earth Scientists Version 1.0 Christopher Lynnes, Ken Keiser, Ruth Duerr, Terry Haran, Lisa Ballagh, Bruce H. Raup, Bruce Wilson Earth Science Data Systems Working Group What Is Data Interoperability? Data interoperability is the ability for a data user to work with (view, analyze, process, etc.) a data provider's science data or model output “transparently,” without having to reformat the data, write special tools to read or extract the data, or rely on specific proprietary software. Why Bother? Inter-disciplinary science: There is an increasing emphasis on Earth System Science, that is, the study of the Earth as a system, rather than individual components. This often requires combining diverse datasets from multiple data providers. Foster collaborations: Earth System Science increasingly requires more collaborations among scientists from various disciplines, who may use software packages from different sources. Enable model/data integration and model coupling: Data interoperability helps models to incorporate observational data, and facilitates the use of models to corroborate observational data. Similarly, data interoperability facilitates coupling the output of one model to the input of another. Make data analysis easier: Many analysis and visualization tools are able to read data files with a single operation if the data are available in an interoperable form. Producing interoperable data is easier than it looks: ...at least for gridded geographical datasets. You don't need to know the details of standards, formats, etc., because there are a number of tools available to help make data interoperable. The rest of this guide presents data formats, protocols, and tools that provide interoperability by removing dependencies on specific operating systems, software packages or environments.
    [Show full text]
  • Final Report L
    CR-189387 g Z' Final Report _L. _ / USRA Subcontract 5555-31 The Grid Analysis and Display System (GRADS) James L. Kinter m, Principal Investigator Center for Oceaa.La_d-Atmosphere Studies Institute of Global En\4ronment and Society, Inc. 4041 Powder Mill Road, Suite 302 Catverton, MD 20705 Personnel: B. Doty J. Kinter (NASA-CR-189387) THE GRID ANALYSIS N95-17491 AND DISPLAY SYSTEM (GRADS) Final Report, 1 Sep. 1993 - 31 Aug. 1994 (Inst. of Global Environment and Unclas Society) 8 p G3/82 0033950 The Grid Analysis and Display System (GRADS) USRA 5555-31 Final Report During the period 1 September 1993 - 31 August 1994, further development of the Grid Analysis and Display System (GRADS) was conducted at the Center for Ocean-Land-Atmosphere Studies (COLA) of the Institute of Gtobal Environment and Society, Inc. (IGES) under subcontract 5555-31 from the University Space Research Association (USRA) administered by The Center of Excellence in Space Data and Information Sciences (CESDIS). This final report documents progress made under this subcontract and provides directions on how to access the soft-ware and documentation developed therein. A short description of GrADS is provided followed by summary of progress completed and a summary of the distribution of the software to date and the establishment of research collaborations. 1. GRADS - Program Summary. The GRADS program is an interactive desktop tool that is currently in use worldwide for the analysis and display of earth science data. GRADS is implemented on all commonly available Unix workstations and DOS based PCs, and is freely distributed over the Internet.
    [Show full text]
  • Grads Activities
    I. Project Activities The goal of the Grid Application Development Software (GrADS) project is to simplify distributed heterogeneous computing to make Grid application development and performance tuning for real applications an everyday practice. Research in key areas is required to achieve this goal: · Grid software architectures that facilitate information flow and resource negotiation among applications, libraries, compilers, linkers, and runtime systems; · base software technologies, such as scheduling, resource discovery, and communication, to support development and execution of performance-efficient Grid applications; · policies and the development of software mechanisms that support exchange of performance information, performance analysis, and performance contract brokering; · languages, compilers, environments, and tools to support creation of applications for the Grid and solution of problems on the Grid; · mathematical and data structure libraries for Grid applications, including numerical methods for control of accuracy and latency tolerance; · system software and communication libraries needed to make distributed computer collections into usable computing configurations; and · simulation and modeling tools to enable systematic, scientific study of the dynamic properties of Grid middleware, application software, and configurations. During the current reporting period (6/1/02-5/31/03), GrADS research focused on the five inter-institutional efforts described in the following sections: Program Execution System (PES), Program Preparation System (PPS) & Libraries, Applications, MacroGrid & Infrastructure, and MicroGrid. Project publications and additional information can be found at http://www.hipersoft.rice.edu/grads. Project design and coordination were enabled through weekly technical teleconferences involving researchers from each of the GrADS sites, PI teleconferences, a PI meeting (11/5-6/02), workshops (11/4-5/02 & 5/7-8/03), and communication via GrADS mailing lists.
    [Show full text]
  • 8.1 the Noaa Operational Model Archive and Distribution System (Nomads)
    8.1 THE NOAA OPERATIONAL MODEL ARCHIVE AND DISTRIBUTION SYSTEM (NOMADS) Glenn K. Rutledge*1, Ronald Stouffer2, Neville Smith3, and Bryan Lawrence4 1National Climatic Data Center 2NOAA Office of Atmospheric Research Scientific Services Division Geophysical Fluid Dynamics Laboratory Asheville, NC 28801 Princeton, NJ 08542 3Bureau of Meteorology Research Centre 4British Atmospheric Data Centre Bureau of Meteorology Rutherford Appleton Laboratory Melbourne, Victoria 3001, Australia Chilton, Didcot, OX11 0QX, U.K. The goals of NOMADS are: 1.0 Introduction • Provide access to NWP (weather) and GCM To address a growing need for real-time and (climate, including ocean related) models. retrospective General Circulation Model (GCM) and • Provide the observational and model data Numerical Weather Prediction (NWP) models and assimilation products for Regional model data, the National Climatic Data Center (NCDC) along initialization and forecast verification for use with the Geophysical Fluid Dynamics Laboratory in both weather and climate applications. (GFDL) and the National Centers for Environmental • Develop linkages between the research and Prediction (NCEP) have initiated the collaborative operational modeling communities and foster NOAA Operational Model Archive and Distribution collaborations between the climate and System (NOMADS) (Rutledge, et al., 2002). This weather modeling communities. new system allows access to weather and climate • Promote product development and model data sets. A new paradigm for sharing data collaborations within
    [Show full text]
  • Gridrun: a Lightweight Packaging and Execution Environment for Compact, Multi-Architecture Binaries
    GridRun: A lightweight packaging and execution environment for compact, multi-architecture binaries John Shalf Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California 94720 {[email protected]} Tom Goodale Louisiana State University Baton Rouge, LA 70803 {[email protected]} applications and will continue to play a very important Abstract role in distributed applications for the foreseeable future. Therefore, we focus our attention squarely on GridRun offers a very simple set of tools for creating the issue of simplifying the management and and executing multi-platform binary executables. distribution native executables as multiplatform binary These "fat-binaries" archive native machine code into packages (fat-binaries). compact packages that are typically a fraction the size of the original binary images they store, enabling In order to support a seamless multiplatform execution efficient staging of executables for heterogeneous environment in lieu of virtual machines, we extend the parallel jobs. GridRun interoperates with existing familiar concept of the “fat-binary” to apply to multi- distributed job launchers/managers like Condor and operating system environments. The fat-binary has the Globus GRAM to greatly simplify the logic been a very well-known and successful design pattern required launching native binary applications in for smoothing major instruction-set-architecture distributed heterogeneous environments. transitions within a single Operating System environment. A fat-binary file contains complete Introduction images of the binary executables for each CPU instruction set it supports. The operating system’s Grid computing makes a vast number of file-loader then selects and executes appropriate heterogeneous computing systems available as a binary image for the CPU architecture that is running virtual computing resource.
    [Show full text]
  • Introduction to Cheyenne for New Users
    Introduction to Cheyenne for New Users December 11, 2020 Shiquan Su Consulting Services Group This material is based upon work supported by the National Center for Atmospheric Research, which is a major facility sponsored by the National Science Foundation under Cooperative Agreement No. 1852977. The Speaker Shiquan Su, PhD in Physics, helping users of NCAR’s HPC resources since 2015 https://staff.ucar.edu/users/shiquan Software Engineer, Consulting Services Group, High Performance Computing Division https://staff.ucar.edu/browse/orgs/HPCD 2 Please type your questions in the chat. I will Topics to cover check the chat windows, and answer the questions at the end of my tutorial. It is perfectly ok that only part of the tutorial is relevant and useful to you. • Available resources on Cheyenne • Signing in and managing data • Customizing your user environment • Accessing and building software • Managing jobs using Batch schedulers This is only an introduction; for more usage details, see: https://www2.cisl.ucar.edu/resources/ 3 Cheyenne Supercomputer For Model Runs, Large Parallel Jobs (Still rank at 60th up to November 2020) • 4032 batch compute nodes • 2-socket 18-core Intel Broadwell CPUs, 2.3 GHz • 3168 nodes with 64 GB memory • 864 nodes with 128 GB memory • EDR Infiniband Network • SUSE Linux Enterprise Server 12 SP4 • PBS job scheduler to access Cheyenne compute nodes, Slurm job scheduler to access DAV nodes 4 SAM (Systems Accounting Manager) • Web access: https://sam.ucar.edu • Log in with Duo (YubiKey) • Change some user settings
    [Show full text]