NCAR-TN/189+PROC NCAR TECHNICAL NOTE 1* April 1982

Proceedings of the Second Annual Computer Users Conference

Computing in the Atmospheric Sciences in the 1980s

Editor: Linda Besen

SCIENTIFIC COMPUTING DIVISION v NATIONAL CENTER FOR ATMOSPHERIC RESEARCH BOULDER, COLORADO qTAII or OCAS

Section I: Introduction ...... 1 Program ...... 3I List of Participants...... 4

Section II: Future Developments within SCD ...... *...... 1

Section III: Computing in the Atmospheric Sciences in the 1980'S...... 1 Summary of Discussion...... 8...... 8I

Section IV: Data Communication Needs for UW/NCAR Computer Link...... 1 Data Communication and the Atmospheric Sciences...... 3I Achieving a Uniform NCAR Computing Environment...... 6 Conference Recommendations: Data Communications/Uniform Access...... *...... a...... *...... * 16

Section V: Gateway Machines to NCAR Network Hosts...... 1 Conference Recommendations: Gateway Machines...... 9

Section VI: Data Inventories, Archive Preparation, and Access...... 1L The Typical Data Life Cycle and Associated Computing Strategies...... a.*...*...... * * .... 55 Data Analysis and the NCAR SCD...... 133 Conference Recommendations...... 16 3 Introduction Program

List of Participants The Scientific Computing Division hosted its second annual Computer Users' Conference on January 7-8, 1982 in Boulder, Colorado. The purpose of these conferences is to provide a formal channel of communication for users of the Computing Division, and to obtain regular feedback from them which can be used to aid in planning for the future of the SCD,

The conference opened with formal greetings from Walter Macintyre (SCD Direc- tor), Wilmot Hess (NCAR director), and Lawrence Lee (National Science Founda- tion).

Walter Macintyre discussed current planning topics for the Division. His report in Section II includes material on future developments within the Divi- sion.

Warren Washington presented the keynote address, "Computing in the Atmospheric Sciences in the 1980's." A summary of his talk, and the resulting discussion, is presented in Section III.

Users then attended one of three separate and concurrent workshops which took place on the first day of the conference. Among the various topics considered important to SCD users in the Atmospheric Sciences are access to the host machines at NCAR; the practicality of moving large files to and from remote sites; and the collection, interchange and display of large data files. Papers given in these workshops and the Conference Recommendations resulting from each workshop are covered in Sections IV, V and VI.

Section VII contains Walter Macintyre's Response and Concluding Remarks. "I think the most important and exciting result of the conference," said Dr. Macintyre, "is the clear, unanimous, and unambiguous statement from our users that NCAR must continue to offer the most powerful computing system available. This is necessary not only for our users to continue their current scientific endeavors, but, more particularly, to allow them to explore new avenues of research--avenues that are currently inaccessible with a -1 class of machine."

Akucxwe&yoAnts

Many persons within SCD have contributed to the conference. Buck Frye was Chairman of the Conference Committee. He was also responsible for the Workshop Issues and Guidelines. Darlene Atwood was responsible for the arrangements and invitations. Cicely Ridley provided University Liaison sup- port. Linda Besen was Editor of this Conference Proceedings. Ann Cowley was responsible for the displays and documentation distribution, as well as the consulting service at the conference. - 2-

The workshop leaders and SCD members for each workshop are listed below:

Workshop I: Data Communications/Uniform Access

Panel Discussion: Dave Houghton, Chair Dave Fulker, Herb Poppe, Dick Sato

Workshop II: Gateway Machines to the NCAR Network Hosts

Panel Discussion: Steve Orszag, Chair Paul Rotar, Gary Jensen, Buck Frye

Workshop III: Data Access and Display

Panel Discussion: Francis Bretherton, Chair Margaret Drake, Roy Jenne, Bob Lackman, Gary Rasmussen -3-

JANUARY 7

9:00 a.m. Introductions Walter Macintyre Welcome Wilmot Hess Introduction by NSF Larry Lee Division Status and Planning Walter Macintyre

10:30 a.m. Coffee Break

10:45 Computing in the Atmospheric Warren Washington Sciences in the 1980s

1:30 p.m. CONFERENCE WORKSHOPS (concurrent)

Workshop I: Data Communications/Uniform Access User Requirements Dave Houghton External Network Alternatives Dave Fulker Uniform Access to Network Herb Poppe

Workshop II: Gateway Machines Scientific Requirements Steve Orszag Front End Machines Gary Jensen Configuration Paul Rotar

Workshop III: Data Access and Display Collection & Interchange of Scientific Data: User Requirements Francis Bretherton Archives and Data Access Roy Jenne Computing Strategies Bob Lackman Software Tools Gary Rasmussen

6:30 p.m. Dinner at The Harvest House

JANUARY 8

9:00 a.m. Opening Remarks Walter Macintyre

CONFERENCE RESULTS Workshop I - Data Communications/Uniform Access Dave Houghton

10:30 a.m. Coffee Break

11:00 a.m. CONFERENCE RESULTS (cont.) Workshop II - Gateway Machines Steve Orszag Workshop III - Data Access and Display Francis Bretherton

Conclusions Walter Macintyre -4-

L·T~IPCEF 1WR!~IPARJSS

Paul Bailey Mary Downton NCAR ACAD NCAR ASP

Linda Bath Margaret Drake NCAR AAP NCAR SCD

W. R. Barchet Jim Drake Battelle Northwest NCAR CSD Richland, Washington Sal Farfan William Baumer NCAR SCD SUNY/Buffalo Richard Farley Ray Bovet Inst. of Atmospheric Sciences NCAR AAP South Dakota School of Mines & Technology Francis Bretherton NCAR AAP Carl Friehe NCAR ATD Gerald Browning NCAR SCD Buck Frye NCAR SCD Garrett Campbell NCAR ASP Dave Fulker NCAR SCD Celia Chen NCAR ATD Bonnie Gacnik NCAR SCD Robert Chervin NCAR AAP Lawrence Gates Climatic Research Institute Julianna Chow Oregon State University NCAR AAP Ron Gilliland Ann Cowley NCAR HAO NCAR SCD James Goerss Robert Dickinson CIMMS NCAR AAP Norman, Oklahoma

Dusan Djuric Gil Green Dept. of Meteorology NCAR SCD Texas A&M University Kadosa Halasi Ben Domenico Dept. of Mathematics NCAR SCD University of Colorado

John Donnelly Barbara Hale NCAR SCD Graduate Center for Cloud Physics Research University of Missouri -5-

Lofton Henderson Lawrence Lee NCAR SCD National Science Foundation

Barbara Horner Doug Lilly NCAR SCD NCAR AAP

David Houghton William Little Dept. of Meteorology Woods Hole Oceanographic Institution University of Wisconsin Timothy Lorello Hsiao-ming Hsu Hinds Geophysical Sciences Atmospheric Sciences Chicago Illinois University of Wisconsin Walter Macintyre Roy Jenne NCAR SCD NCAR SCD Ton Mayer Gary Jensen NCAR AAP NCAR SCD William McKie Jeff Keeler Climatic Research Institute NCAR ATD Oregon State University

Robert Kelly Jack Miller Cloud Physics Laboratory NCAR HAO University of Chicago Robert Mitchell Thomas Kitterman NCAR SCD Dept. of Meteorology Florida State University Carl Mohr NCAR CSD Daniel Kowalski College of Engineering Donald Morris Rutgers University NCAR SCD

Carl Kreitzberg Nancy Norton Dept. of Physics NCAR AAP Drexel University Bernie O'Lear Michael Kuhn NCAR SCD NCAR AAP Stephen Orszag Chela Kunasz Dept. of Mathematics JILA M.I.T. University of Colorado Richard Oye Bob Lackman NCAR ATD NCAR SCD Jan Paegle Ron Larson Dept. of Meteorology Cray Research University of Utah -6-

Pete Peterson James Tillman NCAR SCD Dept. of Atmos. Science University of Washington Vic Pizzo NCAR HAO Greg Tripoli Dept. of Atmos. Sciences Gandikota Rao Colorado State University Dept. of Earth & Atmospheric Sciences St. Louis University Stacy Walters NCAR ACAD Gary Rasmussen NCAR SCD Thomas Warner Dept. of Meteorology Cicely Ridley Pennsylvania State University NCAR SCD Warren Washington Paul Rotar NCAR AAP NCAR SCD Rick Wolski Robert Pasken NCAR AAP Dept. of Meteorology University of Oklahoma

Eric Pitcher Dept. of Meteorology University of Miami

John Roads Scripps Inst. of Oceanography

Herb Poppe NCAR SCD

Richard Sato NCAR SCD

Tom Schlatter NOAA/PROFS

Bert Semtner NCAR AAP

David Stonehill Rochester University

Eugene Takle Climatology-Meteorology Iowa State University

James Telford Desert Research Institute University of "evada CureK II: DIVISItI S - ANeD Mainye

Future Developments within SCD - Walter Macintyre Walter Macintyre National Center for Atmospheric Research

As of the date of writing (10/26/81), it is extremely difficult to forecast how many of the things that we would like to do we will actually be able to deliver. We are one month into the fiscal year without knowing exactly our Divisional budget for FY82. The outlook in FY83 is still more uncertain, but the prophets of gloom and doom seem to outnumber the optimists. Therefore I am reviewing in this presentation some things I feel the community must have from the SCD comitment that I will do everythin within my power to ensure that the computational needs of the community are in fact met, but with only a modest expectation of success. However, in a recent letter to the Chairman of the SCD Advisory Panel, the President of tCAR declared that enhancement of the NCAR Computing Facility was the top institutional priority in the months and years ahead.

These needs include some relatively novel developments. Last January, the message was loud and clear that the primary need perceived by the cornunity was for more computing power. I am relatively sure that the funding is avail- able to replace the CDC 7600 with a more powerful machine, i.e. one both with more flexibility and more throughput. The real question is whether the 7600 should be replaced with a very powerful scalar machine or whether we should attempt to augment the CRAY capacity instead. It is far from clear that the funding available is adequate to support the latter choice. Augmentation of the CRAY involves adding another vector mainframe. Adding another vector mainframe means adding additional front end capacity. Thus the option of adding another vector machine rapidly becomes a very expensive one when fully costed.

You have all experienced the effects of the loss of a TBM channel during July and August 1981. The reasons for such prolonged failure are too long to recite here, but the bottom line is that Ampex, even though they really tried very, very hard, had neither the personnel nor the spare parts necessary to repair the system more quickly. We are taking steps to try to insure that another such disaster does not happen. However, given the nature of the TBM, it is impossible to guarantee that it won't. Furthermore, the load on the TBM which, during the first few years at NCAR, increased linearly with time is now showing a higher order dependence. It would appear that we are entering a period of exponential growth in the material stored on the mass store. It is essential that the TBM be replaced at the earliest possible date with a more reliable system which also has a capacity of at least two orders' magnitude greater than our current capacity. The new mass store must also be able to accommodate overflow in a graceful manner. Unfortunately, no such system exists on today's market. All sorts of interesting sounds are emanating from various sources. It seems possible that a new mass storage system, which meets our needs, will be announced within about two years time. It is our intention to plan to replace the TBM within that timeframe, but it must be remembered that such procurement may be delayed not only for budgetary reasons but because there is no product to procure. -2-

Computing at NCAR is moving into an era that will be characterized by interac- tive access to our major systems. This change opens the door to qualitatively different types of computing. These differences include relatively long con- nect times between the remote site and the central facility at NCAR, and also rather large data flows between the remote user and the central facility. Long connect times mean large telephone bills for the user. Therefore, it is essential that we develop mechanisms that enable the user to have access to the NCAR system at local telephone rates. We have had some discussions with commercial packet-switching network operators, but so far these have been inconclusive. They are able to offer the user the relatively cheap local ser- vice. However, there are two drawbacks to these systems. They are unable to provide data rates in excess of 9600 bits/second, and the charges to NCAR are substantial. The high data rates that will follow from the development of interactive access can only be accommodated by satellite conmunications. We are currently putting together plans for a system that has relatively cheap access, to minimize the costs of connect time, and very high band width to permit the most productive use of that connect time.

A feature of the NCAR computing center is its diversity. In the past years great strides have been made in overcoming the problems inherent in trying to communicate between machines, each of which is very different from the other. Our internal network, which has been under development for five years, is now finally on the air--with our major systems connected. Now that the hardware interfaces have been perfected, and the low to medium level software for these interfaces is up and running, the time has come to make a major effort to eliminate the last remnant of this diversity from the users' attention--the individual JCL for each machine. In the past year, a great deal of study has been made of the proposal that a uniform, or as close to uniform as possible, JCL be developed to cover NCAR's major systems. This JCL must be able, albeit with very minor modifications, to adapt to change in major hardware without the JCL itself being changed. This can only be accomplished by replacing native JCL with procedures, most of which will be written by SCD staff in the native JCL, but some of which will have to be rewritten by SCD staff in a machine dependent language when the new hardware is procured. It remains to be seen how effective and how uniform such a JCL can be, but we are very optimistic.

Finally I would like to say a word about the Comrunity Climate Model. The SCD has been intimately involved in the development of the CCM from the start. We accept a continuing responsibility to assist in its maintenance and documenta- tion. This is an excellent example of the type of cooperative effort between SCD and other NCAR Divisions that I would like to see develop. Recently there has been some talk of other community models. These may be circulation models or they could be chemical models. The SCD intends to work actively with the scientific divisions involved in the development and maintenance of such models. Such a model is a major piece of software, and maintenance of these models could be regarded as being no different from maintenance of, say, the NCAR graphics package. It is our hope that in this way we can assist the com- munity by freeing the scientific staff in the Divisions to move on to more interesting science while providing to the comnunity well-documented and well-maintained standard models. SECTmON II OM5I:PTOING ID TWE ANSIESEC 9CIEMES TIS TMHE 1980OS

Computing in the Atmospheric Sciences in the 1980's -- Warren Washington Summary of Discussion CXPamTDC ON 7M AMOSEHBRXC ES & 19M80"S

Warren M. Washington National Center for Atmospheric Research

The use of computers in the atmospheric sciences has made an enormous impact on the type of science that has been done over the last 30 years and has led to many scientific achievementse An enumeration of these achievements will not be made here except for pointing out that the progress made in numerical forecasting the weather and the ability to simulate solar, weather, climate, and ocean phenomena would have been impossible without the modern-day elec- tronic computer. The first annual NCAR computer user's conference held in January 1981 examined future computing needs in astrophysics, the upper atmo- sphere, climate, fluid dynamics, oceanography, cloud physics, weather predic- tion, and mesoscale modeling. They made projections of their needs in each specific field. These expressions of need were very useful for the future plans of NCAR's Scientific Computing Division.

NCAR has played a unique and important role as chief provider of large facili- ties for computing requirements for the NSF atmospheric and ocean academic sciences. It may be useful to briefly review the history of computers at NCAR. NCAR acquired its first computer in the mid-1960Cs. Up until that time it was making use of an IBM 709 at the University of Colorado. As usage increased it quickly became apparent that NCAR needed to acquire its own com- puter. In 1964 NCAR acquired a CDC 3600 which was subsequently replaced by a CDC 6600 in 1965. In 1971 we acquired a CDC 7600 computer and in July 1977 a CRAY-1. The NCAR Scientific Computing Division (SCD) now has a CDC 7600 and a sixth-generation CRAY-1 as its chief large computers. Even with this capacity it is clear that the present system is definitely saturated. Much of the dis- cussion at this meeting will focus on how to increase capacity in the future. Even though the focus of a meeting will be on looking at NCAR's need for large computers, it may be useful to discuss needs of the entire atmospheric science research community in the 1980's.

Saome of the material that will be presented in this paper is taken from a recent study by the National Advisory Committee on Oceans and Atmosphere (NACOA) in which they pointed out the need for new and improved research facilities in the atmospheric sciences. Part of that report looked at the future needs of computers in atmospheric sciences research. Table 1 of the report summarizes, as of FY1980, the present array of computers used in the atmospheric sciences separated by agency and location, type of computer, capi- tal and operating costs, a percentage of users who are in-house and outside, and percentage of computer use for atmospheric research. Because different agencies account differently, the table may not be entirely consistent. As might be expected by looking at Table 1, we see that there is enormous Federal capital and operating costs associated with large computer installations. The NOAA installations have a large percentage of use for atmospheric research; these installations are Geophysical Fluid Dynamics Laboratory (GFDL) at Princeton University, the Environmental Research Laboratories/NOAA (ERL) in Boulder, Colorado, and Suitland, Maryland, which is part of the National Meteorological Center (NMC). Also NASA through its research at Goddard - 2-

Institute for Space Studies (GISS) in New York, Goddard Laboratory for Atmos- pheric Sciences (GLAS) in Greenbelt, Maryland, and NASA Langley in Hampton, Virginia, provide a large part of atmospheric sciences computing requirements. NASA/Ames is not included in the table. It is interesting to note that 20% of NASA's Greenbelt computer is devoted to university atmospheric sciences.

The National Science Foundation sponsors the computer installations at NCAR of which 100% usage is for atmospheric research. About half is for university research and the other is used for NCAR research. About 15% of the Environ- mental Protection Agency (EPA) are used for atmospheric research. The Depart- ment of Energy (DOE) has a large array of computers that are used at various installations, but most of them are used less than 5% for atmospheric research. It should be noted that GFDL/NOAA and GLAS/NASA have both con- tracted to obtain sixth-generation Cyber 205 computers in the early 1980's,

The increase in computer speed over the last three decades has been impres- sive. Figure 1 shows the speed of a computer in millions instructions per second (mips) versus years. This figure was based on relative test at GFDL. Since such tests are dependent on what test programs are used, this figure should not be taken too literally. It shows a rapid increase from 1954 to 1970 and a somewhat slower increase after 1970. Note that the Cyber 205 is capable of 100-200 mips. It is expected by 1990 that there will be computers several times faster with very large central memories.

Since new computers are becoming faster and more efficient to operate, it may not always be wise to delay the purchase of a new computer even in hard times. To demonstrate the increase in efficiency, Figure 2 shows mips per million dollars versus year. This figure is also based upon relative measurements made at GFDL. It shows in dramatic fashion that the cost of doing 1 mip worth of computing has dropped by a factor of 1000 over the last 30 years. Its rate of decrease seems to have slowed since 1970 but still appears to be quite sub- stantial.

Given the state of large-computer technology development, the 1980's ought to be a period of optimism, but it clearly is not. A contracting Federal budget will make obtaining large capital items more difficult. In order that NCAR and other Federal installations not fall behind in having the most advanced computers needed will require some difficult choices be made in order to pur- chase the computer required, as well as the necessary peripheral equipment such as mass storage devices, communications and graphics equipment.

NCAR has accumulated about 1012 to 1013 bits of atmospheric and oceanic data. This data is stored on a variety of different storage media. Even though this is an impressive amount of data it is 10 times less than what a single weather satellite transmits in a single year. For example, the Geostationary Opera- tional Environment Satellite (GOES) transmits 2 million bits per second or 1014 bits per year. Not all of this data needs to be stored since much of it can be compressed. Obviously the amount of data will grow during the 1980's and provision for reliable and efficient storage devices must be sought. Other devices having a large impact on the way research and operations are being carried out are the interacting graphics systems. These systems are especially useful in the monitoring and analysis of large meteorological data sets such as GARP Atlantic Tropical Experiment (GATE) and the World Weather -3-

Experiment. Also, the new Automated Forecasting and Observing System (AFOS), being developed by the National Weather Service, is being implemented. These devices have been particularly beneficial for the analysis of mesoscale and satellite data. Already the McIDAS interactive graphics system that was developed for satellite use at the University of Wisconsin-Madison is being tested at the NOAA Severe Storms Forecast Center. Some scientific divisions at NCAR make extensive used of such devices. One question to be asked for the future is how rruch capability for interactive graphics should reside in the Scientific Computing Division?

It is clear that this Federal administration, and perhaps future administra- tions, will be taking an even harder look at what to fund and what not to fund. If there is not a unique role within an agency or within the Federal government, it will be a likely candidate for elimination. Returning to the NCAR context, the question to be asked is whether NCAR's computing facility is unique and not overlapping with facilities found at the universities or other Federal installations. The practical question for NCAR management for FY1982-84 is, for example, whether it should take out the CDC 7600 or decrease the amount of user support. The elimination of the CDC 7600 computer would effect the computing power of the CRAY-1 computer by an estimated 20 to 40% which would make the NCAR facility less unique.

Another but similar question could be applied to overlapping facilities out- side the NCAR context. Do the Army, Navy and Air Force need separate and somewhat overlapping facilities for weather forecasting? Are the requirements of each so different that having separate and distinct facilities can be jus- tified? Should NASA have three separate facilities; should NOAA have three separate facilities? Of course, these are questions that will not be answered here but ones that are being asked in Washington, D.C.

As some of you know, the Federal Government has a Federal coordinator for com- puters in the atmospheric sciences. The coordinator prepares a Federal plan for various parts of the administration, such as the office of Management and Budget (OMB) and Congress, to show that each installation is playing a unique role. During good financial times this plan is not taken too seriously, and it is up to the individual agencies to justify their own computer require- ments. During tight budget times, such a coordination plan becomes more important and is one of the tools used by OMB and Congress to make a 'yes' or 'no' decision. It is expected to be especially difficult to find sufficient funds for large capital items during FY1982-84. The creative financing (tax exempt bonds) such as was used for the purchase of the CRAY-1 will probably be employed in the future. In this way the cost of a new computer will be spread over a larger part of the machine's lifetime. However, such creative financ- ing is only attractive during times of low interest rates and low inflation. Such schemes are not attractive at present.

Now it is time to gaze into the crystal ball and speculate about the future, given the trends seen in Washington, D.C. It is expected that there will be a shedding of responsibilities by various Federal agencies engaged in atmos- pheric research. This narrowing of purpose will be forced by expected 5 to 12% cuts over the next three years. Which institutions will be the survivors? Probably those that have a unique and important role to play in providing atmospheric services and research. -4-

Table 1. Summary of existing large-scale operations in support of atmospheric modeling research, fiscal year 1980***

Annual Percentage use Capital operating for atmospheric Agency/Location Computers costs costs Users research

Types Millions of dollars Percent

Department of Commerce

NOAA

GFDL, Princeton Univ. TI/ASC 4.6 1.5 In-house 70-80

Environmental Research Labs. CDC Cyber 170/750 3.2 In-house 50-60 Boulder, Colo. Suitland Md. (3) IBM 360/195 14.9 5 In-house 15

Department of Defense

AFGL, Bedford, Mass. CDC 6600 n/a n/a In-house n/a

U.S. Army, White Sands U 1108 n/a n/a In-house n/a

U.S. Navy (FNOC), CDC 6500 0.3 8 In-house 10 Monterey, Calif. Cyber 170/175, Cyber 170/720, Cyber 203

NASA

GISS, New York, N.Y. IBM 360/95 6 1 In-house 75

GSFC, Greenbelt, Md. Amd ah 1 5 2 In-house, 80% 80 Universities, 20% 80 (2) IBM 360/95 12 3 In-house, 90% 20

Hampton, Va. CDC Cyber 203, 50 10 In-house, 95% 20 Cyber 170/173, Other, 5% (2) Cyber 173, (2) Cyber 175, (2) 6600

Pasadena, Calif. (3) U 1108 13 6.5 In-house 1.5

NSF

NCAR, Boulder, Colo. Cray 1, CDC 7600 12 5 NCAR and 100 (front end to Cray) Universities

EPA

Research Triangle Park, N.C. (2) U 1100, IBM 360/168 0.5** 23.4 In-house 15 DOE

Argonne National Lab. IBM 370/75, IBM 370/95, 48 23 In-house 5 Argonne, Ill. IBM 370/50, (2) 30/33

Brookhaven National Lab. CDC 7600, (2) CDC 6600, 23 8.5 In-house 3 Upton, N.Y. DEQ PDP/10, Sigma 7

Battelle Northwest Lab. n/a 26 21 In-house 2 Richland, Wash.

Idaho Nat'l Engineering Lab. IBM 360/75, Cyber 76 21 10.7 In-house n/a Iadho Falls, Id.

Los Alamos Scientific Lab. (2) CDC 6600, (4) CDC 7600, 85 29 In-house 0.4 Los Alamos, N.M. Cray 1, (2) Cyber 73 5-

Annual Percentage use Capital operating Agency/Location for atmospheric Computers costs costs Users research

Types Millions of dollars Percent Lawrence Berkeley Lab. CDC 7600, CDC 6600, 20 7.9 In-house 1 Berkeley, Calif. CDC 6400 Lawrence Livermore Lab. Cray 1, (4) CDC 7600, 77 21 In-house n/a Livermore, Calif. (2) Star/100, CDC 6600 Oak Ridge National Lab. IBM 360/75, PDP/10, 27 30 In-house 3 Oak Ridge, Tenn. SEL 810B Sandia Labs. U 1108, U 1100-82, 66 ' 17 In-house 0.5 Albuquerque, N.M. (3) CDC 6600, CDC 6400, CDC 7600, Cyber 76, PDP 10

Savannah River Lab. IBM 360/115 20 12 Aiken, S.C. In-house 0.3

n/a - not available.

*Interim system pending fiscal year 1982 procurement. **Leased equipment.

***Taken from Special Report: A Review of Atmospheric Science Research Facilities, NACOA, 30 June 1981, - 6 -

10' 1X/l 10.5 yrs

1 0 1 CYBER/ 205X

: t /# - CRAY 1 _ - t X TI/X4/ASC

10' _M 0'CDC 7600 30/- IBM 360/195

: : ~ ~~ ~ ~ - ~',d IBM 360/91

CDC 6600 10' j>UNIVAC 1108 _-- IBM 7030 _.^ ·. >^/ */IBM 7094 II · *IBM7094 10-.' IMB 7090 M

oIBM704 10I' BM 70701 10 7 1 I I I I I I I I I I II I I I I I I I I I I I I I I I I I i I i I I 1953 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 2000 YEAR

Figure 1.- Performance, top-of-the-line computers. (Based on relative measurements at GFDL.) (Source: J. Smagorinsky, December 1980.) -7-

100

1 0x MIPSL4S$Sm 77yrs .

2 10

-J 0 z 0 -J

1 O3 -i IBM 360/91

Q 0 CDC * UNIVAC 1108 UJ OC

0a IBM 7094 11 O .1 UJz 0cc I-Cn,) IBM 7090 · . IBM 7030 z 0 _-J 704 2

IBM 701

Not normalized for decreasing value of the dollar

.001- 1952 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 YEAR

Figure 2.-- Performance/cost for top-of-the-line computers. (Based on relative measurements at GFDL.) (Source: J. Smaqorinsky, December 1980.) - 8 -

Warren Washington (AAP) believes that Congress will now assert itself more than last year in budget matters. We will be looking for more cost-cutting and we shall be looking to see what the impact will be. He expects the trend toward new computers in federal government to grow because of cost- effectiveness; as maintenance costs escalate, more pressure will be brought to switch to newer, faster machines. There will be more pressure for coordinated approaches to proposals in which each department and agency must demonstrate a unique role. We must be sure that NCAR always provides a unique service, one that can be separated out from the rest of the federal programs.

A question was asked as to the cost-effectiveness of purchasing another CRAY at a cost of $15 to $20 million to give us additional capacity. In reply, Larry Lee (NSF) said it is very difficult to talk about cost-effectiveness. The NSF prefers to look at the mix of computers to best serve the demands of the university and scientific community rather than just the standpoint of cost.

Francis Bretherton (AAP) continued the discussion of whether NCAR should try to continue to provide the type of computing that can't be done elsewhere. What do we really want most to do over the next few years, and why? We need to convince ourselves and our colleagues but must convince the NSF also. The governmental decision-making structure has everyone competing for the money. We need a demonstration of the most effective way for us to go; and this must be articulated first. When we brought about the procurement of the 7600 and the CRAY-1, teams visited universities and explicitly asked what was needed. We then prepared a document which was sent to the NSF. Now we are operating in ways such as this conference for feedback.

What indeed are our perceptions of the kind of problems on which an increment of the type of machine we want would be qualitative. First, numerical weather prediction--climate modeling, which has an insatiable appetite--mesoscale modeling--chemical areas--oceanographic areas--getting data from ships at sea, etc. We need to go about the area bit by bit and see what is needed and where we are going. This must be in the conference report.

Carl Kreitzberg (Drexel University) said last year's conference particularly addressed that issue. He asked if it was felt that what came out of the conference a year ago would be sufficient to provide the scientific basis necessary.

Washington said that if we do get the next generation CRAY, what can we do that couldn't be done before? We came up with a wish list of what we would like to have at last year's conference, and it was five CRAYs. Is there any- thing that clearly states what we hope to do with the additional computing capacity?

Bretherton said we must single out a few problem areas that will be widely accepted where this upgrade will be crucial.

The statement was made that asking for one thing may mean giving up something else. We need long-range interactive computing, but may have to give up some -9- of our pure computing ability.

On the subject of interactive computing, Macintyre said that we are talking here about two different things. We are not talking about interactive comput- ing as something desirable, we are talking about interactive access to the system. There is a fine gray area between these two ideas. We are not talk- ing about providing interactive computing in the main. We provide services that cannot be provided on the campus. We will not compete with them. We will not replace campus minis, etc. SECTIN IV: DATmkCXfD Kh 5tin ACSS

Data Comnumunication Needs for UW/NCAR Computer Link -- Dave Houghton

Data Communication and the Atmospheric Sciences -- Dave Fulker

Achieving a Uniform NCAR Computing Environment -- Herb Poppe

Conference Recormendations Dr]mu D194r E BDS O]qIRiJr/ACMP L]DTR

David D. Houghton University of Wisconsin-Madison Workshop I: Data Communications/Uniform Access

It is useful to summarize the communication needs implied by actual usage pat- terns on the University of Wisconsin RJE link to NCAR, since this represents the experience of one of the larger NCAR external computer users. Records for the past eight months are summarized to highlight commmunication implications.

AL

A. User Community at Wisconsin

There are approximately ten projects using the facility involved with large scale modeling, mesoscale modeling, cloud modeling and data pro- cessing. The mesoscale modeling is the research area with the largest use of the facility.

B. Facility

The RJE system consists of a card reader, a 300 line per minute printer and a CRT control station, all supported by a Harris/6 central processor. A leased line with 4800-baud capability connects the facility to NCAR.

Subary to Temnal Activity for First 8 Iolths cE 1981

Month Jobs CPU (min) PPU/IO (hr) Pages Frames Tape WS - 7600 CRAY 7600 CRAY 7600 CRAY Printed Film Mounts Re- ______Output quests Jan. 293 218 48 23 4.3 1.4 4537 1742 29 371 Feb. 263 235 184 115 3.5 6.4 4674 16094 32 291 Mar. 247 403 140 69 4.3 2.9 6484 11337 56 183| Apr. 309 388 376 41 7.0 1.7 7335 10089 87 253 May 378 299 627 46 8.7 2.5 7600 16825 64 242 Jun. 484 325 523 51 7.5 1.4 12492 12401 48 360 Jul. 206 359 116 85 4.2 3.8 7958 5749 41 252: Aug. 144 468 87 144 2.4 6.7 8276 2436 29 165

TOTAL 2324 2695 2101 574 41.9 26.8 59356 76673 386 2117

AVE 291 337 263 72 5.2 3.3 7419 9584 44 265 -2-

The primary indicators for communication data volume are given by the pages printed and frames of film output information.

A. Pages printed

The average load was 7,419 pages/month with a range from 4,537 to 12,492 (+50%). If the average page is assumed to contain 5 000 characters or 40,000 bites, the data transfer rate is about 3 x 106 bites per month (300 megabites) which is an average of 107 bites per day or about 102 bites per second. This is about 2% of the total capacity of the leased line on an average basis.

B. Frames of film output

Currently the film is produced at NCAR and mailed to Madison. However, it is instructive to consider the requirements for direct line transfer of this information. The 9,584 frames per month (range 1,742 to 16,825) represents the following data transfer according to the content of the frames.

Data/frame (k bites) Total data (Ref. Dave per month Average rate % of 4800 Type of Output Fulker) (megabites) bites/sec baud line

Same as printed page 40 384 148 3

Preview graphics 25-150 240-1440 93-550 2-11

Production graphics 50-1000 480-9600 185-3700 4-77

Raster images 1000-8000 9600-76,600 3700-29,500 77-615

The current use of 4800-baud leased line for computer data communications between the University of Wisconsin and NCAR provides ample capacity to handle current data transmission (currently the use is only several percent of capa- city on a monthly basis).

Nevertheless, if current graphical output were to be transmitted to Wisconsin for real time display the data transfer requirement could approach 50% of full capacity of the leased line if production quality graphics were considered. It appears to be out of the question to transmit raster images with anywhere near the current number of frame output quantity using the present communica- tion system.

The RJE system is still in its growth stage at Wisconsin so the requirements (needs) of the future are expected to be greater than the analysis here dep- icts. -3-

by David Fulker National Center for Atmospheric Research Workshop I: Data Communications/Uniform Access

Certainly one of the fastest growing fields of computer related activity is electronic data communication. Technology developments are occurring rapidly (in such areas as satellite communication, digital service on telephone-like circuits, fiber optics, and networking techniques), but interest in new appli- cations has developed even faster than the technology -- exciting new concepts include electronic mail, televideoconferencing, financial transactions without cash or checks, electronic access to newspapers and magazines, rapid access to a variety of databases, and networks of computers serving a variety of spe- cialized but interrelated functions.

The possibilities are numerous for the application of electronic data communi- cation techniques to the work of atmospheric scientists. However, to realize these possibilities requires budgeting and planning, and indeed, the value of such efforts must be assessed. Planned for the upcoming SCD Annual Users' Conference is a workshop on these topics as they pertain to NCAR. The rest of this document describes various data communication options that might be of significant value to the atmospheric science commrnunity, and poses questions that may help in assessing the actual merits and costs of the options. Of particular concern are the needs for data communication in conjunction with high speed computing as envisioned at NCAR and elsewhere. vPREsET noe CIlUwNlCPa N s VNCOS

An overview of the present state of data communication at NCAR will be presented at the workshop. This will include descriptions of the services offered, patterns of use, and problems that can be identified.

Workshop Questions:

- Within the context of the current services, what changes, if any, should be made to the NCAR data communication facilities?

- In particular, what communication protocols should be supported?

- What are the priorities for such changes?

A WISH LIST (F n WW-ICTIo CNCNREP S

Many of the recent developments and concepts in data communication have excit- ing potential for the atmospheric science community. This is especially true because of its broad geographic distribution, its reliance on centralized com- puting and data collection facilities, its innovative use of graphical representations, its need for globally collected datasets, and its interac- tions with other disciplines. -4-

Each of the following concepts will be elaborated at the workshop.

* Connecting to NCAR by a local call to a public network.

* Transmitting various forms of graphical images from NCAR to remote sites.

* Rapid access to archived meteorological datasets.

* Use of datsets and graphical images for instructional purposes.

* Real or near real time access to remotely collected data.

* Exchanging datasets among various computer systems.

* Computerized access to scientific journals and library materials.

* Electronic mail and its use for conferencing.

* Televideoconferencing.

Workshop Questions:

- Which of the above concepts are valuable to the atmospheric science com- munity, especially those that involve NCAR?

- What are the priorities among them?

A sWRVE F HJIG P a MDNICATEE NES6

The SCD distributed a questionaire on high speed data communication to just over 300 atmospheric scientists, all of whom are present or past users of the NCAR computing services. More than 150 responses were received, providing qualitative and quantitative information on the desirability of many items from the above wish list.

The results of the questionaire will be summarized and discussed at the workshop.

Workshop Questions:

- How should the survey results be incorporated into the assessment of needs and priorities for data communication?

- Can the results be translated into quantitative needs?

ANII AIBT aE XUNI SICICES

A brief study of commercial data communication facilities will be presented at the workshop. This will include descriptions of the services offered, poten- tial applications to NCAR and the atmospheric science comnunity, and estimates of associated costs. -5-

Topics will include:

* Conventional Direct Distance Dialing network.

* Public packet switching networks.

* AT&T Dataphone Digital Service.

* Satellite Communication channels.

* The local distribution problem.

* The vacuum left by XTEN and the business world's abhorrence.

Workshop Questions:

- Which commercial communication facilities should NCAR use to best meet existing requirements?

- What alternative configuration(s) would meet a reasonable proportion of the needs as identified in previous questions under the wish list and the user survey?

AM NOR PKM ( irUURXy0 P[AN

An attempt will be made to prepare a coherent plan for NCAR data communica- tion, with options for several levels of service to be provided, and assess- ments of the merits and costs for each. Hopefully, these assessments will encompass the entire environment in which data communication needs must be balanced against needs for other computer services.

Workshop Questions:

- In view of the costs versus value received, how vigorously should NCAR pursue developing its data communication services?

- What are the options and priorities for such developments? - 6 -

VGA IF 4 AR Ia I

Herb Poppe National Center for Atmospheric Research Workshop I: Data Communications/Uniform Access

The NCAR Local Network is an interconnection of all of the computer systems of the SCD and some of the computers of the other NCAR divisions. These comput- ers, together with their front ends, peripherals, operating systems, command languages, programming language compilers, utilities and libraries constitute the computing environment available to NCAR staff, UCAR members and visitors. A diagram of the NCAR Local Network, as currently configured, is shown in Fig- ure 1.

At the present time, a significant subset of this environment (the CDC-7600, the CRAY-1A and the Ampex TBM mass-store) can be used with only a knowledge of FORTRAN programming and the modest command language (job control language) of the CDC-7600. Jobs to be run on the CRAY can be expressed in the command language of the CDC-7600; these jobs are translated to the command language of the CRAY by the Link Translator prior to their being sent to the CRAY via the 7600-CRAY Direct Link.

It is inevitable that the CDC-7600 will be retired. Hopefully, it can be replaced with a machine of the power of the CRAY-1S (or CRAY-2?) or the CYBER 205. Both the current CRAY and the CDC-7600's hoped for replacement will be front-ended by another machine on the Network, either the IBM-4341 (or its upgrade) or one of the other mini-computers on the Network. In order to access the equivalent computing resources after the CDC-7600 is retired, the user possessing only a knowledge of CDC-7600 concepts and command language must suddenly learn the concepts and command languages of three new systems! In addition, this user must learn how the concepts of one system are related to the concepts of the other systems in this non-uniform computing environ- ment.

It is a purpose of this workshop to examine the plight of the user in a non- uniform computing environment and to explore what remedies may be taken to alleviate the situation. The following issues will be addressed:

* What problems will such a user face in a non-uniform computing environ- ment?

* Can a uniform computing environment be established at NCAR?

* What level of uniformity among systems can be achieved?

* What should the uniform computing environment look like?

* Can uniformity be achieved at an acceptable cost?

* Should SCD proceed with the development of a uniform computing environ- ment? - 7-

CBE6 I )X E-W A IH-UNICM RtI

The problems that a user now submitting jobs on cards over the counter for execution on the CDC-7600 or CRAY-1A (using the 7600-CRAY Link Translator) will face in the future are similar to the problems faced by a user now sub- mitting jobs over the Network via the IBM-4341 or a mini-computer such as HAO's PDP-11/70 UNIX system.

Te ciriol

One would hope to be able to transfer previous knowledge gained in learning to use one system to learning to use another system. A major impediment to this knowledge transfer is the use of different terminology for concepts which are the same between two systems or the use of the same terminology for concepts which are different between two systems.

For example, a dataset on the CRAY-lA has characteristics which are analogous to a volume on the CDC-7600. On other systems, the term dataset often has characteristics associated with the term file.

Often the basic concepts underlying one system diverge dramatically from those of another system.

Some systems are based on the notion of "device-independence". On these sys- tems, the user treats all devices in exactly the same way. For example, a single COPY command supports the copying of a disk file to tape, a card deck to the line printer, etc. To varying degrees, the UNIX, CDC-7600 and CRAY-1A support this concept. (It is carried to a high level in the RSX-llM operating system which supports the DICOMED). On the IBM-4341, however, each device is treated differently. For example, there are 9 commands which implement the "copy" function (COPYFILE, DDR, DISK, MOVEFILE, PRINT, PUNCH, RFADCARD, TAPE, TYPE) under CMS.

Fuctiamality

The functionality of the systems in a non-uniform computing environment may be so different that the user has difficulty porting his applications among machines.

The IBM-4341 supports a command programming language (EXEC's); neither the CRAY-1A nor the CDC-7600 do.

The user has no way of recovering from an error occurring in a system utility* (i.e., FORTRAN compiler) on the CDC-7600; the job is automatically terminated. On the CRAY-1A, the user can use the EXIT command to cause control to be transferred to a series of commands to recover from the error, although the nature of the error cannot be determined. On the IBM-4341, within EXFC's, the user can not only branch to a series of conmands after an error is encoun- tered, but, since the nature of the error is known, the error recovery can be conditional upon the error. -8-

Seamtics

Even though two systems may have the same functionality, the actions taken when the function is invoked may differ; that is, the two ccmmands have dif- ferent semantics.

Both the IBM-4341 and UNIX provide a file "rename" function. On the IBM-4341, an error is reported if a file of the same name as the renamed file already exists. On UNIX, the existing file is replaced without warning. On some other systems, the renamed file is given a version number one greater than the existing version.

Qand Snax

The difference in syntax among commands in a non-uniform computing environment is a source or error and frustration to users.

One aspect of differing syntax is that commands that perform the same function have different names. On UNIX systems, a list of the files catalogued in the working directory can be obtained by typing "ls". On the IBM-4341, a list of files catalogued on the "A" mini-disk can be obtained by typing "1". A user moving between these two systems often confuses the commands.

It is a major effort to remember all of the nuances of the command syntax of the various systems. A typical error made by first time users of the command language of the CRAY Operating System is forgetting to terminate every command with a ".", since none of the other systems on the Network require an explicit command terminator.

File ifer ing The way in which files are referenced can make for some of the greatest differences among systems.

On the CDC-7600, files may be referenced by name or by sequence number (posi- tion). On the CRAY-]A, files cannot be referenced by name (they have none) nor explicitly referenced by position. Rather, files are referenced impli- citly by the current position within a dataset (which may not be the beginning of a file); the current position can be changed via dataset positioning com- mands (REWIND, SKIP, etc.) .

Netwok Ai

The Network allows for transfer of data between a device on one system and a device (or the job input queue) on another system. Device parameters are specified using the syntax of the corresponding system. One or more commands are availiable on each system for accessing the Network. The syntax of these commands is in the style of the command language of the system on which they are invoked. Hence, Network access is inherently non-uniform.

The syntax of a system on which a Network command is invoked does not always have a graceful way of imbedding the syntax of remote device parameters. For example, the character code used on the CDC-7600 cannot express lowercase -9-

alphabetic characters. But user names and file names on UNIX are generally expressed in lowercase and are different entities than those expressed in uppercase. Thus a method had to be developed on the CDC-7600 to specify lowercase so that files could be disposed to a UNIX system. An example of a command to dispose the data on the CDC-7600 print unit to a file named 'xyz' belonging to a user with the login name of 'jones' on HAO's UNIX system is:

*DISPOSE,DN=PR,MF=HA, DC=ST,TEXT=USER= IJO 'U LXYS U

Access is also not uniformly implemented across the Network; the CDC-7600 has the ability to dispose, but not acquire, data across the Network.

FW;IfIf OFP A IAIIF CmaRAFIWB ERmDMI From the above description of the problems a user will face in a non-uniform computing environment, the following are desirable features of a uniform cam- puting environment:

* Uniform Terminology

* Uniform Concepts

* Uniform Functionality

* Uniform Semantics

* Uniform Command (Job Control) Language Syntax

* Uniform File Access

* Uniform Network Access

The following features are also desirable in a uniform computing environment:

* Uniform Programning Language

* Uniform Utilities (Tools)

* Uniform Libraries

* Uniform Record Access

* Uniform External Data Representation

* Uniform Internal Data Representation

The level of uniformity which can be achieved is a function of the number of items in the list above that can be implemented. The number of items which can be implemented must be traded against the cost of implementation and the negative side-effects the implementation may have on the resulting computing environment. - 10 -

These areas of uniformity are interrelated. For example, if one wished to introduce a uniform file naming convention, most of the host system's commands would have to be modified to accept the new file naming convention. It might be just as easy to implement a uniform command language which accepted the new file naming conventions as it would be to modify all of the existing commands. On the other hand, it would be possible to implement a uniform command language without implementing a uniform file naming convention.

A SYSTEMMrIC APWB]CI HO DTSOSSIcr N A Wrrll C(AMX MrU"Gfit'r

Rather than discuss the differences and similarities among different real sys- tems by comparing them to each other, it is often useful to compare each against a hypothetical system.

Made]ng-a (Oop-iifcigbr rcent' - TIt Abstract System A uniform computing environment implemented on different real systems can be described in terms of an abstract system; the abstract system is a logical model of the uniform computing environment. The abstract system can be viewed as a hypothetical computer system with devices, files, etc.

For example, the abstract system may contain the notion of distinct collec- tions of data, called files.

* A file may be stored with other files in a file system.

* A file may be stored permanently or temporarily.

* A temporary file may be stored until the program that created it ter- minates or until the end of the session in which it was created.

The notion of an abstract system is extremely useful because it provides a common terminology to describe concepts which are common among real systems.

Abstract Syste to Ral Syst appig

In order to realize the uniform computing environment, the abstract system must be mapped onto the real system. (In some cases, the mapping is one to one; that is, the abstract system is the real system).

For example, a single-file, local dataset on the CRAY-1A would be mapped into a temporary file in the abstract system that is deleted when the session in which it was created terminates. Despite the name, CRAY-1A "permanent" files are not permanent; they will be deleted by the system as needed to prevent the CRAY disks from saturating. The residence time of large datasets after termi- nation of the job which created them can be very short on the CRAY. Therefore, a single-file, permanent dataset on the CRAY-1A would also be mapped into a temporary file in the abstract system that is deleted when the session in which it was created terminates. Permanent files within a uniform computing environment on the CRAY could be implemented by disposing the datasets into which they map to the TBM at session termination and automatically ascending them from the TBM at the beginning of the next session. - 11 -

Not all of the concepts in the abstract system may be mapped into entities within the real system. Different systems have different capabilities; the existence of a common abstract system does not change that. Just as one sys- tem may not have a device possessed by another system, one system may have both permanent and temporary files and another only temporary files. Thus, an application that depended on the semantics of permanent files cannot be ported unchanged to another system that only supports session temporary files.

Not all of the entities in the real system may be mapped into concepts within the abstract system. A uniform command language may not be able to provide all of the capabilities available via the vendor supplied command language.

It is possible, however, to access system specific capabilities via the uni- form command language in a portable fashion. For example, the user may be able to set the size of virtual memory on the IBM-4341 from within the uniform environment. If this operation were invoked on the CRAY, it would merely be ignored. Thus, uniformity is maintained, even though a system specific feature is invoked.

VFticUnal Capabilities

The operations which may be carried out on (or with) the entities of the abstract system constitute the functional capabilities of the uniform comput- ing environment. The ability to change the name of a file within the abstract system is an example of a functional capability. The usefulness of the uni- form computing environment is largely determined by the level of functionality it supports.

Stax ad Sostics - the OSCL

The user invokes the operations to be performed on or with the entities of the abstract system by means of an Operating System Command and Response Language (OSCRL). (Note that the responses directed to the user as the result of the execution of any OSCRL command are as much a subject of uniformity as are the commands.) The OSCRL has a specific syntax and semantics; for example, the command

RENAME ft.fortran TO fft.fortran would be the means by which the user would invoke the 'rename" function: to change the name of the file 'ft.fortran' to 'fft.fortran' with the proviso that if the file "ft.fortran' did not exist, the user would be so informed and no further action would be taken.

The OSCRL Processor is that entity which parses the syntax of the OSCRL and realizes (or causes to be realized) the semantics of the OSCRL. The essence of implementing the abstract system on any given computer is in the implemen- tation of the OSCRL Processor on that computer. The OSCRL Processor is not necessarily a single entity. The OSCRL Processor is not the operating system, although it may be a part of it. - 12 -

An implementation approach is the method by which the abstract system is made to execute on a real system; it is the method by which the OSCRL Processor is implemented. For example, a uniform computing environment could be realized by implementing a portable operating system on each NCAR computer system.

Evaluantig a AIpprobahes

Each approach has its advantages and disadvantages. For the purposes of com- parison, each approach could be evaluated against the following criteria:

* What limitations are imposed by the processors, peripheral devices, existing file systems and operating systems?

* What resources are required for each implementation approach?

* What impact does each approach have on existing user programs and datasets?

* What computing environments are possible with respect to both programming languages and job control languages?

* What is the relative impact on the functionality and efficiency of user level I/O?

* Which of the capabilities deemed desirable in an NCAR uniform computing environment can be implemented; which must be excluded?

* What impact does each approach have on user availability of any Network node during the implementation process?

* Does the approach have leverage (does the work performed in implementing the uniform computing environment on the first system reduce the amount of work necessary to implement the uniform computing environment on the next system; that is, what proportion of the code is directly transfer- able to another system)?

* Is the implementation approach inherently reliable?

* How does the implementation approach affect maintainability?

* How does the approach affect the importation of user media from outside the NCAR Network environment?

Possible Ill ti Appr esi Outlined below are several possible approaches to implementinq a uniform com- puting environment at NCAR. These approaches can be logically divided into two categories:

1) Abstract System = Real System - 13 -

In these approaches, the abstract system is identical the real system. A selection of one of these approaches is an implicit selection of the the abstract system, the functional capabilities available to the user for manipulating entities within the abstract system, and the syntax and semantics of those functions.

* Single Architecture, Single Operating System

In the future, NCAR would only acquire systems of the same architec- ture all running the same operating system. The processors could be different, upwardly compatible models of the same architecture, obtained from different vendors. The operating systems on these processors could differ because of the model number, provided that the computing environment was the same.

* Portable Operating System

A uniform computing environment could be realized if the host operating system on each computer were replaced with a portable operating system.

2) Abstract System Mapped onto Real System

In the following approaches, the abstract system is mapped onto the real system.

* Rewrite or Extend System Level Code

On each system, the command language processor and file system of the host operating system would be replaced by new system level code to implement the abstract system. This approach would make use of the existing device drivers, etc.

* Implementation via the Host System Command Language

The command languages of some computer systems are akin to program- ming languages. They incorporate such features as variables, expressions, assignment, parameterized procedures, etc. Each com- mand in the uniform command language is implemented by a command procedure in the host command language. Mappings, for example from a uniform file name to the host file name, are carried out by state- ments of the host command language. The actions are carried out by host commands having semantics similar to the uniform command being implemented.

* JCL Translation

In this approach the OSCRL Processor is a language translator; the translator inputs commands in the uniform ccxmand language, performs file name mappings, for example, maps the commands in the uniform command language into commands of the underlying command language, and passes the conmands to the underlying operating system. The translator may require the input of all of the commands in the - 14 -

uniform command language before sending the translated commands to the underlying command language processor.

* On-top-of' Uniform Command Language and File System

The OSCRL Processor is a layer of software built 'on-top-of' the software of the existing operating system. The user interfaces to this topmost layer. This software has the same relationship to the operating system as any user program. The OSCRL Processor is writ- ten in a high level language common to all of the systems in the uniform computing environment; and, to the greatest extent possible, the code is portable. Some of the functions requested by the user are carried out by manipulating entities which are solely contained within this layer; for example, setting the value of a command pro- cedure variable. Other functions requested by the user are carried out by calling upon the services of the underlying operating system; for example, deleting a file. - 15 -

Figure 1: NCRR Local Netsak

COMMUNICATIONS LINES I I I I I I . . 1. I I I

To the DICOMED plotter and various small computers

* - At present, MODCOMP lines do not pass through the PACX ** - Logical connection; physically attached via the 3705 - 16 -

The Workshop on Data Communications/Uniform Access was wel. attended. There was a great deal of participation by the users resulting in considerable exchange of information between the SCD and the User community and also between individual users. case Study of the Unirsity of Wiscxsin Usae

Dave Houghton from the University of Wisconsin presented statistics of usage of their RJE link to NCAR for the first 8 months of 1981. Included in the data were averages for the number of pages printed per month and the number of frames of microfilm generated per month. The current 4800 baud leased line was quite adequate for handling the transmission of their current printer out- put, but if current graphical output with production qualitv was to be transmitted, 50% of the capacity of the line could be required. Therefore higher bandwidth links may be required as interest in online graphics capabil- ities increases.

A question of the trade-offs between leased line versus dialup was discussed. The cost of the Wisconsin leased line was $1500-$1800 per month. Examples of typical costs of dialup lines were $25-$35 per hour from Woods Hole and about $1000 per month from University of Miami. Although more expensive in some cases, leased lines provided more reliable connections than dialup lines. Many dialup users found it necessary to try several calls before acquiring a good connection. In conclusion, where usage is high enough to justify the costs, leased lines are clearly preferred to dialup service.

A serious problem caused by satellite delays was brought up. Since the phone company may route long distance calls through satellite links rather than over land lines, delays on the order of 1/2 second may be incurred causing problems at the modem end. Several solutions were discussed and are listed in the Recommendations section which follows.

Data amunitis

Dave Fulker discussed the current RJE system. The asynchronous access is based on CDC UT200 protocol allowing simple one line commands. It was men- tioned that MIT is in the process of converting from a 4800 baud leased line connection to a Dataphone Digital Service (DDS) line. The cost was expected to be on the order of $25,000 per year.

The results of the questionnaire distributed by Dave to NCAR remote users in August, 1981 on the subject of high speed comunications was presented. Topics rated in the High and Critical category by 40-68% of the responses included quick l.ook and production graphics capabilities and the transmission of archived data and model fields indicating a need for higher bandwidth links in the future.

A description of a packet switching network was given. The cost of $5-$7 per hour of connect time (for Telenet) seemed very attractive compared with - 17 - current charges of about $25 per hour for long distance dialup connections.

Although the overall cost would be decreased, the cost for NCAR would increase since a processor from the packet switch network vendor would be required at the host end and charges would be billed to NCAR. However users were willing to accept a cost back charge for their usage.

Other alternatives were discussed. Satellite communications links would pro- vide very high bandwidth but cost seemed prohibitive and the end point distri- bution problem would still exist. Receive only systems for users were sug- gested since the amount of data transmitted 'to' users was typically consider- ably greater than the amount of data transmitted 'from' users to NCAR and the cost for such systems would be more feasible. The DDS line being implemented by MIT and digital termination services from private organizations selling portions of a satellite channel were topics requiring further research. UnifocaQ sfinglj rFwirjawkeant

Herb Poppe presented a talk on uniformity. The goal would be to provide a uniform computing environment for users of the NCAR systems which would increase productivity of users. Currently running a job usually requires going through more than one system. For example most jobs run on the CRAY-1 also go through the CDC 7600. This requires a user to know JCL on several systems. If a uniform JCL were available on all such systems, users would not be required to know multiple operating systems and corresponding JCL. A uni- form environment includes not only uniform JCL but also uniform access (or device independence) and uniform tools. Several questions were brought up.

- How much effort is involved?

- Can a user learn a new vendor OS/JCL faster than NCAR can implement a uniform system?

- Is a portable operating system (like UNIX) a solution?

- Is the loss of efficiency of the OS worth the benefits?

Data (kmmmication

The first priority was to solve the current problem caused by transmission delays created by the routing of long distance calls through satellite links rather than land lines. The priority for alternatives to a solution were:

1. Implement a new protocol not sensitive to such delays (e.g. X.25)

2. Add a 'black box' available from the phone company which would handle the delays.

3. Look into Dataphone Digital Service. - 18 -

4. Replace 208A modems with new Bell modems which handle the delay.

It was recommended that a study be done of implementing a new protocol (such as X.25) to handle host to host transmissions.

For future communications links, it was recommended that other alternatives, listed below in order of priority, be examined:

1. Subscription to a public packet switching network.

2. Use of AT&T's Dataphone Digital Service (DDS).

3. Use of satellite links with 'receive only' satellite systems for users.

4. Use of full satellite communications system.

Achieving a UniformaMAR Choxutbin Ehirosent

The workshop concluded that a uniform computing environment was highly desir- able. SCD users are already saturated with a proliferation of terminology, concepts, command languages and file systems; the projected replacement of the CDC 7600 will further aggravate the situation.

It was recommended that SCD proceed with a phased implementation of a uniform- command language and file system, using the "on-top-of" approach. There was interest in a portable system that could be installed at user sites as welli as at NCAR. It was felt that adherence to any projected standard was of negligi- ble importance compared to NCAR's fixing the command language and supporting. it through future system acquisitions. It would be desirable to have the first phase implemented with the arrival of the CDC 7600 replacement. S5CTOH V: GuaIak HmIN;SI HE NCR NEBRW 5I

Gateway Machines to NCAR Network Hosts -- Paul Rotar Conference Recommendations Paul Rotar National Center for Atmospheric Research Workshop II: Gateway Machines to the NCAR Network Hosts

The Scientific Computing Division operates a distributed network of computers serving a variety of needs. These computers can be classified as large-scale batch machines, small-scale interactive front end machines, job entry machines, and terminal control devices. A group of network adaptors intercon- nect most of the systems in the complex. Many of these systems cannot be rigidly classified and perform more than a single function.

The operation and maintenance of the gateway system is nontrivial. Some of the maintenance responsibility belongs to NCAR. In many cases, identifying and isolating problems is the responsibility of the Operations Section, even when the ultimate solution must be provided by vendors. Training a large staff to properly handle the variety of equipment is difficult and keeping current with the software changes that affect operation is a continuing prob- lem.

Gateway functions fall into the following areas: front end machines, remote job entry machines/devices, and port selection equipment. Systems performing combinations of these functions form the gateway into the NCAR network hosts. Figure 1 shows the NCAR gateway system. The configuration of the IBM 4341 and the MODCOMP II is shown in Figures 2, 3 and 4. These are included for refer- ence as they are the major remote user gateway systems.

At the end of the gateway chain are front end machines. These machines remove the burden of peripheral device handling from the large-scale processors, sim- plifying their system software and eliminating device overhead. Such peri- pheral devices include standard one-half inch magnetic tapes, card readers, printers, microfilm and mass storage. Front end machines provide storage and archiving for user programs and data. They are a convenient location for background queues for the large-scale machines. They offload the editing and file and job preparation tasks, leaving the large-scale processors for comput- ing. There may be some preprocessing of data in the front end operation before jobs are submitted to large batch machines. Postprocessing of data is also part of the front end function.

Outbound from the front end machines are the gateway processors. Typically, gateway machines perform functions related to the connection of terminals or minicomputes behaving as terminals, to the internal network to front end machines. A gateway device may be nothing more than a terminal controller or concentrator. It may be a minicomputer that handles a number of terminal types, interprets the terminal protocol and provides for the data and protocol conversion that will allow data to migrate into the local network. Such a minicomputer allows flow control problems to be gracefully handled. Flow con- trol is defined as the resolution of the situation, when faster devices in a network overwhelm slower devices with the amount of data transferred in any given time. After completing the gateway functions, a user's information has -2- been integrated into the formats and files needed to be processed by a front end processor.

Gateway machines communicate with terminal classes denoted by the SCD as Remote Terminals and Remote Computers. The former uses asynchronous proto- cols. The latter use a number of synchronous protocols.

Asynchronous protocols are teletype-compatible, transferring one character at a time. Each character is encased in start/stop bits and a parity bit. Typi- cal data transfer rates of asynchronous protocols are 300 and 1200 baud on dial-up phone lines. Interactive ASCII terminals use asynchronous protocols.

Synchronous protocols transmit a block of characters. Each block is encased in control information. Transfer rates run from 2000 baud to 9600 baud and beyond. NCAR supports 3 synchronous protocols: Binary Synchronous, Control Data UT200, and HASP. Binary Synchronous is a line bid protocol allowing either the gateway or the terminal to start a transmission. UT200 is a pol- ling protocol managed by the host. HASP utilizes Binary Synchronous to pro- vide multiple interleaved data streams on a single line and is used to cmun- icate with IBM front end machines. Historically, these protocols existed on specially built, remote job entry terminals, such as the Control Data UT200, or the IBM 2780 or 3780, but now these are generally emulated by minicomputers and the terminal name has become the protocol name.

The gateway machines' ports are connected to a port selection device (PSD) which allows an incoming line to select any valid gateway machine. It pro- vides control when more incoming lines exist than there are ports on a gateway machine. This reduces costs since ports on the PDS are less expensive than gateway ports. The PSD allows the load on each gateway machine to be regu- lated by changing the ratio of lines to ports.

Figure 1 shows the configuration of front end and gateway machines. Four machines marked with an * serve NCAR in-house interactive front end needs, The MODCOMP II and the IBM 4341 provide gateway capacity to the university community as well as to NCAR. The other minicomputers shown provide special- ized interactive services to NCAR divisions. These divisions also operate minicomputers off the mesa site that are linked to the central complex only by dedicated RJE lines.

The MODCOMP II currently has 32 ports. During calendar year 1981, we have tried to add 10 more ports, 6 of which would be synchronous and 4 asynchro- nous. This upgrade will allow 9600 baud service, but has not yet been com- pleted. All of the additional ports would be dial-up and the service among different baud rates and protocols has not been determined.

The following table shows the distribution of ports: -3-

Table 1. Port Distribution rate/protocol

300A 1200A 2000B 2000B 2400B 2400U 4800B 4800U TOTAL PORTS

Dedicated 0 0 0 0 1(NMC) 1 7 3 12

Dial Up 4 4 2 3 1 1 2 3 20

The dedicated ports serve only one site. Two serve university sites. Five are connected to NCAR interactive minicomputers. Three of these computers are connected directly to the NCAR local computer network. These lines could be released for other use.

The MODCOMP provides no interactive or other front end functions, but it stores an entire job on a disk and forwards it when successfully received. The state of the workload on the 7600 is periodically sent to the MODCOMP and if the MODCOMP detects that a job has been lost, it resubnits the job. The MODCOMP's ability to accept and store an entire output file insulates the user from backend instability and lost output. Sometime during FY82 and before the CDC 7600 is removed, the MODCMPT will placed on the local network. At that time, the MODCOMP will be a direct gateway to the CRAY. The automatic job resunbittal will be removed before the MODCOMP is on the network.

There are a number of unanswered questions about the continued use of the MOD- COMP as a gateway. Relatively easy questions include: what services should be provided by the MODCOMP's additional ports? Should we allow any more sites to use the UT200 protocol? More difficult questions include: When will we remove the MODCOMP? Should we release it before or at the time of the 7600's removal? In the past, users have overwhelmingly approved its retention, yet at some point it should be scheduled for removal due to its age, especially the age of its peripheral components. Moreover, the vendor is raising the service rates in an effort to induce NCAR to retire the equipment.

The IBM 4341 provides both gateway and front end capacity (see Figure 2 for the hardware configuration.) Gateway functions are performed through the 3705 terminal controller and the IBM Series 1 and PDP 11/34, both of which emulate terminal controllers. There are 16 synchronous and 32 asynchronous ports available on the 3705. One of the synchronous ports communicates with the PDP 11/34. The PDP 11/34 and the IBM Series 1 provide a total of 48 additional synchronous ports that operate non-IBM terminals in full screen mode and pro- vide the functions found on IBM 3277 terminals. Remote terminals connected to the asynchronous ports of the 3705 generally will be of the teletype class and will be limited to single line mode.

Remote computers connected to the synchronous 3705 ports will communicate with a virtual machine on the IBM 4341 that is running the Remote Spooling Control System (RSCS) provided by the vendor. This normally supports remote computer sites permanently assigned to a given port. The SCD has modified RSCS so that no fixed port assignment exists. The available ports may now serve more users. RSCS will place an incoming job into the Spool File System and when -4- the job has entered the Spool File System, it will be turned over to a virtual batch machine for further processing and transmission, if necessary, over the network to the back end systems.

The IBM 4341. also provides a full range of interactive services in addition to front ending. Plug compatible upgrades to this machine will allow us to extend such services to our ,wholeuser community. Only the cost of telephone service will impede further service.

The CAR local network provides a connection mechanism for attaching addi- tional gateway/front and systems to the complex. The major connection con- straints are the candidate systems' ability to accept the network software, the proximity to existing trunks and a network adaptor, or an available port on an existing adaptor. Complete software packages exist for the following hardware/software systems:

PDP 11/70 under UNIX

PDP 11/34 under RSX1]M

SEL 32/35 under RIM

SEL 32/55 under RIM

IBM 4341 under VM/SP CMS

Other systems can be incorporated into the local network as gateways with six to nine months of effort, depending on the implementor's knowledge of both the network and candidate system. The network software is a fairly portable high-level language program. Quite a few system-dependent routines support it, but these have fixed specifications and are not complex. The network software runs at user level and thus makes minimal system demands. To con- serve resources, the Network Control portion is swapped out in the absence of user activity and only a short 'keep alive' routine is needed in memory.

At this time, two other NCAR divisions have gateway machines on the network. The first of these is the SEL 32/35 at the Atmospheric Chemistry and Aeroncny Divison (ACAD). It has been on the network since January 1979 with some interruptions as the software evolved. The second machine, a PDP 11/70 at HAO, was connected in the summer of 1981. Micraoave radio equipment is avail- able and could extend the NCAR local network to the 30th Street facilities. However, this would cost about $200,000 and has not been budgeted at this time. -5-

Figure 1. NC R Gateway System

to back-end systems

nterectiv front-end need.

Octobe 1981 - 6 - FigureCR 2. 4341 -6M -ai Figure 2. !CAR ImI~ 4341 Hardware ~igur~at

Terminals

Terminals October 1981 -7-

Figure 3. m P Cnfiguration

2 Po T Int. DMP: Cycle Steals Takes Approx. 2.4 jp Up to 64 Ports Ext. DMP: Does Its Own Address Updates. May Take Up (2PsModemtor d-wid) to 0.8 /4 Depending on Memory Addres DMI: Cycle Time of 0.8 as

October 1981 -8-

Figure 4. WAR tedical bntrol Station (TCS)

Telephone Telephone Protocol Une Front End Modeme DAA's site Area Number iNumb MODCOMP Interfaces Ports DSgitlMIP'S AaoLMIP'S ' &Boud Rotary

_ CHO-1 |- 499-6242 200UT,2400obp - - ·I 1934 CHi--- ULDS2010C 12 Inlrrtefc CH 23 -- | IIP 499-4334 200UT,2000bp

499-4335 20oUT,2,000bp 31 o 1934.r--- ICC20/LSI -2- MIP -. 10010 - II rotr(-- Inct CHt 499-4336 200UT,2000tp I-7 il* - -

CH 8-9 494 3600 Boynch,4800bp ...... - ., -,...... I 1,934 ------BELL2088 5 J-.. --~--- -1- ---- ~ Sync C 10110II InCrtHce2 -c ICC20/LSI TLP Sit 13, CSU, Ft. Co#in tle 2FD83021 200UT.480bpp

NCARRDSS VAX, Boukldr Ik 29FDDC161274 Belynch.4800bp 1 GANDALF LDM404 113 . 8 CH S0CHS STD. Site 16. NMC. Boukder tlee 2GD6917 2400tbp .ntSr^c 141 BELL201C 7- - 16-17------400-4409 r 1931 1001D -Aync CH 18-19 499-4410 Interace T.MEPLEXTIMEPLEX202C f 2-- p2 _ (0010 499-4411 Aeynch,300bp I 1931------Aslync CH 22-23 499-4412 Asynch. 300bp y trnwiw ------TflMEPLEX103 D2 n -

499-4413 Aoyvnch.300b1Asvnch,300bpe ro

1931 M 10010 - vyn CH 26-27| IP 499-4414 2 40I uS g n --- TIMEPAEX103 14 ^-4--J 10010 ------1- I ! 499-6862 AsYnch.1200bpo C 9 I I I J~| ~ l *J -*71-~VADHC~ 3415 C H 3 0 3 1 499-6883 A ynch,200bpf II *I l| -X | "} IntfVADIC-1s - 3467 #1 I u j CH32-33 _ 499-6406 19~ |,CC 20oLSI8,6 499-8467 .-._ ...... asynchl2B.000pe rot 494-4016 200UT.400bp I q

)TERM, NCAR 6. Boudr herd-wie B-"--Svertme~ CH 5 6 - BELLCOMMLIUK2088 13 } .. ' . . 0. I C H 3 ~Intfe 381- < 201, SD INCt 494-4660 96097769 ' ... ',. .,...... 200UT.4 .. 80 pe I 1934

S, CSO,NCAR 09. Boulder se CH 4 -41. .. -M 60929 Boynch,4600p I 494-4029 2Q9lWT.4200bpe 1934 1934 ------^- BELL208A1BELL2088,3 12 RS2, NCARS17,- Bou - MM IG063528 lvnch.4e' .o~ 3

(_Svn___ CH 42-49 Stell,1 IS 51. NCAR 510. Bouelwr k^e" T 499 4386 etynch,2400bp 1934 0__CH 52-53« i ------Snt®7, 11 /70 B, NCAR 5,I Bouk" hrd-wry ec.4BObp -- T e Product. TP232 I1 ] S nc y CM 54-665------S .Ia8.11170 A, NCAR 16, Bouldw herd-wie Te ProductaTP232 t2

HAO, NCAR 11, Bould hrd-wir I 1934-" CHC6B0-691 ------__ Se-Sit--1-, AAP,PNCAR 5 12, Bouldr hard-wir 9G02 200IUT.480O0Np !CC COMMLINK 3 | W

8. McldM., U. of Wilconsin itee FcDEC.FOC19.114196114 Bvnch,4800OMP.B.nynch.,48000t 1934 ------? BELL208A 5 - ~bOctbch.4800er ti Irerfa; CH62-63 . _------T--- > _____ 494-4022 ''""'*''' -»------»-J BELL208- a4 =. . ----hon* 9sfctl ilrvioe Numbr: 629-6963 -9-

Issues

The format used by this workshop was to discuss a number of issues and vote on their relative importance. The issues, in order of priority, were: 1. Number crunching: This issue encompasses relief of saturation on the CRAY-1 and 7600 mainframes, and includes a highly computational capabil- ity, not merely more running of models.

2. Friendliness: This issue is defined as ease of learning the system and the required job control language.

3. Software compatibility: Since users are already familiar with the Cray Operating System (COS), this issue may be resolved by buying another Cray mainframe.

4. Memory size and mass store speed and capacity

5. Data transfer - tape to mass store

Other issues which were discussed but not voted to have a high priority were: * Network compatibility

* Interactive job preparation and output * Interactive computing

* Universal access - 10 -

1esults of Wting cn Issues

The votes on these issues were tallied as follows: Perceptions of issues critical to SCD mission on a !scale of 0 (low) to 5 (high) _ __ __L' _ _ ~~~~~ __ I I -^ I J _ _ Issue : 4 3 2 1. 0 Score Rank

- r I- r __ 5 : Z-_ - Number crunch -( $ _ 2 2 1 0 6 98 1

Software compatibility 0 7 6 5 2 6 58 3

Interactive capacity 0 1 3 3 2 18 21

Interactive computing 0 1 0 0 0 26 4

Friendliness 5 3 6 3 0 10 61 2

Network compatibility 0 1 4 1 4 17 22

Data transfer 3 3 2 3 3 13 32 5

Memory size, etc. 2 8 2 1 3 1.0 54 4

Universal access 0 0 0 2 3 22 7 Opticms

Given the priorities listed above, the following options remain open to SCD:

1. Do nothing. Keep the 7600. Get more software now but hold out for the CRAY-2.

2. Multi-minis. Procure a multi-mini farm to serve as a front end machine. Of the total work computed on the CRAY-1, 25% is roughly of the 2-minute or less variety and could be moved to scalar machines. 3. Non plug-comaatible. Get a non-IBM plug-compatible machine (another CRAY or other possibilities) to augment data number crunching and not worry about the interactive computing. Arrangements would have to be made to accommodate mass store and tapes.

4. Plug-compatible. Get an IBM plug-compatible machine such as IBM or AMDAtIL. This would relieve scalar computation and provide tape handling and mass store and interactive capability. - 11 -

Front-Erd Procureant

The following possibilities were discussed for the Front-End procurement, assuming funding of approximately $4.5 million:

1. Drive class VII machine

2. Manage mass store 3. Tape handling

4. Scalar computations

5. Large memory 6. Immediate off load of 7600 and 25% off load of CRAY-1 CN VI: DA S AN DISPLA

Data Inventories, Archive Preparation, and Access -- Roy Jenne

The Typical Data Life Cycle and Associated Computing Strategies -- Bob Lackman

Data Analysis and the NCAR SCD -- Gary Rasmussen Conference Recommendations I~Y~PIaWB~JRIE~ AWCB nrffVE~ c,~~B)A3CES

Roy Jenne National Center for Atmospheric Research Workshop III: Data Access and Display

We will present a brief summary of available data sets. Much of this informa- tion is summarized in Jenne (1981). At NCAR, we have over 100 datasets, many of which involve a number of tapes. Figure 1. is an example for upper air observations. Some of the problems, costs, and time delays involved in obtaining datasets will also be discussed.

In Jenne (1975), we discussed a number of the datasets that are available in the United States for meteorological and oceanographic research. More of the foreign data sources and additional data types are included in the report, "A Global Data Base for Climatic Research," (Jenne, 1981). In 1978 WM published a Catalogue of Meteorological Data for Research. The (WMO, 1977) report giv- ing statistical information on activities in operational hydrology has much information about the density of observing networks for rainfall, evaporation, and stream flow in the world. Karen Posey prepared a helpful report about selected NASA datasets (NASA, 1979). NOAA took abstracts of many datasets and entered them on tape and in the publication, Ropelewski (1980). WMO has obtained this tape and is also defining other data inventory activities. At NCAR we are still expanding our inventory of datasets. For our own data, selected information about the datasets and the tape or mass store volumes is online. Also, a brief listing of available datasets is kept up to date, and is available by writing to the Data Support Section at NCAR. For several datasets, there is also detailed information on tape about the availability of stations (or grids), by days, or months over many years. At the Users' Meet- ing, we will describe our information about the data. Comments from users are desired about the usefulness of online inventory information, and microfiche slides. Associated with each dataset, there are tape lists, and pointers to data access programs, inventories, programs that prepared the data, etc.

The strategies that we use to prepare and access sets of scientific data are described in Jenne (1981), but a brief summary will naw be given. We maintain some datasets in their original format. In other cases, we may simplify the format, change it to a cnomon format, and/or make it more efficient. For datasets having a large volume, we often choose a binary format to reduce com- puting time and storage volume. For high-volume data, we try to prepare sub- sets of data that are much easier to access. This may be a different organi- zation of the same data, samples of data, or averages. Usually there is not enough time to do a thorough quality control, but we do enough that many of the problems are removed. We then wrk with users to uncover and fix addi- tional problems.

The preparation and updating of archives takes much of our time. There are always many new sets that are needed, old ones that need improvement, and the updating problem tends to get worse as we acquire more sets. The investment -2-

of time in dataset improvement, and the use of higher density tapes and mass storage has helped us to stay afloat. We also take advantage of the efforts of others that result in datasets.

DATA ACESS

If a user has an approved project on the NCAR computers, he can access our data archives directly. In other cases, the data can be copied to tape at cost, and mailed to the user. During the tape copy process, data usually can be selected by date, type, and level. Selection criteria that would require format changes generally are not performed.

When data are used at NCAR, there is typically a simple access program which handles the data unpacking for each dataset. Usually the identification of a report is unpacked first, and the associated data field is only unpacked if it is needed. The Data Support Section also maintains some cammonly used subrou- tines for use in data interpolation and common calculations, such as interpo- lating a latitude-longitude grid from a polar stereographic grid or calculat- ing geostrophic winds. We also maintain selected display programs that use standard graphics routines to accomplish specific tasks. For data sets that are used infrequently, we probably will have only a simple access program. This is usually enough to get users started quickly on their work, because the learning time is almost nothing.

The cost of developing major software systems is usually high. After a system is developed, changes are usually necessary. Then the person making the changes must determine where changes should be made, and what the adverse interactions might be. This is often difficult even for the person who designed the system. Therefore, we usually offer selected modules, such as the data access routines, rather than a full system. For certain tasks such as inventory access, and interactive work, like McIdas, it is necessary to develop systems, but they should be as simple and modular as possible. Con- cerning machine efficiency, Jenne and Joseph (1978) describes the hardware constraints that make it better to store data in blocks than to try to directly access each small data element using pointers. In Jenne (1981.) a more detailed analysis is made of computing processes and costs. -3-

Figure 1: Upper Air Observatics

UA Observed Data from GTS (rawinsonde, etc.)

1960 1965 1970 1975 1980

US- NMC -- I I II Mar 62 Jun 66 Germany -- -I IL li -L IL Aug 66

Australia -I I I L -

UA Observed Data, not GTS

From MIT 1958 1963

Special Aircraft Data

Sadler - I 1960 1972

Air Quality Mar 75 AIDS Jun 74

ASDAR Mar 77

Balloons

S. Hem. TWERLE (winds, H) Jul 75 S. Hem. EOLE (winds) Aug 71 Dec 72

FGGE Tropical (winds) Jan 79

Rockets

- - I - I 0%0% needs work 1968

Satellite

Cloud Winds Jun 69 Oct 74 I R Sounders Nov 72 TIROS - N (IR and microwave) Jan 79 - 4 -

1. Jenne, Roy L., 1981: Global Data Bases for Climate Forecasting Research. Presented at the IAMAP Hanburg meeting, Symposium on Long Range Forecast- ing. Available from NCAR, Boulder, CO.

2. ,---,1981 draft: The cost and efficiency of data storage, data move- ments, and calculations in computing systems. NCAR, Boulder, CO, 22 pp.

3. ---- , (in preparation): The global data base for climatic research (A report to the TI4-ICSU Joint scientific committee for CRP).

4. -- , 1980: Strategies to develop and access large sets of scientific data. In Proc. Workshop on Frontiers in Data Storage, Retrieval, and Display, NGSDC, NOAA, Boulder, CO.

5. --- , and D.H. Joseph, 1.978: Management of atmospheric data. From NASA Conference Publication 2055: Engineering and Scientific Data Management, 261 pp.

6. ---- , 1975: Data sets for meteorological research. NCAR-TN/IA-].11. National Center for Atmospheric Research, Boulder, Colorado, 80303, 194pp.

7. NASA, 1979: Candidate NASA data sets applicable to the climate program. Goddard Space Flight Center, Greenbelt, Maryland.

8. Ropelewski, C., M. Predoehl, M. Platto, 1980: The Interim Climate Data Inventory, NOAA-EDIS, Washington D.C.

9. WMO, 1978: Catalogue of Meteorological Data for research, WMO, Geneva.

10. WMO, 1977: Statistical information on activities in operational Hydrol- ogy, WMO No. 464. Geneva. - 5-

THIE TEPICI L IMIA LIFE CCIE AN) ASS CIBD OOMPIS I E

Robert L. Lackman National Center for Atmospheric Research Workshop III: Data Access and Display

A TYPICAL IW LIFE CLF IE

Collection

Modern data collection systems are typically micro or mini-computer driven electronic systems which write the data on magnetic tape or transmit the data to some collection site for real time display and/or permanent storage.

With the recent giant strides forward in this area, the volume of data which has and can be collected is increasing dramatically, as shown in Figure 1. It is an example of the growth rate in data volume which occurred in RAF aircraft data when the development of the NCAR Electra turbulence system and associated mini-computer data collection system were installed.

Not only does the increased computing capability provide increased capacities, it also allows for more front end pre-processing of the data to increase the quality of the data collected. This reduces post-processing requirements and makes the data more suitable for real time analysis and display.

Verificaticn

Data are usually validated/corrected by user scrutiny and subjective replace- ment of perceived outliers, or they are validated/corrected by programmed algorithms which replace the outliers based upon objective statistical cri- teria.

Aacess

In real time systems data access is probably through such media as direct EM links for line of site transmissions with relay towers and satellites for long distance links. As satellite bandwidths continue to increase the capabilites of global real time data nets will do likewise.

Since any real time data network node can have data recording capability, the possibilities for more complete and extensive post-processing analyses of this data are also expanding.

Analysis

Data observations are used in many ways by the scientific researchers and analysts. Several examples of the use of these data are for:

1. Start-up data for computer models,

2. Direct observations of scientific phenomena, - 6 -

3. Direct verification of model results or predictions, and

4. Monitoring of equipment, events, and processeso

Sone of these analyses might be highly sensitive to inaccuracies in the data values. One of the most serious and least understood problems with observed data is that of data aliasing (Ref. 2). If the measured data is not electron- ically or naturally band limited to the range of the Nyquist frequency (1/2*sampling interval), distribution of the signal power is aliased into the true power spectrum. Attempts to interpret such a power spectrum can lead to serious "egg on the face".

Another oft misunderstood problem in data analysis is associated with data transformations. Any smoothing, interpolation, or averaging associated with a dataset is the application of a filter to that data. Any filter has an ampli- tude response function and a phase response function which should be measured, estimated, or at least understood by the researcher. Typical techniques used in data analyses (perhaos because they are easily visualized) are running and block averages for data smoothinq. In reality, a straight averaging window allows a data outlier to have an equal contribution across the full width of the averaging window. This is not the case for Bartlett, tHamming, or Hanning windows, which are also simple to implement.

The data analysts and research scientists must feed hack to the data collec- tion engineers the accuracies needed in the data observations. The researcher must then select and verify processes for their subsequent analyses which will support their conclusions at a statistically significant level.

Display

Once the analyst or researcher has acquired their data and configured analysis tools of sufficient quality, they need a means of presenting the data such that he, or she, and others, can easily visualize and assimilate the informa- tion. As the ideas, analyses, and relationships grow in scope and complexity, the means for communicating this information must be improved.

An example of a graphical. display which would be useful for studying the growth of a severe storm is a 3 dimensional display of the cloud structure which can be moved backward, forward, or be fixed in time. Also selectable by the researcher would be cross sectional views on adjacent screens which could be chosen for any plane, varied in the 3rd dimension and moved backward, moved forward, or fixed in time. Data from other sources such as balloons, drop- sondes, or aircraft tracks could be overlayed in alternate colors as a func- tion of time. It is assumed that the cloud formation patterns represent the optimal use of radar, satellite, and other data sources.

InterdtrmgeW

There is always a need for data to be shared by investigators at different computing sites or for data to be moved from one computing system to another even by an individual. Some computing systems label their magnetic tapes, some do not. Some swap bytes within words when the data is recorded on mag- netic tape, others do not. Various computers can have different bit lengths - 7 associated with a data value, different character sets, etc.

The two most frequently used data forms for porting have been: positively scaled integers and character data.

Some significant promise for improvement in this area seems to be possible by changing the emphasis from re-copying data into a more portable format, to shipping the data with portable software which is sophisticated enough to read data of diverse types. (Ref. 3)

Archival

Data derived from observations or analysis which is deemed to be of potential future use are generally entered into an archive data base. The storage dev- ice is usually magnetic tape or some mass storage system. Hopefully a catalog system is part of the data base and items are recorded in the catalog which describe the data and provide the means of access. With a little more luck, protection passwords and access/update statistics are also included. amWrriE REE0 E SE OP SCF A

Subsequent discussions pertaining to the software needs associated with the data life cycle will exclude the mini and micro-computer applications associ- ated with the data collection phase. Discussion emphasis will be on the sup- portive role to be provided by the NCAR Computing Division to their user can- munity.

There are data handling problems which translate into software requirements at each phase of the data life cycle. Some of the more familiar problems to data users include:

A. Acquiring and accessing the needed data,

B. Problems associated with scattered, missing, and invalid data.

C. Transforming of the data into the appropriate coordinate system. D. Merging of asynchronous data as well as data of diverse dimensionality, formats, and types.

E. Data interchange within the scientific community. F. Display of scientific data which provides:

a. Ease of use, b. Compression of the information into comprehensible form,

c. Multiple dimensions, field overlays, and field types, and

d. Time and space progressions of the fields. -8-

There are generally a number of computing options which can be pursued with respect to each of the problem areas. Some of these strategies are discussed below. An attempt is made to avoid judging 'best' approaches. The primary objective of this article is to stimulate discussion with respect to problem areas arnd the appropriate role of the Scientific Computing Division in addressing these problems.

Interactive vs Batch

The NCAR host machines have typically been batch processors. This has resulted from the view that the large complex models and problems to be run on these machines needed all available cycles whereas interactive operating sys- tems have been known to erode up to 40% of the machine capacity.

There are, however, many applications, such as interactive FORTRAN debugging, which are many times more efficient than their batch alternatives. To provide both capabilities the SCD has recently acquired an IBM 4341 system for local use and encourages remote users to link their own interactive systems to the central NCAR facility.

Some of the criteria which are useful in determini.ng whether an application is best done in batch or interactive mode are:

* Amount of data to be processed

* Complexity of algorithms

* Importance of integrity and consistency

* Availability of resources

* Most available resource, manpower, or computer

* Individual preferences

eal Tim vs Post Pr ssi

Although the SCD does not currently provide real-time processing capabilities, it has been involved with the scientific divisions in the collection of real- time data for their use. Real-time systems are available within the NCAR Atmospheric Technology Division for their Aircraft, Radar, and Portable Automated Mesonet data acquisition systems. Whether data are to be reviewed in real-time or through post-processing are affected by the:

* Amount of real-time resource available.

* Complexity of algorithms

* Need to support field operations vs research analysis -9-

Centralized vs Distribute Prooessing

This is an area which has been the center of considerable controversy in recent years. In my view the comparative evaluations of these 2 approaches are too often based upon physchological and emotional considerations rather than technical and efficiency considerations. One of the red hot issues is always access to and response from the system. Good response can be had from either approach. The key is under-utilization! In a costly highly scrutin- ized central facility the approach is usually to drain the last cycle from the system. In small locally distributed systems often the 1st shift has free time and shifts 2 and 3 are virtually unused.

Efficiency

A recent visitor to NCAR mentioned that his company had over 200 distributed mini-computers. If these machines are only running 30% utilization, they might represent the equivalent of the unused computing capacity of a CRAY. As far as staffing of distributed systems are concerned, the numbers expressed by the owners of such systems seldom include anything but dedicated staff. What about the operational and maintenance tasks performed by all of the users?

Local vs central management

Distributed systems, like local government, can be more responsive to local needs. Local management can more readily set and change project scheduling and priorities. On the other hand, it is easier for the general management of a large project to control that project through regulation of 1 central facil- ity, rather than several distributed ones.

Porting vs central sharing of tools and data

As was previously mentioned, porting of software and data is a problem area. Not only is there a porting cost, but multiple copies mean costly duplication in storage, management, and maintenance.

A positive aspect of porting is that 'fine tuning' for local needs is some- times possible.

Types of tools and facilities available

A central facility generally provides greater capabilities in hardware, local software, commercial libraries, etc., because the high costs associated with these items while beyond the means of a local distributed group are reasonable expenditures when spread across a large user base.

Distributed systems, on the other hand, often integrate specialized hardware tuned to a local need, whereas central facilities tend toward generic needs.

Simple Tools vs. Oplex Sofare Systems

The definition of a simple tool might be a small FORTRAN program that reads a dataset of a specific format from tape or disk into an array in memory. Another example might be an IBM 4341 Exec which allows a user to transfer a - 10 - named dataset from an archival device to the 4341 disk.

A complex software system attempts to provide the user with all of the capa- bilities which are needed to handle an entire class of problems. Some exam- ples of complex software systems are:

1. The statistical BMDP, SPSS, or SAS packages,

2. Any DBMS such as EsRL FRAMIS,

3. The NCAR GENPRO system now under development.

Availability of manpower

The simple tool approach is one in which the user breaks a major task or research goal into its smallest task elements, orders the subtasks, and proceeds I step at a time. If a tool is available for a subtask he can use it, if not he must write one or have it written. Each step is basically independent of the last one. This approach is manpower intensive and requires a high programming skill level.

Type of staff available

A complex software system tries to provide most of the capability that the user needs to accomplish some class of major tasks. The system usually has a significant interface which must be learned, but one which is generally not of a programming nature. Thus, scientists, students, and less experienced pro- gramming staff can effectively use such a package.

One-shot or long term requirements

The expense of acquiring and maintaining a complex system as well as the time spent learning the system must be traded off against the amount the package would be used. Moreover, there are many types of problems for which a com- plete software package is not available.

Classicl MS vs. the MD Appoch

Data Access

The classical Data Base Management System (DBMS) has a fixed data format requirement which requires that all raw data first be 'loaded' into the data base. In the GENPRO approach, the user describes the data formats through input data directives and the INPUT module decodes the existing format.

Functionallity

Most DBMSs have a fixed set of built in functions. GENPRO has an available set of functionalodules plus a module shell and recipe for assembling new functional modules. - 11 -

Data Quantities

Business Data Bases and most of the current scientific Data Bases are struc- tured for and loaded with a limited number of statistical events. Meteorolog- ical data, on the other hand, typically is measured over long time periods and often includes very large amounts of data. Thus typical DBMSs operate on memory contained datasets. GENPRO operates over any defined time interval, windowing the semi-infinite series through memory.

Workshop III of the second annual SCD User's Conference gives us an opportun- ity to detect and prioritize major problem areas in the scientific data life cycle.

We in the SCD hooe that your workshop participation will help us to focus on the areas you feel are the most critical, and where the concentration of our efforts might prove to be the most productive and helpful to you, our user community.

This article lists some computing strategies which might be taken to attack a list of problem areas associated with the life cycle of scientific data. None of these lists are believed to be all inclusive. Hopefully, however, they will provide a starting point for discussion and debate.

1. Kelley, N. D., and Lackman, R. L., 1976. A Study of the RAF Data Processing Project. NCAR Report, Boulder, Co. 80307

2. Blackman, R. B., and Tukey, J. W., 1958. The Measurement of Power Spectra. Dover Publications, New York, N. Y., pgs 117-120

3. Lackman, R. L., 1981. An Overview of the GENPRO Scientific Data Processor. NCAR SCD Publication, Boulder, Co. 80307 (Copies of this article will be available at the Conference or upon request) - 12 -

Figure 1: IF Anmual Output Data Volume firo 1971-1975.

1400

1200-

1000-

'U9 0 U 800- U 0-

U U U 600-

a 400 -

U U, U Ur 200 -

0 I·I ·r·-1C1LL, 1971 1972 1973 1974 1975 (Fiscal Year) - 13 -

IPnA AN;LYSIS AND OE CR CD

Gary Rasmussen National Center for Atmospheric Research Workshop III: Data Access and Display

1WA ANLYSIS AAND 1D E NDAR SCD

Data Analysis = ploration + ofirmation + . .

Before proceeding, we should define what we mean by "data analysis". We are concerned with "data" which arises in the course of research in the atmos- pheric sciences. Much of this data resides in the archives managed by the SCD Data Support Section, but by no means all of it. By "analysis" we mean, in the broadest sense, statistical analysis, including both exploratory analysis and confirmatory analysis. We are concerned moreover, with applications that require (or can profitably utilize) statistical techniques. An example of such an application is the objective analysis of meteorological. fields. This example typifies a central concern: use of both statistical ideas and scien- tific ideas in the analysis of scientific data.

Data Analysis = Data + . .

Data archives are essential for the advancement of scientific research. At NCAR we are fortunate to have extensive archives of atmospheric and oceano- graphic data.

Data Analysis = Work + Bardare + Softare + . . .

We buy, rent, develop, and borrow hardware and software tools to reduce our workload and increase our productivity. We must share these tools because they are expensive to buy, rent, and develop. SO's Role

SCD makes hardware and software tools available to NCAR and UCAR scientists. In the hardware and general mathematical software arenas we are doing pretty well. In the atmospheric science data analysis software arena we could be doing better. cuRRBL fCN/SCDNr JCTIVIIE IN SPORP FUR;ASIS The Multi-User Software Group (MUSG) is actively engaged in developing and testing data analysis software for atmospheric science applications.

Time Series Data Processin - GBiP

A comprehensive and highly modular data processing package for time series applications is currently being tested by a limited number of friendly users. An important class of applications handled well by GENPRO is the processing of research aircraft data. Further details are available from Bob Lackman of the MUSG. - 4 -

Socfbare fores le Data Analysis NCAR/SCD is collaborating with NOAA/PROFS and Drexel University to develop portable multi-user software for the statistical objective analysis of mesos- cale surface and boundary-layer features. This code will be made available to the community in the near future. Further details are available from Gary Rasmussen of the MUSG.

BT r RcK/Sn BE D OI N 9SOP~OT OP CF AN ZSI5S Centralization of some aspects of data analysis support can improve produc- tivity and decrease overall costs.

Data Analysis Sofbware E A modest effort is underway in the SCD Library Group (XLIB). In addition, the Data Support Section maintains a few simple data access and utility routines. We could focus these efforts by starting an Atmospheric Science Data Analysis Software Exchange, by actively soliciting contributions and by issuing user guides and catalogs. The goal of this exchange would he to provide a central depository in the atmospheric science community for data analysis software. Contributors of software wul.d be expected to provide at least a minimum level of documentation in a standardized format. The degree of SCD maintenance on these codes would depend on their level of use and how easy they are to main- tain.

Data Analysis Softare Ew anveint

A great deal of very worthwhile software is written with a single user in mind. The average scientist is not a specialist on multi-user software and typically does not write portable, maintainable, robust multi-user code. Often, however, he does develop software which would tremendously benefit the community, but only if it were enhanced. We could provide the necessary skills and expertise and sane (but not all) of the manpower for worthy codes.

Data Analysis Sftware Specialist Spport Every scientist must, to some extent, be a data analyst and a code developer. However, very few can be data analysis software specialists. It has long been recognized (e.g., Tukey, 1962) that in every science there have to be some scientists who devote much of their efforts analyzing data and interpreting the results of statistical analysis in light of scientific knowledge. We might add, that in every science there also have to be experts in developing multi-user data analysis software. Typically such individuals are actively engaged in consulting activities as well as in data analysis and code develop- ment. We could provide the data analysis expertise for both consulting and multi-user code development.

MuE IDEVC/SCD S CAN DO

Cooperative efforts can benefit all of us. - 15 -

OMEn icaticm is setial

We need your advice and many of you could profitably use ours. How can we communicate more effectively? This User's Conference is a good, albeit infre- quent forum for exchanging ideas. What other channels of communication should we develop?

Fbocsing Our Attention

What do you think is needed in data analysis software? - in data analysis consulting? What are your priorities? What do you think ours should be?

Sharing the Fruits of Yur LaT rs - and Saoe of the Laor

We believe that you have or know of code that others can profitably use. Will you help us identify worthy code? Will you help us enhance it by consulting with us and perhaps providing some of the labor?

1. Tukey, J. W., 1962: The future of data analysis. Annals of Mathematical Statistics, 33, 1-67. - 16 -

The following recomnmendations are made in descending order of priority:

Exploration of existing and coming observational tools such as satellites and radar imply enormous volumes of data. A large data processing engine is essential. Many data processing functions can and should be distributed but a major resident number-crunching and data-handling capability should he co- located with the data archives.

InterchanWe of IDatases

Successful interchange must be grounded in good data management and planning prior to the field phase. Given this, the role of SCD should be to make software utilities available at an appropriate level to facilitate the exchange of datasets. SCD needs to make available to the community, and encourage by example, formatting conventions and basic utility software to make data exchange easy. This could be accomplished with a set of translation routines in an easily-read format.

Archival of Data

The NCAR SCD Data Support Section has already demonstrated very well the type of leadership which is necessary to the community and appropriate to NCAR. As datasets and analyses of suitable availability and high national interest become available in new areas, the archival activities should be extended to include these.

Archival Inersicn

Frequently, the atmospheric simulation models run at SCD create large history files of data that contain variables in three-dimensional space and time. These files are often easy to access along a one- or two-dimensional axis but very difficult and expensive to access along the other axis. For example, time studies of a variable often demand handling the whole file in order to get a small subset of the data. Software for the creation and later access to these files needs to be developed in order to provide easier and more effi- cient access to selected portions of the files. SCD needs to study the methods and the feasibility of supporting appropriate software. Inage Proessing

Systems like Mclcdas have proven to be powerful tools in the analysis of satel- lite and field program's data. While image processing systems are very project-specific, SCD should study the cormon properties of these systems and study the possibility of providing a basic system to groups interested in developing image-processing tools. - 17-

rapics Needs

The dd80 needs to be replaced in the very near future. The ability to optim- ize the following items separately is needed: quick turnaround, low cost and high quality.

Softare E dane SCD should study the feasibility of working with a journal publisher for dis- tributing application software. SCD should advocate contribution and certifi- cation of such software.