viewpoints

VDOI:10.1145/2935880 Thomas Haigh Historical Reflections How Invented the DBMS, a Foundation of Our Digital World His 1963 set the template for all subsequent management systems.

IFTY-THREE YEARS AGO a small team working to automate the business processes of the Company built the first database man- Fagement system. The Integrated Data Store—IDS—was designed by Charles W. Bachman, who won the ACM’s 1973 A.M. for the accomplish- ment. Before General Electric, he had spent 10 years working in engineering, finance, production, and data process- ing for the Dow Chemical Company. He was the first ACM A.M. Turing Award winner without a Ph.D., the first with a background in engineer- ing rather than science, and the first to spend his entire career in industry rather than academia. Some stories, such as the work of Babbage and Lovelace, the creation of the first electronic computers, and the emergence of the personal computer industry have been told to the public again and again. They appear in popu- lar books, such as Walter Isaacson’s recent The Innovators: How a Group of Hackers, Geniuses and Geeks Created the Digital Revolution, and in museum exhibits on computing and innova- tion. In contrast, perhaps because da- tabase management systems are rarely Figure 1. This image, from a 1962 internal General Electric document, conveyed the idea

COURTESY OF CHARLES W. BACHMAN AND THE CHARLES BABBAGE INSTITUTE. BABBAGE AND THE CHARLES BACHMAN W. OF CHARLES COURTESY experienced directly by the public, of random access storage using a set of “pigeon holes” in which data could be placed.

JULY 2016 | VOL. 59 | NO. 7 | COMMUNICATIONS OF THE ACM 25 viewpoints

database history has been largely ne- establish a “totally integrated man- glected. For example, the index of Isaa- agement information system.”8 This cson’s book does not include entries If any technology would integrate and automate all the for “database” or for any of the four was essential to the core operations of a business, ideally people to have won Turing Awards in with advanced management report- this area: Charles W. Bachman and Ed- rebuilding of our daily ing and simulation capabilities built gar F. Codd (1981), James Gray (1988), lives around digital right in. The latest and most expensive or (2014). computers of the era had new capabili- That’s a shame, because if any tech- infrastructures, ties that seemed to open the door to a nology was essential to the rebuild- it was the database more aggressive approach. Compared ing of our daily lives around digital to the machines of the 1950s they infrastructures, which I assume is management system. had relatively large memories. They what Isaacson means by “the Digital featured disk storage as well as tape Revolution,” then it was the database drives, could process data more rap- management system. un- idly, and some were even used to drive dergird the modern world of online interactive terminals. information systems and corporate The reality of data processing intranet applications. Few skills are of subjectivity in judgments about changed much more slowly than the more essential for application develop- “firsts,” particularly as IDS predated hype, and remained focused on simple ers than a familiarity with SQL, the concept of a database management administrative applications that batch the standard database query language, system. As a fusty historian I value nu- processed large files to accomplish and a database course is required for ance and am skeptical of the idea that tasks such as weekly payroll process- most and informa- any important innovation can be fully ing, customer statement generation, tion systems degree programs. Within understood by focusing on a single or accounts payable reporting. ACM, SIGMOD—the Special Interest breakthrough moment. I have docu- Many companies announced their Group for Management of Data—has mented many ways in which IDS built intention to build totally integrated a long and active history fostering da- on earlier file management and report management information systems, tabase research. Many IT professionals generation systems.7 However, if any but few ever claimed significant suc- center their entire careers on database system deserves the title of “first data- cess. A modern reader would not be technology: the census bureau esti- base management system” then it is shocked to learn that firms were un- mates the U.S. alone employed 120,000 clearly IDS. It became a model for the able to create systems of comparable database administrators in 2014 and earliest definitions of “data base man- scope to today’s Enterprise Resources predicts faster than average growth for agement system” and included most of Planning and data warehouse proj- this role. the core capabilities later associated ects using computers with perhaps the Bachman’s IDS was years ahead of with the concept. equivalent of 64KB of memory, no real its time, implementing capabilities operating system, and a few megabytes that had until then been talked about What Was IDS For? of disk storage. Still, even partially in- but never accomplished. Detailed func- Bachman created IDS as a practical tegrated systems covering significant tional specifications for the system tool, not an academic research project. portions of a business would have real were complete by January 1962, and In 1963 there was no database research value. The biggest roadblocks to even Bachman was presenting details of the community. Computer science was just modest progress toward this goal were planned system to his team’s in-house beginning to emerge as an academic the sharing of data between applica- customers by May of that year. It is less field, but its early stars focused on pro- tions and the difficulties application clear from archival materials when the gramming language design, theory of programmers faced in exploiting ran- system first ran, but Bachman tells me computation, numerical analysis, and dom access disk storage. that a prototype installation of IDS was operating system design. In contrast Getting a complex job done might tested with real data in the summer of to this academic neglect, the efficient involve dozens of small programs and 1963, running twice as fast as a custom- and flexible handling of large collec- the generation of many working tapes built manufacturing control system tions of structured data was the central full of intermediate data. These banks performing the same tasks. challenge for what we would now call of whirring tape drives provided com- The details of IDS, Bachman’s life corporate information systems depart- puter centers with their main source story, and the context in which it arose ments, and was then called business of visual interest in the movies of the have been explored elsewhere.2,6 In this data processing. era. Tape-based processing techniques column, I focus on two specific ques- During the early 1960s the hype and evolved directly from those used with tions: reality of business computing diverged pre-computer mechanical punched ˲˲ Why do we view IDS as the first da- dramatically. Consultants, visionaries, card machines: files, records, fields, tabase management system, and business school professors, and com- keys, grouping, merging data from ˲˲ What were its similarities and dif- puter salespeople had all agreed that two files, and the hierarchical combi- ferences versus later systems? the best way to achieve real economic nation of master and detail records There will always be an element payback from computerization was to within a single file. These applied to

26 COMMUNICATIONS OF THE ACM | JULY 2016 | VOL. 59 | NO. 7 viewpoints magnetic tape much as they had done support for structuring data in tape what we still consider the core task of to punched cards, except that tape files but lacked comparable support a database management system. Pro- storage made sorting much harder. for random access storage. Harnessing grams could not manipulate data files The formats of tape files were usually the power of disks meant finding ways directly, instead making calls to IDS so fixed by the code of the application to sequence, insert, delete, or search that it would perform the data opera- programs working with the data. Ev- for records that did not simply repli- tions on their behalf. ery time a field was added or changed cate the sequential techniques used Like modern database manage- all the programs working with the file with tape. Solutions such as hashing, ment systems, IDS explicitly stored would need to be rewritten. If applica- linked lists, chains, indexing, inverted and manipulated metadata about the tions were integrated, for example, by files, and so on were quickly devised records and their relationships, rather treating order records from the sales but these were relatively complex to than expecting each application pro- accounting system as input for the pro- implement and demanded expert gram to understand and respect the duction scheduling application, the judgment to select the best method for format of every data file it worked with. resulting web of dependencies made a particular task (see Figure 1). It enforced relationships between dif- it increasingly difficult to make even IDS was intended to substantially ferent record types, and would protect minor changes when business needs solve these two problems, so that ap- database integrity. Database design- shifted. plications could be integrated to share ers specified record clusters, linked The other key challenge was mak- data files and ordinary programmers list sequencing, indexes, and other ing effective use of random access stor- could effectively develop random ac- details of record organization to boost age in business application programs. cess applications using high-level lan- performance based on expected usage Sequential tape storage was conceptu- guages. Bachman designed IDS to meet patterns. However, the first versions ally simple, and the tape drives them- the needs of an integrated systems did not include a formal data descrip- selves provided some intelligence to project called MIACS, for Manufactur- tion language. Instead of being de- aid programmers in reading or writ- ing Information and Control System. fined through textual commands the ing records. Applications were batch- General Electric had many factories metadata was punched onto specially oriented because searching a tape to spread over its various divisions, and formatted input cards. A special com- find or update a particular record was could not produce and support a dif- mand told IDS to read and apply this too slow to be practical. Instead, mas- ferent integrated manufacturing sys- information. New elements could be ter files were periodically updated with tem for each one. Furthermore, it was added without deleting existing re- accumulated data or read through to entering the computer business, and cords. Each data manipulation com- produce reports. With the arrival, in its managers recognized that a flexible mand contained a reference to the ap- the early 1960s, of disk storage a com- and generic integrated system based propriate element in the metadata. puter could theoretically apply up- on disk storage would be a powerful IDS was designed to be used with dates one at a time as new data came tool in selling its machines to other a high-level programming language. in and generate reports as needed companies. A prototype version of MI- In the initial prototype version, op- based on current data. Indeed this was ACS was being built and tested on the erational in early 1963, this was Gen- the target application of IBM’s RAMAC firm’s Low Voltage Switch Gear depart- eral Electric’s own GECOM language, computer, the first to be equipped ment by a group of systems-minded though performance and memory with a hard disk drive. A programmer staff specialists. concerns drove a shift to assembly working with a disk-based system language for the application program- could easily instruct the disk drive to Was IDS a Database ming in a higher performance version pull data from any particular platter Management System? completed in 1964. Calls to IDS opera- or track, but the hard part was figuring By interposing itself between appli- tions such as store, retrieve, modify, out where on the disk the desired re- cation programs and the disk files in and delete were evaluated at runtime cord could be found. The phrase “data which they stored data, IDS carried out against embedded metadata. As high- base” was associated with random ac- level languages matured and memory cess storage but was not particularly grew less scarce, later versions of IDS well established, so Bachman’s alter- IDS was designed worked with application programs native choice of “data store” would not written in COBOL. have seemed any more or less familiar to be used with This provided a measure of what at the time. a high-level is now called data independence for Without significant disk file man- programs. If a file was restructured to agement support from the rudimentary programming add fields or modify their length then operating systems of the era only elite language. the programs using it would continue programmers could hope to create an to work properly. Files could be moved efficient random access application. around and records reorganized with- Mainstream application programmers out rewriting application programs. were beginning to shift from assembly That made running different applica- language to high-level languages such tion programs against the same data- as COBOL, which included high-level base much more feasible. IDS also in-

JULY 2016 | VOL. 59 | NO. 7 | COMMUNICATIONS OF THE ACM 27 viewpoints

cluded its own system of paging data in was used for paging buffers by IDS’s Controller was built and installed at and out of memory, to create a virtual virtual memory manager. Weyerhaeuser, on a computer hooked memory capability transparent to the Requests from users to process par- up to a national Teletype network. The application programmer. ticular transactions were read from system serviced remote users at their The concept of transactions is “problem control records” stored and Teletypes without any intervention fundamental to modern database retrieved by IDS in the same manner as needed by local operators. Requests to management systems. Programmers application data records. Transactions process order entry, inventory manage- specify that a series of interconnected could be simple, or contain a batch of ment, invoicing, and other business updates must take place together, so data cards to be processed. The Prob- transactions were processed automati- that if one fails or is undone they all lem Controller processed one transac- cally by the Problem Controller and ap- are. IDS was also transaction oriented, tion at a time by executing the desig- plication programs. though not in exactly the same sense. nated application program. It worked Bachman’s original version of IDS Bachman devised an innovative trans- its way through the queue of transac- lacked a backup and recovery system, action processing system, which he tion requests, choosing the highest pri- a key feature of later database manage- called the Problem Controller. The ority outstanding job and refreshing ment systems. This was added in 1964 Problem Controller and IDS were load- the queue from the card reader after by the International General Electric ed when the computer was booted. each transaction was finished. team that produced and operated the The Problem Controller and IDS oc- The Problem Controller did not first production installation of IDS. cupied 4,000 words of memory. They appear in later versions of IDS but A recovery and restart magnetic tape took control of the entire computer, did provide a basis for an early online logged each new transaction as it was which might have only 8,000 words of transaction processing system. By 1965 started and captured database pages memory. The residual area in memory an expanded version of the Problem “before” and “after” they were modi- fied by the transaction, so that the da- tabase could be restored to a prior con- sistent state if something went wrong before the transaction was completed. The same tape also served as a backup of all changes written to the disk in case there was a disk failure since the last full database backup. The first packaged versions of IDS did lack some features later viewed as essential for database management systems. One was the idea that spe- cific users could be granted or denied access to particular parts of the data- base. This omission was related to an- other limitation: IDS databases could be queried or modified only by writing and executing programs in which IDS calls were included. There was no capa- bility to specify “ad hoc” reports or run one-off queries without having to write a program.a These capabilities did ex- ist during the 1960s in report genera- tor systems (such as 9PAC and MARK IV) and in online interactive data man- agement systems (such as TDMS) but these packages were generally seen as

a On reading this observation, Bachman noted “IDS came into use long before the notion of online, interactive users came into vogue. There is no record of anyone writing an IDS transaction processing application program that processed transactions that specified a query or report and returned the desired out- put. However, the capability of IDS and the Figure 2. This drawing, from the 1962 presentation “IDS: The Information Processing Machine We Need,” shows the use of chains to connect records. The programmer looped Problem Controller to handle such a query or through GET NEXT commands to navigate between related records until an end-of-set report specifying transaction programs was

condition is detected. clearly available. A missed opportunity!” INSTITUTE. BABBAGE AND THE CHARLES BACHMAN W. OF CHARLES COURTESY

28 COMMUNICATIONS OF THE ACM | JULY 2016 | VOL. 59 | NO. 7 viewpoints

independence, and program indepen- dence. It went beyond early versions of Calendar IDS was a strong IDS by adding security features, includ- product, in many ing “privacy locks” and “sub-schemas,” of Events roughly equivalent to views in modern respects more systems, so that particular programs July 4–8 advanced than could be constrained to work with de- MobiHoc’16: The 17th ACM fined subsets of the database. International Symposium on IBM’s competing Mobile Ad Hoc Networking and CODASYL’s definition of the archi- Computing, IMS that appeared tecture of a database management sys- Paderborn, Germany, tem and its core capabilities were quite Sponsored: ACM/SIG, several years later. Contact: Falko Dressler close to that included in textbooks to Email: [email protected] this day. In particular, it suggested that a database management system should July 5–8 support online, interactive applica- LICS ‘16: 31st Annual ACM/ IEEE Symposium on Logic in tions as well as batch-driven applica- Computer Science, a separate class of from da- tions and have separate interfaces. In New York, NY, tabase management systems. By the retrospect, the committee’s work, and Contact: Eric Koskinen 1970s report generation packages, still a related effort by CODASYL’s Systems Email: [email protected] widely used, included optional mod- Committee to evaluate existing sys- July 9–13 ules to interface with data stored in da- tems within the new framework,5 were ITiCSE ‘16: Innovation and tabase management systems. significant primarily for formulating Technology in Computer and spreading the concept of a “data Science Education Conference 2016, IDS and CODASYL base management system.” Arequipa, Peru, After Bachman handed IDS over to a Although IBM itself refused to sup- Sponsored: ACM/SIG, different team within General Elec- port the CODASYL approach many Contact: Alison Clear tric in 1964 it was made available as a other computer vendors endorsed the Email: [email protected] documented and supported software committee’s recommendations and July 10–13 package for the company’s 200-series eventually produced systems incorpo- HT ‘16: 27th ACM Conference computers. In those days software rating these features. The most success- on Hypertext and Social Media, Halifax, NS, Canada, packages from computer manufac- ful CODASYL system, IDMS, came from Sponsored: ACM/SIG, turers were paid for by hardware sales an independent software company. It Contact: Eelco Herder, and given to customers without an began as a port of IDS to IBM’s domi- Email: [email protected] additional charge. Later versions sup- nant System/360 mainframe platform.b July 11–13 ported its 400- and 600-series systems. SPAA ‘16: 28th ACM Symposium New versions followed in the 1970s The Legacy of IDS on Parallelism in Algorithms after bought out General IDS and CODASYL systems did not and Architectures, Electric’s computer business. IDS was use the relational , formu- Pacific Grove, CA, Co-Sponsored: ACM/SIG a strong product, in many respects lated years later by Ted Codd, which more advanced than IBM’s competing underlies today’s dominant SQL data- July 11–13 IMS that appeared several years later. base management systems. Instead it SCA ‘16: The ACM SIGGRAPH/ However, IBM machines so domi- introduced what would later be called Eurographics Symposium on Computer Animation, nated the industry that software from the “network data model.” This en- Zurich, Switzerland, other manufacturers was doomed to coded relationships between differ- Sponsored: ACM/SIG, relative obscurity. ent kinds of records as a graph, rather Contact: Matthias Teschner During the late 1960s the ideas than the strict hierarchy enforced by Email: teschner@informatik. uni-freiburg.de Bachman created for IDS were taken tape systems and some other software up by the Database Task Group of CO- packages of the 1960s such as IBM’s July 13–17 DASYL, a standards body for the data later and widely used IMS. The network UMAP ‘16: User Modeling, processing industry best known for data model was widely used during the Adaptation and Personalization Conference, its creation and promotion of the CO- Halifax, NS, Canada, BOL language. Its initial report, issued Co-Sponsored: ACM/SIG, in 1969, drew heavily on IDS in defin- b The importance of the database management Contact: Julita Vassileva, system to the emerging packaged software in- Email: [email protected] ing a proposed standard for database dustry is a major theme in M. Campbell-Kelly, management systems, in part thanks From Airline Reservations to Sonic the Hedgehog: to Bachman’s own service on the com- A History of the Software Industry. MIT Press, mittee.4 The report documented foun- Cambridge, MA, 2003 and is explored in detail in T.J. Bergin and T. Haigh, “The Commercial- dational concepts and vocabulary such ization of Database Management Systems, as data definition language, data ma- 1969–1983.” IEEE Annals of the History of Com- nipulation language, schemas, data puting 31, 4 (Oct.–Dec. 2009), 26–41.

JULY 2016 | VOL. 59 | NO. 7 | COMMUNICATIONS OF THE ACM 29 viewpoints

1970s and 1980s, and commercial da- sees himself above all as an engineer, tabase management systems based on retaining a professional engineer’s this approach were among the most IDS was a useful zest for the elegant solution of diffi- successful products of the mushroom- and practical tool cult problems and faith in the power ing packaged software industry. of careful and rational analysis. As he Bachman spoke memorably in for business use wrote in a note at the end of the tran- his 1973 Turing Award lecture of the from the mid-1960s, script of an oral history interview I “Programmer as Navigator,” chart- conducted with him in 2004, “My work ing a path through the database from while relational has been my play.”1 one record to another.3 The network systems were When database specialists look approach used in IDS required pro- at IDS today they immediately see its grammers to work with one record not commercially limitations compared to modern sys- at a time. Performing the same op- available until tems. Its strengths are more difficult eration on multiple records meant to recognize, because its huge influ- retrieving a record, processing and if the early 1980s. ence on the nascent software industry necessary updating it, and then mov- meant that much of what was revo- ing on to the next record of interest lutionary about it in 1963 was soon to repeat the process. For some tasks taken for granted. Without IDS, or this made programs longer and more Bachman’s tireless championing of cumbersome than the equivalent in a the ideas it contained, the very con- relational system, where a task such as cialized the two approaches were seen cept of a “database management sys- deleting all records more than a year for some time as complementary, with tem” might never have taken root. IDS old or adding 10% to the sales price of network systems used for high-perfor- did more than any other single piece every item could be performed with a mance transaction-processing systems of software to broaden the range of single command. handling routine operations on large business problems to which comput- IDS and other network systems numbers of records (for example, credit ers could usefully be applied and so to encoded what we now think of as the card transaction processing or custom- usher in today’s “digital” world where “joins” between different kinds of er billing) and relational systems best every administrative transaction is records as part of the database struc- suited for “decision support” analytical realized through a flurry of database ture rather than specifying them in data crunching. IDMS, the successor to queries and updates rather than by each query and rebuilding them when IDS, underpins some very large main- completing, routing, and filing in trip- the query is processed (see Figure 2). frame applications and is still being licate a set of paper forms. Bachman introduced a data structure supported and enhanced by its current diagramming, often called the “Bach- owner Computer Associates, most re- References 1. Bachman, C.W. Oral history interview by Thomas man diagram” to describe these rela- cently with release 18.5 in 2014. Howev- Haigh September 25–26, 2004, Tucson, AZ. ACM Oral c Hardcoding the relation- History Interviews collection. ACM Digital Library, tionships. er it, and other database management http://dl.acm.org/citation.cfm?id=1141882. ships between record sets made IDS systems based on Bachman’s network 2. Bachman, C.W. The origin of the integrated data store (IDS): The first direct-access DBMS.IEEE Annals of much less flexible than later rela- data model, have long since been su- the History of Computing 31, 4 (Oct.–Dec. 2009), 42–54. tional systems, but also much sim- perseded for new applications and for 3. Bachman, C.W. The programmer as navigator. Commun. ACM 16, 11 (Nov. 1973), 653–658. pler to implement and more efficient mainstream computing needs. 4. CODASYL Data Base Task Group. CODASYL Data for routine operations. Although by any standard a suc- Base Task Group: October 1969 Report. 5. CODASYL Systems Committee. Survey of Generalized IDS was a useful and practical tool cessful innovator, Bachman does not Data Base Management Systems, May 1969. for business use from the mid-1960s, fit neatly into the “hackers, geniuses, Association for Computing Machinery, New York, 1969. 6. Haigh, T. Charles W. Bachman: Database software while relational systems were not com- and geeks” framework favored by Wal- pioneer. IEEE Annals of the History of Computing 33, 4 mercially significant until the early ter Isaacson. During his long career (Oct.–Dec. 2011), 70–80. 7. Haigh, T. How data got its base: Generalized 1980s. Relational systems did not be- Bachman had also founded a public information storage software in the 1950s and 60s. come feasible until computers were company, played a leading role in for- IEEE Annals of the History of Computing 31, 4 (Oct.– Dec. 2009), 6–25. orders of magnitude more powerful mulating the OSI seven-layer model 8. Haigh, T. Inventing information systems: The systems than they had been in 1963 and some for data communications, and pio- men and the computer, 1950–1968. Business History Review 75, 1 (Spring 2001), 15–61. extremely challenging implementa- neered online transaction processing. tion issues had been overcome by pio- In 2014, he visited the White House Thomas Haigh ([email protected]) is a visiting neers such as IBM’s System R group to receive from President Obama a professor at Siegen University, an associate professor of information studies at the University of Wisconsin— and Berkeley’s INGRES team. Even National Medal of Technology and Milwaukee, and immediate past chair of the SIGCIS group after relational systems were commer- Innovation in recognition of his “fun- for historians of computing. damental inventions in database c C.W. Bachman, “Data Structure Diagrams,” management, transaction processing, Data Base 1, 2 (Summer 1969), 4–10 was very and .”d Bachman influential in spreading the idea of data struc- ture diagrams, but internal GE documents make clear he was using a similar technique d The 2012 medals were presented at the White as early as 1962. House in November 2014. Copyright held by author.

30 COMMUNICATIONS OF THE ACM | JULY 2016 | VOL. 59 | NO. 7