Method for Coding Data on Microbial Strains for Computers (Edition Ab)
Total Page:16
File Type:pdf, Size:1020Kb
METHOD FOR CODING DATA ON MICROBIAL STRAINS FOR COMPUTERS (EDITION AB) Morrison Rogosa, Micah 1. Krichevsky, and Rita R. Colwell National Institute of Dental Research, NatIonal Institutes of Health, Bethesda, Maryland 20014 and Department of Biology, Georgetown Universl ty, Washington, D.C. 20007. TABLE OF CONTENTS SECTION 1: SECTION 2: SECTION 3: SECTION 4: SECTION 5: SECTION 6: SECTION 7: SECTION 8: SECTION 9: SECTION 10: SECTION 11: SECTION 12: SECTION 13: SECTION 14: SECTION 15: SECTION 16: SECTION 17: SECTION 18: SECTION 19: SECTION 20: SECTION 21: SECTION 22: SECTION 23: SECTION 24: SECTION 25: SECTION 26: SECTION 27: SECTION 28: SECTION 29: SECTION 30: SECTION 31: SECTION 32: SECTION 33: SECTION 34: INTRODUCTION Experience with computers in microbial identification and classification is now sufficient to warrant planning for the establishment of an international data bank for cultures and culture collections. This would be an invaluable resource to be used in many profitable ways. fact, the present population of computers with the available machine Inoperating and storage capacities and software( 1.e. programs and programing versatil ities) makes the concept of an international microbiological data bank feasible. The maximum data grid for present information can easily be handled by the models of computers available to major university and research centers in many countries of the worl d. A microbiological data bank, as a centralized repository of information of all types, should include historical, nomenclatural, classificatory, and epidemiological information for groups and clones of such microorganisms as bacteria, rickettsiae, chlamydiae, and viruses. The great potentlal value of a data bank outweighs the onerous task of transferring currently available data to an international computer center, The very magnl tude of the unf 1aggi ng i ncrease i n information produced daily in the outflow from thousands of laboratories in the United States and other countries of the world requires some concerted, unified effort to catalogue and file these new data for future reference, In tandem, of course, runs the need for constant updating of major microbiological reference works such as Bergey's Manual of Determinative Bacteriology (1957) and Index Bergeyana (1966). In spite of these needs, no solutions to the information explosion problem have been proposed on an International basis. In fact, Bergey's Manual is at present being updated by international committees, whose efforts, by present methods, can provide a volume revision and updating only about every ten or fifteen years because of the sheer volumes of information to be sorted, anal yzed and col 1 ated. As one example of the value of computer assistance, references to the literature on taxonomy of microorganisms can be located on computer at an international center for rapid call-up and consultation by any individual in any other country at any time. ldentif ication, classification and nomenclature of microorganisms requires the same diligence in literature search as other scientific activities and the lack of immediate access to pertinent literature on microorganisms as is available, for example, on chemical compounds, puts a major obstacle in the way of those microbiologists seeking to identify and classify microorganisms, whether for purposes of pure taxonomy or for such applications as medical diagnosis, A computer library with ready call-up to all or specific components of appropriately stored data would ineet these needs. Information is never since all data put into computer storage are retained and can be called up by appropriate query 6A ROGOSA, KRICHEVSKY, AND COLWELL (ed. AB) I JSBA INTRODUCTION to the computer at any time. The classes of information acceptable for computer storage include clonal data, with strain number specified where appl Icable, and morphology, phys 10 logy, b iochemi tr y, genet 1 cs, serology, ecol ogy, nomenclature, pathogenicity, and alls other data anticipated to be obtained by clinical and research investigators at the present time or in future. The Index Bergeyana (19661, for example, provides a format suitable for transfer to the computer by present keypunching operations without major modifications or rearrangements. That is, synonyms, as listed in the Index Bergeyana (19661, keypunched for computer entry, provide a cross-indexed 1 ist of taxonomic epithets. Siml larly, laboratory data record books may, in many cases, be amenable to immediate entry to the computer. Selection of data for a variety of purposes is possible with present programming versatilities. The kinds of questions whicn can be put to a data center are virtually unlimited. That is, requests for strain nomenclatural information, location of specific microbial strains In world repositories, or enumeration of specific strains with given biochemical properties, or a combined request for any or all of these, may answered within a matter of seconds If done by a computer system. Controlsbe centrally located, or computer-1 inked terminals at points remote from the computer installation, would provide instant access to computer memory storage. For example, an Individual investigator relays diagnostic data to the central computer unit from a remote terminal and requests identification of the organism in question. By any one of several internally programmed methods, the organism is identified. If insufficient data were provided, the list of additional tests necessary for positive identification would be relayed back to the investigator at the remote control terminal station, Thus, an investigator-computer dialogue ensues, constituting, in essence, a feed-back control system for diagnostic purposes. With these points in mind, it is timely to pay serious attention to the data gatherlng and recording procedures for efficient computer applications. From our combined experience in computer analyses of microbiological data, we are able to avoid the common pitfalls and errors in programming and data handling, In addition, a mechanism for transfer of data to a computer system has been derived which involves a minimum amount of time and labor. Finally, we have compiled a comprehensive set of criteria currently used in microbiology for descr i bing, ident ifyi ng, and class if ying microorgani sms wh ich compr ises the bulk of a microbial data storage system. The major sources of this information were Bergey's Manual (1957) and Skerman's Guide to the Identification of Genera of Bacteria The llst of characteristics provided by Skerman (1962) formed(1968). the backbone of the documentation. Pertinent monographs, reviews and individual papers were also consulted. Appropriate data format planning for a microbiological data vol. 21, 1971 METHOD FOR CODING MICROBIAL DATA (ed. AB) 7A INTRODUCTION center are underway which take into account general needs. Allowance is made for information as yet uncollected or to be gained in the future as research work in microbiology progresses. Sui table programs for handling literature references are already available in some labora tor i es (S impson, 1968 1. The documentat ion for computer storage, retrieval and anal ysi s of data descriptive of microbial taxa has been prepared in the form of a quest ionnai re. Questions have been generally designed to be test and taxon independent. There are difficulties in comparing results of some methods presently in use in many microbiological laboratories. The assumption has been made that, regardless of how determined, the test result is acceptable for coding so long as a method is reproducible and re1 iable within available knowledge and technology. Once the tests and test methods are entered from the files into the data bank, where the particular tests used give conflicting results, el ther in the same laboratory or among laboratories, the computer wi 11 be programmed to detect discrepancies and calculate efficiencies of tests and test methods. The verification or comparison of test results from contributing laboratories would be possible with this information as would be a calculation of indices of efficiency of various testing methods for specif ic characteristics. These points, incidently, represent two of the many advantages derived from computer-assisted microbial data storage and retrieval. Ultimately the problem of tests and testing methods will have to be the subject of a separate and intensive scrutiny. For the moment, simply keeping a file of the methods used in any given case will prevent loss of information on the specif ic tests used, Variability of a character of a strain is not scored because of the practical matter that most tests are scored in a binary or "YES" or fashion. Variability would be determined only by multiple entries for"NO" the same strain. Each individual answer Is assumed to be the answer to only one determination of a property. One could search the proposed data bank for a list of characteristics (or array of test results) required to identify or classify a set of microorganisms by requesting a minimum set of criteria required, a set for a given reliability, or indeed, by any other codable constraint on the search. Individuals wishing to analyze data by computer in an array compatible with internationally accepted testing schema would use the suggested format for their data input to the data bank, Comparisons with stored information would then be possible. The need for standardization of methods for collecting and reporting data is more urgently appreciated when considering