Theory of Files Non-Arithmetic Data Processing, Data Flow, Algebraic
Total Page:16
File Type:pdf, Size:1020Kb
absorbed by IFIPS. Also, the British Standards Institution re- (e) Development of other languages, e.g., the Algebraic Busi- cently circulated to many individuals in the U.S. a "Draft British ness Language. The v. Neumann-type data processing languages Standard Glossary of Tei'ms Used in Data Processing" for com- (i.e., languages in which procedures are represented as images of ment. flow charts) do not satisfy the needs of most non-arithmetic All these groups are working toward the same goal. They hope procedures. to clarify and establish the meaning of certain technical terms REFERENCES : used in the computing literature so that these words convey the GOLDSTINE, H.; NEUMANN, J. v. Planning and coding for an same meaning to the readers. electronic digital computer. The Institute for Advanced Study Publications, 1947. LOMBAROI, L. Theory of files. Proc. 1960 Eastern Joint Comput. Conf., Paper 3.3. Theory of Files , System handling of functional operators. J. ACM 8, (1961), Numerical Analysis Research Project, University of Cali- 168. fornia at Los Angeles, Calif. , Mathematical structure of non-arithmetic data processing Reported by: Lionello Lombardi (May 1961) procedures. Submitted for publication. , Inexpensive punched card equipment. In press, Journal of Descriptors: theory of files, flow control expressions, Machine Accounting. non-arithmetic data processing, data , Logic of automation of system communications. In prepa- flow, algebraic business language, yon ration. Neumann languages, sorting, Boolean , Coding in storage ~nd se~rching systems. In P. Garvin algebra (Editor), Natural Language and the Computer, ch. 14; in press, McGraw-Hill Book Co. In hardware design, the theory of files is used as a tool to analyze the features of data flow of the systems for which the equipment is designed. As a result, it is proved possible to formu- A Class of Search-Models for Machine Retrieval late a punched card data processing system considerably less expensive than any presently available. The logical features of Information Retrieval Project, IBM Research Center, this system, based on a novel organization of data flow, is de- Yorktown Heights, N. Y. termined by synthesis. Reported by: Eugene Wont (May 1961) Currently, the theory of files is being applied to the study of Descriptors: search, search-strategy, optimization, the flow of information through random-access fixed-plus-erasable access to mechanical storage, informa- memory systems in order to select a compact set of parameters which characterize the flow involved in any specific application, tion retrieval, stochastic models and then to formulate relations between these system parameters, A class of theoretical models for studying optimum search- the minimum storage capacity requirements of the systems, and strategies in one dimension, has been constructed with a view the optimum storage capacity distribution among the components. towards the eventual application of these strategies to machine Development and applications of the theory of files are carried retrieval. The basic assumption underlying these models can be out under the sponsorship of the Office of Naval Research. The briefly stated as follows: following results have been obtained: (1) The probability that tile object of search lies in the interval (a) Determination of a common pattern to which the coordi- (x, x+dx) is p(x)dx, where the density function p(x) is known a nated data flow conforms of any non-arithlnetic data processing priori. procedure involving files, such as machine accounting, access to (2) The search is conducted with constant speed v. stored information, dictionary analysis, etc. In particular, it is (3) It is possible to skip, i.e., to move from one point to another shown that the data flow configuration of any procedure involving without searching, with constant speed s. In general it can be n files can be fully represented at any time by means of only 5n+4 assumed that s > v. boolean variables (the indicators). The major feature of this model is a balance between the ad- (b) Development of a pattern of system language (the Algebraic vantage of always searching where the probability of success is Business Language) which allows for the use of logico-mathe- the highest against the disadvantage of frequent skipping that matical techniques to describe and control the data flow by means this may incur. This feature exists in many practical storage of specially designed boolean expressions (the Flow Control Ex- devices, e.g., magnetic tape. pressions). Without loss of generality any search procedure can be ex- (c) Description of processes in rigorous mathematical terms pressed as a sequence of alternately searching and skipping where the relevance of arithmetic operations is low and the main operations. Using the letters v for searching s for skipping, the problem is logical input-output coordination. Such procedures, search-sequence can be expressed as where a small amount of processing is performed on relatively ~) 8 V S V large amounts of information, include most business data handling Xo ) Xl ) :g2 ) X3 ) X4 ) etc. and document processing procedures. Now it is possible to repre- sent such processes as sets of operations defined in boolean alge- The object of the analysis is to find the sequence {x,~} which bras whose elements are files. optimizes the search in stone sense. For example, a complete k-way v. Neumann sorting procedure The model, that has been described, with p(x) = (a/2)e -"l~ is represented by the formula, and the criterion of minimum mean search-time T has been k, ( i+1 ) studied intensively with considerable success. Both parametric F~"+') = ~ F~m) , (i = O, 1,2 ... ; m = O, 1,2... ) optimization, e.g., let x~ = (-1)n(x0+n~) and find x0 and ~ which k*(i+l)] minimize T, and non-paremetric optimization have been achieved. where F~t) represents the sth ordered sequence of records avail- Analysis with other forms of p(x) and other criteria for optimi- able after t runs. zation are being undertaken. Modifications of the basic model to (d) Representation of procedm'es as sets of equations relating adalSt more closely to practical storage arrangements are also the input data to the output results. being considered. 324 Communications of the ACM .