<<

Session 18 Automatic Programming

A GLOBAL VIEW OF AUTOMATIC PROGRAMMING

Robert M- Ba)zer USC Information Sciences Institute 4676 Admiralty Way Marina del Rey, California 9C291

Abstract into this domain and, In fact, the term "automatic programming itself was applied to This paper presents a framework for early systems during the 1950s. characterizing automatic programming systems In terms of how a task Is communicated to the What is needed, therefore Is to either system, the method and time at which the appropriately restrict the definition of system acquires the knowledge to perform the automatic programming, or to provide a set of task, and the characteristics of the resulting criteria used for measuring the acceptability program to perform that task. Jt describes or performance of an automatic programming one approach In which both tasks and knowledge system. Such a measure of system merit is about the task domain are stated in natural extremely hard to arrive at but would contain language In the terms of that domain. All the following man, machine, and system knowledge of necessary to applicability components: Implement the task Is Internalized Inside the system. The efficiency of the resulting program; Preface and Acknowledgement The amount of computer This paper represents the author's resources expended In arriving at persona I vIew of a globaI description of that program; Automatic Programming. This view resulted from the author's dlscussions wlth and The elapsed time used in suggestions from numerous colleagues on this arriving at the resulting program; area for which he Is deeply indebted. The amount of effort expended This paper Is a condensation of this by the user in specifying the task; view, taken from a larger work [1] which attempts to structure the field on the The reliability and ruggedness conceptualization expressed here. The of the resulting program; interested reader should consult this work, which describes the issues In greater detail The ease with which future and contains supporting evidence for the views modifications can be incorporated; summarized here. and finally. Introduction The range and complexity of tasks which can be handled by the The goals of automatic programming are system. deceptIve1y simple; namely, t he effective utilization of computers. This Implies both The Basic Model simplicity of use and efficient application of the computing resources. One goal of any automatic programming system Is to allow its users to state their Thus, automatic programming is the problems and any advice about Its solution In application of a computing system to the terms natural to the problem. Although most problem of effectively utilizing that or problems and their solutions can be most another computing system in the performance of naturally described in the terrnms native to a task specified by the user. Although this their fields, some can best be stated and/or is certainly what is meant by automatic solved In terms of a different field, such as programming, this definition does little to mathematics. Occasionally, this other field restrict the set of applicable computer is computer science, and the problem or Its systems included in the automatic programming solutlon is expressed in terms of data domain. All , operating systems, structures and their manipulations. Such debugging systems, text editors, etc., fall descriptions, in terms other than those of the problem domain, are entirely satisfactory as This research Is supported by the Advanced long as they are part of the conceptual Research Projects Agency under Contract No. repertoire of the user and are not DAHC15 72 C 0308, ARPA Order No. 2223/1, artificially Introduced to enable the system Program Code No. 3D30 and 3P10. to comprehend the problem or process Its solution.

494 We therefore treat both the native terms problem in terms of a model of the domain, of a Held and the terms of other fields which which obtains a solution for the problem in users have found useful to describe and terms of this model, and produces an efficient conceptualize problems and solutions in that computer implementation of this solution in field as the problem domain terms of that the form of a program. field. With this definition, we conjecture that the solution of every computable problem System Reguirements can be represented entirely In problem domain terms as a sequence, which may involve loops There are seven facilities to be provided and conditionals, of actions in that domain by, or criteria to be satisfied by future which affect a data base of relationships Automatic Programming Systems. The first is between the entities of the domain. Included an interactive system Interaction between either as part of the data base or as a the system and the user Is required so that separate part of the model. Is the history of the specification can be given incrementally the model (I.e., the sequence of actions and any errors or discrepancies that arise in applied to the model). This logically or from such specification can be handled as completes the model and enables questions or they occur. actions involving historical Information to be handled. In a strong sense, such a solution Along with this Interactiveness the Is a direct simulation of the domain. The system should be very forgiving. It should system models at each step what would occur In allow great flexibility in the way and time at the domain. which information is specified. It should also be forgiving by allowing the user to The important part of the above change or retract previous communications with conjecture Is that any computable problem can the system. be solved, and hence described, In problem domain terms. This enables us to divide the The second criteria is the amount of solution into two parts, an external and an non-proceduralness used In the specification Internal part. The external part Is the of the task to be performed. As far as problem specification given by the user In possible the system should be told what to do completely domaln specific terms. The rather than how to do it. There i s a requirements for such users is no longer a continuum between the statement of a problem comprehensive knowledge of computers, but as the transformation from an Initial state to rather the ability to completely characterize a goal state, and the specification of how to the relevant relationships between entities of perform such a transformation. Most of the problem domain and the actions in that computer languge development can be viewed as domain. In addition, such users should have a a movement from specifying HOW to do something rough awar eness of the problem solving towards a statement of WHAT is desired. The capability of the system so that they can level of non-proceduralness achi evable wlthin provide additional help where needed In the an Automatic Programming Sytem is directly form of more appropriate -actions, related to the system's capability of turning recommendations about the use of certain goals Into actions and this Is dependent upon act Ions, and/or Imperative sequences which fts knowledge of how to acleve certain results will solve part or all of the problem In in the problem domain. The ability of the problem related terms. system to achieve results in the problem domain Is used as the main distinction between The Internal part Is first concerned with non-procedural and procedural languages. finding a solution In problem related terms. Thus, problems must be stated in a language if this has not already been provided by the appropriate for that domain, I.e., one that user. Second, this part Is concerned with can express the structure and finding efficient solutions given the interrelationships of the entitles within that available computing resources. Such domain, and one that users are familiar with optimizations occur at two levels beyond what for discussing and describing tasks and is normally considered optimizatlon. First, problems within that domain. Only with such a at the problem level, recognition that certain language can the system know how to achieve entities and/or relation ships are irrelevant the desired results rather than being directly enables their removal from the model. Second, told how to produce the desired result. Some since only part of the state of the modelled actions can, however, best be described In domain Is required, and only at certain points terms of bow to accomplish them rather than by in the solutIon process, rather than the resulting state. Such procedural simulating the model completely at each step, descriptions are quite acceptable as long as the system can employ alternative they are specified entirely in problem domain representations which require less maintenance terms rather than implementatlon computer and which either directly mirror the required terms. part of the domain state or allow such parts to be computationally inferred. Such In addition to problem oriented representations may also enable more direct specifications, the amount of Information that solution of the problem. It Is these must be specified for the system to correctly optimizations which form the main distinction process the problem must be reduced. This can between t he code generation par t of an best be done by removing details from the Automatic Programming system and current state specification and allowing the system to fill of the art compilers. them in. As with non-proceduralness there Is a continuum here, but the level that is Thus, our definition of an Automatic desi red is one which omits from the Programming system Is one which accepts a specification all references to entities not

495 contained within the problem domain (and which handle and their method of obtaining this are not necessary for transferring knowledge knowledge. between users of the field). Specifically, artificial references to the data structures The Four Phases: An Overvlew of computer science (e.g.,lists, rings, arrays, symbol tables, and the like), that Automatic Programming begins with the have been Invented as a means of specifying application of problem solving to problem "how" rather than "what" to do must be statements rather than problem solutions; avoided. I.e., with the attempt by a computer system to obtain an understanding of the task being Next, a mechanism Is required for the specified. Once the task has been understood. modification of specifications that have been If it is not in process form. It must be previously entered either because they don't transformed into one. This Is the traditional work, or because the environment In which they area of Artificial Intelligence and human are operating has changed, or because program design. It must be verified that the df fferent behavior Is desired. Such resulting process model Is the one desired by specifications should be given In terms of the user and that It Is adequate for the changes in deslred results, and, entirely user's problem. If not, It must be modified within the language of the problem domain and transformed by the above steps and Itself. reverified. It must then be made Into an efficiently running program. This involves Once a problem has been specified, a the automation of the ad hoc knowledge of mechanism Is needed for Insuring that the computer science. system produced Is the one desired. This Is especially critical since the system will be A complete Automatic Programming system great i y augmenting the specifications. thus consists of four major phases- Problem Discrepancies between the desired system and Acqulsition, Process Transformation, Model the one produced can arise from an Inadequate Verification, and Automatic Coding. Problem knowledge of the system about the domain, from Acquisition Is the process by which the system a mis-statement by the user, or from a obtains (1) a description of the problem to be misinterpretation of something that was solved or task to be performed In a form specified. Whatever the cause. It Is processable by the system, and (2) the Important that the user can see the produced knowledge needed to solve the problem. The system In operation In his own terms. I.e., In result of this phase Is a well formed problem the language of the problem domain Itself, so and knowledge base which can be manipulated by that he can check the expected behavior the system and transformed Into a high level against the behavior produced. In addition, process for solving the problem during the the system should be able to generate test second phase. The third phase Is used to Input data so that a wide range of behavior of verify that this process is the one desired the task specified can be observed by the and that it is adequate for the problem user. If a discrepancy Is found, the user solution. The fourth phase, Automatic Coding, additionally requires capabilities to locate fills In the necessary details, optimizes the and Isolate the source of the discrepancy, and process, and produces the actual code to solve then, to modify it to obtain the desired the problem. result. The Automatic Programming Model After a correct program has been produced, a mechanism Is needed for One of the most striking and deep rooted transforming It Into an efficient one. Such features of the Automatic Programming model efficiency will rest on two kinds of presented here Is the Interface It creates informatlon. First, knowledge about the between a high level external specification of problem domain which enables alternative ways a problem which omits data structures which of performing the same task to be evaluated. are not part of the domain and the internal Secondly, Information about efficient ways to implementation of that specificaton In an utilize computers so that a total cost can be efftclent representation. assigned to each of the different ways of solving the problem In domain terms. implied, This choice of a basic Interface has but not stated In the above, Is that Automatic predicated large parts of the entire model. Programming systems don"t just automate Through this choice as the basic Interface programming. They also provide facilities wlthln the Automatic Programming model four which help the user move from an understanding Important gains are expected. of a task to be accomplished to a finished running system which performs that task. An First, the complete mode] conjecture Automatic Programming system Is a system which states that such a division Is feasible for aids the user In all the steps from problem stating and solving domain dependent problems. definition and design to final completed running programs. Second, since the choice of a data representatton and the mat ntenance of is To meet all the above criteria automatic consistency occupy such a large portion of programming systems require detailed Knowledge current programs, the size and complexity of about the problem domain. The requirement for spect fications without such representations this knowledge li mits the , system's should be drastically reduced over those which applicability to other areas, and hence, one retain these representations. measure of such systems Is the range of problem domains which they can adequately

496 Third, since so much detail has been We can now define the adequacy and removed from the specification It Is easier appropriateness of the models for an Automatic for. the system to understand what the task Is Programming system. A fregean model i s rather than getting lost In the details of adequate If it contains a complete enough what is going on. description of the relations between the entitles in the problem domain that a sequence Finally, since the problem has not been of operations or transformations on this model overspecified with a particular choice of can be built to solve the problem posed by the representation so that the probIem was user. This Is what was referred to as the expressible, the system Is now free to choose complete model In the Basic Model Section. a representation that will efficiently solve Thus, the adequacy of a model Is dependent the problem at hand. The system has been upon the use of that model to solve the given Increased flexibility in Its choice and problem. Operationall, this requires that may well outperform humans In correctly making the Automatic Programming system is capable of representation choices; not because the system finding the complete set of applicable Is more Intelligent than the user, but because transformations on the model and can calculate it can cycle through more possibilities and the consequences of each of these actions. bring to bear a greater level of effort In The appropriateness of the model is a measure such optimizations than any user Is willing or of how well suited the available able to Invest In such issues. transformations are to solving the problem at hand, I.e., an adequate model can be made more Problem Acquisitlon approprlate by addi ng to It non-primi tlve transformations made up of a sequence of The Problem Acquisition phase I s primitive ones, which are suitable building concerned with obtaining an understanding of blocks for the problem being posed. The model the users problem and the domain In which It rr,dy also be made more appropriate by including exists so that the Process Transformation recommendatlons about the suitablliy of phase can attempt to find a sequence of alternative strategies for sequences of model transformations or operations in that domain transformations. which will obtain the solution required by the user. Thus, the Problem Acquisition phase is Users can significantly reduce the concerned with building a model of the users well-known problem of building a powerful domain which represents the i nteractions general purpose problem solver by tailorIng between the entities of that domain and the the specified model to make It more effect on those entities by the allowed appropriate for the problem at hand. transformations or operations applicable within the domain. It is our primary The state of the art In natural language contention that only through the development understanding appears adequate for the of such a model of the user's domain can the description of problems and of models and Automatic Programming system have any degree beyond our ability to utilize the information of generality In the domains for which It Is thus obtained, and hence, should not be a applicable. bottleneck in an Automatic Programming system. Evidence for this viewpoint comes from the Currently, a] 1 such models of user work of Woods in the Moonrocks Program, from domains have been coded into a system. It Is Winograd in the blocks Descr i ption Program, proposed here that such models can be and from Martin Kay in the Mind System. Each specified to the system by Its users and that of these systems represents an alternative through these models the system can acquire linguistic technology and each Is capable of the knowledge necessary to solve problems handling a wide range of linguistic forms within these domains and to understand what Is within the domain of its competence. required for such a solution. The two main Issues, then, are what constitutes an adequate The basic viewpoint, then, is to process and appropriate model and how Is such a model the user's natural language communication with specified or communicated to the system. the understanding that It is meant to convey to the Automatic Programming system a model of There are basically two types of his problem domain. Towards this end the models,analogical and fregean. Analogical system can extract entitles and the models bear a strong resemblence to the relationships between them from the structure of the object being described, such communication. It can further query the user as the floor plan for a room, or the diagram as to the relationships between entitles which model used, by Gerlernter In his geometry have not, as yet, been explicitly specified proving program. but which have been inferred by the previous communication. Such inferences by the system Fregean systems, on the other hand, are about the Incompleteness of the model require linguistic or relation based, l,n which a sophisticated understanding not only of the expressions are built up on the relation communication but of the types of models used .between functions and arguments to those for problem domain specification. functions. unfortunately, our sophistication In both these areas Is quite limited. In Since one of the basic goals of the communication we need to be able to understand Automatic Programming system is the generality how Information Is ordered for presentation, of problem domains that It Is willing and how context Is established and utilized, how capable of handling, the fregean model the capabilities of the recipient effects the approach has been chosen. communication, and how these capabilities are perceived by the speaker. In modelling we

497 need to have a space of possible models, an Automatic Coding understanding of how the parts of a model Interact, a means for recognlzlng Automatic Coding Is concerned with incompleteness and inconsistencies In models, finding an efflet ent computer implementati on a means for obtaining all the aIlowed of the process description obtained from the operations on the model, and the means for proceeding phase. This description does not transforming the models with these operations. yet Include a choice of data representations, but does specify the major processing elements Process Transformation and sequences. It Is intended that this phase will not need any domain specific knowledge Our contention Is that the main activity except for input frequency and distribution In programming Is not finding a solution but information. The major logical representation In finding a solution which drops out the and processing decisions have already been Irrelevancles and which abstracts the made by the Process Transformation phase. necessary processing so that it can be efficiently implemented. It is recogni zed Of alI the phases in the Automatic that this Is a strong contention, but In most Programming System, the Automatic Coding one programming problems It Is felt that a is the one essential component of any solution is known and the main concern is In Automatic Programming System. Without it the finding a more efficient one. This Is not system cannot produce programs, and hence, optimization In the normal sense of the term. though It may be useful It Is not an Automatic The concern, rather. is with finding Programming System. irrelevancies in the complete model and representational abstractions based on the Most people are not truly creative when required processing of that model. Once these they reorganize sections of their program to logical representations have been found they increase efficiency. Rather than inventlng must be efficiently Implemented. totally new representations, they appear to select one out of an Ill-defined set of such The above contention. If true, greatly possible representations and to adapt and shifts the emphasls within the Process modify It to function In the current Transformation Phase from that of a general situation. This is probably the main problem solver solving problems In a domain challenge to the Automatic Coding phase, the Independent way to modifying a solution so ability not only to cycle through a set of that it does not maintain any irrelevant alternative representations but to adapt and portions of the complete model and which modify them to the existing situation. Such abstracts the relevant portions into a more an ability would vastly increase the effident representation for the process ing appl[cableness of a small set of alternative required. Together with Problem Acquisition, representations. the ability to find representational abstractions and transform complete model From such Automatic Coding studies, one solutions into ones which utilize these would expect to see both a set of heuristics representations represent the mat n and a calculus, eventually, for data technological deficiencies with obtaining an representation choices. Automatic Programming system. Summary and Conclusions Model Verification The definition of automatic programming Although the Automatic Coding phase will started with a goal, namely, reducing the produce only correct code. Program Testing effort required to get a task running on a cannot disappear. This Is because the Problem computer. From this a framework was adopted Acquisiti on Phase and the Process in which the external characteristics of Transformation Phase will undoubtedly employ a Automatic Programming Systems could be number of heuristies and may very well described in terms of: Incorrectly Interpret either the problem statement or the allowed transformations that 1. The terms in which the problem is can occur In the user's model. Because of stated; this, the user must verify that the system created is the one that he desired. 2. The method and time at which the system acquires the knowledge of the problem The technology for this is at hand. It domain; consists of today's methods wherein a test case is given to the system and its 3. The characteristi c of the resulting performance Is used to validate the model that program. it constructed. Additionally, the system can aid the process by generating test cases of The choices we made are; its own which probe areas of which It Is uncertain and which could have led to either 1. problem statement In natural language in misunderstanding or incompleteness in the terms of the problem domain. original model. One might also expect that program debugging would disappear, but for 2. Know I edge abou t the domain acquired very similar reasons It too will remain under i nteractively in natural language in Automatic Programming. if there is a terms of the complete model of the disparity between the user's model and the problem domaln. system's model, then the reason for this disparity must be obtained.

498 3. Resulting programs which are optimized 1. R. M. Balzer, Automatic Programming. with respect to data representations, Institute Technical Memorandum. control structure, and code. USC/Information Sciences Institute, September 1972. This approach requires significant advances In Artificial Intelligence techniques, in such areas as knowledge representation, Inference systems, learning, and problem solving, and In the codification of programming knowledge in the areas of data representations, aIgorithm selection, and optimization techniques.

Other choices could certainly be made, and the resulting systems might look very different. One particular set of choices, suggested by. Al Perils, represents a system based on a completely different paradigm. His system is predi cated on incremental growth from an accepted base, namely, . This Idea Is to Increase the declarative parts of the language at the expense of the procedural. The declarative parts are In turn replaced by a series of questions from the system which specify how a defined concept Is to be used. For instance, the concept "array* might generate queries to see whether the size was dynamic at run time, whether insertions or deletions are being done, whether the elements are homogeneous, and whether they are accessed sequentially. From such questions and the programs which use these concepts, an optimal representation can be chosen.

In this system, higher levels occur when enough example systems have been generated and understood by humans that someone can codify this knowledge and Introduce a new level of semantics and questions. There Is no doubt that such a system would be useful, and It has the appealing attribute that the facilities of the system can be incrementally expanded from a widely distributed and available base. Also, such a system could Instantly be used by existing programmers who could gradually learn to use the new facilities on top of their ability to use FORTRAN.

On the other hand. it Is not clear how far such a technique can be pushed, and whether this Is the best way to achieve the goal except In the short-term. Automatic programming systems, however realized, would substantially reduce the effort and training required by Its users and would enable the subsystem produced to more closely reflect the intents of their designers. So much time, money, and effort is currently being expended, and even greater amounts forecasted In the future, for the creation of software products, that the potential benefits from automatic programming systems are enormous. Therefore, since the required technologies seem feasible, such systems, utilizing either the approach outlined In this paper or various others, should be extensively Investigated.

References

499