Database Structures for Mathematical Programming Models Robert Fourer
Total Page:16
File Type:pdf, Size:1020Kb
Database Structures for Mathematical Programming Models Robert Fourer Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL 60208-3119, U.S.A. Abstract. In both the design and use of large-scale mathematical programming systems, a substantial portion of the effort has no direct relation to the variables and constraints, but is instead concerned with the description, manipulation and display of data. Established principles of database design do not apply directly to mathematical programming, however, because there are significant differences of organization and content between the data for an optimization model and the data for a conventional database application such as payroll or order entry. This paper derives fundamental principles of database construction for large- scale mathematical programming, by use of a steel mill planning model as an exam- ple. Alternative formulations for the model—which incorporate aspects of produc- tion and network linear programming—are presented at the outset, and are shown to correspond to relational and hierarchical database schemes that have contrasting strengths and weaknesses. A particular implementation of the steel optimization package is then presented as an illustration. A concluding section puts this work into perspective, by surveying and categorizing a variety of approaches for providing data management features in mathematical programming applications. The views of data offered by this paper’s approach are seen to differ substantially from the views offered by traditional mathematical programming systems, and certain “in- termediate” strategies for integration of database and mathematical programming software are identified as having particular promise for future work. Published version: Decision Support Systems 20 (1997) 317–344. This work has been supported in part by grants from the American Iron and Steel Institute, and by grant DDM-8908818 from the National Science Foundation. 1. Introduction The work recounted here grew out of a project to design an optimization package for steel mill planning. Because the project was supported by the American Iron and Steel Institute, it was to be based on a generic model—one that any particular steel company could specialize to its own operations, simply by supplying its own data. Users of the model would be concerned mainly with entering and maintaining their data, and with reporting the optimal production levels. In light of the data’s central role, it was decided to implement the optimization package in the context of a database management system. The user would thus enter the description of a steel mill as a collection of materials and facilities records of a prescribed structure. An associated linear program would then be automati- cally generated and solved, and the optimal values would be automatically left in appropriate fields. Finally, the results would be displayed as desired, by use of the database system’s varied reporting options. As the implementation of the steel optimization package has progressed, it has become clear that many of the underlying principles are of much broader applicabil- ity. Certainly, the generic linear programming model can describe other productive enterprises that transform flows of raw materials into diverse finished products, through a series of processing steps. There are equally important general principles at work, however, in the way that the database structure is related to the formulation of the linear program. The goal of this paper is to suggest some of the principles of database construc- tion for linear programming (and for large-scale optimization in general) by use of the steel optimization model as an example. Two likely formulations of the linear program are introduced initially, and are shown to correspond to relational and hi- erarchical database schemes that have contrasting strengths and weaknesses. An implementation of the steel optimization package is then described, and its presen- tation of the data is contrasted to the data views typically provided by specialized optimization software. In conclusion, a variety of promising generalizations and alternatives are discussed. The next part of this introduction briefly describes traditional connections be- tween mathematical programming and database management systems, and cites some notable previous studies and implementations. The remainder of the paper is then outlined in greater detail. 1.1 Background Large-scale linear programming applications optimize over hundreds or thou- sands of variables, subject to comparably large numbers of constraints. The main- tenance of these models has long been recognized to involve a substantial task of data management. Indeed, in both the design and use of a mathematical program- ming system (or MPS), a substantial portion of the effort has no direct relation to the objective and constraints, but is instead concerned with the description, manip- ulation and display of data. This holds true both for traditional matrix generation systems [2, 38] and for more modern systems based on algebraic modeling languages 1 [8, 18] or matrix block schematics [10, 47]. The data required by mathematical programming models is not typified by the payroll and order entry data that appear as examples in textbooks on database management [14, 45]. As described for example by Palmer [37], an MPS’s data tables tend to be smaller but more complex in structure, and tend to contain more numerical (as opposed to symbolic) information. Designers have responded to these features by creating data representations that are specific to MPS software. Most popular is a scheme of tables whose rows and columns are specially labeled [46], and whose entries may be maintained independently or may be generated automatically from a larger corporate database [30]. The characteristics of an MPS’s data are not so special, however, as to preclude the use of standard database structures for purposes of mathematical programming. The similarities between MPS data tables and relational database tables have been remarked upon by several authors, including Welch [46], Choobineh [11] and M¨uller- Merbach [34]. MPS implementations using the so-called network database model were investigated by Bonczek, Holsapple and Whinston [7] and by Stohr and Tanniru [44]; more recently, an implementation of an MPS that employs aspects of the hierarchical and relational database models has been described by Baker [1]. Other researchers (to be cited later) have proposed ways in which standard database concepts or software can be integrated with—or interfaced to—various mathematical programming systems. In some cases the database aspect takes the lead, while the ability to optimize is regarded as an added feature. In other designs the lead is given to the mathematical programming aspect, with the database being just one feature of an integrated MPS. A third approach employs both a general- purpose database management system and a mathematical programming system, which pass information between them. 1.2 Outline Section 2 presents two formulations of our linear programming example, which differ in how they describe the indexing of model components. Sections 3 and 4 then describe relational and hierarchical database structures, respectively, that correspond to the organization of data in the two formulations. Each structure has certain advantages and disadvantages, as explained in Section 5, with respect to ease of use, data storage, and data retrieval. The relationships between the linear programming formulations (in Section 2) and the database structures (in Sections 3 and 4) are not merely a fortuitous con- sequence of the application at hand. One can readily discern general rules that connect familiar algebraic characterizations of data to familiar database concepts such as records, fields, and keys. Sections 3 and 4 also codify these rules and com- ment on their application. The implementation designed for the steel optimization project is described in Section 6. Particular attention is given in this section to practical issues of speed and convenience, and to the appearance and mechanics of the user interface. As a complement to Section 6’s very specific focus, the concluding section com- 2 pares a variety of strategies for integrating database and linear programming soft- ware. Sections 7.1 and 7.2 survey ideas for the addition of linear programming features to database management systems, and for the incorporation of database features into linear programming systems, respectively; the latter also observes how traditional MPS’s differ fundamentally from typical database management systems in their ways of organizing data and presenting it to the user. Section 7.3 describes several promising options for intermediate levels of integration between database and optimization software. Different integration strategies are seen to correspond to different kinds of coordination between a modeling system’s algebraic description of data and a database management system’s data scheme. 3 2. Formulations We adopt the following general formulation that covers all linear programs (LPs) to be considered in this paper: n Maximize cjxj j=1 row ≤ n ≤ row Subject to li j=1 aijxj ui ,i=1,...,m col ≤ ≤ col lj xj uj ,j=1,...,n It will be of conceptual and practical value to investigate database structures that row row col are capable of representing any LP of this form, by storing all li , ui , cj, lj and col uj , along with all aij that are nonzero.