Symmetric multiprocessing algorithm for conceptual design

Eric S Fraga

Department of Chemical Engineering, University College London, Torrington Place, London, United Kingdom WC1E 7JE, [email protected]

Automated process synthesis can be a computationally demanding application. However, the recent introduction of affordable personal based on the symmetric multiprocess- ing architecture makes it feasible to consider parallel applications for everyday use. This paper describes a new algorithm for process synthesis which is based on multithreading and . The new algorithm has been implemented in Java and results for the computationally intensive problem of looking at dynamic behaviour in synthesis are presented.

1. INTRODUCTION Automated process synthesis is based on the solution of mixed integer nonlinear program- ming problems. Although a number of approaches have been taken [5], one approach that has been shown to be particularly successful for a large class of problems is the use of implicit enumeration with dynamic programming and branch and bound [1,2]. In particular, implicit enumeration techniques are efficient when implemented on multicomput- ers [3,4]. Distributed memory multicomputers are either not commonly available or are difficult to manage. Recently, however, we have seen the introduction of affordable symmetric multipro- cessing (SMP) personal computers. These systems have initially been made available in dual configurations although 4 and 8 processor systems are now appearing. The main advantage of SMP systems is the shared memory paradigm they implement. Shared memory encourages the development of new parallel algorithms. This paper describes one such algo- rithm for automated process synthesis. The implementation makes good use of the resources provided by SMP-PCs and therefore provides an affordable route for the use of parallel com- puting in industrial applications.

1.1. The Jacaranda system for automated synthesis The Jacaranda system is the implementation of an implicit enumeration algorithm for automated process synthesis. The use of implicit enumeration, combined with dynam- ic programming and branch and bound, together with the underlying discrete programming approach, yields the following advantages:

Problem formulation consists of the list of processing technologies available, the list of available raw materials, the desired product specifications and a set of ranking criteria for process selection. class Problem method init() best new List() end method method solve() while hasMoreElements() do node nextElement() node.evaluate() best.insert(node) end while end method method hasMoreElements() ... returns boolean method nextElement() ... returns boolean inner class Node method evaluate() ... end class end class Fig. 1. Object oriented framework for search procedure.

The use of discrete programming makes it possible to generate efficiently and simulta- neously a ranked list of solutions for each of the ranking criteria specified [6]. Jacaranda is ideal as an exploration tool in early design. The results of Jacaranda can sub- sequently be used as input for more rigorous and detailed evaluation although there are no limitations on the rigour or detail of the models used in Jacaranda. Figure 1 presents the pseudo-code for the search procedure. The problem object represents a particular sub-problem in the search graph. A sub-problem is defined by a set of streams and the desired or required properties of the solution. The solution procedure consists of generating a set of nodes corresponding to unit designs. The hasMoreElements and nextElement methods together implement the implicit enumeration aspects of the algorithm through the use of the unit models available for the problem. Unit designs may generate output streams and each output stream is used to define a new sub-problem. The procedure is therefore recursive and is based on a depth-first pre-order generation and traversal of the search graph. The use of discretization ensures a finite search graph and enables the use of dynamic programming. Jacaranda is written in Java and provides a generic object oriented framework for automated synthesis. The generic nature of the framework provides a basis for the implementation of a parallel version which should inherit the positive attributes of the sequential version.

2. THE IE-SMP ALGORITHM This section describes a new algorithm, based on an implicit enumeration approach, for imple- mentation on SMP systems. The aim is to retain all the positive attributes of the sequential ap- proach while enhancing efficiency through multiprocessing. Shared memory makes it possible to implement a which mimicks the behaviour of the sequential algorithm. Previous parallel approaches have been based on designing a new procedure for generating and traversing the search graph. Fraga & McKinnon [3] described a dynamic programming class Problem method init() best new List() class Node end method method doStep() method doStep() evaluate() while nodes available do for each unit output o do if hasMoreElements() then jobs.put(new Problem(o)) jobs.put(nextElement()) end for end if end method end while end class end method end class

Fig. 2. IE-SMP algorithm: Problem and implicit enumeration node classes. approach. Although scalable to large numbers of processors, the search graph generated was larger than required by the sequential procedure. An improved method was subsequently de- veloped based on a two step procedure in which a tighter search graph was generated before traversing it [4]. However, this method still suffered from the loss of the effect of pruning by the branch and bound procedure used in the sequential method. For small numbers of pro- cessors, the parallel implementation was less effective. The motivation for the new approach, therefore, is to preserve both the dynamic programming and branch and bound aspects of the sequential approach and be efficient for small numbers of processors. The new parallel algorithm is based on the same branch & bound depth-first pre-order traversal of the search graph. Parallelism is inherent in this procedure because there are often several units which may be considered for processing the streams associated with any sub- problem and because each unit often has a set of design alternatives for the available feed streams. Both aspects will be exploited as neither is sufficient. There are problems in which either there is only one type of processing unit (e.g. distillation sequence synthesis) or where units have few processing alternatives (e.g. in biochemical processes). To ensure load balance, an asynchronous multithreaded job queue scheme is used. A set of threads is created and each is responsible for both enumerating nodes in the search graph and evaluating specific nodes. Each retrieves a job from the job queue and executes one step of the job (via the doStep method). A step may create new jobs; these jobs are placed in the job queue and a link between the new jobs and the current job is created so that the current job can be notified when the new jobs have been completed. There are two types of jobs: ones that correspond to the enumeration of nodes and those which evaluate specific nodes. Figure 2 presents the pseudo-code for the new parallel algorithm. There is one parameter which controls the overall behaviour of the new algorithm: each new problem object has associated with it a set of implicit enumeration nodes. When the enumeration procedure identifies a new node in the search graph, a node object is retrieved from the set. In the sequential algorithm, there is essentially one node available for each problem object. By increasing this number, the is increased. By default, in the parallel implementation, the set consists of two enumeration nodes. P2 2 3 P2 2 3

P1 1 4 P1 1 4

1 2 3 4 1 2 3 4 Seq Seq

0 100 200 300 400 0 100 200 300 400 Elapsed Time (s) Elapsed Time (s)

(a) Linux version 2.2.5-15smp with IBM's JDK, (b) MS Windows NT 4.0 with Sun's JDK, ver- version 1.1.8. sion 1.2.2. Fig. 3. Timings for 3 component separation problem on a dual 450 MHz Pentium II system.

3. RESULTS The primary motivation for the use of is to reduce computational times. Historically, in process synthesis, simple models have been used to alleviate computational resource problems. Recently, however, the need for high fidelity modelling has become appar- ent, especially for design for controllability, operability and reliability. One of our interests is the design of operable processes and the application of synthesis procedures to this problem. Therefore, the sample problem comes from that area. We are interested in generating process flowsheets that exhibit good behaviour in a variety of aspects: economic performance (capital and operating costs), maximum deviation from steady state due to disturbances in the feed stream, and the time to steady state from start-up. Using the multicriteria feature of Jacaranda, all three aspects can be considered simultaneously. The need for dynamic models poses a significant impact on resources. Therefore, this problem is ideal for parallel computation. The implementation of the SMP version of Jacaranda is based on version 1.1 of Java. Although the majority of development has been undertaken on a Linux system, the resulting code is indeed write once, run anywhere, as claims. The Java language provides the basic requirements for defining, using and manipulating multiple threads within a single program. The basics for synchronisation are also provided. However, performance can vary between different architectures. To demonstrate the algorithm, we first consider a small three component separation prob- lem. Separation is based on distillation which is modelled using a rigorous tray by tray proce- dure. Figure 3(a) shows the task allocation for the problem both sequentially (the bottom of the three time graphs) and in parallel (the two top graphs, each of which corresponds to one processor on a dual processor Pentium II Xeon 450 MHz system). These timing results were generated using Linux 2.2.5-15smp as distributed with Red Hat 6.0. The parallel implementa- tion is not particularly effective at reducing the computational time involved. There are four instances of unit model designs in this problem: 1. separation of component A from BC, 2. separation of AB from C, 3. separation of A from B, and 4. separation of B from C, as indicated on the figure. The same task takes longer in the parallel version than in the sequential case. Analysis of the processor usage during the parallel run shows that the system seldom reaches 100% utilisation. It is important to note that the times recorded are wall-clock or elapsed times as opposed to actual CPU time. At first glance, one could assume that the reduction in computational efficiency (on a per processor basis) is due to the new algorithm’s implementation. For instance, the new algorithm does result in an increase in object management, purely on the basis of having a queue of jobs to handle. However, using the same hardware, we have solved the same problem using MS Windows NT and the results are shown in Figure 3(b). The sequential version is about 10% faster on Windows NT as compared with Linux. Version 1.2 of Java from Sun was used in Windows NT and version 1.1.8 from IBM on Linux. The interesting result is that the parallel version on Windows does make efficient use of the two processors. The elapsed time reduces from 350 seconds to a little over 200. We do not expect 100% efficiency (i.e. a reduction to 180 or so seconds) for a problem with so few tasks due to the coarse granularity of the task definition. The labelling on the graph shows that each task takes approximately the same amount of time either sequentially or in parallel. One possible conclusion is that the handling of threads is better on Windows NT than on Linux. This can be due to either the or the actual Java run-time system. Indications from the current work on the Linux kernel are that it is the former that is responsible for the loss of efficiency on Linux. The next version of the kernel for Linux is promised to be more efficient for symmetric multiprocessing systems. In fact, some tests with the latest development Linux kernel (version 2.3.33) show an improvement of approximately 15-20% over the current stable version. Further improvements are expected so it will be interesting to re-visit these results in the coming year. An interesting side-effect noticed on Windows NT is that one processor appears to be faster than the other. In particular, tasks on the "first" processor (middle time line in the figure) are completed in approximately the same amount of time as in the sequential approach. Tasks on the second processor, however, take a little longer (see tasks 2 and 3). There are two possible explanations for this and both are based on how the scheduler in Windows NT works: The scheduler may be biased to the first processor or the garbage collecting thread in Java may be allocated to one processor and remains there for the life of the . It should also be noted that the sequential approach does make use of the multiple processors available on the system. The Java runtime system uses multiple threads and so the garbage collection thread, for instance, will typically use the idle processor. A larger 5 component separation problem has also been solved. The timings for the NT version are shown in Figure 4(a). Again, the total elapsed time has been reduced significantly. However, it can be seen that the second processor is idle for a significant amount of time at around 1500 seconds into the problem. This is because the job queue has emptied and no new jobs are generated until the job currently being processed by the first processor finishes. One parameter that can affect this behaviour is the number of simultaneous implicit enu- meration nodes to generate for any given sub-problem. Figure 4(a) corresponds to the gen- eration of just 2 nodes for each sub-problem. If we increase this number, this will have the effect of increasing the number of jobs sitting in the job queue at any moment. Figure 4(b) shows what happens if we increase the number of available implicit enumeration nodes to 4. P2 P2

P1 P1

Seq Seq

0 1000 2000 3000 4000 0 1000 2000 3000 4000 Elapsed Time (s) Elapsed Time (s)

(a) With 2 simultaneous implicit enumeration n- (b) With 4 simultaneous implicit enumeration n- odes. odes. Fig. 4. Timings for 5 component separation problem on Windows NT.

The gap disappears and the overall elapsed time decreases accordingly. In general, increasing the number of nodes leads to a broader front through the search graph and thereby reduces the amount of pruning that may be possible. For the problem discussed in this paper, this is not an issue as the search is essentially exhaustive due to the multi-criteria nature of the ranking.

4. CONCLUSIONS Symmetric multiprocessing is increasingly affordable and provides an easy route for increas- ing the computational resource available on engineers’ desks. The use of Java, with its built-in support for multithreading, enables the use of multiple processors in a shared memory archi- tecture. Together the result is an easy to use parallel computing resource. Synthesis can be a computationally demanding application and the Jacaranda system has shown how it is possible to use SMP systems for interesting synthesis problems. Symmetric multiprocessing provides a shared memory architecture which makes it possible to implement parallel algorithms which inherit the positive features of the sequential algorithm. Java provides a good framework for multithreaded applications. The promise of write once, run anywhere is achieved in terms of code portability but not necessarily in terms of performance. Improvements in the underlying thread support in some systems is needed before multithreaded applications can be truly portable across the wide range of systems used in practice.

REFERENCES 1. Fraga, E S, 1998, Chem Eng Res Des 76(A1) 45-54. 2. Fraga, E S & K I M McKinnon, 1994, Chem Eng Res Des 72(A3) 389-394. 3. Fraga, E S & K I M McKinnon, 1994, Computers chem. Engng 18(1) 1-13. 4. Fraga, E S & K I M McKinnon, 1995, Computers chem. Engng, 19(6/7) 759-773. 5. Grossmann, I E, J A Caballero & H Yeomans, 1999, Korean J Chem Eng 16(4) 407-426. 6. Steffens, M A, E S Fraga, & I D L Bogle, 1999, Computers chem. Engng 23(10) 1455- 1467.