<<

Electric Power Systems Research 65 (2003) 169/177 www.elsevier.com/locate/epsr

Distributed with theoretic analysis of radial and looped load flows for power distribution systems

Fangxing Li *, Robert P. Broadwater

ECE Department Virginia Tech, Blacksburg, VA 24060, USA

Received 15 April 2002; received in revised form 14 December 2002; accepted 16 December 2002

Abstract

This paper presents distributed algorithms for both radial and looped load flows for unbalanced, multi-phase power distribution systems. The distributed algorithms are developed from tree-based sequential algorithms. Formulas of scalability for the distributed algorithms are presented. It is shown that computation time dominates communication time in the distributed model. This provides benefits to real-time load flow calculations, network reconfigurations, and optimization studies that rely on load flow calculations. Also, test results match the predictions of derived formulas. This shows the formulas can be used to predict the computation time when additional processors are involved. # 2003 Elsevier Science B.. All rights reserved.

Keywords: ; Scalability analysis; Radial load flow; Looped load flow; Power distribution systems

1. Introduction Also, the method presented in Ref. [10] was tested in radial distribution systems with no more than 528 buses.

Parallel and distributed computing has been applied More recent works [11/14] presented distributed to many scientific and engineering computations such as implementations for power flows or power flow based weather forecasting and nuclear simulations [1,2]. It also algorithms like optimizations and contingency analysis. has been applied to power system analysis calculations These works also targeted power transmission systems.

[3/14]. is usually referred to as Instead of using , a scheme of message computing carried out on a single machine with multiple passing was used. CPUs and shared memory. Distributed computing is This paper discusses distributed algorithms to calcu- usually thought of as a less ‘coupled’ computing carried late load flows for radial and looped power distribution out among multiple, separate machines without shared systems based on tree traverses. The algorithms are memory. implemented in up to 8 workstations distributed in an Parallel schemes for power flow based on sparse LAN. Ways in which this work differs from previous works are now considered: matrices have been investigated in previous works [3/ 9] for transmission systems. These works employed (1) Previous works presented scalability based on test results. This work not only presents test results of factorization to utilize sparse matrices in parallel scalability, but also presents theoretic scalability formulas computers. Power transmission systems were the target from the point of view of analysis (AA). AA of these works. The previous work [10] presented a [1,2] is an approach to explore the performance of parallel power flow method for radial distribution algorithms from the point of view of mathematics. systems. Like [3 /9], a Jacobin matrix was employed. Scalability, including and efficiency, is the most important measurement of performance for par- allel and distributed algorithms. It is a function of n and p, where n is the number of elements in a power * Corresponding author. Tel.: /1-919-807-5707; fax: /1-919-807- 5060. distribution system, and p is the number of processors -mail addresses: [email protected], [email protected] (F. Li). involved in parallel or distributed schemes. This paper

0378-7796/03/$ - see front matter # 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0378-7796(03)00021-X 170 F. Li, R.P. Broadwater / Electric Power Systems Research 65 (2003) 169/177 explores theoretic scalability analysis and estimates this work are based on the (sequential) algorithms in the scalability for the proposed distributed algorithms. previous work [15/19], the advantages of the tree-based Estimations will be checked with test results. approach flow naturally to distributed algorithms. (2) Previous works were based on dedicated parallel/ (4) Previous works focused on power transmission distributed architecture, while this work is based on systems or small distribution systems, while this work Ethernet LANs that are widely available. Some previous focuses on power distribution systems having a much parallel solutions of load flows were implemented in larger system scale. The number of elements (including dedicated and/or expensive high-performance parallel customer loads) in a large-scale power distribution computers [4/10], such as the Sequent Symmetry S81, system model may well exceed the number of elements iPSC/2, Transputer. Other previous work utilized spe- in a transmission system model. The previous works [3/ cific software like parallel [11/13] or 9,12/14] were tested on transmission systems with at specific virtual architecture [14] like ring and hyper- most two thousand elements. The previous work [10] cube. Here we use existing, generic computer networks handled small radial distribution systems with less than like Ethernet LANs, without any extra hardware or 600 elements. This work is tested with radial and looped software, to fulfill distributed solutions for load flows. distribution systems consisting of approximately 40 000 This makes the proposed algorithms very practical since elements. Ethernet LANs are widely available. The communica- Section 2 presents a general approach for distributed tion delay in a LAN is generally longer than in a specific algorithms of tree-based systems. Section 3 presents a architecture, but there is no additional cost. Fig. 1 shows distributed algorithm for radial load flow. Section 4 the Ethernet LAN architecture. Since there is no shared presents a distributed algorithm for looped load flow. memory or storage, through the Section 5 employs techniques in AA to derive formulas standard TCP/IP protocol is used to exchange informa- of the scalability (speedup and efficiency) of the tion among different machines. Here multi-instruction distributed algorithms discussed in Sections 3 and 4 and multi-data is used in the distributed algorithm. and Section 6 presents test results of a distribution Also, a multithreading approach is employed to fully system with approximately 40 000 elements. utilize the concurrency of hardware. (3) Previous works used matrix-based load flows, while this work uses a tree-based approach for unbalanced, 2. General approach for distributed algorithms of tree- multi-phase distribution systems, including both radial and based analysis looped systems. Previous works employed the sparse/ Jacobin matrix to solve load flow or load flow based 2.1. Controller and workers algorithms for transmission systems [3/9,11/14] and radial distribution systems [10]. This work utilizes a tree- The distributed algorithm defines two types of based approach to solve the unbalanced, multi-phase processes, controllers and workers, which play different load flow for radial and looped distribution systems, roles in the message passing scheme. A controller is a based on the previous works [15/19], especially the with the following responsibilities: previous work [19]. Compared with the matrix approach, the tree-base . To accept user input approach may reduce the complexity of software . To assign initial data (job) to workers . To invoke workers to execute tasks implementation for distribution systems [20/22]. The reason is that the matrix approach needs to model nodes . To send/receive intermediate data to/from workers and edges for the system topology, while the tree-based . To do system-wide, non-intensive calculations approach only needs to model edges. In addition, with . To terminate the algorithm at the appropriate time the tree-based approach, the looped load flow can be and notify workers to stop. viewed as inherited from the radial load flow. Hence, A worker is a process with the following respon- development and implementation works can be reduced sibilities: in practice. Since the distributed algorithms presented in . To receive initial data from a controller . To do intensive calculations upon a controller’s request . To send/receive intermediate data to/from a control- ler . To stop when notified.

In the approach here there is only one controller, while there are multiple workers. The controller is Fig. 1. An Ethernet LAN with multiple workstations. mainly responsible for coordinating all workers and F. Li, R.P. Broadwater / Electric Power Systems Research 65 (2003) 169/177 171

maybe some non-intensive computations, while workers . Assign (p/1)th feeder to pth , (p/2)th are responsible for intensive computations. The con- feeder to (p/1)th processor,...,(2p)th feeder to 1st troller has system-wide knowledge, while a worker has processor. knowledge of only a part of the system, i.e., the job . Assign (2p/1)th feeder to 1st processor, (2p/2)th assigned by the controller. In this sense, a controller feeder to 2nd processor, and so on. behaves like a manager in a company, while a worker . Stop when all k feeders are assigned. behaves like a salesman or an .

2.2. General approach 3. Distributed algorithm for radial load flow Fig. 2 illustrates the above steps using unified model- ing language (UML) notation [23], a standard for . In this figure, CController is the controller class and CWorker is the worker class. 3.1. Sequential algorithm based on tree approach Also, c is an object of class CController,andw is an object of class CWorker. Each directed line represents a The term sequential is used here to indicate one message or an activity, ordered by its sequence in time. processor. The radial load flow algorithm used here is The name of each message or activity is self-explana- based on tree traverses [15/20]. Compared with tradi- tory. Also, an asterisk (*) indicates a repeated activity. tional load flow algorithms like Gauss/Seidel and Newton/Raphson, this approach suits power distribu- tion systems. First, it is designed for unbalanced, multi- 2.3. Approach of tree partition phase models with large R/X ratios, so that it has good convergence characteristics. Secondly, it can be ex- In Step 1, a tree is partitioned into p sub-trees, or p tended to networked distribution systems as discussed tasks. Again, p is the number of processors. Here, the in Section 4. minimum set of a task is a feeder from a substation The sequential algorithm is carried out by several (source). So there must be at least p feeders to fully iterations. Every iteration consists of a backward utilize all processors. Since the proposed distributed traverse, followed by a forward traverse. The backward algorithm aims to solve large-scale distribution systems, traverse calculates the currents through elements. The this condition is usually satisfied. forward traverse calculates the voltages across elements. If there are k (k E/p) feeders, the following scheme is These calculations are represented by the following used to evenly assign tasks for each processor. equations, respectively. . Sort all feeders based on the number of components. X S+ Then, 1st feeder contains most components, while load I Im + (1a) kth feeder contains the least. V2 . Assign 1st feeder to 1st processor, 2nd feeder to 2nd processor,..., and pth feeder to pth processor. V2 V1 IZ (1b)

where I/current through element; m / all preceding downstream elements; Sload /load attached to the element; V2 /voltage at downstream of the element; V1/voltage at upstream node of the element;

Z/the impedance of the element. Note: for unbalanced three-phase systems, I, S,and

V are 3/1 vectors and Z is a 3/3 matrix. The sequential algorithm for the tree-based radial load flow is given as follows:

1) Starting from an ending element, backward traverse the tree element-by-element. Eq. (1a) is applied to calculate the current for each element. 2) Starting from the root element, forward traverse the tree element-by-element. Eq. (1b) is applied to calculate the voltages for each element. Fig. 2. General approach of distributed computing model for a tree- 3) Check the convergence criteria. If converged, stop; based system. otherwise, go back to Step 1. 172 F. Li, R.P. Broadwater / Electric Power Systems Research 65 (2003) 169/177

3.2. Distributed algorithm for radial load flow 4) Converge each radial tree using the radial load flow method. A distributed algorithm for radial load flow is now 5) Calculate the sensitivity matrix elements as the presented. It is based upon the sequential algorithm for change in voltage divided by the current injection radial load flow. using Eq. (2). The UML sequence diagram in Fig. 3 illustrates the 6) Calculate changes in co-tree or loop current flows algorithm. In this diagram, c is an object of class that force co-tree voltage drops to zero. CController, and w1 and w2 are two objects of class 7) Treat co-tree current flows as injected load currents CWorker. for associated distribution feeders. Object c initializes the algorithm by breaking a system 8) Converge each of the connecting circuits using the into different sub-trees (feeders) and sending tasks to w1 radial load flow. and w2 simultaneously. The messages with the same 9) Check convergence. If converged, stop; else, go to index numbers occur concurrently. Step 5. After the job assignment, w1 and w2 start backward traverses concurrently. Then, w1andw2 send the The sensitivity matrix of each co-tree element is given current information at their sub-tree roots to c,so as follows. that c can compute voltage drops at substations. 2 3 DVa Then, c sends voltage information at the sub-tree 4 5 1 1 1 [S]d DV ×d[ I I I ] (2) roots to w1 and w2 concurrently to initialize voltage a ( a) ( b) ( c) DV calculations. Next, w1 and w2 start forward traverses a concurrently. After the convergence criteria are met in where S/3/3 sensitivity matrix; DV /voltage drop both sub-trees handled by w1 and w2, respectively, the i at the co-tree element at phase i, i/a, b, and c; algorithm terminates. d[DV...]/3/1 vector of the change of DV between 1 two iterations; (Ii ) /reciprocal of co-tree node injec-

tion current at phase i, i/a, b, and c; d[I...]/1/3 4. Distributed algorithm for looped load flow 1 vector of change of (Ii ) between two iterations. Since this sensitivity-based algorithm is highly depen- 4.1. Sequential algorithm based on sensitivity matrix dent on radial load flow, it can be easily implemented by extending the radial algorithm in Section 3. Also, this In power distribution systems, there may be closed algorithm is highly parallizable since the key idea is to switches that connect two or more radial power break the looped tree into many radial trees. Each radial distribution feeders. These closed switches create loops in the systems. They are referred to here as co-tree elements [19,20]. The number of co-tree elements in a distribution network may approach 100. To calculate the load flow in this scenario, the previous works [17/19] presented different but similar algorithms based on compensations at co-tree elements. The method in Ref. [19] utilized approximate and easy- to-compute sensitivity (compensation) matrices at co- tree elements, as opposed to the methods in Refs. [17,18] that required the actual and more complex impedance sensitivity matrix. The method in Ref. [19] is used as the sequential algorithm here and will be transformed to a distributed algorithm in Section 4.2. This sequential algorithm is based on the sequential radial load flow presented in Section 3.1 in conjunction with a sensitivity matrix. The description of this sequential algorithm follows.

1) Break the network into a number of radial trees by disconnecting the co-tree elements, usually switches. 2) Converge each radial tree using the radial load flow method. 3) Inject one ampere into the radial circuits at each co- tree element. Fig. 3. Sequence diagram for radial load flow distributed algorithm. F. Li, R.P. Broadwater / Electric Power Systems Research 65 (2003) 169/177 173 tree can then be solved in parallel with the other radial kernel of the sequential algorithm. All it does is split the trees. computation into small parts. Each part is carried out in a separate worker process, and the results are re- 4.2. Distributed algorithm for looped load flow assembled at the controller through message passing. Next, the , speedup and efficiency are A distributed version of the above algorithm is discussed. Here the term time is referred to as wall-clock presented in this section. It is similar to the key idea time. of the distributed version of the radial load flow in Assume that the sequential load flow converges within Section 3. Assume that there is a unique controller and p k iterations (from experiments typically, k/4/12), each workers. The step-by-step description of the algorithm having one backward and one forward traverse. Hence, follows. 2k traverses are needed. Also assume that the time to traverse an element is the same in the forward and 1) The controller breaks the circuit into p radial sub- backward calculations. The time complexity for the tree circuits by disconnecting co-tree elements. sequential algorithm, involving 2k traverses of a tree Then, the controller assigns sub-trees to each system, is given as worker. T 2kan (3) 2) The controller and the workers work together until seq

all radial load flow calculations converge. Here, the where k/the number of forward traverses/the num-

distributed radial load flow algorithm is used. ber of backward traverses; a/the time to backward

3) The controller sets the injected current at each co- traverse one element/the time to forward traverse one

tree switch to 1 A. element; n/the number of elements in the system. 4) The controller and workers again complete the Assume that there are p processors for the distributed distributed radial load flow algorithm. algorithm. Since the controller process is computation- 5) The controller calculates the sensitivity matrix ally lightly loaded, one processor runs the controller elements as the change in voltage divided by the process and a worker process for maximum CPU usage.

current injections. Each of the remaining p/1 processors runs a worker

6) The controller calculates the change of injection process. Hence, there are p/1 processes in p processors. current at each co-tree element. The time complexity for the distributed algorithm 7) The controller and workers complete the radial load consists of two parts. One is for computation, and the flow distributed algorithm. other is for communication. Since the traverse is carried 8) The convergence criteria are checked. If converged, out in parallel at p workers, the complexity for stop; otherwise, return to Step 5. computation is 1/p of the sequential execution. Hence, we have In this algorithm, the controller has more responsi- bility than in the distributed radial load flow algorithm. Tdist-comp 2kan=p (4) It needs to calculate the sensitivity matrix, which is Next consider the communication overhead during required by the looped load flow. The controller is the initial job assignment. During initialization, the responsible for this since it has global knowledge of the controller sends p sub-trees to each worker. Each sub- system, while a worker has only local knowledge of its tree takes roughly n/p elements (and n/p units of own sub-tree. Note that the size of the sensitivity matrix information). Since the communication channel can corresponds to the number of co-tree elements. serve only one communicating pair at a given time, the communication overhead for initialization is given as

5. Convergence, complexity and efficiency Tdist-comm-init p(bn=p)bn (5)

where b/the time of transmitting the information of AA is employed in this section to derive theoretic one element. formulas of scalability, including speedup and effi- The communication overhead during the computa- ciency, for the proposed distributed algorithms. The tion is much smaller. In the backward traverse, each formulas are used in Section 6 to predict scalability and worker needs to pass one unit of current information at to check with the test results. its sub-tree root to the controller. Then, the controller forwards this information to other workers. Hence, the 5.1. Distributed algorithm for radial load flow communication overhead for k backward traverses is given as The radial load flow distributed algorithm preserves T k(bpbp)2kbp (6) the convergence characteristics of the original sequential dist-comm-bt algorithm, because it does not change the mathematical The forward traverse has the same communication 174 F. Li, R.P. Broadwater / Electric Power Systems Research 65 (2003) 169/177 overhead as in Eq. (6). Then, we have simplified as

Tdist-comm-ft 2kbp (7) Tdist-comm bn (18) The overall communication overhead is given as The overall complexity of the distributed algorithm is given as Tdist-comm Tdist-comm-init Tdist-comm-bt Tdist-comm-fte T T T 2k?an=pbn (19) bn4kbp (8) dist dist-comp dist-comm The speedup and the efficiency are given as Practically, n /4kp for p B/64. This gives

SL Tseq=Tdist p=(1bp=2k?a) (20) Tdistcomm bn (9) E S =p1(1bp=2k?a) (21) The overall complexity for the distributed algorithm is L L given by Compared with Eq. (12), EL is greater than ER because of the number of traverse k? is 2/4 times as T T T 2kan=pbn (10) dist dist-comp dist-comm great as k. This matches the test results presented in the The speedup [1] is given as following section.

SR Tseq=Tdist p=(1bp=2ka) (11) The efficiency [1] is given as 6. Test results E S =p1=(1bp=2ka) (12) R R Using a prototype of a software framework for power system analysis [22], example solutions are now pre- sented. One set of solutions implements the sequential algorithms, and the other set implements the distributed 5.2. Distributed algorithm for looped load flow algorithms. For simplicity, the load flow results pre- sented are for single-phase systems. As stated in Section 5.1, the distributed algorithm for the looped load flow does not change the mathematical 6.1. Development and test environment characteristics. Hence, it does not affect the convergence features. Practical experience demonstrates that the Both sequential and distributed programs are devel- sequential program converges in about 4/6 iterations oped with Microsoft Visual C// 6.0 on a Windows NT of the sensitivity matrix calculation. Please note that 4.0 platform. Tests are run on eight machines, connected within each sensitivity matrix calculation, a radial load with a 10BASE-T Ethernet LAN. flow needs to be run. Also, the last iteration of the load The sequential load flow program is installed and flow may take fewer traverses than the first iteration of tested on each machine individually. the load flow, due to the different injection currents and The distributed load flow program consists of a initial system states. The total number of backward or controller part and a worker part, distributed among forward traverses falls in the range of 8/35. the above machines. One machine runs a controller Assume that there are k? forward and k? backward process and a worker process concurrently. At each of traverses in total. Therefore, there are 2k? system the remaining machines only a worker process is run. traverses in the looped load flow. Then we have the These workers communicate with the controller through complexity of the sequential algorithm given by TCP/IP protocol. Distributed tests are run with 2, 4, and Tseq 2k?an (13) 8 machines. A multithreading technique is used for each worker With the same assumption as in Section 5.1, there are process so that CPU and I/O are fully concurrent. That p processors running p/1 processes in the distributed is, one is responsible for the mass computation, algorithms. Similarly, the time complexity for T , dist-comp while the other thread is responsible for data commu- T and T are given by the following dist-comm-ft dist-comm-bt nication over the Ethernet LAN. equations.

Tdist-comp 2k?an=p (14) 6.2. Test systems

Tdist-comm-bt Tdist-comm-ft 2k?bp (15) A system containing approximately 40 000 elements is Tdist-comm-init bn (16) used. It is configured as shown in Fig. 4. Tdist-comm Tdist-comm-init Tdist-comm-ft Tdist-comm-bt There are two substations, SUB1 and SUB2, in Fig. 4. bn 4k bp (17) ? Each substation has four feeders. Each feeder has an

Since n /4k?p for p B/64, Tdist-comm-ft and Tdist-comm-bt electric load of approximately 1.0 MV A. The number of are much smaller than Tdist-comm-init . Eq. (17) can be elements in each feeder ranges from 4000 to 6000 F. Li, R.P. Broadwater / Electric Power Systems Research 65 (2003) 169/177 175

Table 2 Results for the looped load flow

# of processors 1 2 4 8

Communication time (s) / 0.97 1.20 1.43 Computation time (s) 61.93 32.47 16.67 7.87 Total time (s) 61.93 33.44 17.87 9.30 Fig. 4. Test system. Speedup 1.00 1.85 3.47 6.66 Efficiency (%) 100 93 87 83 elements. This system will be broken into two, four, or eight parts to fit the distributed computing environment. The four co-tree switches are open when running the Table 3 radial load flow, while they are closed when running the Efficiency from test results and estimations using b/a/1.63 looped load flow. # of processors 2 4 8

Distributed algorithm of radial load flow 6.3. Test results for sequential and distributed algorithms Test results (%) 92 86 69 Estimations (%) 88 79 65 Table 1 presents the test results of the radial load flow Error (%) /4.3 /8.1 /5.3 sequential algorithm (with one processor) and distrib- Distributed algorithm of looped load flow uted algorithm (with 2, 4 and 8 processors). The load Test results (%) 93 87 83 Estimations (%) 95 91 83 flow takes 12 iterations (i.e. k/12) to converge. Error (%) 2.2 4.6 0.0 Table 2 presents the test results of the looped load flow sequential algorithm (with one processor) and distributed algorithm (with 2, 4 and 8 processors). The . The overall time to complete the load flow calcula- load flow takes 4 iterations for the sensitivity matrix tion is significantly decreased with the distributed calculations (4 radial load flows indeed). These 4 radial algorithm. load flows take 12, 8, 6 and 5 backward and forward . The computation time dominates the overhead time traverses to converge, respectively. That is, k? equals to of job assignment. So, the proposed distributed 31 in this case. Again, the time in Tables 1 and 2 is wall- algorithms tend to be efficient. clock time. . The computation time almost linearly decreases. The ratio of b/a can be calculated using Eqs. (12) and . The derived scalability formulas approximately pre- (21) for the cases with {2, 4, 8} processors. These ratios dict the scalability of the proposed distributed algo- are {1.04, 0.96, 1.32} from Table 1 and {2.51, 2.37, 1.56} rithms. from Table 2. The average of b/a is 1.63. Applying this value to Eq. (12) and Eq. (21) again, the efficiency is The differences observed in Tables 1 and 2 are now estimated as shown in Table 3. This table also gives considered. The looped load flow is more computation- errors of estimated efficiency as opposed to efficiency ally intensive than the radial load flow, while the job from tests. Similar results can be calculated for speedup.

6.4. Results analysis

From Tables 1/3 the following observations can be drawn for the distributed algorithms.

Table 1 Results for the radial load flow

# of processors 1 2 4 8

Communication time (sec) / 1.16 1.03 1.26 Computation time (s) 25.75 12.80 6.43 3.38 Total time (s) 25.75 13.96 7.46 4.64 Speedup 1.00 1.84 3.45 5.55 Efficiency (%) 100 92 86 69 Fig. 5. Efficiency comparison between distributed algorithms of radial load flow and looped load flow. 176 F. Li, R.P. Broadwater / Electric Power Systems Research 65 (2003) 169/177 assignment time for the two cases is approximately the . The load flow distributed algorithms would be same. Therefore, the job assignment time counts much valuable for time-consuming computations such as less in the looped load flow. So, a better speed-up and network reconfigurations and optimizations. efficiency is achieved for the looped load flow. This is also shown in Fig. 5, which displays the efficiency curves of the distributed algorithms for the radial load flow and the looped load flow. The looped load flow has a flatter Acknowledgements curve when the number of processors goes up. If the above results are compared with the results The authors thank Electric Distribution Design presented in the previous work [3/10], especially the (EDD) Inc. for providing software, facilities, and results in Ref. [10] that were tested on small radial financial support to complete this research work. They distribution systems, this work demonstrates improved also thank the Electric Power Research Institute (EPRI) speedup and efficiency. The reasons for this improve- for use of the Distribution Engineering Workstation ment may be attributed to the following: (DEW) in benchmark analysis of the load flow calcula- tions. . This work deals with 40 000 elements, while previous works handled only hundreds of elements. Then, each processor in this work needs to process many more elements. Hence, the ratio of computation time versus References communication time in this work is much greater than the previous works. That is, the portion of [1] V. Kumar, A. Grama, A. Gupta, G. Karypis, Introduction to communication overhead in this work is smaller than Parallel Computing: Design and , Benja- that in the previous works. min-Cummings/Addison-Wesley, 1994. . This work uses an efficient multithreading approach [2] U. Manber, Introduction to Algorithms*/A Creative Approach, for direct communication with TCP/IP among differ- Addison-Wesley, 1989. [3] A. Abur, A parallel scheme for the forward/backward substitu- ent processes, while the previous works may have tions in solving sparse linear equations, IEEE Trans. Power Syst.

additional communication overhead due to add-on 3 (4) (1988) 1471/1478. software modules. [4] K. Lau, D.J. Tylavsky, A. Bose, Coarse grain in parallel triangular factorization and solution of power system

matrices, IEEE Trans. Power Syst. 6 (2) (1991) 708/714. [5] G. Huang, W. Ongsakul, Speedup and overhead 7. Conclusions analysis of Gauss/Seidel type algorithms on a sequent balance machine, IEE Proc. Part C 141 (5) (1994) 437/444. The following conclusions can be drawn: [6] J.Q. Wu, A. Bose, Parallel solution of large sparse matrix equations and parallel power flow, IEEE Trans. Power Syst. 10 . Tree-based load flow for unbalanced, multi-phase (3) (1995) 1343/1349. power distribution systems is a computationally [7] J.Q. Wu, A. Bose, A new successive relaxation scheme for the W- matrix solution method on a shared memory parallel computer, intensive problem that is suited for distributed IEEE Trans. Power Syst. 11 (1) (1996) 233/238. computing. [8] T.T. Nguyen, Load-flow parallel processing system: conjugate- . The looped load flow distributed algorithm is com- gradient neutral , Electric Power Syst. Res. putationally more intense and more efficient than the 39 (1) (1996) 73/79. distributed algorithm for radial load flow. [9] S.D. Chen, J.F. Chen, Fast load flow using multiprocessors, Int. J. Electrical Power Energy Syst. 22 (4) (2000) 231/236. . Formulas derived from AA can be used to estimate [10] Y. Fukuyama, Y. Nakanishi, H-D Chiang, Fast distribution load scalability of the radial and looped load flow algo- flow with multiprocessors, Int. J. Electrical Power Energy Syst. 18 rithms presented. (5) (1996) 331/337. [11] V.C. Ramesh, On distributed computing for on-line power system Some additional comments follow: applications, Int. J. Electrical Power and Energy Syst. 18 (8) (1996) 527/533. . The communication overhead would be even smaller [12] R. Baldick, B.H. Kim, C. Craig, Y. Luo, A fast distributed if Fast Ethernet and Gigabit Ethernet are applied for implementation of optimal power flow, IEEE Trans. Power Syst. 14 (3) (1999) 858/864. the distributed algorithms. They are 10 and 100 times [13] B.H. Kim, R. Baldick, A comparison of distributed optimal faster, respectively, than 10BASE-T Ethernet. power flow algorithm, IEEE Trans. Power Syst. 15 (2) (2000) . The distributed algorithms are more efficient for 599/604. multi-phase load flows, because their computational [14] J.R. Santos, A.G. Exposito, J.L.M. Ramos, Distributed contin- scale is up to 9 times as large as that of single-phase gency analysis: practical issues, IEEE Trans. Power Syst. 14 (4) (1999) 1349/1354. load flows. Hence, in the multi-phase load flows, [15] W.H. Kersting, D.L. Mendive, Application of Ladder Network computation time dominates communication time Theory to the Solution of Three-Phase radial load flow Problems, more than in the case of single-phase load flows. Proceedings of IEEE PES Winter Meeting, New York, 1976. F. Li, R.P. Broadwater / Electric Power Systems Research 65 (2003) 169/177 177

[16] R.P. Broadwater, A. Chandrasekaran, C.T. Huddleston, A.H. [23] R. Pooley, P. Stevens, Using UML: with Khan, Power flow analysis of unbalanced multiphase radial Objects and Components, Addison-Wesley, 1999.

distribution systems, Electric Power Syst. Res. 14 (1988) 23/33. [17] D. Shirmohammadi, H.W. Hong, A. Semlyen, G.X. Luo, A compensation-based power flow method for weakly meshed Biographies distribution and transmission networks, IEEE Trans. Power

Syst. 3 (2) (1989) 753/762. [18] G.X. Luo, A. Semlyen, Efficient load flow for large weakly Fangxing Li received his BS and MS degrees in

meshed networks, IEEE Trans. Power Syst. 5 (4) (1990) 1309/ Electric Power Engineering from Southeast University, 1316. Nanjing, China, in 1994 and 1997 respectively. He [19] M. Ellis, The Ladder Load Flow Method Extended to Distribu- received his Ph.D. degree in tion Networks, Ph.D. Dissertation, Department of Electrical from Virginia Tech in 2001. He is presently a Senior Engineering, Virginia Tech, July 1994. [20] R. Broadwater, J. Thompson, M. Ellis, H. Ng, N. Singhs, D. R&D Engineer at ABB Inc., where he specializes in Loyd, Application programmer interface for the EPRI distribu- computer applications in power systems, including soft- tion engineering workstation, IEEE Trans. Power Syst. 10 (1) ware architecture and distributed computing. (1995) 499/505. Robert Broadwater is a power systems and software [21] R.E. Brown, Electric Power System Reliability, Marcel Dekker, engineering Professor at Virginia Tech where he teaches 2001. courses in applied software engineering and large-scale [22] Fangxing Li, A Software Framework for Advanced Power System Analysis: Case Studies in Networks, Distributed Generation, and . Dr Broadwater works in the area Distributed Computation, Ph.D. Dissertation, Dept. of Electrical of computer-aided engineering for electrical distribution and Computer Engineering, Virginia Tech, 2001. system analysis, design, and operations.