Ams 530: Priniciples of Parallel Computing

AMS 530: PRINICIPLES OF PARALLEL COMPUTING

PROJECT 3: TRAVELLING SALESMAN PROBLEM

Submitted by

ASHA NAYAK PROJECT DESCRIPTION:

Problem Description:

A traveling salesman must travel to 20 cities, (at least once each) of a country whose borders form a rectangle of length (east-west) 3000 miles and width 1500 (south-north) miles. Firstly, 20 such cities must be randomly and uniformly generated and then the salesman must start traveling from the most northeast city (defined as the city closest to the northeast corner of the country) to the remaining 19 cities. The aim of the project is to find a path that has the shortest total distance.

Solution :

The Travelling Salesman Problem (TSP) is a deceptively simple combinatorial problem. It can be stated very simply:

A salesman spends his time visiting n cities (or nodes) cyclically. In one tour he visits each city just once, and finishes up where he started. In what order should he visit them to minimize the distance travelled?

An assumption made is that, this is a symmetric TSP - that is, for any two cities A and B, the distance from A to B is the same as that from B to A. In this case the tour length remains the same on reversing the order in which they are visited. For the symmetric case there are (n-1)! /2 distinct solutions - for an n city TSP. The number of solutions becomes extremely large for large n, so that an exhaustive search is impractical. The problem has some direct importance, since quite a lot of practical applications can be put in this form. It also has a theoretical importance in complexity theory, since the TSP is one of the class of "NP Complete" combinatorial problems. NP Complete problems are intractable in the sense that no one has found any really efficient way of solving them for large n. They are also known to be more or less equivalent to each other; if you knew how to solve one kind of NP Complete problem you could solve the lot.

The approach followed here is an imitation of brute-force method with the goal of eliminating the large time complexity inherent in this approach. Each processor computes the path length for a set of sequences and records the best solution amongst them. Thus each processor exhausts a portion of the solution space. Finally the central processor chooses the shortest path amongst these local best solutions.

Specifically, in this project, the following 3 steps were computed:

(1) The CPU time necessary to find the shortest path using one processor is computed and then used as the benchmark for the program to be run using multiple processors. (2) Then, using P=2, 4, 8 nodes the shortest path, is computed again individually. The calculations are terminated on reaching a shortest path value lesser than that given by P=1 (as in (1) above) or on completing the specified number of iterations in the program. (3) Lastly the speed-up curve is plotted. ALGORITHM DESCRIPTION

1. Start 2. Run the program 3. Initialize the MPI routines and indicate the rank and size 5. Initialize the co-ordinate positions of each of the 20 cities 6. Calculate the straightline distance between every pair of cities and place it in the distance matrix 7. Find out the Northeast city and using this city as the starting point calculate the length for an initial random path 8. Record the time tstart 9 .Initialize nloop to 1. 10. Parallelize the work between the processors (i.e each processor generates a different possible sequence in which the cities could be traversed) 11. Each processor now calculates the length of the path it has generated and compares with the length of the previous generated path (in the first iteration it is the initial path). 12. Choose the minimum of these lengths as the new path length and retain the sequence corresponding to the path which has been accepted 13.. Increment nloop and repeat steps 10 to 12 till nloop=100000 or till a pre-defined path length is reached. 14. Each processor passes the best path sequence it has generated along with the path length to a central processor. The central processor chooses the shortest path amongst these and retains the sequence corresponding to the chosen path. 15. Record the time totaltime when the shortest path calculation ends 16. Find the time taken to perform shortest path calculation (totaltime – tstart) 17. Pass this time to the central processor and calculate the average time taken by all the processors 18. Display the shortest path and its length 19. End PROGRAMS: /******* This program computes the shortest path a travelling salesman can take in order to tour 20 cities ****/

#include #include #include #include"mpi.h"

#define NO_OF_LOOPS 250000 #define PROC1COST 19000 struct position { float x; float y; }; int var = 1000,my_rank,p,n=1; long int no=1; struct position pos[20]; int start; float tstart,totaltime,finaltime; float dist[20][20]; int a[20], b[20],minseq[20],tag[20]; main(int argc,char **argv) {

int i,j,k,l,m,flag=0,minrank,temprank; int tempseq[20]; float newcost,mincost,startdist,newdist,tempcost;

MPI_Status status; float rand1(); void create_sequence(); float calculate_cost();

MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&my_rank); MPI_Comm_size(MPI_COMM_WORLD,&p);

/***** initial configuration generated by processor p-1 *****/ if (my_rank==(p-1)) { for(i=0;i<20;i++) { pos[i].x = rand1()*3000; pos[i].y = rand1()*1500; printf("the x coord is %f and the y cord is %f of %d\n",pos[i].x,pos[i].y,i); }

/*** calculation of distance matrix by processor p-1 ***/ for(i=0;i<20;i++) { dist[i][i] = 0; } for(i=0;i<20;i++) { for(j=i+1;j<20;j++) { dist[i][j]=sqrt((pos[i].x - pos[j].x)*(pos[i].x - pos[j].x)+(pos[i].y - pos[j].y)* (pos[i].y - pos[j].y)); dist[j][i] = dist[i][j]; } }

/**** computation of north-east most point by processor p-1 ****/ start = 0; startdist =sqrt((3000 - pos[0].x)*(3000 - pos[0].x) + (1500 - pos[0].y)* (1500 - pos[0].y)); for(i=1;i<20;i++) { newdist = sqrt((3000 - pos[i].x)*(3000 - pos[i].x) + (1500 - pos[i].y)* (1500 - pos[i].y)); if ( newdist < startdist ) { startdist = newdist; start = i; } } for(i=0;i<20;i++) { a[i] = i; tag[i]=0; b[i] = i; } a[start] = a[0]; a[0] = start; b[start] = a[start]; b[0]=a[0]; tag[0]=1;

} //my_rank = (p-1) ends here if (p!=1) { if (my_rank == (p-1)) { for (i=0;i<(p-1);i++) { MPI_Send(&start,1,MPI_INT,i,i+9,MPI_COMM_WORLD); for (j=0;j<20;j++) { MPI_Send(&a[j],1,MPI_INT,i,i+j+20,MPI_COMM_WORLD); } for (k=0;k<20;k++) { for (j=0;j<20;j++) { MPI_Send(&dist[k][j],1,MPI_DOUBLE,i,i+k+j+40,MPI_COMM_WORLD); } }

}//end of first loop }//end of if my_rank loop else { MPI_Recv(&start,1,MPI_INT,p-1,my_rank+9,MPI_COMM_WORLD,&status); for (j=0;j<20;j++) { MPI_Recv(&a[j],1,MPI_INT,p-1,my_rank+j+20,MPI_COMM_WORLD,&status); } for (k=0;k<20;k++) { for (j=0;j<20;j++) { MPI_Recv(&dist[k][j],1,MPI_DOUBLE,p-1,my_rank+j+k+40,MPI_COMM_WORLD,&status); } } } //else ends here

}//end of p!=1 loop tstart=MPI_Wtime(); create_sequence(); mincost = calculate_cost(); for(m=0;m<=19;m++) minseq[m]=b[m];

//printf("The intial distance is %f",mincost);

//getch(); while (no < NO_OF_LOOPS) {

/* each processor generating a random sequence of cities to be visited **/ create_sequence();

/* each processor calculates the path length*/ newcost = calculate_cost();

/* the shortest path found is recorded */ if (newcost < mincost) { mincost = newcost; for(i=0;i<20;i++) minseq[i] = b[i]; }

if(p!=1) {

/* each processor sends its best solution to p-1 **/ for(i=0;i<(p-1);i++) { MPI_Send(&mincost,1,MPI_DOUBLE,p-1,my_rank+10,MPI_COMM_WORLD); MPI_Send(&my_rank,1,MPI_INT,p-1,my_rank+11,MPI_COMM_WORLD); }

if(my_rank==(p-1)) { /* the terminating condition is being checked by p-1**/ if(mincost

for(i=0;i<(p-1);i++) {

MPI_Recv(&tempcost,1,MPI_DOUBLE,i,i+10,MPI_COMM_WORLD,&status); MPI_Recv(&temprank,1,MPI_INT,i,i+11,MPI_COMM_WORLD,&status);

/* terminating condition is being checked by other processors*/ if(tempcost

} // i ends here

/* all processors send the flag to p-1 */ for(m=0;m<(p-1);m++) MPI_Send(&flag,1,MPI_INT,m,m+20,MPI_COMM_WORLD); }//(my_rank = p-1 ends here

if(my_rank!=(p-1)) MPI_Recv(&flag,1,MPI_INT,p-1,my_rank+20,MPI_COMM_WORLD,&status); if(flag) { if(my_rank != (p-1)) for(l=0;l<=19;l++) MPI_Send(&minseq[l],1,MPI_INT,p-1,my_rank+30,MPI_COMM_WORLD); else { for(i=0;i<(p-1);i++) { for(l=0;l<=19;l++)

MPI_Recv(&tempseq[l],1,MPI_INT,i,i+30,MPI_COMM_WORLD,&status); if(minrank==i) { for(l=0;l<=19;l++) minseq[l]=tempseq[l]; } } printf("The path travelled has the following sequence \n"); for(l=0;l<19;l++) { printf(" %d ",minseq[l]); } printf("\n The shortest path has the length %f",mincost); } break; }// flag ends here

} // p!=1 ends here

no++;

} /* display of the final result */ printf("The min distance is %f and rank is %d\n",mincost,my_rank); for(i=0;i<20;i++) printf (" %d ",minseq[i]); printf("\n"); totaltime = MPI_Wtime() - tstart;

if(my_rank == p-1) finaltime = totaltime; if(p!=1) { if(my_rank == p-1) { for(m=0;m<(p-1);m++) {

MPI_Recv(&totaltime,1,MPI_DOUBLE,m,m+14,MPI_COMM_WORLD,&status); finaltime+=totaltime; }

finaltime=(finaltime/p); }

else MPI_Send(&totaltime,1,MPI_DOUBLE,p-1,my_rank+14,MPI_COMM_WORLD);

} //P!= 1 ENDS HERE if(my_rank == (p-1)) printf("\n Average Time is %f",finaltime); MPI_Finalize();

}

float rand1() //This function returns a random value between 0 and 1/ {

float s; var=var+1000; srand(var);

s =(float) rand()/RAND_MAX; return(s); } void create_sequence() //this function generates a random sequence of cities to be visited { int c,i,j; tag[0]=1; for (i=1; i<20 ; i++) tag[i] = 0 ; b[0] = start; for (j=1; j<20 ; j++) { i=0; c = (rand1() * 20) + 1; while ( c>=0 ) { i++; if(i==20) i=1; if (tag[i] == 0) c = c-1; }

b[j] = a[i]; tag[i] = 1; } } float calculate_cost() //this function calculates the path length { float cost; int i; cost = 0;

for (i=0; i<19; i++) { cost = cost + dist[b[i]][b[i+1]]; } cost = cost + dist[b[19]][b[0]]; return (cost); } RESULTS & ANALYSIS

Results: The results yielded were as follows:

Table1: This table gives the length of the shortest path for different number of iterations using only one processor (p=1).

# of iterations Shortest path length Computation time (sec) 1000 19003 0.5 5000 17522 2.51 10000 16848 3.96 50000 16464 40.20 250000 15095 99.16 1000000 15095 502

20000

SHORTEST 15000 PATH LENGTH 10000 TIME ELAPSED 5000

0 0 500000 1000000 1500000

x axis: # of iterations

Table2: This table compares the computation time required using P1, P2, P4, P8 without the use of any terminating condition (i.e allowing all the processors to run for the specified number of iterations.)

# of processors Shortest path length Computation time (sec) 1 15095.500977 99.253 2 14473.726562 81.534737 4 14473.726562 55.636208 8 14473.726562 79.641678 The speed up is calculated using the formula: Speed up = Time required by one processor Time required by n processors

The Speed up is as follows P Speed up

1 1 2 1.217 4 1.78396 8 1.246

speed-up curve

2

1.5

1 speed-up curve

0.5

0 0 5 10

x axis: # of nodes y axis: Speed up

Table3: This table compares the computation time required using P1, P2, P4, P8 with the use of a terminating condition (i.e the program is first run for one processor and the shortest path found by it is used as the terminating condition while running the program for 2,4 and 8 processors.)

# of iterations P1 P2 P4 P8 1000 0.5 0.26 4.12 18.04 Analysis of experimental results:

Analysis of Table1: From table1 it can be seen that as the number of iterations are increased the solution improves i.e a shorter path is found. As we know, at each iteration a different sequence is tested for its optimality. Hence as the number of iterations increase more of the solution space is covered and as a result the probability of getting a good path goes on increasing. The decrease in path length is an evidence of the same. But it can also be noticed that after a certain maximum number of iterations there is no improvement in the path. Specifically in table1, after 250000 iterations there is no improvement in path length. This may lead us to conclude that there is no necessity to cover the entire solution space since after a certain point the improvement in performance is marginal and can be neglected. In this program 250000 iterations have been used based on this conclusion.

Analysis of Table2 and the speed –up curve: From table2 it can be seen that as the number of processors increase, the computation time of the program decreases. But the speed-up curve shows that the speed–up in the process is not significant. The reason for this may be as follows. We are aware that as the number of processors increase the message passing between the processors increases. This increases the computation time of the process. As the number of cities to be visited (n) is very less, the time required to compute the different possible paths is less. Hence the message passing between the processors dominates the time required to complete the process. A high value of n requiring a large amount of computation as compared to message passing would yield a good speed-up curve.

Analysis of Table3: From table3 it can be seen that as the number of processors increase, the computation time increases. The basic reason for this is message passing. In this case a terminating condition has been used. The program is first run for one processor and the shortest path found by it is used as the terminating condition while running the program for 2,4 and eight processors. Hence unlike the case of table2, where every processor works independently till the iterations are complete, here every processor sends a status flag to the central processor at the end of each iteration. The status flag indicates whether any processor has satisfied the terminating condition. Hence for p processors there are p-1 messages sent to the central processor for each iteration. If the number of iterations is nloop the message passing due to the use of terminating condition is nloop X (p-1). The time required for message passing overrides the benefit derived by using more number of processors and hence results in poor performance of the program. ANALYSIS OF PROGRAM PERFORMANCE:

The traveling salesman problem is a NP-complete problem. As we know the main characteristic of a NP-complete problem is that the solution space is vast and hence the time required to cover the entire solution space is extremely large. This large time complexity forces us to think beyond the brute-force method to get the optimal solution. One point to bear in mind is that since brute-force method cannot be employed, a near optimal solution rather than the optimal solution should be expected.

The alternative to using brute-force method is the use of either genetic algorithms or simulated annealing. But the use of these algorithms results in near optimal solution. In this project a method to imitate the brute-force method has been employed. This was done with the intention of exploiting the basic advantage of a brute-force method which is accuracy of an optimal solution but at the same time eliminating the factor that deters us from using this method i.e extremely large time complexity. The detailed description of the implementation is as follows:

The program is initially run using only one processor and the shortest path found by it is used as a benchmark while using more than one processor.

In case of more than one processor, each processor experiments with different possible orders in which the cities could be visited. The sequence which gives the shortest path is finally accepted. The terminating condition is either reaching the benchmark set by the use of one processor or running for the given number of iterations. Each processor now sends its best result to a central processor. The central processor chooses the best result amongst these results and displays it.

It can be noted that as the number of processors increase the percentage of the solution space covered increases. For example considering the case of 100 iterations if one processor is used, it will search 100 different sequences. But if 4 processors are used then 400 different sequences will be tried out before coming to a conclusion. Also this is achieved without increasing the time required to reach the solution.

Thus as the number of processors are increased, the solution moves from a near optimal solution to an optimal solution.