Website Appendix: Fortran Source Code and Numerical Example

Below is the source code for the Fortran program ILP6.FOR, which generates the CPLEX input for the sequential integer programming method:

PROGRAM GENERATE C IMPLICIT INTEGER (A-Z) REAL*8 TIMEA,TIMEB,RR INTEGER A(17000,500),E(17000),R(50,50),O(500) C C THIS IS THE PROGRAM THAT WILL CORRESPOND TO THE MAIN MODEL IN C THE PAPER. THE WEIGHTS FOR THE OBJECTIVE FUNCTION ARE GONE. C THE GOAL IS TO MAXIMIZE THE SIZE OF CLUSTER K SUBJECT TO C CONSTRAINTS ON THE EARLIER SIZES. C OPEN(1, FILE = 'SMATRIX') ! PROXIMITY MATIX - input open(3, file= 'CF.MPS') ! MPS file for cplex - output C WRITE(*,*) ' PLEASE INPUT NUMBER OF ITEMS ' READ(*,*) N ! Read number of ITEMS READ(1,*) ((R(I,J),J=1,N),I=1,N) WRITE(*,*) ' PLEASE INPUT NUMBER OF CLUSTERS 2 TO 10' READ(*,*) C WRITE(*,*) ' PLEASE INPUT THE PROXIMITY THRESHOLD ' READ(*,*) LAMBDA C C CALL GETTIM (IHR, IMIN, ISEC, I100) CALL GETDAT (IYR, IMON, IDAY) TIMEA = DFLOAT(86400*IDAY+3600*IHR+60*IMIN+ISEC)+DFLOAT(I100)/100. C C SET OBJECTIVE COEFFICIENTS C NV = 0 DO K = 1,C DO I = 1,N NV = NV + 1 IF(K.EQ.C) THEN O(NV) = 1 ELSE O(NV) = 0 END IF END DO END DO C C SET CONSTRAINTS SO THAT EACH ITEM NOT ASSIGNED TO MORE C THAN ONE CLUSTER C NV = 0 NC = 0 DO I = 1,N NC = NC + 1 E(NC) = 1 DO K = 1,C

1 NCOL = N*(K-1) + I A(NC,NCOL) = 1 END DO END DO C C NOW SET THE TRIAD CONSTRAINTS TO GUARANTEE THAT THE LOWER C BOUND CONDITION IN ELLIS IS MET - THESE CONSTRAINTS ARE C NOT APPLICABLE TO THE PROBLEM IN THE PAPER C C DO I = 1,N-2 C DO J = I+1,N-1 C DO K = J+1,N C VIOL = 0 C IF(R(I,J).LT.R(I,K)*R(J,K)) VIOL = 1 C IF(R(I,K).LT.R(I,J)*R(J,K)) VIOL = 1 C IF(R(J,K).LT.R(I,K)*R(I,J)) VIOL = 1 C IF(VIOL.EQ.1) THEN C DO L = 1,C C NC = NC + 1 C E(NC) = 2 C NCOL1 = N*(L-1)+I C NCOL2 = N*(L-1)+J C NCOL3 = N*(L-1)+K C A(NC,NCOL1) = 1 C A(NC,NCOL2) = 1 C A(NC,NCOL3) = 1 C END DO C END IF C END DO C END DO C END DO C C NOW SET THE DYAD CONSTRAINTS TO GUARANTEE THAT A PAIR C OF ITEMS IS NOT ASSIGNED IF THEIR SIMILARITY FALLS C SHORT OF THE 'LAMBDA' THRESHOLD. C DO I = 1,N-1 DO J = I+1,N VIOL = 0 IF(R(I,J).LT.LAMBDA) VIOL = 1 IF(VIOL.EQ.1) THEN DO L = 1,C NC = NC + 1 E(NC) = 1 NCOL1 = N*(L-1)+I NCOL2 = N*(L-1)+J A(NC,NCOL1) = 1 A(NC,NCOL2) = 1 END DO END IF END DO END DO C NCUTS = C-1 ! PUT IN CUTS FOR ALL CLUSTER SIZES, BUT C ICOL = 0 DO I = 1,NCUTS NC = NC + 1

2 DO J = 1,N ICOL = ICOL + 1 A(NC,ICOL) = 1 END DO WRITE(*,*) ' INPUT NUMBER OF ITEMS IN GROUP ',I READ(*,*) E(NC) END DO COL = N*C ROW = NC WRITE(*,*) COL,ROW C C THE INPUT GENERATION IS COMPLETED HERE! ALL OF THE STUFF C BELOW IS TO GENERATE THE FOMATTED INPUT FOR CPLEX. C 1147 WRITE(3,31) WRITE(3,1144) 1144 FORMAT('OBJSENSE') WRITE(3,1145) 1145 FORMAT(' MAX') WRITE(3,32) WRITE(3,33) DO I = 1,ROW-NCUTS IF(I.LE.9) THEN WRITE(3,34) I ELSEIF(I.LE.99) THEN WRITE(3,134) I ELSEIF(I.LE.999) THEN WRITE(3,234) I ELSEIF(I.LE.9999) THEN WRITE(3,334) I ELSE WRITE(3,434) I END IF END DO DO I = ROW-NCUTS+1,ROW IF(I.LE.9) THEN WRITE(3,734) I ELSEIF(I.LE.99) THEN WRITE(3,1734) I ELSEIF(I.LE.999) THEN WRITE(3,2734) I ELSEIF(I.LE.9999) THEN WRITE(3,3734) I ELSE WRITE(3,434) I END IF END DO WRITE(3,35) DO 400 J = 1,COL IF(J.LE.9) THEN WRITE(3,36) J,O(J) ELSE IF(J.LE.99) THEN WRITE(3,236) J,O(J) ELSE IF(J.LE.999) THEN WRITE(3,336) J,O(J) ELSE IF(J.LE.9999) THEN WRITE(3,436) J,O(J)

3 ELSE WRITE(3,536) J,O(J) END IF DO 401 I = 1,ROW IF(A(I,J).EQ.0) GO TO 401 IF(J.LE.9) THEN IF(I.LE.9) THEN WRITE(3,1138) J,I,A(I,J) ELSEIF(I.LE.99) THEN WRITE(3,1238) J,I,A(I,J) ELSEIF(I.LE.999) THEN WRITE(3,1338) J,I,A(I,J) ELSEIF(I.LE.9999) THEN WRITE(3,1438) J,I,A(I,J) ELSE WRITE(3,1538) J,I,A(I,J) END IF ELSE IF(J.LE.99) THEN IF(I.LE.9) THEN WRITE(3,2138) J,I,A(I,J) ELSEIF(I.LE.99) THEN WRITE(3,2238) J,I,A(I,J) ELSEIF(I.LE.999) THEN WRITE(3,2338) J,I,A(I,J) ELSEIF(I.LE.9999) THEN WRITE(3,2438) J,I,A(I,J) ELSE WRITE(3,2538) J,I,A(I,J) END IF ELSE IF(J.LE.999) THEN IF(I.LE.9) THEN WRITE(3,3138) J,I,A(I,J) ELSEIF(I.LE.99) THEN WRITE(3,3238) J,I,A(I,J) ELSEIF(I.LE.999) THEN WRITE(3,3338) J,I,A(I,J) ELSEIF(I.LE.9999) THEN WRITE(3,3438) J,I,A(I,J) ELSE WRITE(3,3538) J,I,A(I,J) END IF ELSE IF(J.LE.9999) THEN IF(I.LE.9) THEN WRITE(3,4138) J,I,A(I,J) ELSEIF(I.LE.99) THEN WRITE(3,4238) J,I,A(I,J) ELSEIF(I.LE.999) THEN WRITE(3,4338) J,I,A(I,J) ELSEIF(I.LE.9999) THEN WRITE(3,4438) J,I,A(I,J) ELSE WRITE(3,4538) J,I,A(I,J) END IF ELSE IF(I.LE.9) THEN WRITE(3,5138) J,I,A(I,J) ELSEIF(I.LE.99) THEN

4 WRITE(3,5238) J,I,A(I,J) ELSEIF(I.LE.999) THEN WRITE(3,5338) J,I,A(I,J) ELSEIF(I.LE.9999) THEN WRITE(3,5438) J,I,A(I,J) ELSE WRITE(3,5538) J,I,A(I,J) END IF END IF 401 CONTINUE C 400 CONTINUE C WRITE(3,41) DO I = 1,ROW IF(I.LE.9) THEN WRITE(3,42) I,E(I) ELSEIF(I.LE.99) THEN WRITE(3,142) I,E(I) ELSEIF(I.LE.999) THEN WRITE(3,242) I,E(I) ELSEIF(I.LE.9999) THEN WRITE(3,342) I,E(I) ELSE WRITE(3,442) I,E(I) END IF END DO WRITE(3,43) DO I = 1,COL IF(I.LE.9) THEN WRITE(3,544) I ELSEIF(I.LE.99) THEN WRITE(3,545) I ELSEIF(I.LE.999) THEN WRITE(3,546) I ELSEIF(I.LE.9999) THEN WRITE(3,547) I ELSE WRITE(3,548) I END IF END DO C DO I = M*NR+N*NC+1,ISET C IF(I.LE.9) THEN C WRITE(3,744) I C ELSEIF(I.LE.99) THEN C WRITE(3,745) I C ELSEIF(I.LE.999) THEN C WRITE(3,746) I C ELSEIF(I.LE.9999) THEN C WRITE(3,747) I C ELSE C WRITE(3,748) I C END IF C END DO

WRITE(3,44) C

5 31 FORMAT('NAME MIKEB') 32 FORMAT('ROWS') 33 FORMAT(' N OBJECTIV') 134 FORMAT(' L C',I2) 234 FORMAT(' L C',I3) 334 FORMAT(' L C',I4) 434 FORMAT(' L C',I5) 34 FORMAT(' L C',I1) 734 FORMAT(' E C',I1) 1734 FORMAT(' E C',I2) 2734 FORMAT(' E C',I3) 3734 FORMAT(' E C',I4) 4734 FORMAT(' E C',I5) 35 FORMAT('COLUMNS') 36 FORMAT(' X',I1,' OBJECTIV ',I7) 236 FORMAT(' X',I2,' OBJECTIV ',I7) 336 FORMAT(' X',I3,' OBJECTIV ',I7) 436 FORMAT(' X',I4,' OBJECTIV ',I7) 536 FORMAT(' X',I5,' OBJECTIV ',I7) 1138 FORMAT(' X',I1,' C',I1,' ',I7) 1238 FORMAT(' X',I1,' C',I2,' ',I7) 1338 FORMAT(' X',I1,' C',I3,' ',I7) 1438 FORMAT(' X',I1,' C',I4,' ',I7) 1538 FORMAT(' X',I1,' C',I5,' ',I7) 2138 FORMAT(' X',I2,' C',I1,' ',I7) 2238 FORMAT(' X',I2,' C',I2,' ',I7) 2338 FORMAT(' X',I2,' C',I3,' ',I7) 2438 FORMAT(' X',I2,' C',I4,' ',I7) 2538 FORMAT(' X',I2,' C',I5,' ',I7) 3138 FORMAT(' X',I3,' C',I1,' ',I7) 3238 FORMAT(' X',I3,' C',I2,' ',I7) 3338 FORMAT(' X',I3,' C',I3,' ',I7) 3438 FORMAT(' X',I3,' C',I4,' ',I7) 3538 FORMAT(' X',I3,' C',I5,' ',I7) 4138 FORMAT(' X',I4,' C',I1,' ',I7) 4238 FORMAT(' X',I4,' C',I2,' ',I7) 4338 FORMAT(' X',I4,' C',I3,' ',I7) 4438 FORMAT(' X',I4,' C',I4,' ',I7) 4538 FORMAT(' X',I4,' C',I5,' ',I7) 5138 FORMAT(' X',I5,' C',I1,' ',I7) 5238 FORMAT(' X',I5,' C',I2,' ',I7) 5338 FORMAT(' X',I5,' C',I3,' ',I7) 5438 FORMAT(' X',I5,' C',I4,' ',I7) 5538 FORMAT(' X',I5,' C',I5,' ',I7) 41 FORMAT('RHS') 42 FORMAT(' RHS C',I1,12X,I7) 142 FORMAT(' RHS C',I2,11X,I7) 242 FORMAT(' RHS C',I3,10X,I7) 342 FORMAT(' RHS C',I4,9X,I7) 442 FORMAT(' RHS C',I5,8X,I7) 43 FORMAT('BOUNDS')

544 FORMAT(' BV BOUND X',I1) 545 FORMAT(' BV BOUND X',I2) 546 FORMAT(' BV BOUND X',I3) 547 FORMAT(' BV BOUND X',I4) 548 FORMAT(' BV BOUND X',I5)

6 744 FORMAT(' UP BOUND X',I1,' 1') 745 FORMAT(' UP BOUND X',I2,' 1') 746 FORMAT(' UP BOUND X',I3,' 1') 747 FORMAT(' UP BOUND X',I4,' 1') 748 FORMAT(' UP BOUND X',I5,' 1') 44 FORMAT('ENDATA') 889 STOP END

7 Below is a small numerical example that uses the Fortran program.

The input proximity matrix ‘smatrix’ corresponds to n = 5 objects (note that the main diagonal is ignored). The value selected for lambda is  = 7. smatrix = 0 6 5 8 5 6 0 7 14 8 5 7 0 7 12 8 14 7 0 4 5 8 12 4 0

Enter ‘ILP6’ at the command prompt > ILP6

This will yield the following prompts (typed replies follow immediately)

> PLEASE INPUT NUMBER OF ITEMS > 5

> PLEASE INPUT NUMBER OF CLUSTERS 1 TO 10 > 1 > PLEASE INPUT THE PROXIMITY THRESHOLD > 7

This generates the file ‘CF.MPS’, which is the CPLEX input file. CPLEX is initiated and the file is read.

Log started (V12.1.0) Mon Nov 02 15:02:55 2015

Specified objective sense: MAXIMIZE Selected objective name: OBJECTIV Selected RHS name: RHS Selected bound name: BOUND

Problem 'c:\isp\cf.mps' read. Read time = 0.00 sec.

The formulation is:

Maximize OBJECTIV: X1 + X2 + X3 + X4 + X5 Subject To C1: X1 <= 1 C2: X2 <= 1 C3: X3 <= 1 C4: X4 <= 1 C5: X5 <= 1 C6: X1 + X2 <= 1

8 C7: X1 + X3 <= 1 C8: X1 + X5 <= 1 C9: X4 + X5 <= 1 Bounds 0 <= X1 <= 1 0 <= X2 <= 1 0 <= X3 <= 1 0 <= X4 <= 1 0 <= X5 <= 1 Binaries X1 X2 X3 X4 X5

The solution is as follows:

Tried aggregator 2 times. MIP Presolve eliminated 6 rows and 2 columns. Aggregator did 3 substitutions. All rows and columns eliminated. Presolve time = -0.00 sec.

Solution pool: 1 solution saved.

MIP - Integer optimal solution: Objective = 3.0000000000e+000 Solution time = 0.00 sec. Iterations = 0 Nodes = 0

The maximum cardinality of the first subset contains three items, as shown by the optimal objective function value above. Since all 5 items were not assigned, we will increase the number of subsets from k = 1 to k = 2 and continue the process.

Re-Enter ‘ILP6’ at the command prompt > ILP6

This will yield the following prompts (typed replies follow immediately). Note there is now an extra prompt because we must put in the constraint that the first subset must contain three objects!

> PLEASE INPUT NUMBER OF ITEMS > 5 > PLEASE INPUT NUMBER OF CLUSTERS 2 TO 10 > 2 > PLEASE INPUT THE PROXIMITY THRESHOLD > 7 > INPUT NUMBER OF ITEMS IN GROUP 1 > 3

This generates the file ‘CF.MPS’, which is the CPLEX input file. CPLEX is initiated and the file is read. Notice that this formulation is bigger. Another n variables (X6 to X10) that correspond to possible assignments of objects to the second subset are necessary. Also notice that only X6 to X10 are in the objective function because the goal is to maximize the size of the second subset – the size of the first subset is constrained to be 3, as shown by the constraint X1+X2+X3+X4+X5 = 3 in the formulation.

9 Specified objective sense: MAXIMIZE Selected objective name: OBJECTIV Selected RHS name: RHS Selected bound name: BOUND

Problem 'c:\isp\cf.mps' read. Read time = 0.00 sec. Maximize OBJECTIV: X6 + X7 + X8 + X9 + X10 Subject To C1: X1 + X6 <= 1 C2: X2 + X7 <= 1 C3: X3 + X8 <= 1

C4: X4 + X9 <= 1 C5: X5 + X10 <= 1 C6: X1 + X2 <= 1 C7: X6 + X7 <= 1 C8: X1 + X3 <= 1 C9: X6 + X8 <= 1 C10: X1 + X5 <= 1 C11: X6 + X10 <= 1 C12: X4 + X5 <= 1 C13: X9 + X10 <= 1 C14: X1 + X2 + X3 + X4 + X5 = 3 Bounds 0 <= X1 <= 1 0 <= X2 <= 1 0 <= X3 <= 1 0 <= X4 <= 1 0 <= X5 <= 1 0 <= X6 <= 1 0 <= X7 <= 1 0 <= X8 <= 1 0 <= X9 <= 1 0 <= X10 <= 1 Binaries X1 X2 X3 X4 X5 X6 X7 X8 X9 X10

The solution is as follows:

Tried aggregator 2 times. MIP Presolve eliminated 12 rows and 8 columns. MIP Presolve modified 3 coefficients. Aggregator did 2 substitutions. All rows and columns eliminated. Presolve time = -0.00 sec.

Solution pool: 1 solution saved.

MIP - Integer optimal solution: Objective = 2.0000000000e+000 Solution time = 0.00 sec. Iterations = 0 Nodes = 0

10 The maximum cardinality of the second subset contains two items, as shown by the optimal objective function value above. Since the first subset contains the other three items, we are done! We can display the binary solution variables to see which items are assigned to which subset.

Incumbent solution Variable Name Solution Value X2 1.000000 X3 1.000000 X5 1.000000 X6 1.000000 X9 1.000000 All other variables in the range 1-10 are 0.

Since X2 = X3 = X5 = 1, items 2, 3, and 5 are assigned to subset 1. Since X6 = X9 = 1, we know that items (6-5) = 1 and (9-5) = 4 are assigned to subset 2. Notice from ‘smatrix’ that a14 = 8, which equals or exceeds  = 7. Also, a23 = 7, a25 = 8, and a35 = 12, which all equal or exceed  = 7. So, the optimal partition consists of one subset {2, 3, 5} with maximum cardinality of 3, and second subset {1, 4} with cardinality of 2.

It is interesting to notice that there is another possible maximum cardinality subset of size three; however, it will not allow a second subset of size 2 to be extracted. Consider the subset

{2, 3, 4}, which satisfies the  constraint for all pairs. However, because a15 = 5, which is less than  = 7, we cannot form a second subset of cardinality of 2.

11