Spanning Trees with many Leaves in Regular Bipartite Graphs.

Emanuele G. Fusco and Angelo Monti

Dipartimento di Informatica “Sapienza”, Universit`adi Roma , via Salaria, 113-00198 Rome, Italy {fusco, monti}@di.uniroma1.it

Abstract. Given a d-regular Gd, whose nodes are di- vided in black nodes and white nodes according to the partition, we consider the problem of computing the spanning of Gd with the maximum number of black leaves. We prove that the problem is NP hard for any fixed d ≥ 4 and we present a simple greedy algorithm that gives a constant approximation ratio for the problem. More precisely our al- gorithm can be used to get in linear time an approximation ratio of 2 − 2/(d − 1)2 for d ≥ 4. When applied to cubic bipartite graphs the algorithm only achieves a 2-approximation ratio. Hence we introduce a local optimization step that allows us to improve the approximation ratio for cubic bipartite graphs to 1.5. Focusing on structural properties, the analysis of our algorithm proves a lower bound on lB (n, d), i.e., the minimum m such that every Gd with n black nodes has a spanning tree with at least m black leaves. In n particular, for d = 3 we prove that lB (n, 3) is exactly ˚ 3 ˇ + 1.

1 Introduction

The problem of finding spanning trees with many leaves has been thoroughly investigated [1,2,4–6,8–11,13–15,17]. It is known to be NP hard [4]. Lu and Ravi [14, 15] provided 3-approximation algorithms and a 2-approximation al- gorithm was presented by Solis-Oba [17]. It is known that the problem remains NP hard even if the input is restricted to d-regular graphs for any fixed d ≥ 3 [11]. A7/4 approximation algorithm for cubic graphs is presented in [13]. Finding ap- proximation algorithms with ratio less than 2 for d-regular graphs remains an open problem for d ≥ 4. The NP hardness of the optimization problem leads to seek constructive proofs for related extremal problems. A constructive proof that all graphs in a particular class have spanning trees with at least m leaves becomes an algorithm to produce such a tree for graphs in this class. Let l(n, d) be the maximum integer m such that every connected n-vertex graph with minimum vertex degree at least d has a spanning tree with at least m leaves. The value l(n, d) is known for d ≤ 5. Trivially l(n, 2) = 2. Storer [18] proved that l(n, 3) = ⌈n/4+2⌉. Griggs 2 8 and Wu [9] and Kleitman and West [10] proved l(n, 4) = 5 n + 5 . In [9] it is   3 also proved that l(n, 5) = 6 n +2 . For d ≥ 6 the exact value of l(n, d) remains unknown. For more information about this topic see [3].   In [16] a variation of the maximum leaf spanning tree problem has been introduced. This variation restricts the problem to bipartite graphs and asks to find a spanning tree having the maximum number of leaves in one of the partited set. We call nodes in this set black nodes, and white nodes those in the other. In [12] it is proved that this variation of the problem is NP hard for planar bipartite graphs. In this paper we study the variation of the problem proposed in [16] restricted to the class of regular bipartite graphs. We prove that the problem is NP hard for d-regular bipartite graphs for any fixed d ≥ 4. We remark that our proof of NP hardness relies on a construction involving non planar regular bipartite graphs. It remains an open question to determine if the problem is NP hard for regular planar bipartite graphs. We present greedy algorithms working in linear time that find, for any d- regular graph, a spanning tree having a constant fraction of the maximum num- ber of black leaves. Our algorithm for d-regular bipartite graphs provides an approximation ratio of 2 − 2/(d − 1)2. The analysis of the performance ratio is based on the assumption that d ≥ 4: in order to reach approximation ratio 1.5 on cubic bipartite graphs we present a refinement on our base algorithm based on local optimization. Define lB(n, d) as the maximum m such that every d-regular bipartite graph with n black nodes has a spanning tree with at least m black leaves. Trivially lB(n, 2) = 1. We prove that lB(n, 3) = ⌈n/3⌉ + 1. For d ≥ 4 the exact value of lB(n, d) remains unknown, however we provide upper and lower bounds. More 2 d−1 (d−1) d−2 precisely we prove: 2d n + 2d ≤ lB(n, d) ≤ d n + 1. l m   2 Preliminaries

In this section we introduce some terminology and notation. Let G be a graph; we use V (G) to denote the set of nodes in G and E(G) to denote the set of edges in G. For a node v in V (G), ΓG(v) denotes the set of neighbours of v in G. We denote by Gd a d-regular bipartite graph. We use colors black and white to identify the two sets of the partition. As in any regular bipartite graph the number of black nodes is equal to the number of white nodes, we use the letter n to denote the number of black nodes in Gd (clearly, the total number of nodes in Gd is equal to 2n). Given a spanning tree T of Gd, λi(T ), 1 ≤ i ≤ d, denotes the number of black nodes of degree i in T . We omit T when it is clear form the context.

Lemma 1. Let T be a spanning tree of Gd, it holds

d

λ1(T )=1+ (i − 2)λi(T ) (1) i=3 X d Proof. As T spans all the nodes in Gd, it holds that i=1 λi = n. Any edge in T has endpoints in one black node and one white node, so the total number of edges is given by the sum of the degree of the black (orP white) nodes, giving that d i=1 iλi =2n − 1. From these two equations we obtain Equation 1. ⊓⊔ P 3 Regular Bipartite Graphs

We start our analysis from the general case of regular bipartite graphs, while on Sec. 4 we focus on cubic graphs to refine our results in this restricted case.

(d−2)n+1 Lemma 2. Let T be a spanning tree of Gd, then λ1(T ) ≤ d−1 . j k d Proof. From Equation 1, as i=1 λi = n, we have that λ1 is maximized when λ1 + λd = n and λ2, λ3,...,λd−1 are all 0. As we search for integer solutions, (d−2)n+1 P λ1 ≤ d−1 and the thesis follows. ⊓⊔

We now describe the algorithm span that, given a graph Gd, produces a spanning tree TA for Gd. The algorithm first builds a forest F, then it connects the trees in F and the isolated nodes to form TA. Every tree Ti in F is built by first choosing a black node v such that ΓGd (v) ∩ V (F) = ∅. Each tree Ti is augmented as long as a new black node w with at most d−2 neighbours in Ti can be found. When a tree Ti cannot be augmented, the algorithm starts building a new tree Ti+1. In Algorithm 1 we formalize the algorithm span by providing its pseudo code.

Algorithm 1 span(Gd) 1: F←∅ 2: i ← 1

3: while ∃ a black node v ∈ Gd \ V (F) such that ΓGd (v) ∩ V (F)= ∅ do

4: E(Ti) ←{(v, x) such that x ∈ ΓGd (v)}

5: V (Ti) ←{v} ∪ ΓGd (v) 6: while ∃ a black node w ∈ Gd \ (V (F) ∪ V (Ti)) and a white node y ∈

V (Ti) such that |ΓGd (w) ∩ (V (F) ∪ V (Ti)) | ≤ d − 2 and (w,y) ∈ E(Gd) do

7: E(Ti) ← E(Ti) ∪{(w,y)}∪{(w,z) such that z ∈ ΓGd (w) \ ΓTi (w)}

8: V (Ti) ← V (Ti) ∪{w} ∪ ΓGd (w) 9: F←F∪ Ti 10: i ← i +1 11: build TA by connecting F and all the nodes in V (Gd) − V (F) in a tree 12: return TA Lemma 3. For any Gd with d ≥ 4 there exists a spanning tree TA such that d − 1 (d − 1)2 λ (T ) ≥ n + 1 A 2d 2d   Proof. The proof of this lemma consists in the performance evaluation of Algo- rithm 1. Let T1,T2,...,Tk be the trees built by the algorithm with input Gd. Any black node in a tree Ti has degree at least 3 and all the black nodes that do not belong to any tree have at least d − 1 neighbours belonging to one tree Th for some value 1 ≤ h ≤ k. Let bi be the number of black nodes in V (Gd) \ V (F) having at least d − 1 neighbours in Ti. Obviously,

k d n = bi + λj (Ti) (2)   i=1 j=3 X X   d − d(1+Pj=3(j 2)λj (Ti)) Now we bound the number bi with d−1 by considering that Ti d   d contains 1 + j=3(j − 1)λj(Ti) white nodes and hence there are d(1 + j=3(j − 2)λj (Ti)) edges from nodes in V (Ti) to nodes in V (Gd) \ V (Ti). P d − − P dk+Pj=3(dj d 1)λj (TA) From Equation 2 we have n ≤ d−1 and as in the forest each tree Ti has at least one node of degree d (i.e. the root) we have that

d dλd(TA)+ (dj − d − 1)λj (TA) n ≤ j=3 (3) P d − 1 2 If d ≥ 4, d − 4d +1 > 0, and as λd(TA) ≥ 1 the following chain of disequa- tions holds:

d dλd(TA)+ (dj − d − 1) λj (TA)= j=3 X d d dλd(TA)+ 2d (j − 2) λj (TA) − ((j − 3)d + 1) λj (TA) ≤ j=3 j=3 X X dλd(TA)+2d (λ1(TA) − 1) − ((d − 3)d + 1) λd(TA)= 2 2dλ1(TA) − 2d − λd(TA)(d − 4d + 1) ≤ 2 2dλ1(TA) − 2d − (d − 4d +1)= 2 2dλ1(TA) − (d − 1) Hence from Equation 3 we have 2dλ (T ) − (d − 1)2 n ≤ 1 A d − 1 and the thesis follows. ⊓⊔ Combining Lemma 3 and Lemma 2 we can state the following theorem:

Theorem 1. The problem of finding a Spanning Tree with the maximum number of black leaves for a d-regular bipartite graph Gd, d ≥ 4, can be approximated by an algorithm running in linear time with approximation ratio ≤ 2 − 2/(d − 1)2.

∗ Proof. Let T be a spanning tree of Gd with the maximum number of black leaves and let TA be the spanning tree of Gd produced by Algorithm 1. From Lemma 2 ∗ (d−2)n+1 d−1 we have that λ1(T ) ≤ d−1 and from Lemma 3 we have λ1(TA) ≥ 2d n+ 2 ∗ (d−1) λj1(T ) k d(d−2)n+d d(d−2) 2 . It follows that ≤ 2 2 3 < 2 2 =2 − 2 . ⊓⊔ 2d λ1(TA) (d−1) n+(d−1) (d−1) (d−1)   4 Cubic Bipartite Graphs

Algorithm 1 can obviously be applied to a graph G3. An analysis for the case d = 3 gives λ1(TA) ≥ n/4 and combining this with Lemma 2 we obtain an approximation ratio of 2. Now consider the example in Figure 1: the graph in the example is a necklace composed by l repetitions of the same block. A run of Algorithm 1 can produce a forest where every tree Ti contains a black node only (thick edges in figure represent the edges in the forest). In such a case, λ3 = l and so λ1 = l + 1, while the optimum solution can assign degree 3 to all but one the black nodes on the bottom, thus achieving 2l black leaves.

Fig. 1. Worst case example for Algorithm 1 on cubic graphs.

As suggested by the example above, in order to improve the approximation ratio, we modify Algorithm 1 by adding a procedure that tries to reduce the number of trees with only one black node in F. We call span3 the modified version of Algorithm 1. If a tree Ti has only one black node at the of the while loop of line 6 of Algorithm 1, span3 searches in Gd \F for the pattern depicted in figure 2.a. If such a pattern is present, Ti is destroyed and rebuilt starting from node x. Then, the augmenting process of lines 6 to 8 of the original algorithm is applied, resulting in a tree with at least 2 black nodes. Ti Ti

x y

z t a b

Fig. 2. a) Pattern for local optimization in Algorithm span3. b) Performance analysis case.

Lemma 4. For any G3 there exists a spanning tree TA such that λ1(TA) ≥ n 3 +1. Proof. The proof of this lemma consists in the performance evaluation of Algo- rithm span3. In the following we assume G3 has at least 5 black nodes (if n =3 or n = 4, the lemma trivially holds as any spanning tree having one black node of degree 3 and 2 black leaves is optimal). Let T1,T2,...,Tk be the trees built by Algorithm span3 with input G3. All black nodes in V (F) have degree 3 while black nodes in V (G3) \ V (F) have 2 neighbors in some tree Ti ∈ F. Let bi be the number of black nodes outside F that have at least 2 neighbors in the tree Ti ∈F. It holds that

k

n = bi + λ3(Ti) i=1 X 

Now we prove that bi ≤ 2λ3(Ti) for all 1 ≤ i ≤ k: notice that this is enough to prove the lemma since it implies that n ≤ 3λ3(TA)=3λ1(TA) − 3. 3λ3(Ti)+3 With the same reasoning applied in Lemma 3, we can bound bi with 2 .

If λ3(Ti) ≥ 2 it holds bi ≤ 2λ3(Ti) so the only case we need to analyzej is when k λ3(Ti)=1. Assume by contradiction that b1 = 3: there must be 3 black nodes, say x, y, and z, such that each of them has two white neighbors in Ti. Now consider the set W of white neighbors of x, y, and z outside Ti. It cannot be |W |≥ 2 as otherwise the pattern of figure 2.a would have been detected by span 3 and Ti would have been a tree with at least 2 black nodes. On the other hand, it cannot be |W | = 1 as the only G3 where this happens is the one depicted in figure 2.b: this graph has 4 black nodes only while we assumed n ≥ 5. It follows that bi ≤ 2 thus completing the proof. ⊓⊔

Remark 1. The lower bound given in Lemma 4 is tight. For a given value n we can build a graph composed by l = ⌊n/3⌋ subgraphs closed on a necklace. The fist l − 1 subgraphs are repetitions of k3,3 − e while the last one can have 3, 4 or 5 black and white nodes depending on the value of n mod 3 (see figure 3 for an example where n mod 3 = 0). Any spanning tree has to assign degree greater than 1 to at least 2 black nodes in any subgraph but one. It follows that λ1 ≤ n − 2 ⌊n/3⌋+1= ⌈n/3⌉+ 1.

ε Fig. 3. G3.

Theorem 2 below immediately follows from Lemma 2 and Lemma 4.

Theorem 2. The problem of finding a Spanning Tree with the maximum number of black leaves for a cubic bipartite graph G3 can be approximated by an algorithm running in linear time with approximation ratio ≤ 3/2.

Remark 2. The analysis of Algorithm span3 is tight. Consider the example in figure 4: the graph in the example is a necklace composed by l repetitions of the same block. A run of algorithm span3 can produce a forest where every tree Ti contains two black nodes only (thick edges in figure represent the edges in the forest). In such a case, λ3 =2l and so λ1 =2l + 1, while the optimum solution can assign degree 3 to all but one the black nodes on the bottom, thus achieving 3l black leaves.

Fig. 4. Worst case example for Algorithm span3. 5 NP Hardness

In this section we prove that the problem of finding a spanning tree with the maximum number of black leaves in a 4-regular bipartite graph is NP-hard. Details of the extension of the proof for any fixed d ≥ 4 are omitted. Our proof relies in a reduction to a restricted version of the well known NP-complete problem 3-exact cover. 3-exact cover (in short 3EC) requires, given a universe U and a collection S of 3-subsets of U, to determine if there exists a subcollection ′ S of pairwise disjoint sets in S that forms a partition of U. We consider instances of 3EC where each element of U occurs in exactly three subsets of S and |U| = 4 · 3i, i ≥ 1. We denote 3EC∗ this restricted version of 3EC. It is known that 3EC remains NP-complete when each element of U occurs in at most 3 subsets of S (see, e.g., [7] pag. 222). From this fact it is a simple exercise to prove that 3EC is polynomially reducible to 3EC∗. Hence we have: Lemma 5. 3EC∗ is NP complete.

Theorem 3. The problem of determining, given G4, if there exists a spanning tree T of G4 such that λ2(T )= λ3(T )=0 is NP complete. Proof. Starting from any instance I of 3EC∗, we construct in polynomial time a graph G4, and the thesis follows from the fact that G4 admits a spanning tree T with λ2(T )= λ3(T ) = 0 if and only if I admits a solution. To build G4, we create a black node for each set in S. We add new nodes to form a tree whose leaves are the 4 · 3i black nodes representing sets in S. The tree is made by white nodes of degree 4 and internal black nodes of degree 2. More precisely the tree is rooted in a white node; even levels contain white nodes while odd levels contain black nodes. By construction, the tree will have 2i +1 levels and at level 2j + 1, j ≥ 0 there will be 4 · 3j black nodes (i.e., the levels with black nodes have an even number of nodes). Then we create a white node for each element of the universe U and we put an edge between the white node representing element ui ∈ U and the black leaf representing set Sj ∈ S if ui ∈ Sj .

a b

Fig. 5. a) Gadget Ga1. b) Gadget Ga2.

To complete the construction, we connect each of the white nodes represent- ing elements in U to the outgoing edges of a gadget Ga1 (see figure 5.a) and we connect each of the black nodes with degree 2 with a gadget Ga2 twice (see figure 5.b): this operation can always be done as the number of internal black nodes in the tree is even. The resulting graph is outlined in figure 6.

Ga2 Ga2

Ga2 Ga2 Ga2 Ga2 Ga2 Ga2

S

U

Ga1 Ga1 Ga1 Ga1 Ga1 Ga1

Fig. 6. Construction of graph for the reduction. Edges between black nodes representing sets in S and white nodes representing elements of U are missing.

Given a solution for I, it is easy to derive a spanning tree for G4, having black nodes of degree 4 and 1 only, by assigning degree 4 to all the black nodes connected to gadgets Ga2 and to the black nodes representing sets forming the ∗ solution for I. On the other hand, if a spanning tree T of G4 exists having black nodes of degree 4 and 1 only, a solution for I can always be found. First notice that in any such T ∗, all the black nodes connected to gadgets Ga2 must have degree 4, as nodes inside the gadgets must be connected to the tree and black nodes inside the gadgets cannot be assigned degree 4 without resulting in some black node of degree 2 or 3. As a result, all the black nodes representing sets of ∗ S in G4 that have degree 4 in T are connected from above, thus they cannot be adjacent in T ∗ to the same white node representing an element of U without creating a loop (i.e., if we consider the sets represented by black nodes of degree 4 in T ∗, they are pairwise disjoints). Finally, all the white nodes representing elements in U need to be connected from above in T ∗, as connecting a white node from below means passing trough a gadget Ga1, thus producing at least a black node of degree 2 or two of degree 3. It follows that starting from T ∗, a solution for I can be found by taking all sets of S corresponding to black nodes of degree 4 in T ∗. ⊓⊔

Corollary 1. The problem of finding a spanning tree with the maximum number of black leaves is NP hard for 4-regular bipartite graphs. Proof. Define the BLST problem as follows: given Gd and an integer k, determine if it exists a spanning tree T of Gd such that λ1(T ) ≥ k. It is easy to show that BLST on 4-regular bipartite graphs is NP complete. Indeed, from Theorem 3, we know that it is NP complete to determine if a given G4 has a spanning tree T such that λ2(T )= λ3(T ) = 0. It follows that the BLST problem is NP complete (d−2)n+1 when k = d−1 . Having an algorithm that resolves in polynomial time the problem of finding a spanning tree of a given G4 with the maximum number of black leaves, would allow us to solve BLST in polynomial time, thus proving the problem to be NP hard. ⊓⊔ All the proofs in this section can be extended to any value d ≥ 5. We omit the details. Theorem 4. The problem of finding a spanning tree with the maximum number of black leaves is NP hard for d-regular bipartite graphs for any fixed d ≥ 4.

6 Conclusions and Open Questions

Spanning trees of connected graphs are a major topic of research in the area of graph algorithms. In this paper we studied the problem of finding a spanning tree with the maximum number of black leaves in regular bipartite graphs. We proved that the problem is NP hard for any fixed d ≥ 4 and we presented a linear time algorithm that achieve approximation ratio 2 − 2/ (d − 1)2. It is an interesting question whether this problem is NP hard or polynomial time solvable in the case of cubic bipartite graphs. Our contribution for this class of graphs is a linear time algorithm with approximation ratio 1.5. Our proof of NP hardness relies on a construction involving regular bipartite graphs that are non planar. It remains an open question to determine if the problem remains NP hard for regular planar graphs also. In [12] it is shown that the problem is NP hard for planar non regular bipartite graphs. Finally, define lB(n, d) to be the maximum m such that every Gd with n black nodes has a spanning tree with at least m black leaves. Obviously lB(n, 2) = 1. n From Lemma 4 and Remark 1 we have l(n, 3) = ⌈ 3 ⌉ + 1. It would be interesting to determine precisely the value lB(n, d) for any d ≥ 4. We know that

d − 1 (d − 1)2 d − 2 n + ≤ l (n, d) ≤ n +1 2d 2d B d & '   where the lower bound follows from Lemma 4 and the upper bound can be obtained by generalizing Remark 1, using as a building block for the necklace the graph kd,d − e.

References

1. H. L. Bodlaender. On linear time minor tests and depth first search. In Frank K. H. A. Dehne, J¨org-R¨udiger Sack, and Nicola Santoro, editors, WADS, volume 382 of Lecture Notes in Computer Science, pages 577–590. Springer, 1989. 2. Paul S. Bonsma, Tobias Br¨uggemann, and Gerhard J. Woeginger. A faster fpt algorithm for finding spanning trees with many leaves. In Branislav Rovan and Peter Vojt´as, editors, MFCS, volume 2747 of Lecture Notes in Computer Science, pages 259–268. Springer, 2003. 3. Yair Caro, Douglas B. West, and Raphael Yuster. Connected domination and span- ning trees with many leaves. SIAM Journal on Discrete , 13(2):202– 211, 2000. 4. Guoli Ding, Thor Johnson, and Paul Seymour. Spanning trees with many leaves. Journal of , 37:189–197, 2001. 5. Tetsuya Fujie. The maximum-leaf spanning tree problem: Formulations and facets. Networks, 43(4):212–223, 2004. 6. G. Galbiati, F. Maffioli, and A. Morzenti. A short note on the approximability of the maximum leaves spanning tree problem. Inf. Process. Lett., 52(1):45–49, 1994. 7. Michael R. Garey and David S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979. 8. Jerrold R. Griggs, Daniel J. Kleitman, and Aditya Shastri. Spanning trees with many leaves in cubic graphs. Journal of Graph Theory, 13(6):669–695, 1989. 9. Jerrold R. Griggs and Mingshen Wu. Spanning trees in graphs of minimum degree 4 or 5. Discrete Math., 104(2):167–183, 1992. 10. Daniel J. Kleitman and Douglas B. West. Spanning trees with many leaves. SIAM Journal on Discrete Mathematics, 4(1):99–106, 1991. 11. Paul Lemke. The maximum leaf spanning tree problem for cubic graphs is np- complete. Technical Report IMA Preprint Series # 428, University of Minnesota, July 1988. 12. P. C. Li and M. Toulouse. Variations of the maximum leaf spanning tree problem for bipartite graphs. Inf. Process. Lett., 97(4):129–132, 2006. 13. Krzysztof Lorys and Grazyna Zwozniak. Approximation algorithm for the maxi- mum leaf spanning tree problem for cubic graphs. In Rolf H. M¨ohring and Rajeev Raman, editors, ESA, volume 2461 of Lecture Notes in Computer Science, pages 686–697. Springer, 2002. 14. Hsueh-I Lu and R. Ravi. The power of local optimizations: Approximation al- gorithms for maximum-leaf spanning tree (draft)*. Technical Report CS-96-05, 1996. 15. Hsueh-I Lu and R. Ravi. Approximating maximum leaf spanning trees in almost linear time. J. Algorithms, 29(1):132–141, 1998. 16. M. Sohel Rahman and M. Kaykobad. Complexities of some interesting problems on spanning trees. Inf. Process. Lett., 94(2):93–97, 2005. 17. Roberto Solis-Oba. 2-approximation algorithm for finding a spanning tree with maximum number of leaves. In Gianfranco Bilardi, Giuseppe F. Italiano, Andrea Pietracaprina, and Geppino Pucci, editors, ESA, volume 1461 of Lecture Notes in Computer Science, pages 441–452. Springer, 1998. 18. J.A. Storer. Constructing full spanning trees for cubic graphs. Inf. Process. Lett., 13:8–11, 1981.