Independent B-Matching Approximation Algorithm With

Independent b-matching Approximation Algorithm

with Applications to Peer-to-Peer Networks

A thesis submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

Master of Science

in the Department of Electrical Engineering and Computer Science

of the College of Engineering

Christopher S. Ochs

B.S. Computer Science University of Cincinnati

June 2020

Committee Chair:

Kenneth A. Berman, Ph.D. Fred S. Annexstein, Ph.D. Anca L. Ralescu, Ph.D.

Abstract

The analysis of graphs and finding matchings and its generalization to b-matchings in a graph have numerous real-world applications. Given a mapping b of the vertex set of a graph G to the nonnegative integers, a perfect b-matching is a mapping M from the edge set to the nonnegative integers such the sum of the b-values over all edges incident with each vertex v equals b(v). Two b-matchings are independent if their supports, i.e., edges with nonzero b- values, are disjoint. In this thesis we show that the problem of determining the existence of two independent perfect b-matchings is NP-complete, even when restricted to the complete graph.

However, in the case where no b-value is too large, i.e., at most one-sixth the sum of the b-values over all the vertices, we present an O(n log n) algorithm for computing two independent perfect b-matchings in the complete graph, where n denotes the number of vertices.

In addition, we present an efficient approximation algorithm b-dominator for maximum b-matchings with application to computing an approximation to independent maximum b- matchings. In this thesis, we report on empirical research we conducted on how good of an approximation was achieved. Our results show that algorithm b-dominator performed best when the variance in the degree of the vertices is low, the average degree is sufficiently large and the

2-hop cover of the vertices is evenly distributed over the graph. Finally, we discuss the possibility of using algorithm b-dominator for reliable peer-to-peer network backup.

Acknowledgements

This opportunity to work on this problem and perform research for this thesis has been a great experience. The journey has been full of excitement, as I have had the pleasure to deeply study my favorite area of computer science, graph data structure applications.

First off, I would like to thank my advisor, Dr. Kenneth Berman, whose help and guidance has allowed me to succeed in this research. The great amount of patience and persistence he showed to help me understand and thrive in this research, I will forever be thankful.

I would also like to thank Dr. Fred Annexstein and Dr. Anca Ralescu. As members of the committee they have provided excellent perspectives and key insights that have been invaluable in the completion of this thesis.

I would like to thank my classmates for always being available for brainstorming ideas and willing to discuss my results. All the knowledge, perspectives and help has been crucial in the completion of the empirical research.

I would finally like to thank my significant other, Kayla. Through a very challenging and stress filled year, she has helped guide me with compassion and grace.

Thank you all…

Table of Contents

Chapter 1: Introduction ...... 10

1.1 Overview ...... 10

1.2 Goals...... 11

1.3 Organization of Remaining Chapters ...... 11

Chapter 2: Background ...... 13

2.1 Graphs ...... 13

2.2 Graph Types ...... 14

2.2.1 Complete Graph ...... 14

2.2.2 Traditional Random Graphs...... 14

2.2.3 Barabási-Albert Graph ...... 15

2.2.4 Watts-Strogatz Small World ...... 17

2.3 Sum of Subsets ...... 19

2.4 Approximation Algorithms...... 20

2.5 Back-up Schemes ...... 21

2.6 B-matching ...... 22

Chapter 3: b-matching ...... 27

3.1 b-dominator Approximation Algorithm ...... 27

3.2 Empirical Analysis of b-dominator ...... 30

3.2.1 Empirical Analysis Setup ...... 30

3.2.2 Analysis of Complete Graph ...... 31

3.2.3 Analysis of Random Graph ...... 32

3.2.4 Analysis of Barabási-Albert Graph ...... 35

3.2.5 Analysis of Watts–Strogatz Graph ...... 37

3.2.6 Analysis of Special Cases ...... 40

3.3 Discussion ...... 42

Chapter 4: Difficulty of Independent Matchings ...... 45

Chapter 5: Empirical Analysis of b-dominator for Independent Matchings...... 52

5.1 b-dominator for Independent b-matchings ...... 52

5.2 Empirical Analysis of Independent b-matchings ...... 53

5.2.1 Empirical Analysis Setup ...... 53

5.2.2 Analysis of Complete Graph ...... 54

5.2.3 Analysis of Random Graph ...... 55

5.2.4 Analysis of Barabási-Albert Graph ...... 58

5.2.5 Analysis of Watts-Strogatz Small World ...... 59

5.2.6 Analysis of Special Cases ...... 63

5.3 Discussion ...... 64

Chapter 6: Conclusion ...... 66

Chapter 7: Further Research ...... 68

List of Figures

FIGURE 1 - BARABÁSI-ALBERT PREFERENTIAL ATTACHMENT GRAPH GENERATED USING NETWORKX ...... 16

FIGURE 2 - RING GRAPH N=10 K=6 CREATED USING NETWORKX ...... 18

FIGURE 3 - REWIRING OF RING GRAPH IN FIGURE 2 CREATED USING NETWORKX ...... 18

FIGURE 4 - RANDOM GRAPH N=5 M=8 WITH B-VALUES CREATED USING NETWORKX ...... 22

FIGURE 5 - A PERFECT B-MATCHING OF FIGURE 2'S GRAPH CREATED USING NETWORKX ...... 23

FIGURE 6 - STRUCTURE OF PULLEYBLANK'S ALGORITHM CREATED BY MÜLLER-HANNEMANN AND SCHWARTZ ...... 25

FIGURE 7 - EDGE PROBABILITY VS RECOVERY PERCENTAGE IN PROBABILITY BASED RANDOM GRAPH ...... 33

FIGURE 8 - EDGE COUNT VS RECOVERY PERCENTAGE IN EDGE BASED RANDOM GRAPH ...... 34

FIGURE 9 - INITIAL ATTACHMENTS VS RECOVERY PERCENTAGE BARABÁSI-ALBERT ...... 37

FIGURE 10 - REWIRING PROBABILITY VS RECOVERY PERCENTAGE WATTS-STROGATZ SMALL WORLD GRAPH ...... 38

FIGURE 11 - COMPLETE GRAPH INDEPENDENT B-MATCHING RECOVERY PERCENTAGE ...... 55

FIGURE 12 - PROBABILITY BASED RANDOM GRAPH INDEPENDENT B-MATCHING RECOVERY PERCENTAGE ...... 56

FIGURE 13 - EDGE BASED RANDOM GRAPH INDEPENDENT B-MATCHING RECOVERY PERCENTAGE ...... 57

FIGURE 14 - BARABÁSI-ALBERT GRAPH INDEPENDENT B-MATCHING RECOVERY PERCENTAGE ...... 59

FIGURE 15 - WATTS-STROGATZ GRAPH WITH INITIAL DEGREE 10, INDEPENDENT B-MATCHING RECOVERY PERCENTAGE ...... 61

FIGURE 16 - WATTS-STROGATZ GRAPH WITH INITIAL DEGREE 15, INDEPENDENT B-MATCHING RECOVERY PERCENTAGE ...... 61

FIGURE 17 - WATTS-STROGATZ GRAPH WITH INITIAL DEGREE 20, INDEPENDENT B-MATCHING RECOVERY PERCENTAGE ...... 62

FIGURE 18 - WATTS-STROGATZ GRAPH WITH INITIAL DEGREE 25, INDEPENDENT B-MATCHING RECOVERY PERCENTAGE ...... 62

List of Equations

EQUATION 1 - SET PARTITION INTO SUBSETS OF EQUAL SUM ...... 19

EQUATION 2 - DEFINITIONS FOR B-DOMINATOR ALGORITHM ...... 28

EQUATION 3 - PSEUDO CODE FOR B-DOMINATOR APPROXIMATION ALGORITHM ...... 29

EQUATION 4 - PSEUDO CODE FOR B-DOMINATOR APPLIED TO INDEPENDENT MATCHINGS ...... 52

List of Tables

TABLE 1 - COMPLETE GRAPH B-DOMINATOR PERFORMANCE RESULTS ...... 32

TABLE 2 - PROBABILITY-BASED RANDOM GRAPH PERFORMANCE RESULTS ...... 33

TABLE 3 - EDGE BASED RANDOM GRAPH B-DOMINATOR PERFORMANCE RESULTS ...... 34

TABLE 4 - BARABÁSI-ALBERT PREFERENTIAL ATTACHMENT GRAPH B-DOMINATOR PERFORMANCE RESULTS ...... 36

TABLE 5 - SMALL WORLD GRAPH B-DOMINATOR PERFORMANCE RESULTS...... 39

TABLE 6 - NEAR-COMPLETE GRAPH B-DOMINATOR PERFORMANCE RESULTS ...... 41

TABLE 7 - RING GRAPH B-DOMINATOR PERFORMANCE RESULTS ...... 41

TABLE 8 - PERFORMANCE OF B-DOMINATOR WITH A DOMINATING VERTEX ...... 42

TABLE 9 - COMPLETE GRAPH PERFORMANCE OF B-DOMINATOR FOR FINDING INDEPENDENT B-MATCHINGS ...... 54

TABLE 10 - PROBABILITY BASED RANDOM GRAPH PERFORMANCE OF B-DOMINATOR FOR FINDING INDEPENDENT B-MATCHINGS ...... 56

TABLE 11 - EDGE BASED RANDOM GRAPH PERFORMANCE OF B-DOMINATOR FOR FINDING INDEPENDENT B-MATCHINGS...... 57

TABLE 12 - BARABÁSI-ALBERT GRAPH PERFORMANCE OF B-DOMINATOR FOR FINDING INDEPENDENT B-MATCHINGS ...... 58

TABLE 13 - WATTS-STROGATZ GRAPH PERFORMANCE OF B-DOMINATOR FOR FINDING INDEPENDENT B-MATCHINGS ...... 60

TABLE 14 - NEAR-COMPLETE GRAPH PERFORMANCE OF B-DOMINATOR FOR FINDING INDEPENDENT B-MATCHINGS ...... 63

TABLE 15 - RING GRAPH PERFORMANCE OF B-DOMINATOR FOR FINDING INDEPENDENT B-MATCHINGS ...... 64

Chapter 1: Introduction

1.1 Overview

Given a simple graph 퐺 = (푉, 퐸) with a vertex set 푉 of size 푛 and an edge set 퐸 of size 푚.

Throughout this paper, we will always assume that 퐺 has no self-loops or multiple edges. Let

푏(푣) be the b-value of a vertex 푣 and 푏(푉) be the sum of all the b-values over the vertex set. Let

푏 represent the mapping from 푉 to the nonnegative integers. Given a mapping 푀 from 퐸 to the non-negative integers, let 푀(푉) = ∑푣푤∈퐸 푀(푣푤). A b-matching is a mapping such that for all

푣 ∈ 푉 the following conditions holds, 푀(푣) ≤ 푏(푣). A b-matching is considered a perfect b- matching if for all 푣 ∈ 푉 푀(푣) = 푏(푣).

A b-matching has an obvious application in a “peer-to-peer backup scheme”. Consider a peer-to-peer distributed network where peers exchange equal size blocks of data with other peers to create a backup of this data. A scheme modeled this way is fair because each peer stores exactly the same amount of data it sends out for backup. A perfect block-exchange scheme is a set of block exchanges between peers such that all of the data is able to be backed-up. Multiple redundant back-ups can be implemented if independent matchings are found. Therefore, lost data can then be easily recovered after the failure of a vertex.

1.2 Goals

The research goal of this thesis is to explore the topic of b-matchings. This thesis has several sub-goals or milestones. They are as follows:

1. Develop an efficient approximation algorithm for finding a maximum b-matching in

graph as well as an adaptation to multiple independent matchings.

2. Perform empirical research for the performance of the algorithm for finding a single b-

matching in numerous types of graphs to stress test the algorithm’s ability to handle

various different scenarios.

3. Perform empirical research on the performance of the algorithm’s ability to find

independent b-matching and observe its ability to maximize each subsequent matching.

4. Show the NP-completeness of finding independent perfect b-matchings even in the

complete graph.

5. Explore the possibility of using the algorithm for developing a peer-to-peer network in

which bytes of data can be exchanged between peers for the use of a back-up scheme.

1.3 Organization of Remaining Chapters

The remainder of this thesis is structured as follows. Chapter 2 covers various background topics that are necessary to this thesis: graph theory terminology, graph generators, sum of subsets, b-matchings and other topics. Chapter 3 introduces an approximation algorithm for b-matchings, sets up an empirical experiment for single matchings and provides the results of 11

the research. Chapter 4 provides a proof of the NP-completeness of the partition problem and the partition of equal sized subsets. Chapter 5 includes a proof of the difficulty of finding two independent b-matchings in a single graph. Chapter 6 introduces a modification of the approximation algorithm for independent b-matchings and sets up an empirical experiment for multiple independent matchings. Chapter 7 provides a conclusion of the results in Chapter 3 and

Chapter 6 as well as the significance of the findings. Chapter 8 provides insights for further research and possible extension of the approximation algorithm.

Chapter 2: Background

This chapter encompasses a wide range of topics relevant to the goal of this research. A standardization of the terminology used throughout this thesis is provided to remove the ambiguity that appears due to the interrelation between the fields Network Science and Graph

Theory. A brief overview of the types of graphs that will be analyzed later, as well as an explanation of their generators is given to set-up the empirical research. A key concept used when analyzing independent matchings, sum-of-subsets, is introduced. Following, we provide an overview of approximation algorithms and data backup schemes in peer-to-peer networks.

Lastly, b-matchings are introduced along with a history of various algorithms used to find b- matchings.

2.1 Graphs

Graphs are a key data structure used to obtain results. Therefore, the key of understanding the results is understanding the fundamental functions and terminology of a graph. A graph is a very common structure used to often represent a set of relationships between different entities. A vertex, 푣, is a single object within the graph that depending on the parameters defined, can create relationships with other vertices and itself. An edge, 푒, is a relationship defined between a pair of vertices signifying a relationship between the two vertices. A graph, 퐺, is defined by a vertex set,

푉, and an edge set, 퐸. The number of edges containing an individual vertex is known as the vertex’s degree, commonly denoted as 푘. A graph where the edges are ordered and have an orientation creating a one-way connection between the vertices is referred to as a directed graph. 13

On the other hand, an undirected graph has bi-directional edges between vertices. In Network

Theory, a vertex, an edge, and a graph are defined as a node, a link, and a network respectively.

Throughout this paper, the standard Graph Theory terminology of vertex, edge, and graph will be used exclusively.

2.2 Graph Types

There are several graph types that will be implemented throughout the empirical research that will be expounded upon in the next sub-sections. Within all the graphs discussed, the assumptions of all the graphs are undirected and simple and contain no isolated vertices. All the types of graphs below are fully detailed in Albert-Lásazó Brabási’s book Network Science. [1]

2.2.1 Complete Graph

The complete graph is quite trivial. The complete graph generator will simply take the number of vertices, 푛, and create edges at all the possible positions. In other words, an edge exists between every pair of vertices. In the complete graph there are 푛(푛−1) edges and the 2 average degree is 푛 − 1.

2.2.2 Traditional Random Graphs

There are two traditional types of generators used to create a random graph, one using a fixed number of edges and one using probability to create edges. The first model will generate a graph using a fixed number of vertices, 푛, and a second fixed integer, 푚, which corresponds to

the number of edges in the graph. The generator will randomly place each of the 푚 edges throughout the 푛 vertices allowing for every graph with 푛 vertices and 푚 edges to be equally likely to be generated. This model was created and heavily explored by the renowned mathematicians Pál Erdős and Alfréd Rényi. [1]

Again, the second model will use a fixed number of vertices, 푛, but the second number,

푝, will represent a probability bounded between 0 and 1. This model will create all isolated vertices, then consider each candidate edge for creation according to the probability. Candidate

푛(푛−1) edges are selected from all pairs of vertices. A random value between 0 and 1 is created 2 according to a random number generator scheme. If this random value is less than or equal to 푝, then the edge is created and added to the graph. In the rarest of cases, this model does allow for the creation of isolated vertices. However, given a sufficiently large 푛 and a moderate 푝, the probability of isolated vertices is low because all 푛 − 1 incident edges must be assigned a value greater than 푝. In extreme cases, the model does allow the generation of the complete or empty graph. Mathematically, this is quite improbable given that 푛 is sufficiently large and a 푝 that is not astronomically close to either 0 or 1. [1]

2.2.3 Barabási-Albert Graph

The Barabási-Albert model implements a concept called preferential attachment to generate the graph. [1] Preferential attachment is a novel concept added to the generation process, which is able to better represent a social network versus a traditional random graph. The concept of preferential attachment follows the idea that new vertices entering an existing graph

are more likely to form new edges with the vertices who have higher degrees. This form of graph generation relies upon an already existing small graph. The concept of a time step where each vertex is added sequentially, is critical to the generator of a Barabási-Albert graph. The sequential creation allows for the generator to evaluate the graph every time a vertex is to be added and determine its incident vertices. An example of a graph generated using the Barabási-

Albert model can be seen in Figure 1 where vertices are labeled with their average degree. The maximum degree of a Barabási-Albert varies drastically from the acreage degree. This attribute is a significant difference from the traditional random graphs whose maximum degree is often very close to the average degree.

Figure 1 - Barabási-Albert Preferential Attachment graph generated using NetworkX

2.2.4 Watts-Strogatz Small World

The previous graphs all modeled a structured relationships between vertices. However, they fail to capture the small world property or small world phenomenon that is present in normal human relations. The small world property states that the average distance between each vertex is log (푛). [2] Watts and Strogatz introduced this idea in their article in the letters to nature journal. The small world phenomenon is popularly known as the “six degrees of separation”, which states that all humans are at least six or fewer relationships away from one another. The concept of the six degrees of separation is commonly credited to Hungarian author

Frigyes Karinthy.

Watts and Strogatz developed an algorithm to generate a graph that has this small world property. The algorithm takes three parameters: the number of vertices 푁, the average degree 퐾 and a rewire probability 훽. The algorithm begins by creating a ring graph that can be seen in

Figure 2. Every edge is then considered to be rewired with the probability 훽. If an edge is rewired then the edge 푒 = (푢, 푤) is removed and a new edge 푒′ = (푢, 푤′). The new vertex, 푤′, is selected at random from the vertex set 푉 such that 푒′ is not an existing member of the edge set

퐸 and 푒′ doesn’t create a self-loop. The resulting graph exhibits the small world properties with an average distance of log (푛). The rewiring process of the graph in Figure 2 can be seen in

Figure 3.

Figure 2 - Ring Graph n=10 k=6 created using NetworkX

Figure 3 - Rewiring of Ring Graph in Figure 2 created using NetworkX

2.3 Sum of Subsets

A classical problem in computer science is the subset sum problem. The outline of the problem is given a set or multiset of integers to find a subset whose sum is equal to a specific value. This problem has been proven to be NP-Complete by Robert M. Karp. [3] A solution is quite easy to verify, but difficult to find. Several approximation and pseudo polynomial time algorithms have been developed to solve this problem. However, no deterministic polynomial time algorithm has been developed to solve the subset sum problem.

The partition problem is a different version of the subset problem. [4] The partition problem begins with the same premise as the subset problem, but rather than summing to a specified value, the goal is to split the set or multiset into two subsets who sum to the same value. The series of equations in Equation 1 is an example given to illustrate a set 푆 and the necessary properties to satisfy the partition problem.

푆 = {0, … , 푛}

퐴 ⊂ 푆

퐵 ⊂ 푆

퐴 ∩ 퐵 = ∅

푆 = 퐴 ∪ 퐵

∑ 퐴 = ∑ 퐵

Equation 1 - Set partition into subsets of equal sum

푆 is any finite set or multiset. 푆 is then partitioned into two disjoint sets whose sums are equal.

An extension of the partition problem that will be examined in this thesis is the partition of equal sized subsets (PESS). PESS maintains all the conditions of the partition problem outlined in

Equation 1, but adds the constraint that the cardinality of 퐴 must equal the cardinality of 퐵. In

Chapter 4, we will show that PESS is indeed NP-Complete.

2.4 Approximation Algorithms

The need for approximation algorithms arose from the need to find solutions for NP-

Complete and NP-Hard problems. An approximation algorithm will not always find an optimal solution, but instead will find a solution that can get close to the optimal in most cases. In their book, The Design and Approximation of Algorithms, Williamson and Shmoys expertly define an approximation algorithm as, “An α-approximation algorithm for an optimization is a polynomial-time algorithm that for all instances of the problem produces a solution whose value is within a factor of α of an optimal solution.” [5]

For an approximation algorithm, the performance guarantee is represented by α. Other common terminology for the performance guarantee is the approximation ratio or the approximation factor. This performance guarantee is used to score an approximation algorithm to determine how well the algorithm approximates an optimal solution for the problem it is attempting to solve. However, the performance guarantee can only be calculated when an optimal solution is known and can be found.

When designing an approximation algorithm, it is crucial to determine whether the algorithm is either a minimization or a maximization algorithm. In the case of a maximization algorithm, the goal is to produce a solution that is as close as possible, but less than the optimal value. Therefore, for an approximation that produces a solution of at least a third of the value of

1 the optimal solution the performance guarantee is α = . In the case of a minimization algorithm 3 who produces a solution of at least two times the optimal solution’s value has a performance guarantee of α = 2. [5] In both cases, the closer the performance guarantee is to 1, the better the algorithm is able to approximate a solution.

2.5 Back-up Schemes

A back-up scheme is a system designed to allow users to safeguard important information from loss and corruption. A peer-to-peer backup scheme, pStore, was developed by Christopher

Batten, Kenneth Barr, Arvind Saraf and Stanley Trepetin. [6] pStore allows users to join a peer- to-peer network of other peers who wish to join the network to back-up their important files. The back-up scheme is designed with data security in mind, in which every file is split into several file blocks, which are tracked using meta data stored in file block lists. These file blocks are then encrypted by the pStore client using symmetric key generation and randomly dispersed across the network using a salt. pStore will send multiple copies of these data chunks to allow for recovery in the case of a node failure. In addition, it also allows for sophisticated space optimization and versioning through advanced techniques.

2.6 B-matching

An example of the process for finding a b-matching is described in this section. Figure 4 is a random graph generated using a defined number of links. Each vertex has been assigned a random b-value. Then, an algorithm computes the b-matching and assigns a weight to each of the edges in the matching. Figure 5 is a perfect b-matching as all of the entire b-values have been successfully moved to the edges. This is the best case scenario for a b-matching algorithm, but is not always guaranteed.

Figure 4 - Random Graph n=5 m=8 with b-values created using NetworkX

Figure 5 - A Perfect b-matching of Figure 2's Graph created using NetworkX

An excellent survey paper written by Müller-Hannemann and Schwartz, covers and implements numerous b-matching algorithms. [7] We will cover the highlights of the evolution of b-matching algorithms to illustrate the significance of the results shown in this thesis. The first algorithm to find a maximum b-matching was developed in 1956 by Edmonds. [8] The b- matching algorithm can be reduced to a 1-matching, but this will increase the size of the problem. A 1-matching is simply a b-matching with 푏(푣) = 1. In large graphs, the reduction from a b-matching to a 1-matching is quite impractical due to the size that it creates. [9]

Therefore, finding a solution using this methodology is not always a practical option.

Pulleyblank created a modification to a primal-dual algorithm to solve for a maximum b- matching. [10] The primal-dual algorithm is also commonly known as the Blossom algorithm of which numerous revisions have been seen, the latest being Blossom V. [11] Pulleyblank’s algorithm constructs a linear programming problem with constraints, a dual and slack elements.

The algorithm begins with an existing b-matching that is neither a perfect b-matching nor a maximum b-matching. Going forward, the algorithm will attempt to progress towards meeting the conditions provided in the constraints and dual. The Blossom algorithm allows Pulleyblank’s algorithm to create pseudo-nodes by combining several nodes into a single node or petal, allowing for the conditions to be more easily met by expanding and shrinking the graph. Once a condition is met, it is held invariant until the algorithm terminates. After the primal step is complete, the algorithm will check if the perfect b-matching is achieved; otherwise the algorithm will continue through the primal-dual loop until all the conditions are met and the maximum weight b-matching is returned. See Figure 6 from Müller-Hannemann and Schwartz to better illustrate the action of the algorithm.

Figure 6 - Structure of Pulleyblank's algorithm created by Müller-Hannemann and Schwartz [7] A branch and cut approach polynomial algorithm was created by Padberg and Rao, to solve the weighted b-matching problem using odd cut constraints. [12] Huang and Jebara approached the b-matching using belief propagation. Belief propagation allowed them to achieve an impressive space requirement and time complexity of 푂(|푉|) and 푂(|푉|2.5). The space improvement was most significant as this allowed for the ability to solve for a b-matching using a non-parallelized algorithm on a home computer, whereas previous algorithms would make this impractical due to the memory of previous algorithms. [13] The advantage of the space savings originates from the idea of storing the belief for each edge in such a way that the previous belief can be recursively computed so not all previous beliefs need to be stored and can instead be computed. Thus, once the belief matrix is updated, the selected beliefs can be stored and the

unnecessary data is discarded. This drastically reduces the memory requirements for the belief matrix and reduces the necessary storage of Huang and Jebara previous algorithm. The time savings improvement is done by exploiting an observation that each belief is the sum of a weight and an 훼 or 훽 value. [13] These values can be presorted before iterating over the values so that the search does not have to perform searches pass a certain stop criteria. This sorting guarantees that since the beliefs are bounded by certain parameters. This optimization provides unique challenges which are discussed by Huang and Jebara, but these issues are able to be overcome and lead to an improvement in time complexity from 푂(푁3) to 푂(푁2).

Recently an algorithm for an approximation of a b-matching named b-SUITOR was developed by a large collaboration of researchers [9]. Their algorithm is built upon the

SUITORS algorithm released by Manne and Halappanavar [14]. However, the b-matching they are solving for is constructed slightly differently than the b-matching discussed earlier in this thesis. In their graph, every vertex has a 푏(푣) value which bounds the number of incident edges that can be added to the final matching. Every edge has a unique value that is used to calculate the final weight of the b-matching. The final weight of the b-matching is calculated by summing the weights of the edges that are in the final matching. Do not confuse the 푏(푣) value with the b- value described earlier in this section.

The b-SUITOR algorithm works by maintaining a priority queue 푆 of potential suitors for a vertex. Note that the priority queue may not contain more suitors than 푏(푣). The algorithm will iterate through each vertex and propose to its neighbors whose edge has the largest weight. The algorithm will continue until the priority queue of every vertex has not been updated.

Chapter 3: b-matching

Throughout this chapter, we will introduce an approximation algorithm to find a b- matching on various connected graphs with b-values. This algorithm has been implemented in

Python 3.7.4 using the NetworkX. An empirical analysis of this approximation algorithm is provided in comparison to the optimal matching. The source code will be released in the appendix.

3.1 b-dominator Approximation Algorithm

The algorithm, b-dominator, was developed by Kenneth Berman with collaboration of

Christopher Ochs. The algorithm will produce an approximation of the maximum b-matching of a graph. Empirical research has shown that the performance guarantee of b-dominator is 3. 4

Succinctly, the algorithm works by iterating through each vertex and deciding which neighboring vertex is the best one to perform the block exchange operation with. The block exchange operation takes a pair of vertices that share an edge and reduces the b-value of each vertex by the exchange parameter value 푐; such that the input to the block exchange operation is

(푢푣, 푐). Following, the algorithm adds the edge formed by the vertices and 푐 to the current matching. The algorithm will terminate once there is no longer an edge that is eligible for the block exchange operation. Necessary definitions and a brief explanation of the terms follows in

Equation 2, along with the pseudo code for b-dominator in Equation 3.

퐺 = (푉, 퐸) with b-values For 푈 ⊆ 푉

푏(푈) = ∑ 푏(푢) 푢 ∈ 푈 푁(푢) = {푣 |푢푣 ∈ 퐸} 휂(푢) = 푏(푁(푢)) − 푏(푢) 휇(푢) = min{ 휂(푣) | 푣 ∈ 푁(푢)} Equation 2 - Definitions for b-Dominator algorithm

To better understand the b-dominator algorithm, a brief description of the definitions will be provided. The 푏(푈) value is simply a sum of all the b-values contained in the proper subset of

푉. The 푁(푢) function will return a set of all the vertices that are neighbors of 푢. The 휂(푢) value is the b-value of 푢 subtracted from sum of all of the b-values of 푢’s neighbors. Finally, 휇(푢) is the minimum value of the 푛(푣) where 푣 is the neighbors of 푢.

푀 = ∅ (1)

While ∃ a vertex 푣 such that 푛(푣) ≤ 0 (2)

For all 푢 ∈ 푁(푣) (3)

Perform block exchange operation (푢푣, 푏(푢)) (4)

Add (푢푣, 푏(푢)) to 푀 (5)

While ∃ an edge 푢푣 such that 푏(푢) and 푏(푣) are not zero (6)

휇(푢) 휇(푣) 푥 = min {b(u), b(v), , } (7) 2 2

Perform block exchange operation (푢푣, 푥) (8)

Add (푢푣, 푥) to 푀 (9)

While ∃ a vertex 푣 such that η(v) = 0 (10)

For all 푢 ∈ 푁(푣) (11)

Perform block exchange operation (푢푣, 푏(푢)) (12)

Add (푢푣, 푏(푢)) to 푀 (13)

Output the b-matching determined by 푀 (14)

Equation 3 - Pseudo code for b-Dominator approximation algorithm

The first part of the b-dominator algorithm begins by initializing an empty matching 푀 that will represent the b-matching found by the algorithm. Next, the algorithm iterates through all the vertices and checks if any are dominating their neighbors i.e., the b-value of a vertex is greater than the sum of its neighbors b-values. If a vertex 푢 is locally dominating or equal to the sum of its neighbors b-values, then the block exchange operation is performed with each of 푢’s neighbors and each edge is added to the matching 푀. This operation will lower the b-value of 푢 as much as possible, but not eliminate it.

Once b-dominator has been determined that no vertices are dominating, the algorithm proceeds to the next step. In the second part of the b-dominator algorithm, line 6, all the edges are iterated through and a candidate edge 푢푣 is selected such that 푏(푢) and 푏(푣) are not zero.

Then the exchange value parameter is determined by selecting the minimum of both the b-values of each vertex of the edge and the 휇 value of each vertex, which ensures that a locally dominating vertex is not created. The block exchange is performed using the parameters (푢푣, 푥) 29

and is added to the matching 푀. From here, line 10, the algorithm determines if there is a vertex

푣 whose 휂(푣) value is zero. If this exists, then the block exchange is performed with 푣 and all of its neighbors, causing the b-value of 푣 will be reduced to zero. After this, the algorithm will determine if there is another candidate edge 푢푣 such that 푏(푢) and 푏(푣) are non-zero. If there is another candidate edge then the algorithm will return to line 7 and begin the process over again.

Once no candidate edge exists the algorithm returns the matching 푀 which is an approximation of the maximum b-matching.

3.2 Empirical Analysis of b-dominator

An empirical analysis of the performance of the b-dominator was performed. The setup of the testing is outlined in this section along with a discussion of the results at the end of this chapter.

3.2.1 Empirical Analysis Setup

To perform this empirical analysis, the NetworkX Python library was utilized to have an easy to use object that represents a graph along with numerous optimized helper methods.

Analysis was performed on the five types of graphs discussed in Chapter 2: the complete graph, probability random graph, edge based random graph, Barabási-Albert preferential attachment graph and the Watts-Strogatz small world graph. To perform an empirical analysis of the b- dominator algorithm, a metric to score the effectiveness of the b-dominator is required. The metric created is called the recovery percentage of a b-matching. The sum of all of the b-values

of the edges in the matching is the recovery value. The recovery percentage of a b-matching is the recovery value in which the matching outputted from b-dominator divided by the optimal recovery value of the perfect b-matching. Thus, the closer to one the recovery percentage is the better algorithm performed on the graph.

Each of the types of graphs is generated using a NetworkX generator method and isolated vertices are removed. Followed by adding weights of random value, such that the maximum value is at most 1000 times the minimum value. The weights are added to each of the edges in the graph ensuring that there exists a perfect b-matching. Next, the weights are moved in the reverse of the block exchange operation as the weight of all the edges incident to a vertex are summed together and the summation is the b-value of that vertex. Now the graph with b-values is passed to the b-dominator algorithm and a matching is returned. This matching is then scored using the recovery percentage metric.

The recovery percentage depends heavily upon the layout of the graph because of this, numerous input parameters for the generators are tested when performing this empirical analysis.

Specific values will be provided in the subsection corresponding to each graph type. Every graph is generated 1000 times with one set of input parameters and then the best, worst, and average recovery percentage is reported.

3.2.2 Analysis of Complete Graph

The b-dominator algorithm will always return a perfect b-matching if it exists in the complete graph, or else it will return a maximum b-matching of the graph. Note that the only way a perfect matching will not exist is if 푏(푉) is odd, but due to the way the b-values are 31

constructed, this is not an issue. However, to ensure consistency across this empirical test, the complete graph will still be tested. The complete graph with 50, 75, 100, 150, 200 and 250 vertices are tested and the results can be seen in Table 1.

Complete Graph Vertices Avg. Recovery Best Case Worst Case 50 100.0000% 100.0000% 100.0000% 75 100.0000% 100.0000% 100.0000% 100 100.0000% 100.0000% 100.0000% 150 100.0000% 100.0000% 100.0000% 200 100.0000% 100.0000% 100.0000% 250 100.0000% 100.0000% 100.0000% Table 1 - Complete graph b-dominator performance results

3.2.3 Analysis of Random Graph

There are two types of standard random graphs, the edge based and probability-based graph. First, the probability based random graph was tested with a constant vertex count of 100 and varying edge probability of 0.4, 0.5, 0.6, 0.7 and 0.8. This range of probabilities allowed for a sufficient representation of the graph without approaching too close to both the complete graph and extremely sparse graph. The results can be seen in Table 2, along with a plot of the edge probability and recovery percentages in Figure 7. The edge based random graph is tested with a vertex count of 100 and a varying edge counts of 300, 400, 500, 1000, 2000, 2500 and 4500.

This range of edge counts is again selected to cover the range of graphs from the sparse to the

dense. The results can be seen in Table 3 along with a plot of the edge count and recovery percentages in Figure 8.

Probability Based Random Graph Vertices Edge Prob. Recovery Percentage Best Case Worst Case 100 40% 98.6329% 100% 94.8552% 100 50% 98.9515% 100% 95.7113% 100 60% 99.2213% 100% 97.1098% 100 70% 99.4486% 100% 97.1030% 100 80% 99.6391% 100% 97.1738% Table 2 - Probability-based random graph performance results

Edge Probability vs Recovery Percentage 0.998

0.996

0.994

0.992

0.99

Recovery Percentage Recovery 0.988

0.986

0.984 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Edge Probability

Figure 7 – Edge probability vs recovery percentage in probability based random graph

Edge Based Random Graph Vertices Edges Recovery Percentage Best Case Worst Case 100 300 91.5581% 97.3158% 84.0232% 100 400 93.1899% 98.2813% 86.7066% 100 500 94.4974% 98.4696% 88.3413% 100 1000 97.0723% 100.0000% 92.5400% 100 2000 98.6378% 100.0000% 95.2805% 100 2500 98.9709% 100.0000% 94.9816% 100 4500 99.8269% 100.0000% 97.9124% Table 3 - Edge based random graph b-dominator performance results

Edge Count vs Recovery Percentage

0.99

0.98

0.97

0.96

0.95

0.94

Recovery Percentage Recovery 0.93

0.92

0.91

0.9 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Edge Count

Figure 8 - Edge count vs recovery percentage in edge based random graph

3.2.4 Analysis of Barabási-Albert Graph

The Barabási-Albert preferential attachment graph was selected due to its scale free nature. The scale-free property of graph is a graph with a degree distribution that follows a power law. [1] A scale free network will have several vertices with a degree that is significantly greater than the average degree of the graph, creating a graph with hub topology. A hub topology will have several vertices with a degree significantly larger than the average degree. A hub topology that is commonly used to model the networks created using the world wide web. The graph of the world wide web is constructed using hyperlinks as directed edges and webpages represented by vertices. Scale free networks can also model social networks in certain cases when individual accounts have millions of followers a hub topology is expected to be observed.

However, a very recent journal entry, “Scale-free networks are rare” written by Anna Broido and Aaron Clauset, has performed an empirical study on the scale-free properties of social networks. These results showed that only half of the social networks exhibit a scale-free property, and that those that do exhibit this scale freeness do so poorly. [15] The results of this paper will not impact the analysis performed in this thesis, rather the results simply question whether or not the Barabási-Albert model is a good representation of numerous social networks.

This question has been studied rigorously and a consensus has not been reached as there are papers which both support and refute this claim. [15]

The analysis of the Barabási-Albert graph is performed with a constant vertex count of

100 and a varying initial attachment factor of 2, 4, 6, 8, 10, 15 and 20. The generator will start with a vertex and begin performing the time step operations of adding vertices to the graph with

edges equal to the initial attachment factor until the number of vertices is 100. The neighbors of each incoming vertex are determined by the degree of each other existing vertex in such a way that the probability of a being a neighbor of an incoming vertex is more probable the higher the degree of the existing vertex. The attachment factors were selected to model both a sparse and dense scale-free graph. This allows for a sufficient sampling of the performance of b-dominator across numerous Barabási-Albert graphs. The results can be seen in Table 4, along with a plot of the initial attachment and recovery percentages in Figure 9.

Barabási-Albert Preferential Attachment Vertices Attachment Edges Recovery Best Case Worst Percentage Case 100 2 196 82.8168% 89.7190% 74.9843% 100 4 384 88.8868% 93.8470% 81.7763% 100 6 564 91.7570% 96.2269% 86.1071% 100 8 736 93.4729% 97.7682% 88.5160% 100 10 900 94.5785% 98.7678% 90.5383% 100 15 1275 96.2902% 99.3301% 91.4599% 100 20 1600 97.2203% 99.6399% 93.1196% Table 4 - Barabási-Albert preferential attachment graph b-dominator performance results

Barabási-Albert Preferential Attachment 1

0.98

0.96

0.94

0.92

0.9

0.88

Recovery Percentage Recovery 0.86

0.84

0.82

0.8 0 5 10 15 20 25 Initial Attachment

Figure 9 - Initial attachments vs recovery percentage Barabási-Albert

3.2.5 Analysis of Watts–Strogatz Graph

The best way to model the phenomenon of the six degrees of separation is using the graph model created by Watts-Strogatz, often referred to as a small world graph. The small world graph will model a graph that captures the small world phenomenon. Due to the nature of how these types of graphs are generated with two varying parameters, performance analysis generates significantly more data. Thus these results will be summarized slightly differently than the previous graphs with all data plotted on single graph with multiple series.

The Watts-Strogatz small world graph is tested with a constant vertex count of 100 along with a pair of varying parameters starting degree and rewiring probability. The starting degree is selected from the set 3, 5, 10, 15, 20, 25 and the rewiring probability is selected from twenty-five percent to seventy-five percent incrementing by five percent. Each combination of starting degree and rewiring probability is tested. The abridged results can be seen in Table 5, along with a plot of the initial attachment and recovery percentages in Figure 10.

Watts Strogatz Small World Graph 1

0.98

0.96 Initial Degre 3 0.94 Initial Degre 5 Initial Degree 10 0.92

Initial Degree 15 Recovery Percentage Recovery 0.9 Initial Degree 20 Initial Degree 25 0.88

0.86 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Rewiring Probability

Figure 10 – Rewiring probability vs Recovery percentage Watts-Strogatz small world graph

Watts-Strogatz Small World Graph Initial Degree Rewiring Probability Recovery Best Case Worst Case 3 0.25 94.5097% 98.9830% 87.9979% 3 0.35 92.8293% 97.9196% 86.5158% 3 0.45 91.3594% 96.2581% 85.6845% 3 0.55 90.0245% 96.3228% 82.2309% 3 0.65 88.8218% 95.6273% 81.2631% 3 0.75 87.9588% 93.5805% 81.4637% 5 0.25 97.9180% 10.0000% 93.6328% 5 0.35 96.8503% 99.8461% 91.7136% 5 0.45 95.5806% 99.1089% 89.0813% 5 0.55 94.4324% 98.8326% 88.6701% 5 0.65 93.2197% 98.7454% 85.6520% 5 0.75 92.0510% 96.7534% 85.6756% 10 0.25 99.5685% 100.0000% 97.5028% 10 0.35 99.3376% 100.0000% 96.3159% 10 0.45 98.9782% 100.0000% 95.8037% 10 0.55 98.5685% 100.0000% 94.7504% 10 0.65 97.9080% 100.0000% 92.5875% 10 0.75 97.0502% 100.0000% 91.3995% 15 0.25 99.5712% 100.0000% 97.5868% 15 0.35 99.3988% 100.0000% 96.8293% 15 0.45 99.2051% 100.0000% 96.6759% 15 0.55 98.9577% 100.0000% 95.9677% 15 0.65 98.5083% 100.0000% 95.6395% 15 0.75 97.8860% 100.0000% 93.9402% 20 0.25 99.5558% 100.0000% 97.1431% 20 0.35 99.3956% 100.0000% 95.7471% 20 0.45 99.2207% 100.0000% 96.8619% 20 0.55 99.0526% 100.0000% 96.3610% 20 0.75 98.3751% 100.0000% 94.8502% 25 0.25 99.5716% 100.0000% 97.3777% 25 0.35 99.4336% 100.0000% 96.8318% 25 0.45 99.2876% 100.0000% 96.8462% 25 0.55 99.0758% 100.0000% 96.6382% 25 0.65 98.8860% 100.0000% 95.4808% 25 0.75 98.5912% 100.0000% 94.8753% Table 5 - Small world graph b-dominator performance results 39

3.2.6 Analysis of Special Cases

There are specific scenarios in which the previous analysis did not cover that requires the graph to be crafted specifically for the test rather than using a predefined generator. These cases will cover edge cases that will test the capability of b-dominator in poor scenarios as well as other interesting cases.

The first scenario that will be tested is the near-complete graph. The near-complete graph

푛(푛−1) is the complete graph with a single edge removed and will have − 1 edges. The goal of 2 this test is to identify the importance of the final edge in relation to achieving a perfect b- matching. The near-complete graph will be tested with varying vertex sizes of 50, 75, 100, 150,

200 and 250. The test will be run 1000 times and the number of times that a perfect matching is achieved is recorded as well. The results from the near-complete graph can be seen in Table 6.

The second scenario that will be tested is when a single vertex dominates its neighbors.

When a vertex dominates its neighbors, the sum of all the neighbor’s b-values is less than the sum of a vertex b-values. Clearly, a perfect b-matching is not possible when this is the case and to compensate for this, the recovery percentage will be continued to see how close it is to achieving a perfect b-matching. Since the previous analysis has strictly not allowed a dominating vertex to exist due to the nature of the construction of the graphs, we will force a dominating vertex to be created. This is achieved by selecting the vertex with the maximum b-value and increase its b-value to be ten percent greater than the sum of its neighbors b-values. This methodology will always create a dominating vertex. The complete graph and random graphs will be tested with an added dominating vertex. The results from this can be seen in Table 8.

The final scenario that will be tested is a very specific graph that is very rare to occur naturally in the random graph generators, but exhibits a key property. The final special case is a ring graph. A ring graph will generate a uniform graph where there a systematic structure, refer to Figure 2 for a visualization of the ring graph. The ring graph will be organized in such a way that a traversal between any two vertices can be completed in a short number of hops. The results for the ring graph can be seen in Table 7.

Near-Complete Graph Vertices Edges Recovery Percentage Perfect Matchings Worst Case 50 1224 99.9974% 998 98.123556 75 2774 99.9995% 999 99.5243% 100 4949 99.9984% 999 98.3967% 150 11174 99.9993% 999 99.3436% 200 19899 100.0000% 1000 100.0000% 250 31124 100.0000% 1000 100.0000% Table 6 – Near-complete graph b-dominator performance results Ring Graph Vertices Degree Recovery Percentage Perfect Matchings Worst Case 100 3 99.39463% 39 98.03% 100 4 99.86388% 754 98.45% 100 5 99.88870% 774 98.47% 100 6 99.99988% 999 99.88% 100 7 99.99899% 999 98.94% 100 8 100.00000% 1000 100.00% 100 9 100.00000% 1000 100.00% Table 7 – Ring graph b-dominator performance results

Dominating Vertex Vertices Edges Recovery Percentage Best Case Worst Case 50 150 88.5357% 94.1727% 81.7451% 50 250 91.1032% 94.9213% 83.2150% 50 500 93.1521% 95.1615% 89.0789% 50 1250 94.1553% 94.2284% 93.9972% 100 300 89.7094% 94.5517% 83.5489% 100 500 92.2223% 96.3836% 87.7872% 100 1000 94.3248% 96.5490% 91.4773% 100 2500 94.9665% 95.8393% 93.3440% 100 4500 94.6343% 94.9731% 93.8453% 100 4950 94.7178% 94.7435% 94.6757% 150 450 90.0884% 94.1296% 84.5073% 150 750 92.7766% 96.0269% 88.4206% 150 1500 94.9844% 97.2163% 91.4146% 150 3750 95.7154% 96.7130% 93.8879% 150 6750 95.3463% 95.8895% 94.2584% 150 11175 94.8973% 94.9103% 94.8710% 200 600 90.3451% 94.2642% 86.1350% 200 1000 92.9605% 96.2410% 88.3912% 200 2000 95.3314% 97.4928% 92.5736% 200 5000 96.1805% 97.2154% 94.4044% 200 9000 95.8736% 96.5040% 94.7915% 200 19900 94.9858% 94.9932% 94.9662% Table 8 - Performance of b-dominator with a dominating vertex

3.3 Discussion

The results of b-dominator analysis are quite intriguing. When considering the goal of a b-matching is to maximize the sum of the b-values found in the matching, intuition would lead to the assumption that as the average degree increases the more options or candidates b-dominator will have for selection. However, this assumption is not entirely true. The variance between the degree of the maximum vertex’s degree and minimum vertex’s degree will reduce the final 42

recovery percentage. Therefore, the average degree cannot be the determining factor alone and variance in the degrees must be factored in. The trend can be seen in the Barabási-Albert preferential attachment results and the random edge results. Consider the relationship between the number of edges in Table 3 and Table 4, the edge base random graph has a higher recovery percentage than a Barabási-Albert graph when considering a similar edge count. A graph with a low initial attachment value will have a very large variance in the degree of the vertices because a few select vertices will have a large degree and the rest of the vertices will have a small degree.

Barabási-Albert graphs with a low initial attachment and as such will exhibit a poor recovery percentage. On the other hand, a large initial attachment performs very well as the difference between the average degree and maximum degree is small.

Another factor that will influence the performance of b-dominator is the two-hop neighbors. The two-hop neighbors of a vertex are the vertices who are neighbors of the neighbors of the vertex. The larger percentage of the two-hop cover that is redundant, the better b- dominator will perform. In other terms, when the vertices that are neighbors of a vertex are also two-hops away, they can be considered redundant. This trend can be easily seen in the results of the ring graph. When the degree of all vertices increases in the ring graph, the two-hop vertex cover will cover an increasingly higher percentage of the graph; the recovery percentage will increase accordingly until b-dominator will consistently return a perfect matching. This occurs because of line 7 in Equation 3. This is where the algorithm will look at vertices two hops away from the candidate edge. When a large portion of this two-hop cover is redundant, it allows new vertices to be considered and added to the matching slowly, thus allowing a higher recovery percentage to be achieved. Another interesting discovery is when a ring graph has an initial 43

attachment above seven, b-dominator will always return a perfect b-matching. This observation was tested using varying weights and graphs with extremely large number of vertices.

The b-dominator algorithm returns a perfect b-matching in the case of the complete graph because of the lack of any variance in the degree of its vertices, as well as the ability to perform the block exchange with every vertex. The complete graph is also able to absorb the blow of a dominating vertex as there are existing numerous other routes to perform the block exchange with the remaining b-values.

Chapter 4: Difficulty of Independent Matchings

A simple proof of the NP-completeness of the partition of equal sized subsets follows.

Recall the subset sum has been proven to be NP-complete by Karp. [3] In addition, recall that the extensions of the subset sum problem, the partition problem, is also in NP. [4] This is a significant proof that will be used in later chapters.

Lemma 1: The partition problem is NP-complete.

The PARTITON PROBLEM is:

Input: < 퐴 > where 퐴 is a set or multiset whose members are non-negative integers.

Question: Find a partition of 퐴 into two subsets whose sums are equal.

The PARTITION OF EQUAL SIZED SUBSETS problem is:

Input: < 퐴 > where 퐴 is a set or multiset whose members are non-negative integers.

Question: Find a partition of 퐴 into two subsets of equal size whose sums are equal.

Claim 1: The PARTITON PROBLEM is in NP.

Proof: The sum of subsets problem has been proven to be NP-complete by Karp. The sum of

subsets will take a set or multiset 퐴 = {푎1, … , 푎푛} of nonnegative integers and an integer

푐. The goal of the sum of subsets is to find a subset 푆 of 퐴 whose sum is 푐 or equivalently

푠 find a subset 푇 whose sum is 푠 − 푐 where 푠 = 푎 + 푎 + . . . + 푎 (). Let 푏 = | − 푐| 1 2 푛 2

and let 퐴′ be the set obtained from 퐴 by adding the element 푏. Then, a partition of 퐴′ yields a

solution to the sum of subsets problem (퐴, 푐) or equivalently (퐴, 푠 − 푐).

Claim 2: PARTITON PROBLEM ≤푝 PARTITION OF EQUAL SIZED SUBSETS

Proof: Define a function 푓 that takes an input < 퐴 > where 퐴 is a set or multiset of non-

negative integers 푛 = |퐴| and outputs < 퐴′ > where 퐴′ is a multiset constructed from 퐴 with

푛 zeroes appended.

Suppose that 퐴′ has partitioning into the subsets 푥 and 푦 of equal cardinality |푥| = |푦| and

equal sum ∑ 푥 = ∑ 푦. Then the subsets 푥′ and 푦′ in which all the added zeroes have been

removed is a solution to the partition problem.

Given two matchings 푀1 and 푀2 in the graph G, with support 푆1 and 푆2 respectively. 푀1 and 푀2 are said to be independent matching if the intersection of the edge sets of 푆1 and 푆2 is the empty set. In other terms, no edge may appear in both 푀1 and 푀2 while maintaining the independence of 푀1 and 푀2. Additional independent matchings can be found as long as the conditions above hold for all matchings found. The normalized maximum is the ratio of the largest b-value divided by the sum of every b-value. The following theorems will illustrate the findings of independent matchings with normalized maximums of a third and a sixth.

Theorem 1. Finding two independent perfect b-matchings is NP-complete even for the

1 complete graph with 휂(퐺) = . 3

Given the complete graph 퐺 = (푉, 퐸) with vertex set 푉 and edge set 퐸. For U ⊆ V, let

b(v) b(U) = ∑ b(u). We define the normalized maximum 휂(b) = max { }. u∈U v ∈ V b(V)

Given a multiset 퐴 = {푎1,푎2, … , 푎푛} where 푛 is even, Lemma 1 proves that it is NP-

1 complete to find a subset, 퐵 ⊆ 퐴, that satisfies the following conditions: ∑ 푥 = ∑ 푎 푥∈퐵 2 푎∈퐴

푛 and |퐵| = . 2

Let 퐶 be a large integer greater than 휎 = ∑푎∈퐴 푎. For vertices, b(v푖) = 푎푖 + 퐶. The

푛 푛 maximum vertex, 푣 , has a 푏(푣 ) = 퐶 + 휎. 푛+1 푛+1 2 2

Claim 1: ∃ partition with equal sum and size  ∃ two independent perfect b-matchings.

Proof: Assume that there exists a partition of 푉 into three subsets: the maximum vertex 푣푛+1

, 푋 and 푌. The sets 푋 and 푌 both sum to 푏(푣푛+1). The first perfect b-matching, 푀1, is found by matching every vertex in 푋 with 푣푛+1. Since the set 푌 contains no dominating vertex due to the large integer 퐶 that will scale all the b-values, a perfect b-matching can be found using edges in which each vertex is contained within 푌. A second perfect b-matching, 푀2, can be found by matching every vertex in 푌 with 푣푛+1. Then all the vertices in 푋 can be b-matched

similar to 푀1. 푀1 and 푀2 are perfect b-matchings and share no edges. Therefore, there exists two independent perfect b-matchings.

Claim 2: ∃ two independent perfect b-matchings  ∃ partition with equal sum and size.

Proof: Assume that two independent perfect b-matchings, 푀1 and 푀2, exist in 퐺. Consider

푛 that vertices much match with 푣 in each matching. These vertices that match with the 2 푛+1 dominating vertex can be formed into sets of their own in such a way that vertices who match with the dominating vertex in 푀1 will form a set 푋 and those in 푀2 will form a set 푌. Then due to the definition of independent matchings, these sets must be exclusive and sum to the same value 푏(푣푛+1) and be of equal size.

Claim 3: ∃ two independent perfect b-matchings ⟺ ∃ partition with equal sum and size.

Proof: Claim 3 is a result of the bidirectional conditionality of Claim 1 and Claim 2.

1 Claim 4: Finding two independent perfect b-matchings is NP-complete even for 휂(퐺) = . 3

Proof: Given a polynomial time solution that finds two independent perfect b-matchings.

A trivial transformation from this problem will create a Partition of Equal Sized Subsets problem. PESS has be shown in the previous section to be NP-complete. This solution could then also be used to solve all other NP-complete problems. Therefore, finding two

1 independent perfect b-matchings is NP-complete when 휂(퐺) = . 3

Theorem 2: There exists two independent perfect b-matchings in the complete graph when

1 휂(퐺) ≤ . 6

Preliminary: Sort the b-values such that 푏0 ≤ 푏1 ≤. . . ≤ 푏푛. Let 푖 denote the vertex having b-value 푏푖 and 푖 = 0, … , 푛. We may assume without loss of generality that 푛 is even. If 푛 is odd, then add a vertex with b-weight 0. Solving the problem in the latter complete graph on n

+ 1 vertices, solves it in the original complete graph on n vertices.

Construction of 푴ퟏ: This matching is constructed using the block exchange operation described in Chapter 3. Where each edge is selected as (푖, 푖 + 1) with an exchange value of

푏푖; 푖 = 1,2, … , 푝 − 1. If 푏푝−1 = 푏푝, then the support of 푀1 is (푖, 푖 + 1), 푖 =

1, … , 푝 – 1 together with (푘, 푛), 푘 = 푝 + 1, … , 푛 – 1; if not the support is the same, but with the addition of (푝, 푛).

Construction of 푴ퟐ: Let 푋 = {1, 3, … , 푛 – 1} and 푌 = {2, 4, … , 푛 – 2} and let 퐺[푋] and

퐺[푌] be the complete subgraphs induced by the sets 푋 and 푌, respectively. Note that neither

퐺[푋] nor 퐺[푌] contain any edge from the support 푆(푀1) of 푀1. Let 퐵표푑푑 and 퐵푒푣푒푛 denote the sum of the B-values of the vertices in X and Y, respectively. Let 퐵푖 = 푏1 + … +

푏푖, 푖 = 1, … , 푛.

The algorithm for constructing 푀2 consists of three stages. Stage 1 involves performing block exchange operations to zero vertex 푛. Stage 2 involves performing block exchange

퐵표푑푑 퐵푒푣푒푛 operations, so that 푏 ≤ , 푏 = and both 퐵 and 퐵 are even. Stage 3 푛−1 2 푛−2 2 표푑푑 푒푣푒푛 involves using the b-dominator algorithm to 퐺[푋] and 퐺[푌] to compute a set of block exchanges for the sets 푋 and 푌.

1 5 4 Stage 1: Since 푏 ≤ 푏 ≤ 푏 ≤ 퐵, logically it follows that 퐵 ≥ 퐵 and 퐵 ≥ 퐵. 푝−1 푝 푛 6 푝 6 푝−1 6

Therefore, we can perform a sequence of block exchange operations of the form (푖, 푛, 푐), where 푐 = min (푏푖, 푏푛) until 푏푛 is zero, where 1 ≤ 푖 ≤ 푝 − 1, without using any edge {푖, 푛}

2 in the support of 푀 . Note that after Stage 1 is completed the new value 퐵′ of 퐵 is at least 1 3 of the previous value 퐵. It follows that:

1 1 2 1 푏 ≤ 푏 ≤ 퐵 ≤ ( 퐵′) = 퐵′ 푛−2 푛−1 6 6 3 4

Stage 2: Since 푏푛 = 0 after Stage 1, it follows that 퐵 = 퐵표푑푑 + 퐵푒푣푒푛. Since 푏푖 ≥

퐵 퐵 푏 , we have 퐵 ≥ 퐵 . We can conclude that 퐵 ≥ . Since 푏 ≤ , we have 푖−1 표푑푑 푒푣푒푛 표푑푑 2 푛−1 4

푏 푏 ≤ 표푑푑. 푛−1 2

푩 퐵 Case 1 - 풃 > 풆풗풆풏: Since 퐵 = 퐵 + 퐵 and 푏 ≤ 푏 ≤ , we can perform a 풏−ퟐ ퟐ 표푑푑 푒푣푒푛 푛−2 푛−1 4

퐵 sequence of block-exchange operation that results in 푏 = 푒푣푒푛 while preserving 푏 ≤ 푛−2 2 푛−1

퐵 퐵 표푑푑. Further, since 퐵 + 퐵 − 푏 − 푏 ≥ , these block-exchange operations can 2 표푑푑 푒푣푒푛 푛−1 푛−3 2

be restricted to avoid 푛 − 1 and 푛 − 3. Finally, perform a sequence of block exchanges

푏 푏 ((푛 − 2, 푘), 푐) where 푐 = min {푏 , 푒푣푒푛 − 푏 } and 푘 휖 {1,3, … , 푛 − 5}, until 푒푣푒푛 = 푏 푘 2 푛−2 2 푛−2

풃 Case 2 - 풃 ≤ 풆풗풆풏: if 퐵 푖푠 표푑푑, then perform the block-exchange (푛 − 2, 푛 − 5,1). 풏−ퟐ ퟐ 푒푣푒푛

퐵 Stage 3: After Stage 1 and Stage 2 are completed 푏 = 0. Also, 푏 ≤ 표푑푑 , 푏 ≤ 푛 푛−1 2 푛−2

푏 푒푣푒푛 and both 퐵 + 퐵 are even. Thus, we can apply b-dominator to 퐺[푋] and 퐺[푌] to 2 표푑푑 푒푣푒푛 compute the series of block exchanges to zero 푋 and 푌 respectively.

Proof: The union of all the block exchanges applied in Stages 1, 2 and 3 form a zeroing set of block exchanges that correspond to the matching 푀2 whose support is disjoint from 푀1.

Thus, there exists two independent perfect b-matchings in the complete graph with a normalized maximum of one-sixth.

Chapter 5: Empirical Analysis of b-dominator for

Independent Matchings

An empirical analysis of the performance of the b-dominator for independent matchings was performed. The setup of the testing is outlined along with a discussion of the results will be covered in this chapter.

5.1 b-dominator for Independent b-matchings

A simple extension of the b-dominator algorithm can be used to solve for independent b- matchings. Recall that b-dominator will return a b-matching on a graph. We will now introduce an algorithm for 푘 independent b-matchings.

퐺1 = 퐺 (1)

For 푖 = 1 to 푘 do: (2)

푀푖 = b-dominator (퐺푖) (3)

퐺푖+1 = 퐺푖 − 푆푢푝푝표푟푡(푀푖) (4)

Output independent set of b-matchings (푀1, … , 푀푘) (5)

Equation 4 - Pseudo code for b-dominator applied to independent matchings

The algorithm will begin by taking in graph 퐺 and a goal number of independent matchings to achieve 푘. Following, the algorithm will iterate from 1 to 푘 and perform the b-

dominator algorithm on 퐺푖. A matching 푀푖 is returned and a new graph, 퐺푖+1 is created by subtracting the support of 푀푖 from 퐺푖. The algorithm will continue until 푘 is reached, then a set of independent matchings is returned.

5.2 Empirical Analysis of Independent b-matchings

This section will discuss the setup of the empirical testing. Then show the results of testing all of the graphs. Lastly, a discussion of the results is provided at the end of this section.

5.2.1 Empirical Analysis Setup

To perform this empirical analysis the NetworkX was again used. Analysis was performed on the five types of graphs discussed in Chapter 2: the complete graph, probability random graph, edge based random graph, Barabási-Albert preferential attachment graph, and the

Watts-Strogatz small world graph. To score the effectiveness of the b-dominator algorithm for independent matchings, the recovery percentage is used again.

Once more, each of the types of graphs is generated using a NetworkX generator method and isolated nodes are removed if applicable to the graph type. A graph with b-values is generated similarly to the method described in Chapter 3, such that a perfect b-matching is guaranteed to exist. This graph is passed to the b-dominator for independent matchings. The recovery percentage for each independent matching from 1 to 푘 is recorded as well as the maximum number of possible independent matches before a vertex becomes isolated. Each type of graph is tested 1000 times and the data is averaged together.

5.2.2 Analysis of Complete Graph

To analyze the complete graph, four independent matchings were discovered by b- dominator for each vertex count of 50, 75 100, 150, 200 and 250. The results were averaged together and a plot of the recovery percentage at each stage can be seen in Table 9. The maximum number of matches for each vertex count can be seen in Figure 11, along with the independent matching’s average recovery percentage. Complete Graph Independence Vertices Final Maximum 푀1 푀2 푀3 푀4 Recovery Matchings 50 90.3309% 21.97 100.00% 96.54% 96.98% 95.62% 75 90.4985% 33.27 100.00% 97.70% 97.87% 97.06% 100 90.5321% 44.49 100.00% 98.43% 98.48% 97.82% 150 90.5171% 67.35 100.00% 98.85% 99.03% 98.50% 200 90.4692% 90.45 100.00% 99.25% 99.18% 98.98% 250 90.5169% 113.43 100.00% 99.31% 99.38% 99.17% Table 9 - Complete graph performance of b-dominator for finding independent b-matchings

Complete Graph Independence 1

0.995

0.99

0.985

0.98 Matching 1 0.975 Matching 2 0.97 Matching 3

Recovery Percentage Recovery 0.965 Mathcing 4 0.96

0.955

0.95 0 50 100 150 200 250 300 Verticies

Figure 11 - Complete graph independent b-matching recovery percentage

5.2.3 Analysis of Random Graph

Both types of random graphs, edge and probability based, are tested for four independent matchings with a vertex count of 100. First, the probability based random graph was tested varying edge probability of 0.4, 0.5, 0.6, 0.7 and 0.8. The independent matching’s recovery percentage can be seen in Table 10, along with a plot of the edge probability and recovery percentages for each matching in Figure 12. The edge based random graph is tested with a varying edge count of 300, 400, 500, 1000, 2000, 2500 and 4500. The independent matching’s recovery percentage can be seen in Table 11, along with a plot of the edge count and recovery percentages for each matching in Figure 13.

Probability Based Random Graph Independence Edge Final Maximum 푀1 푀2 푀3 푀4 Probability Recovery Matchings 40% 90.5918% 17.236 98.60% 97.07% 96.19% 95.16% 50% 90.5217% 21.996 99.03% 97.73% 96.80% 95.94% 60% 90.5132% 26.559 99.21% 98.23% 97.52% 96.95% 70% 90.5646% 31.039 99.46% 98.47% 97.85% 96.84% 80% 90.5233% 35.623 99.65% 98.62% 97.96% 97.20% Table 10 - Probability based random graph performance of b-dominator for finding independent b-matchings

Probability Based Random Graph Independence 1 0.995 0.99 0.985 0.98 0.975 Matching 1 0.97 Matching 2 0.965 Matching 3

Recovery Percentage Recovery 0.96 Matching 4 0.955 0.95 0.945 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Edge Probability

Figure 12 - Probability based random graph independent b-matching recovery percentage

Edge Based Random Graph Independence Edges Final Maximum 푀1 푀2 푀3 푀4 Recovery Matchings 300 90.8373% 1.354 91.68% Null Null Null 400 90.7597% 2.145 93.41% Null Null Null 500 90.5964% 3.148 94.31% 90.30% Null Null 1000 90.4411% 7.944 97.00% 94.77% 91.98% 90.02% 2000 90.5861% 17.502 98.62% 97.37% 96.19% 94.93% 2500 90.5426% 22.185 99.12% 97.75% 97.03% 95.79% 4500 90.5483% 40.390 99.81% 98.50% 98.17% 97.93% Table 11 - Edge based random graph performance of b-dominator for finding independent b-matchings

Edge Based Random Graph Indpendence 1

0.98

0.96

Matching 1 0.94 Mathcing 2 Matching 3 0.92 Recovery Percentage Recovery Matching 4

0.9

0.88 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Edge Count

Figure 13 – Edge based random graph independent b-matching recovery percentage

5.2.4 Analysis of Barabási-Albert Graph

The analysis of b-dominator’s performance for finding independent matchings in a

Barabási-Albert graph is performed with a vertex count of 100 and a varying initial attachment factor of 2, 4, 6, 8, 10, 15 and 20. The generator will work in the same manner described in section 3.3.4. Due to the structure of a Barabási-Albert graph, a single vertex will have a very large b-value and control a significant portion of the weight of the graph. Thus, numerous independent matchings are very scarce as the first matching will use many of the available edges in the graph and leave a small amount for the remaining matches. Therefore, we will only be searching for three independent matchings when analyzing the Barabási-Albert graph. The independent matchings’ final recovery percentage can be seen in Table 12, along with a plot of the initial attachment and recovery percentages for each matching in Figure 14.

Barabási-Albert Preferential Attachment Independence Initial Final Maximum 푀1 푀2 푀3 Attachments Recovery Matchings 2 82.6668% 1.08 82.74% Null Null 4 82.5892% 2.77 89.18% Null Null 6 81.5015% 4.91 91.83% 86.48% Null 8 83.6202% 5.64 93.42% 89.33% Null 10 83.0792% 7.53 94.61% 91.08% Null 15 87.1238% 8.32 96.23% 93.79% 91.66% 20 86.4001% 11.72 97.15% 95.14% 92.96% Table 12 – Barabási-Albert graph performance of b-dominator for finding independent b-matchings

Barabási-Albert Preferential Attachment Indpendence 1 0.98 0.96 0.94 0.92 0.9 Matching 1 0.88 Matching 2

0.86 Matching 3 Recovery Percentage Recovery 0.84 0.82 0.8 0 5 10 15 20 25 Initial Attachments

Figure 14 – Barabási-Albert graph independent b-matching recovery percentage

5.2.5 Analysis of Watts-Strogatz Small World

The analysis of b-dominator’s performance for finding independent matchings in a

Barabási-Albert graph is performed with a vertex count of 100, along with a pair of varying parameters starting degree and rewiring probability. The starting degree is selected from the set

{3, 5, 10, 15, 20, 25} and the rewiring probability is selected from twenty-five percent to seventy-five percent, incrementing by five percent. Each combination of starting degree and rewiring probability is tested. Due to the low number of edges in the graphs with starting degrees of 3 and 5 they were unable to reach more than a single independent matching. Since only one matching is found, the results of graphs with initial degrees of 3 and 5 are not plotted. The abridged independent matchings’ recovery percentages can be seen in Table 13, along with a

plot of the initial attachment and recovery percentages of each matching in

Watts-Strogatz - Initial Degree 10 1 0.99 0.98 0.97 0.96 0.95 Matching 1 0.94 Matching 2 Matching 3

Recovery Percentage Recovery 0.93 0.92 0.91 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Rewiring Probability

Figure 15, Figure 16, Figure 17 and Figure 18.

Watts-Strogatz Small World Graph Independence Initial Rewiring Final Recovery Maximum 푀1 푀2 푀3 푀4 Degree Probability Matchings 3 0.25 94.5029% 1 94.31% Null Null Null 3 0.35 92.8398% 1 93.11% Null Null Null 3 0.45 91.4948% 1 91.50% Null Null Null 3 0.55 90.1226% 1 90.29% Null Null Null 3 0.65 88.8084% 1 89.16% Null Null Null 3 0.75 87.8852% 1 87.80% Null Null Null 5 0.25 97.8599% 1.011 97.78% Null Null Null 5 0.35 96.8307% 1.007 96.70% Null Null Null 5 0.45 95.6076% 1.01 95.57% Null Null Null 5 0.55 94.2883% 1.025 94.33% Null Null Null 5 0.65 93.1575% 1.032 93.14% Null Null Null 5 0.75 91.9947% 1.053 92.30% Null Null Null 10 0.25 96.0113% 3.888 99.47% 98.26% 95.81% Null 10 0.35 95.1779% 3.891 99.24% 97.91% 94.21% Null 10 0.45 94.5318% 3.862 99.02% 97.01% 93.27% Null 10 0.55 93.7871% 3.867 98.50% 96.16% 92.21% Null 10 0.65 93.1184% 3.894 97.99% 95.27% 91.57% Null 10 0.75 92.5400% 3.922 97.03% 94.58% 90.80% Null 15 0.25 95.8712% 5.71 99.57% 98.78% 98.12% 96.35% 15 0.35 95.1369% 5.718 99.40% 98.50% 97.50% 95.14% 15 0.45 94.4648% 5.737 99.14% 98.23% 96.56% 94.19% 15 0.55 93.8441% 5.793 98.93% 97.52% 95.52% 93.02% 15 0.65 93.2915% 5.844 98.55% 96.93% 94.94% 92.81% 15 0.75 92.8407% 5.888 97.92% 96.08% 94.17% 92.29% 20 0.25 95.5607% 8.524 99.59% 98.68% 98.40% 97.67% 20 0.35 94.9349% 8.569 99.44% 98.78% 98.17% 97.46% 20 0.45 94.3754% 8.649 99.31% 98.48% 97.90% 96.74% 20 0.55 93.9113% 8.721 99.03% 98.12% 97.41% 96.18% 20 0.65 93.4955% 8.768 98.71% 97.71% 96.71% 95.84% 20 0.75 93.1314% 8.857 98.45% 97.24% 96.45% 95.28% 25 0.25 95.3748% 10.428 99.60% 98.93% 98.39% 97.85% 25 0.35 94.7789% 10.531 99.38% 98.73% 98.04% 97.78% 25 0.45 94.2594% 10.629 99.23% 98.57% 97.89% 97.10% 25 0.55 93.8740% 10.728 99.16% 98.32% 97.74% 96.95% 25 0.65 93.5583% 10.784 98.77% 97.97% 97.51% 96.48% 25 0.75 93.1898% 10.867 98.48% 97.61% 96.83% 96.52% Table 13 – Watts-Strogatz graph performance of b-dominator for finding independent b-matchings

Watts-Strogatz - Initial Degree 10 1 0.99 0.98 0.97 0.96 0.95 Matching 1 0.94 Matching 2 Matching 3

Recovery Percentage Recovery 0.93 0.92 0.91 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Rewiring Probability

Figure 15 - Watts-Strogatz graph with initial degree 10, independent b-matching recovery percentage

Watts-Strogatz - Initial Degree 15 1

0.99

0.98

0.97

0.96 Matching 1

0.95 Matching 2 Matching 3 0.94 Recovery Percentage Recovery Matching 4 0.93

0.92

0.91 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Rewiring Probability

Figure 16 - Watts-Strogatz graph with initial degree 15, independent b-matching recovery percentage 62

Watts-Strogatz -Initial Degree 20 1 0.995 0.99 0.985 0.98 Matching 1 0.975 Matching 2 0.97 Matching 3

Recovery percentage Recovery 0.965 Matching 4 0.96 0.955 0.95 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Rewiring Probability

Figure 17 - Watts-Strogatz graph with initial degree 20, independent b-matching recovery percentage

Watts-Strogatz - Initial Degree 25 1

0.995

0.99

0.985 Matching 1 0.98 Matching 2

0.975 Matching 3 Recovery percentage Recovery 0.97 Matching 4

0.965

0.96 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Rewiring Probability

Figure 18 - Watts-Strogatz graph with initial degree 25, independent b-matching recovery percentage

5.2.6 Analysis of Special Cases

The same special cases covered in Chapter 3 will be analyzed again and the performance of b-dominator for finding impendent matchings will tested. However, the results for the random and complete graph are not going to be shown for every scenario in which there is a dominating vertex, only a single matching will ever be found. This occurs because a single vertex controls such a large portion of the b-values that it requires every neighbor to perform a block exchange operation with it, and thus maximum independent matchings cannot exist. The results for a near- complete graph can be seen in Table 14. The results for the ring graph can be seen in Table 15.

Near-Complete Graph Independence

Vertices Edges 푀1 푀2 푀3 푀4 50 1224 100% 96.7884% 96.9970% 95.7284% 75 2774 100% 97.4990% 98.0493% 96.8851% 100 4949 100% 98.4222% 98.4785% 97.8383% 150 11174 100% 98.8517% 99.0791% 98.5403% 200 19899 100% 99.1909% 99.2350% 98.8580% 250 31124 100% 99.3095% 99.4437% 99.1480% Table 14 - Near-Complete graph performance of b-dominator for finding independent b-matchings

Ring Graph Independence Vertices Degree 푀1 푀2 푀3 푀4 100 2 99.3847% Null Null Null 100 3 99.3697% Null Null Null 100 4 99.8258% Null Null Null 100 5 99.8606% Null Null Null 100 6 99.9954% 98.0866% Null Null 100 7 99.9980% 98.0466% Null Null 100 8 100.0000% 98.9320% 96.1744% Null 100 10 100.0000% 98.9152% 98.4094% Null 100 12 100.0000% 98.9130% 98.5296% 97.8088% 100 15 100.0000% 98.9302% 98.4477% 98.2563% 100 20 100.0000% 98.8514% 98.6408% 98.3492% 100 25 100.0000% 98.8902% 98.7441% 98.3362% Table 15 -Ring graph performance of b-dominator for finding independent b-matchings

5.3 Discussion

The ability of b-dominator to find multiple independent b-matchings is quite impressive.

Unsurprisingly, the complete and near-complete graph performed very well when finding independent matchings. The complete graph was able to perform the best of all the graphs, especially when the finding matchings past the initial matching. The complete graph was able to find significantly more independent matchings, but in none of the trials were 푀푛>1 matchings a perfect b-matching. This is quite surprising as intuition would say that the possibility of a perfect matching would increase due to the amount of remaining edges to find a second independent matching and that the algorithm in certain scenarios would be able to find a matching. The

random graphs also performed reasonably well. The performance of b-dominator improved in all matchings as the graph became nearer to the complete graph. This is expected as this would lead to more remaining candidate edges to be found in subsequent graphs.

The Barabási-Albert preferential attachment graph performed poorly in some scenarios, but sufficiently in others. B-dominator performed very poor when the initial attachment is low, but improved drastically as the initial attachment increased. Independent matchings will be hard to find in graphs with a high variance in the degree of the vertices. This occurs in the Barabási-

Albert graph because of the vertices that control the graph and a hub topology will occur in the graph. The vertices with a high degree will require a significant number of block exchanges to reduce its b-value to zero. The subsequent matchings will not be able to use the edges again and this leads to a dramatic drop in the recovery percentage from matching 푀푛 to 푀푛+1 compared to the drop observed in the other graphs between 푀푛 to 푀푛+1.

B-dominator will perform quite well on the Watts-Strogatz graph when the initial degree is high, and the rewiring probability is low. Recall that the generator for a small world graph begins with a ring graph and will rewire edges, thus making less like the ring graph and more similar to the structure exhibited in the random graphs. The recovery percentage results also show the trend that as the rewiring percentage increases, the performance will trend more towards the performance of the edge based random graph with the same number of edges.

Chapter 6: Conclusion

The research described in this thesis is driven by the desire to understand the relationship between various characteristics in a graph and their impact upon the ability to find a b-matching in the graph. The goals of this thesis are as follows:

1. Develop an efficient approximation algorithm for finding a maximum b-matching in

graph as well as multiple independent matchings.

2. Perform empirical research for the performance of the algorithm for finding a single b-

matching in numerous types of graphs to stress test the algorithm’s ability to handle

various different scenarios.

3. Perform empirical research on the performance of the algorithm’s ability to find

independent b-matching and observe the ability to maximize each subsequent matching.

4. Show the NP-completeness of finding impendent perfect b-matchings even in the

complete graph.

5. Explore the possibility of using the algorithm for developing a peer-to-peer network in

which bytes of data can be exchanged between peers for the use of a back-up scheme.

The work of this thesis extends the current progress on developing b-matching

algorithms. This paper introduced an approximation algorithm, b-dominator, with an

3 experimentally found 훼 = . In addition to the analysis performed of finding a single 4

matching using b-dominator, this research focused closely upon finding independent b- matchings within a graph.

The analysis is extended to search for multiple matchings within a single graph. The b- matchings were scored using a recovery percentage metric, which is the sum of the b-values in the matching divided by the maximum matching’s b-values. Since this is obviously difficult to compute, the graphs were crafted in such a way that the maximum matching would simply be the perfect matching and therefore the recovery score can be calculated by summing all the entire b-values in the graph.

The b-dominator algorithm was shown to perform optimally when several conditions were met. The average degree is large, the variance in the degree is quite small or non- existent, and the two-hop neighbors are also neighbors of vertices.

The thesis also shows that it is indeed NP-complete to find multiple perfect matchings in the case of a normalized maximum of one third by reducing the problem from the partition of equal sized subsets problem.

A peer-to-peer back-up scheme can easily implement the b-dominator algorithm to ensure a fair exchange of data. The back-up scheme can implement redundancy using independent matchings. Data that is not backed up on the peer-to-peer network could simply be stored on a small server, as a high majority of the data would be stored on the network.

Chapter 7: Further Research

Future research for this thesis can be taken either in an empirical or theoretical direction.

Empirically, there very well can exist optimizations to the b-dominator algorithm that further implementations may reveal. An extremely rigorous analysis could reveal that specific graphs allow for an optimization to the algorithm. A true implementation of the algorithm for use in a peer-to-peer network set up would be quite interesting due to the fact that the algorithm as designed requires to be initiated beginning with an empty matching. An online implementation of the algorithm would be very curious as it would have to handle constant entering and leaving of the network. Though, with the proposed scenario of corporate and photography back-up, the graph would be quite static. Further analysis could be conducted on the three values that were found to influence the performance of b-dominator.

Theoretical future research could include further analysis of the proofs using the normalized maximum in Chapter 4, as well as the application to future problems as they arrive.

The initial empirical research performed showed that b-dominator always returned at least seventy-five percent of the maximum further analysis could lead to a theoretical approximation factor to support the empirical finding. This empirical research found that a ring graph with an initial degree of eight is sufficient to find a perfect b-matching. Since the ring graph is a subset of complete graph, there should be a transition from a complete graph to a ring graph that would still return a perfect b-matching. Finally, a rigorous analysis of the ring graph could reveal if there is a point when a b-matching cannot be found in a ring graph with initial attachment of eight. The results from this theoretical research would be quite intriguing. 69

Bibliography

[1] Barabási and Pósfai, Network science, Cambridge, United Kingdom: Cambridge

University Press, 2016.

[2] D. J. Watts and S. H. Strogatz, "Collective dynamics of 'small world' networks," nature,

vol. 393, no. 6684, pp. 440-442, 4 June 1998.

[3] R. M. Karp, Reducibility among Combinatorial Problems. In: Miller R.E., Thatcher J.W.,

Bohlinger J.D. (eds) Complexity of Computer Computations. The IBM Research

Symposia Series., Boston, MA: Springer, 1972.

[4] R. E. Korf, "A complete anytime algorithm for number partitioning," Artificial

Intelligence, vol. 106, no. 2, pp. 181-203, 1998.

[5] D. P. Williamson and D. B. Shmoys, The design of approximation algorithms, Cambridge,

New York: Cambridge University Press, 2011.

[6] C. Batten, K. Barr, A. Saraf and S. Trepetin, "pStore: A secure peer-to-peer backup

system," Unpublished report, MIT Laboratory for Computer Science, pp. 130-139, 2001.

[7] M. Müller-Hannemann and A. Schwartz, "Implementing Weighted b-Mathcing

Algorithms: Towards a Flexible Software Design," Journal of Experimental Algorithmics

(JEA), vol. 4, pp. 7 - es, 1999.

[8] J. Edmonds, "Maximum Matching and a Polyhedron With 0,1 - Verticies," JOURNAL OF

RESEARCH of the National Bureau of Standards - B. Mathmatics and Mathmatical

Physics, vol. 69B, no. 1 and 2, pp. 125-130, 1965.

[9] A. Khan, A. Pothen, M. A. Patwary, N. Satish, N. Sundaram, F. Manne, M. Halappanavar

and P. Dubey, "Efficient Aproximation Algorithm for Weighted B-Matching," SIAM

Journal on Scientific Computing, vol. 38, no. 5, pp. S593--S619, 2016.

[10] W. R. Pulleyblank, "Faces of Mathcing Polyhedra," Faculty of Mathematics, Waterloo,

Ontario, 1973.

[11] V. Kolmogorov, "Blossom V: a new implementation of a minimum cost perfect matching

algorithm," Mathematical Programming Computation, vol. 1, no. 1, pp. 43-67, 2009.

[12] M. W. Padberg and M. R. Rao, "Odd Minimum Cut-Sets and b-Matchings," Mathmatics

of Operations Research, vol. 7, no. 1, pp. 67-80, February 1982.

[13] B. C. Huang and T. Jebra, "Fast b-matching via sufficient selection belief propagation," in

International Conference on Artificial Intelligence and Statistics, Fort Lauderdale,

Florida, 2011.

[14] F. Manne and M. Halappanavar, "New Effective Multithreaded Matching Algorithms," in

IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ,

2014.

[15] A. D. Brodio and A. Clauset , "Scale-free networks are rare," Nature communications, vol.

10, no. 1, pp. 1-10, 2019.

Appendix:

The code from below can be found on GitHub at this link: https://github.com/Christopher-

Ochs/B-dominatorAnalysis. Simply install all the requirements in the requirements file and run either single_matching.py or k_inpendent_matchings.py. Ensure that utils.py is included in the same folder as single_matching.py and k_inpendent_matchings.py.

Utils.py import matplotlib.pyplot as plt import networkx as nx import random as rand import copy import operator

def generate_random_prob_graph(num_nodes, prob): """ Generates a random probability graph and removes isolated nodes :param num_nodes: number of nodes in the graph :param prob: probability every pair of nodes will create an edge :return: Graph """ graph = nx.gnp_random_graph(num_nodes, prob) graph.remove_nodes_from(list(nx.isolates(graph)))

while not nx.is_connected(graph): graph = nx.gnp_random_graph(num_nodes, prob) graph.remove_nodes_from(list(nx.isolates(graph)))

return graph

def generate_random_edge_graph(num_nodes, num_edges): """

Generates a networkx graph and removes any isolated nodes in the graph :param num_nodes: number of nodes in the generated graph :param num_edges: number of edges in the generated graph :return: Graph """ graph = nx.dense_gnm_random_graph(num_nodes, num_edges) graph.remove_nodes_from(list(nx.isolates(graph)))

while not nx.is_connected(graph): graph = nx.dense_gnm_random_graph(num_nodes, num_edges) graph.remove_nodes_from(list(nx.isolates(graph)))

return graph

def generate_barabasi_pref_graph(num_nodes, attachments): """ Generates a networkx graph according to Barabasi Albert's preferential attachment :param num_nodes: number of nodes in the generated graph :param attachments: number of edges to create when a new node is added :return: Graph """ return nx.barabasi_albert_graph(num_nodes, attachments)

def generate_complete_graph(num_nodes): """ Generates a complete networkx graph :param num_nodes: number of nodes in the generated complete graph :return: Graph """ return nx.complete_graph(num_nodes)

def generate_small_world(num_nodes, starting_degree, rewire_probability): """ Generates a small world graph. :param num_nodes: number of nodes :param starting_degree: initial degree of each node :param rewire_probability: probability that each edge is destroyed and a new neighbor is selected. The criteria for new neighbor selection can be seen here: 73

:return: Graph """ return nx.connected_watts_strogatz_graph(num_nodes, starting_degree, rewire_probability)

def add_weights_to_graph(graph, weight_min, weight_max): """ Adds random weights weights to the edges on the graph and returns the graph. :param graph: Graph to add weights to :param weight_min: Minimum weight value allowed for edge :param weight_max: Maximum weight value allowed for edge :return: The same graph with weights added to edges """ rand.seed()

for (u, v, w) in graph.edges.data(): graph.add_edge(u, v, weight=rand.randint(weight_min, weight_max))

return copy.deepcopy(graph)

def add_b_values_to_graph(graph): """ :param graph: Networkx graph to add b-values to, must have weighted edges :return: a new graph with b-values added """ # We don't want to modify the original graph so make a copy first g = copy.deepcopy(graph) nx.set_node_attributes(g, 0, 'bValue') weights = nx.get_edge_attributes(g, 'weight')

for (node1, node2), w in weights.items(): g.add_node(node1, bValue=(w + get_b_value(g, node1))) g.add_node(node2, bValue=(w + get_b_value(g, node2)))

nx.set_edge_attributes(g, 0, 'weight')

return g

def show_graph_with_b_values(graph, pos): 74

""" Displays the graph with b values on each node :param graph: Networkx Graph with param 'bValue' as node attribute :param pos: Position of each node :return: None """ plt.figure(1, figsize=(12, 12)) labels = nx.get_node_attributes(graph, 'bValue') nx.draw_networkx(graph, pos, labels=labels, width=6, node_size=700) plt.show()

def show_graph(graph, pos): """ Displays whe graph; may contain weighted edges :param graph: Networkx graph :param pos: position of nodes :return: None """ plt.figure(1, figsize=(12, 12)) nx.draw_networkx_nodes(graph, pos, node_size=700) nx.draw_networkx_labels(graph, pos, font_size=20, font_family='sans-serif') nx.draw_networkx_edges(graph, pos, width=6) labels = nx.get_edge_attributes(graph, 'weight') nx.draw_networkx_edge_labels(graph, pos, edge_labels=labels) plt.show()

def get_b_value(graph, node): """ :param graph: Networkx graph :param node: Node :return: the value of attribute 'bValue' on the node """ return graph.nodes[node]['bValue']

def get_maximum_b_value(graph): """ :param graph: Networkx graph with node attribute bValues :return: node id and value of maximum bValue 75

""" node_attribute_dict = nx.get_node_attributes(graph, "bValue")

return max(node_attribute_dict.items(), key=operator.itemgetter(1))

def sum_neighbors_b_values(graph, node): """ :param graph: Networkx graph with bValues :param node: node id :return: sum of node's neighbors bValues """ neighbors_sum = 0

for n in graph.neighbors(node): neighbors_sum += get_b_value(graph, n)

return neighbors_sum

def get_eta_value(graph, node): """ Returns the subtraction of the b value of the node from the summation of the b values of neighboring nodes :param graph: Networkx graph :param node: Node :return: the eta value """

return sum_neighbors_b_values(graph, node) - get_b_value(graph, node)

def get_mu_value(graph, node): """ :param graph: Networkx graph :param node: Node :return: the minimum value of the eta value of the neighbors of node """ candidate_values = []

for n in graph.neighbors(node): candidate_values.append(get_eta_value(graph, n)) 76

return min(candidate_values)

def has_neighbor_with_b_value(graph, node): """ :param graph: Networkx graph with bValues :param node: Node id :return: true if any neighbor has bValue; else false """ for n in graph.neighbors(node):

if get_b_value(graph, n) != 0: return True

return False

def find_eta_lt0(graph): """ finds a node with a eta value less than 0; if none exists returns negative 1 NOTE: We don't want to cover the case where all neighbors bValues are 0 :param graph: Networkx Graph :return: node or -1 """ for n in graph.nodes():

if get_b_value(graph, n) != 0 and get_eta_value(graph, n) <= 0 and has_neighbor_with_b_value(graph, n): return n

return -1

def find_eta_e0(graph): """ finds a node with eta value equal to 0; if none exists returns negative 1 :param graph: Networkx Graph :return: node or -1 """ for n in graph.nodes(): # We must ensure that we no longer count nodes whose b value is 0 77

if get_b_value(graph, n) != 0 and get_eta_value(graph, n) == 0: return n

return -1

def find_candidate_edge(graph): """ Enumerates through the edges of the graph and finds two nodes whose b-value are not 0. :param graph: Networkx graph :return: the pair of nodes or -1 """ for u, v in graph.edges():

if get_b_value(graph, u) != 0 and get_b_value(graph, v) != 0: return u, v

return -1, -1

def block_exchange(original_graph, matching_graph, node1, node2, value): """ Reduces the b-value of each node by the param value and adds to matching :param matching_graph: :param original_graph: :param node1: A node, must share edge with node 2 :param node2: A node, must share edge with node 1 :param value: The value to increase the edge weight and reduce the b-values :return: None """ if original_graph.nodes[node1]['bValue'] < value or original_graph.nodes[node2]['bValue'] < value: raise Exception( 'Value should not exceed b-value of either node. Value was {}, Node1 b-value {}, Node2 b-value {}'.format( value, original_graph.nodes[node1]['bValue'], original_graph.nodes[node2]['bValue']))

original_graph.nodes[node1]['bValue'] -= value original_graph.nodes[node2]['bValue'] -= value

if original_graph.nodes[node1]['bValue'] == 0: original_graph.remove_node(node1)

if original_graph.nodes[node2]['bValue'] == 0: original_graph.remove_node(node2)

if matching_graph.has_edge(node1, node2): matching_graph.get_edge_data(node1, node2)['weight'] += value else: matching_graph.add_edge(node1, node2, weight=value)

def get_total_weights(graph): """ Returns the total weights of the edges of the graph. :param graph: Networkx graph :return: total weight of graph """ total = 0

for u, v in graph.edges: total += graph.get_edge_data(u, v)['weight']

return total

def perform_b_matching(original_graph): """ Performs a b-matching on the provided graph with b-values on the nodes. :param original_graph: Networkx graph :return: a graph with weights on edges assigned to from b-values """ # Ensure that no weights exists on edges before beginning matching_graph = nx.Graph()

# Find any dominating nodes and perform the block exchange with all its neighbors node = find_eta_lt0(original_graph)

while node != -1: neighbors = copy.deepcopy(original_graph.neighbors(node))

for n in neighbors: block_exchange(original_graph, matching_graph, n, node, get_b_value(original_graph, n))

# Search for new dominating node node = find_eta_lt0(original_graph)

# Find a candidate edge such that the b-value of both nodes is non zero and continue while this is true (u, v) = find_candidate_edge(original_graph) while u != -1:

# Get the b-value and mu value for each node in the edge b_u = get_b_value(original_graph, u) b_v = get_b_value(original_graph, v) mu_u = get_mu_value(original_graph, u) // 2 mu_v = get_mu_value(original_graph, v) // 2

# Select the smallest value greater than 0 and perform the block exchange with that value value = min(i for i in [b_u, b_v, mu_u, mu_v] if i > 0) block_exchange(original_graph, matching_graph, u, v, value)

# If there is a node whose neighbors b-values equal it's b-value perform the exchange with all neighbors. node = find_eta_e0(original_graph)

while node != -1: neighbors = copy.deepcopy(original_graph.neighbors(node))

for n in neighbors: block_exchange(original_graph, matching_graph, node, n, get_b_value(original_graph, n))

node = find_eta_e0(original_graph)

# Search for new candidate edge (u, v) = find_candidate_edge(original_graph) return matching_graph