PREPRINT SUBMITTED TO A JOURNAL. 1 Biased Edge Dropout for Enhancing Fairness in Graph Representation Learning Indro Spinelli, Graduate Student Member, IEEE, Simone Scardapane, Amir Hussain, Senior Member, IEEE and Aurelio Uncini, Member, IEEE

Abstract—Graph representation learning has become a ubiqui- I.INTRODUCTION tous component in many scenarios, ranging from analysis to energy forecasting in smart grids. In several applica- RAPH structured data, ranging from friendships on tions, ensuring the fairness of the node (or graph) representations G social networks to physical links in energy grids, powers with respect to some protected attributes is crucial for their many algorithms governing our digital life. Social networks correct deployment. Yet, fairness in graph remains under-explored, with few solutions available. In particular, the topologies define the stream of information we will receive, tendency of similar nodes to cluster on several real-world often influencing our opinion [1][2][3][4]. Bad actors, some- graphs (i.e., homophily) can dramatically worsen the fairness times, define these topologies ad-hoc to spread false informa- of these procedures. In this paper, we propose a biased edge tion [5]. Similarly, recommender systems [6] suggest products dropout algorithm (FairDrop) to counter-act homophily and tailored to our own experiences and history of purchases. improve fairness in graph representation learning. FairDrop can be plugged in easily on many existing algorithms, is efficient, However, pursuing the highest accuracy as the only metric of adaptable, and can be combined with other fairness-inducing interest has let many of these algorithms discriminate against solutions. After describing the general algorithm, we demonstrate minorities in the past [7][8][9], despite the law prohibiting its application on two benchmark tasks, specifically, as a random unfair treatment based on sensitive traits such as race, religion, walk model for producing node embeddings, and to a graph and gender. For this reason, the research community has convolutional network for link prediction. We prove that the proposed algorithm can successfully improve the fairness of started looking into the biases introduced by all models up to a small or negligible drop in accuracy, and algorithms working over graphs [10]. compares favourably with existing state-of-the-art solutions. In One of the most common approaches to process graphs is an ablation study, we demonstrate that our algorithm can flexibly via learning vector embeddings for the nodes (or the edges), interpolate between biasing towards fairness and an unbiased edge dropout. Furthermore, to better evaluate the gains, we e.g., [11]. These are a low dimensional representation of the propose a new dyadic group definition to measure the bias of nodes (or the edges), encoding the local topology. Downstream a link prediction task when paired with group-based fairness tasks then use these embeddings as inputs. Some examples of metrics. In particular, we extend the metric used to measure these tasks are node classification, community detection, and the bias in the node embeddings to take into account the graph link prediction [12]. We will focus on the latter due to its structure. widespread application in social networks and recommender Impact Statement—Fairness in graph representation learn- systems. Alternatively, graph neural networks (GNNs) [13] ing is under-explored. Yet, the algorithms working with these solve link prediction or other downstream tasks in an end- types of data have a fundamental impact on our digital life. Therefore, despite the law prohibits unfair treatment based to-end fashion, without prior learning of embeddings through on sensitive traits, social networks and recommender systems ad-hoc procedures. The techniques developed in this paper can systematically discriminate against minorities. Current solutions be applied to both scenarios. are computationally intensive or significantly lower the accuracy In this work, we will concentrate on the bias introduced by arXiv:2104.14210v1 [cs.LG] 29 Apr 2021 score. To solve the fairness problem, we propose FairDrop, a one of the key aspects behind the success of GNNs and node biased edge dropout. Our approach provides protection against unfairness generated from the network’s homophily w..t the embedding procedures: homophily. Homophily is the principle sensitive attributes. It is easy to integrate FairDrop into today’s that similar users interact at a higher rate than dissimilar ones. solutions for learning network embeddings or downstream tasks. In a graph, this means that nodes with similar characteristics We believe that the lack of expensive computations and the are more likely to be connected. In node classification, this flexibility of our fairness constraint will spread the awareness encourages smoothness in terms of label distributions and of the fairness issue. embeddings, yielding excellent results [14]. Index Terms—Graph representation learning, graph embed- From the fairness point of view, the homophily of sensitive ding, fairness, link prediction, graph neural network. attributes directly influences the prediction and introduces inequalities. In social networks, the “unfair homophily” of Manuscript received xx xx, xx. (Corresponding author: Indro Spinelli.) race, gender or nationality, limits the contents accessible by I. Spinelli, S. Scardapane, and A. Uncini are with the Department of Infor- mation Engineering, Electronics and Telecommunications (DIET), Sapienza the users, influencing their online behaviour [1]. For example, University of Rome, 00184 Rome, Italy (e-mail: [email protected]; the authors of [2] showed that users affiliated with majority [email protected]; [email protected]) political groups are exposed to new information faster and A. Hussain is with the School of Computing, Edinburgh Napier University, UK (e-mail: [email protected]). in greater quantity. Similarly, homophily can put minority Handled by yyy.. groups at a disadvantage by restricting their ability to establish 2 PREPRINT SUBMITTED TO A JOURNAL.

links with a majority group [15]. An unfair link prediction II.RELATED WORKS magnifies this issue, known as “filter bubble”, by increasing The literature on algorithmic bias is extensive and interdis- the segregation between the groups. ciplinary [20]. However, most approaches study independent To mitigate this issue, in this paper we propose a biased and identically distributed data. Just recently, with the success dropout strategy that forces the graph topology to reduce the of GNNs, some works started to investigate fairness in graph homophily of sensitive attributes. At each step of training, representation learning. Some works focused on the creation we compute a random copy of the adjacency matrix biased of fair node embeddings [10][21][19] that can be used as the towards reducing the amount of homophily in the original input of a downstream task of link prediction. Others targeted graph. Thus, models trained on these modified graphs undergo directly the task of a fair link prediction [17][22]. a fairness-biased data augmentation regime. Our approach Some of these approaches base their foundations on adver- limits the biases introduced by the unfair homophily, result- sarial learning. Compositional fairness constraints [10] learn ing in fairer node representations and link predictions while a set of adversarial filters that remove information about preserving most of the original accuracy. particular sensitive attributes. Similarly, Fairness-aware link Measuring fairness in this context requires some adapta- prediction [17] employs an adversarial learning approach to tions. Most works on fairness measures focus on independent ensure that inter-group links are well-represented among the and identically distributed (i.i.d.) data. These works proposed predicted links. FairAdj [22] learns a fair adjacency matrix many metrics, each one protecting against a different bias [16]. during an end-to-end link prediction task. The two techniques In particular, group fairness measures determine the level of closest to our work are DeBayes [19] and Fairwalk [21], which equity of the algorithm predictions between groups of individ- we describe in-depth below. uals. Link prediction requires a dyadic fairness measure that DeBayes [19] extends directly “Conditional Network Em- considers the influence of both sensitive attributes associated bedding” (CNE) [23] to improve its fairness. CNE uses the with the connection [17]. However, it is still possible to apply Bayes rule to combine prior knowledge about the network with group fairness metrics by defining new groups for the edges. a probabilistic model for the Euclidean embedding conditioned on the network. Then, DeBayes maximizes the obtained poste- A. Contributions of the paper rior probability for the network conditioned on the embedding We propose a preprocessing technique that modifies the to yield a maximum likelihood embedding. DeBayes models training data to reduce the predictability of its sensitive the sensitive information as part of the prior distribution. attributes [18]. Our algorithm introduces no overhead and Fairwalk [21] is an adaptation of Node2Vec [11] that aims can be framed as a biased data augmentation technique. A to increase the fairness of the resulting embeddings. It modifies single hyperparameter regulates the intensity of the fairness the transition probability of the random walks at each step, by constraint. This ranges from maximum fairness to an unbiased weighing the neighbourhood of each node, according to their edge dropout. Therefore FairDrop can be easily adapted to the sensitive attributes. needs of different datasets and models. These characteristics make our framework extremely adaptable. Our approach can III.PRELIMINARIES also be applied in combination with other fairness constraints. A. Graph representation learning We evaluate the fairness improvement on two different tasks. In this work we will consider an undirected graph G = Firstly, we measure the fairness imposed on the outcomes of (V, E), where V = {1, . . . , n} is the set of node indexes, and end-to-end link prediction tasks. Secondly, we test the capa- E = {(i, j) | i, j ∈ V} is the set of arcs (edges) connecting bility of removing the contributions of the sensitive attributes pairs of nodes. The meaning of a single node or edge depends from the resulting node embeddings generated from repre- on the application. For some tasks a node i is endowed with a sentation learning algorithms. To measure the improvements vector x ∈ d of features. Each node is also associated to a for the link prediction we also propose a novel group-based i R sensitive attribute s ∈ [0, 1] (e.g., political preference in a bi- fairness metric on dyadic level groups. We define a new dyadic i partite system), which may or may not be part of its features. group and use it in combination with the ones described in Connectivity in the graph can be summarized by the adjacency [17]. Instead, for the predictability of the sensitive attributes matrix A ∈ {0, 1}n×n. This matrix is used to build different from the node embeddings, we measure the representation types of operators that define the communication protocols bias [19]. It is common to use node embeddings as input of across the graph. The vanilla operator is the symmetrically downstream link prediction task. Therefore, we introduce a normalized graph Laplacian [24] Lb = D−1/2LD−1/2 with D new metric that also considers the graph’s topology. P being the diagonal degree matrix where Dii = j Aij, and B. Outline of the paper L the Laplacian matrix L = D − A. We will focus on the task of link prediction where the The rest of the paper is structured as follows. Section objective is to predict whether two nodes in a network are II reviews recent works about enforcing fairness for graph- likely to have a link [25]. For graphs having vector of features structured data and their limitations. Then, in Section III we associated to the nodes, these can combined with the structural introduce GNN models and group-based fairness metrics for information of the graph by solving an end-to-end optimization i.i.d data. FairDrop, our proposed biased edge dropout, is first problem [26]: introduced in SectionIV and then tested in SectionVI. We conclude with some general remarks in Section VII. Aˆ = σ(ZZT ), with Z = GNN(X, A) . (1) INDRO SPINELLI et al.: FAIR EDGE DROPOUT 3 where GNN(·, ·) is a GNN operating on the graph structure ∆FPR = max E[Yˆ = 1|S = s, Y = 0] (6) s (see [27], [28], [29] for representative surveys on GNN − min E[Yˆ = 1|S = s, Y = 0] . models). Another approach first uses unsupervised learning s techniques [30][11] to create a latent representation of each The final metric is then given by: node and then perform a downstream task of link prediction using the learned embeddings and standard classifiers (e.g., ∆EO = max(∆TPR, ∆FPR) . (7) support vector machines). Differently from most GNNs, node embeddings are obtained in an unsupervised fashion, and they However, these definitions are not suitable to be directly can be re-used for multiple downstream tasks. Recently, self- applied to graph-structured data. supervised models for GNNs have been explored to fill this gap [31]. The framework we propose in this paper can be used IV. PROPOSEDBIASEDEDGEDROPOUTTOENHANCE to improve fairness in all these scenarios. FAIRNESS In this work, for every epoch of training (either of the node B. Fairness measures in i.i.d. data embedding or the GNN), we generate a fairer random copy of Fairness in decision making is broadly defined as the ab- the original adjacency matrix to counteract the negative effects sence of any advantage or discrimination towards an individual introduced by the homophily of the sensitive attributes. To or a group based on their traits [32]. This general definition do so, we reduce the number of connections between nodes leaves the door open to several different fairness metrics, each sharing the same sensitive attributes. focused on a different type of discrimination [16]. In this FairDrop can be divided into three main steps as shown in paper we focus on the definition of group fairness, known also Figure1 and summarized in Algorithm1: as disparate impact. In this case, the unfairness occurs when 1) Computation of a mask encoding the network’s ho- the model’s predictions disproportionately benefit or damage mophily w.r.t the sensitive attributes. people of different groups defined by their sensitive attribute. 2) Biased random perturbation of the mask. These measures are usually expressed in the context of a 3) Removing connections from the original adjacency ma- binary classification problem. In the notation of the previous trix according to the computed perturbation. section, denote by Y ∈ [0, 1] a binary target variable, and We conjecture that training a model over multiple fairness- by Yˆ = f(x) a predictor that does not exploit the graph enforcing random copies of the adjacency matrix (one for each structure. As before, we associate to each x a sensitive binary epoch) increases the randomness and the diversity of its input. attribute S. Two widely used criterion belonging to this group Thus the entire pipeline can be considered a data augmentation are demographic parity [33] and equalized odds [34]: procedure biased towards fairness, similarly to how standard • Demographic Parity (DP )[33]: Yˆ satisfies DP if the dropout can be interpreted as noise-based data augmentation likelihood of a positive outcome is the same regardless [36]. of the value of the sensitive attribute S: P (Yˆ |S = 1) = P (Yˆ |S = 0) (2) A. Sensitive attribute homophily mask ˆ • Equalized Odds (EO)[34]: Y satisfies EO if it has equal We build an adjacency matrix M encoding the heteroge- rates for true positives and false positives between the two neous connections between the sensitive attributes. Its compu- groups defined by the protected attribute S: tation is the same for binary and categorical ones. We define P (Yˆ = 1|S = 1,Y = y) = P (Yˆ = 1|S = 0,Y = y) each element of M as: (3) ( 1 if si 6= sj mij = , (8) These definitions trivially extend to the case where the sensi- 0 otherwise tive attribute is a categorical variable with |S| > 2 and, for the rest of the paper, we will consider this scenario. To measure where mij = 1 indicates an heterogeneous edge with respect the deviation from these ideal metrics it is common to use the to the sensitive attribute S. It is easy to see that if we want differences between the groups defined by S. In particular the to improve the fairness of the graph we have to promote the DP difference is defined as the difference between the largest first type of connections. and the smallest group-level selection rate: ∆DP = max E[Yˆ |S = s] − min E[Yˆ |S = s] (4) B. Randomized response s s For the EO difference, we can report the largest discrepancy Randomized response [37] adds a layer of “plausible denia- between the true positive rate difference and the false positive bility” for structured interviews. With this protection, respon- rate difference between the groups defined again by the dents are able to answer sensitive issues while maintaining protected attribute S [35]: their privacy. The mechanism is easy and yet elegant. In its basic form, randomized response works with binary questions. ∆FPR = max E[Yˆ = 1|S = s, Y = 1] (5) s The interviewee throws a coin to decide whether to respond − min E[Yˆ = 1|S = s, Y = 1] , honestly or at random with another coin toss. With unbiased s coins, the user will tell the truth 2/3 of the times and lie the 4 PREPRINT SUBMITTED TO A JOURNAL.

Fig. 1. Schematics of FairDrop. From left to right: (i) Construction of the sensitive attribute homophily mask (Section IV-A); (ii) Randomized response used to inject randomness and to regulate the bias imposed by the constraint (Section IV-B); (iii) Final graph obtained by dropping the connections according to the randomized mask (Section IV-C).

Algorithm 1: FairDrop corresponds to an unbiased edge dropout of p = 1/2 like the Input : Adjacency matrix A, sensitive attribute one proposed by [38]. Increasing δ will incrementally promote (vector) s, mask for adjacency matrix fair connections removing the unfair links in the graph. When 1 δ is at its maximum value, FairDrop removes all the unfair initialized with zeros M , δ ∈ [0, 2 ] ; Output: Afair Fair adjacency matrix edges and keeps all the fair ones. The possibility of varying forall Edges (i, j) ∈ G do how much the algorithm is biased towards fairness makes it mij = s[i] 6= s[j]; adaptable to the needs of different datasets and algorithms. 1 There are almost no additional costs from a standard edge if RANDOM ≤ 2 + δ then rr(mij) = mij ; dropout [38], making this fairness “constraint” extremely fast. else Furthermore, FairDrop shares the capability of alleviating rr(mij) = 1 − mij; over-fitting and over-smoothing. It is possible to apply FairDrop when multiple sensitive A = A ◦ rr(M); fair attributes are present at once. We can build a fair adjacency matrix for every sensitive attribute, and then these can be alternated for the training phase.

remaining times. More formally, randomized response can be D. Deployment defined as follows: FairDrop can be considered a data augmentation technique. ( 1 mij with probability : + δ It modifies the training data to reduce the influence of its rr(m ) = 2 , (9) ij 1 sensitive attributes [18][39]. We are interested in link pre- 1 − mij with probability : 2 − δ diction, both in an end-to-end fashion and by learning node 1 with δ ∈ [0, 2 ]. When δ = 0 randomized response will always embeddings and then predicting the links as a downstream give random answers, maximizing the privacy protection but task. For both approaches, we generate different random copies making the acquired data useless. Contrary, δ = 1/2 gives us that enforce the fairness of the original graph. Like in [38] this clean data without any privacy protection. augments the randomness and the diversity of the input data We use this framework, as shown in the central block of ingested by the learning model. Thus it can be considered a Fig.1, where the input is the mask encoding the homophily data augmentation biased towards fairness. Our approach can of the sensitive attributes and the output will be its randomized also be applied in combination with other fairness constraints. version. In the first scenario, the model performs every training epoch on a different fair random copy of the input data. The C. Fair edge dropout model can be one of the many proposed in recent years that fit in the framework described by Equation1 like GCN [24], Finally, we are ready to drop the unfair connection from the GAT [40], AP-GCN [41] and many others. original adjacency matrix: For learning the network embedding, we paired our edge

Afair = A ◦ rr(M) (10) sampling with Node2Vec [11]. This algorithm learns a con- tinuous representation for the nodes using local information where ◦ denotes the Hadamard product between matrices. The obtained from truncated random walks [30] balancing the resulting matrix Afair is a fairness-enforcing random copy of exploration-exploitation trade-off. In this case, we generate the original adjacency matrix. In particular, with rr(M) we fairness enforcing versions of the graph for every batch of bias the dropout towards fairness by cutting more connections random walks that we want to compute. between nodes sharing the same sensitive attribute. A model trained on this version of the graph will adapt consequently. V. PROPOSEDFAIRNESSMETRICSFORGRAPH The reduction of this unfair homophily will be reflected in the REPRESENTATION LEARNING network embeddings or directly in the model’s predictions. For graph-structured data, it is common to have attributes, It is easy to see that the parameter δ regulates the level of including the sensitive ones, associated with the nodes. There- fairness we enforce on the topology. In fact, setting δ = 0, fore, if we are interested in evaluating the fairness of a node INDRO SPINELLI et al.: FAIR EDGE DROPOUT 5

TABLE I classification task, we can use the metrics described previously DATASET . in Section III-B. However, for the link prediction and node embedding tasks, those metrics cannot be used straightfor- Dataset S |S| Features Nodes Edges Citeseer paper class 6 3703 2110 3668 wardly. For link prediction, we need a dyadic fairness criterion Cora-ML paper class 7 2879 2810 7981 expecting the predictions to be statistically independent from PubMed paper class 3 500 19717 44324 both sensitive attributes associated the edges. In Section V-A DBLP continent 5 None 3980 6965 FB gender 2 None 4039 88234 we propose a novel set of fairness metrics for the task. For node embeddings, in Section V-B we extend the ideas presented in [42] to the downstream task of link prediction. B. Fairness for node embeddings Representation learning aims to convert the graph’s structure into a low-dimensional space. This representation is fair if A. Fairness for link prediction it obfuscates as much as possible the unfair presence of An edge connects two nodes and thus two sensitive at- sensitive attributes in the resulting embeddings [42]. Network tributes. Our objective is to use the same fairness criteria for embeddings are obfuscated if a classification task trying to i.i.d. data (DP , EO) for the links. Therefore, we have to define predict the sensitive attribute yields the same accuracy as a so-called dyadic groups, mapping the groups defined by the random classifier [10][19]. If this is the case, the embeddings sensitive attribute for the nodes to new ones for the edges. It are fair and the downstream task built upon them too. This is thus possible to define different dyadic fairness criterion type of bias afflicting node embeddings is also known as depending on how we build these dyadic groups. In [17] representation bias (RB). To measure the RB, we define an the authors introduced two dyadic criteria, mixed dyadic-level RB score [19] associated with the node embeddings Z as the protection and subgroup dyadic-level protection. We propose balanced accuracy obtained by a classifier. a third criterion called group dyadic-level protection. It aims X |Zs| Node RB = Accuracy (Z ) (12) to be closer to the original groups defined by the sensitive |V| s attributes. The original groups S generate different dyadic s∈S subgroups associated with the edges D of the graph instead where Zs is the set of node embeddings associated to nodes of the nodes. In particular, from the coarsest to the finest, we having S = s and Accuracy(·) denotes the accuracy of a have: classifier trained to predict the sensitive attribute, restricted to the set Z . Since we focus on link prediction, we also • Mixed dyadic [17] (|D| = 2): the original groups deter- s mine two dyadic groups. An edge will be in the intra- extend the definition of RB to the edges. We define the group if it connects a pair of nodes with the same edge embeddings as the concatenation of the embeddings of Z[src,dst] sensitive attribute. Otherwise, it will be part of the inter- the nodes participating in the links . We treat the group. This dyadic definition is common in recommender classification problem as a multi-label since the edges have systems, where intra-group edges are known to create a two sensitive attributes associated with their source nodes Zsrc Zdst “filter bubble” issue. embedding and their destinations . We conjecture that the additional knowledge of the relation between the • Group dyadic (|D| = |S|): there is a one-to-one mapping between the dyadic and node-level groups. To do so, we nodes helps the classification task. Consequently, it is harder to have to consider each edge twice, once for every sensitive obtain a good score, making this metric more challenging. We attribute involved. This dyadic definition ensures that one define the representation bias score for the links as the mean group, in particular, is prominent in the formation of the of balanced accuracy obtained from the two target labels. 1 |Zsrc|   links.   X s [src,dst] (|S|+2−1)! Link RB = Accuracysrc Zs • Sub-group dyadic [17] |D| = 2!(|S|−1)! : each dyadic 2 |E| s∈S (13) sub-group represents a connection between two groups dst |Zs |  [src,dst] (or an intra-connection). The fairness criteria protect the + Accuracydst Zs balance between all the possible inter-group combinations |E|

together with the intra-group ones. where Accuracysrc(·) is the accuracy with respect to the label For example, considering the mixed dyadic group, and denot- of the source node, while Accuracydst(·) is the accuracy with ing the two groups as D = 0 for the inter-group and D = 1 respect to the label of the target node. for the intra-group, we say that a link predictor achieves DP (extending (2)) if: VI.EVALUATION We test the fairness imposed by our algorithm by either P (Yˆ |D = 0) = P (Yˆ |D = 1) . (11) evaluating the network embeddings obtained with Node2Vec, and on a downstream task of link prediction. We summarize All other metrics from Section III-B extend in a similar way the datasets used in this evaluation in TableI. We ran sev- for the three dyadic groups. We present a visualization of all eral experiments with different random seeds and data splits the dyadic groups in Figure2. In Figure3 instead, we show (80%/20% train and test) and we report the average together the composition of these groups for the Citeseer dataset. with the standard deviations. 6 PREPRINT SUBMITTED TO A JOURNAL.

Fig. 2. Visualization of the dyadic group D definitions. We use a single sensitive attribute with |S| = 3. From left to right: (i) Mixed dyadic group definition generates two links group “d = in” for intra connections and “d = out” for inter ones; (ii) Group dyadic creates a one to one mapping between the sensitive attribute S and the dyadic group D; (iii) The sub-group dyadic definition generates a group for every possible combination of the original sensitive attributes.

7000 Mixed dyadic 6000 Group dyadic Sub-group dyadic 5000

4000

3000

Number of2000 edges

1000

0 in out 0 1 2 3 4 5 00 01 02 03 04 05 11 12 13 14 15 22 23 24 25 33 34 35 44 45 55 Dyadic group

Fig. 3. Barplot showing the distribution of the different dyadic groups for the Citeseer dataset.

A. Fair link prediction effect, we perform an ablation study in VI-C.

In these experiments, we test FairDrop’s effect on an end- to-end link prediction task. We tested a GCN [24] and a B. Fair representation learning GAT [40], both with two layers, hidden dimension of 128 Next, we compare our fairness constraint against two state- and learning rate of 0.005. We performed the optimization of-the-art frameworks for debiasing node embeddings, De- once with unbiased edge dropout and once with our fairness Bayes [19] and FairWalk [21]. We compute both node and link enforcing dropout. We use the three benchmarking citation representation bias scores described in SectionI using three networks from TableI, i.e., Citeseer, Cora-ML, and PubMed. different downstream classifiers. In particular, we compare In these graphs, nodes are articles. They have associated a logistic regression (LR), a three-layer neural network (NN) bag-of-words representation of the abstract. Links represent a and a random forest (RF) to compute these quantities. To citation regardless of the direction. The sensitive attribute, in compute the optimal hyperparameters, we perform a grid this case, is the category of the article. We measured the link search with cross-validation. We repeat this step for each one prediction quality with accuracy and the area under the ROC of the generated embeddings to be as fair as possible in the curve (AUC), the latter representing the trade-off between true evaluation. After that, we use the same classifiers to perform a and false positives for different thresholds. To evaluate the downstream task of link prediction. Ideally, the best algorithm fairness of the predictions, we used the demographic parity will achieve the lowest scores on both representation biases difference (∆DP ) and the equality of odds difference (∆OP ) while keeping the accuracy high. using the dyadic groups described in SectionI. We report For this task we use a co-authorship network built in [19] the results in TablesII,IV andIV where the subscript of the from DBLP [43]. They extracted the authors from the accepted fairness metrics is the initial of the dyadic group over which papers of eight different conferences. The nodes of the graphs we computed the quantity. Arrows in the first row describes representing the authors are linked if they have collaborated whether higher or lower values are better for the respective at least once. The sensitive attribute is the continent of the metric. author institution. The other dataset we considered is FB [44] In every dataset, the networks trained with FairDrop which is a combination of ego-networks obtained from a social achieved superior fairness performances. In some cases, Fair- network. The graph encodes users as nodes with gender as a Drop also improved the classification score. To study this sensitive attribute and friendships as links. INDRO SPINELLI et al.: FAIR EDGE DROPOUT 7

TABLE II LINKPREDICTIONON CORA

Method Accuracy ↑ AUC ↑ ∆DPm ↓ ∆EOm ↓ ∆DPg ↓ ∆EOg ↓ ∆DPs ↓ ∆EOs ↓ GCN+EdgeDrop 82.4 ± 0.9 90.1 ± 0.7 56.4 ± 2.4 36.5 ± 4.3 12.3 ± 2.6 15.4 ± 3.3 90.2 ± 2.7 100.0 ± 0.0 GAT+EdgeDrop 80.5 ± 1.2 88.3 ± 0.8 53.7 ± 2.5 37.1 ± 3.2 18.8 ± 3.6 22.5 ± 4.2 93.6 ± 2.9 100.0 ± 0.0 GCN+FairDrop 82.4 ± 0.9 90.1 ± 0.7 52.9 ± 2.5 31.0 ± 4.9 11.8 ± 3.2 14.9 ± 3.7 89.4 ± 3.4 100.0 ± 0.0 GAT+FairDrop 79.2 ± 1.2 87.8 ± 1.0 48.9 ± 2.8 31.9 ± 4.3 15.3 ± 3.2 18.1 ± 3.5 94.5 ± 2.0 100.0 ± 0.0

TABLE III LINKPREDICTIONON CITESEER

Method Accuracy ↑ AUC ↑ ∆DPm ↓ ∆EOm ↓ ∆DPg ↓ ∆EOg ↓ ∆DPs ↓ ∆EOs ↓ GCN+EdgeDrop 78.9 ± 1.3 88.0 ± 1.3 44.9 ± 2.5 27.5 ± 4.1 20.1 ± 2.9 21.6 ± 5.0 71.0 ± 3.4 73.2 ± 9.5 GAT+EdgeDrop 76.3 ± 0.9 85.6 ± 1.0 42.6 ± 2.5 28.4 ± 5.0 22.2 ± 5.1 27.6 ± 6.3 76.7 ± 3.0 77.5 ± 8.8 GCN+FairDrop 79.2 ± 1.4 88.4 ± 1.4 42.6 ± 2.5 26.5 ± 4.2 18.7 ± 4.0 17.6 ± 5.5 67.7 ± 3.5 64.3 ± 9.5 GAT+FairDrop 78.2 ± 1.1 87.1 ± 1.1 42.9 ± 2.2 28.3 ± 4.3 22.5 ± 3.4 25.9 ± 5.2 75.3 ± 3.2 73.4 ± 9.1

TABLE IV LINKPREDICTIONON PUBMED

Method Accuracy ↑ AUC ↑ ∆DPm ↓ ∆EOm ↓ ∆DPg ↓ ∆EOg ↓ ∆DPs ↓ ∆EOs ↓ GCN+EdgeDrop 88.0 ± 0.5 94.6 ± 0.3 43.7 ± 1.0 12.8 ± 0.8 6.3 ± 0.7 6.0 ± 1.1 57.5 ± 1.4 26.3 ± 2.3 GAT+EdgeDrop 80.6 ± 0.9 88.8 ± 0.7 43.5 ± 1.1 24.5 ± 1.9 4.8 ± 1.6 7.5 ± 1.5 60.1 ± 1.9 49.3 ± 3.6 GCN+FairDrop 88.4 ± 0.4 94.8 ± 0.2 42.5 ± 0.5 12.2 ± 0.7 5.6 ± 1.8 5.1 ± 0.9 55.7 ± 1.5 26.6 ± 2.6 GAT+FairDrop 79.0 ± 0.8 87.6 ± 0.7 37.4 ± 0.9 19.7 ± 1.1 2.0 ± 1.0 6.4 ± 1.4 56.8 ± 2.1 47.3 ± 4.1

TABLE V REPRESENTATION LEARNINGON DBPLDATASET higher than the previous ones due to the additional information provided by the concatenation of the two nodes embeddings Method Classifier Node RB ↓ Link RB ↓ Accuracy ↑ forming the link. In this case, FairDrop performs on par with DeBayes LR 20.0 ± 0.0 21.6 ± 1.0 87.2 ± 0.3 DeBayes if we limit our observation to the number of times NN 63.9 ± 4.3 77.4 ± 3.4 91.2 ± 0.7 the algorithm got the best score. However, Node2Vec with RF 20.9 ± 0.3 28.0 ± 0.6 91.6 ± 0.5 FairDrop is orders of magnitudes faster, taking on average FairWalk LR 53.5 ± 5.3 67.1 ± 3.4 92.9 ± 0.6 NN 74.2 ± 4.9 81.9 ± 3.0 93.5 ± 0.9 just 50 seconds against 33 minutes on average required by RF 20.8 ± 0.7 28.2 ± 0.4 90.9 ± 0.7 DeBayes for converging in its standard configuration. FairWalk Node2Vec+FairDrop LR 31.6 ± 3.5 58.8 ± 4.3 88.4 ± 0.8 is less aggressive and this results in less protection but higher NN 34.4 ± 3.5 71.7 ± 3.1 88.1 ± 0.6 accuracy scores on the downstream tasks. RF 20.0 ± 0.0 26.4 ± 0.3 87.2 ± 0.7

TABLE VI C. Ablation REPRESENTATION LEARNINGON FBDATASET We perform an ablation study on the behaviour imposed Method Classifier Node RB ↓ Link RB ↓ Accuracy ↑ by the parameter δ. We report in Figure4 the AUC score DeBayes LR 52.0 ± 0.5 55.7 ± 0.5 96.7 ± 0.1 and both representation biases obtained using a random forest NN 58.9 ± 1.3 91.6 ± 0.5 98.5 ± 0.2 classifier on a single data split for one hundred runs on the FB RF 53.4 ± 0.6 62.7 ± 0.6 97.9 ± 0.2 dataset. We recall that for a higher value of δ, the sampling FairWalk LR 53.1 ± 0.6 63.3 ± 1.0 97.4 ± 0.2 NN 58.9 ± 1.3 99.5 ± 0.2 97.6 ± 0.3 is more biased towards fairness. Surprisingly the AUC of the RF 53.5 ± 0.6 62.3 ± 0.8 97.0 ± 0.2 downstream link prediction task grows as the representation Node2Vec+FairDrop LR 51.3 ± 1.0 62.3 ± 0.8 96.0 ± 0.2 biases decrease. Despite seeming counter-intuitive, we think NN 49.9 ± 1.3 99.0 ± 0.1 96.7 ± 0.5 this is in line with what the authors of [45] observed. They RF 50.0 ± 0.0 61.1 ± 0.4 96.4 ± 0.2 examined the problem of over-smoothing of GNNs for node classification tasks. They argued that one factor leading to this We use the same hyperparameters of Node2Vec for Fair- issue is the over-mixing of information and noise. In particular Walk and FairDrop. For DeBayes, we used the parameters intra-class connections can bring valuable information, while provided in their implementation. inter-class interaction may lead to indistinguishable represen- We present the results in TablesV andVI respectively for tations across classes. However, for the link prediction task, DBPL ad FB. the inverse is true. An excessive similarity between intra-class embedding and dissimilarity between inter-class ones results Node2Vec coupled with FairDrop improves the obfuscation in the known “filter bubble” issue. of the sensitive attribute in the resulting node embeddings with the biggest gain in the Node RB evaluation. Furthermore, the obfuscation is robust against the choice of the downstream VII.CONCLUSION classifier. This improvement comes with a small drop in the We proposed a flexible biased edge dropout algorithm for accuracy of the downstream task. The Link RB scores are enhancing fairness in graph representation learning. FairDrop 8 PREPRINT SUBMITTED TO A JOURNAL.

0.94 Node [6] Y. Tian, S. Peng, X. Zhang, T. Rodemann, K. C. Tan, and Y. Jin, “A recommender system for metaheuristic algorithms for continuous opti- Link 0.93 mization based on deep recurrent neural networks,” IEEE Transactions 0.4 on Artificial Intelligence, vol. 1, no. 1, pp. 5–18, 2020. 0.92 [7] A. Datta, M. C. Tschantz, and A. Datta, “Automated experiments on ad privacy settings: A tale of opacity, choice, and discrimination,” CoRR, vol. abs/1408.6491, 2014. 0.91 0.3 [8] S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq, “Algorith- mic decision making and the cost of fairness,” in Proceedings of the 23rd 0.90 δ AUC ACM SIGKDD International Conference on Knowledge Discovery and 0.2 , KDD ’17, (New York, NY, USA), p. 797–806, Association 0.89 for Computing Machinery, 2017. [9] Z. Obermeyer, B. Powers, C. Vogeli, and S. Mullainathan, “Dissecting racial bias in an algorithm used to manage the health of populations,” 0.88 0.1 Science, vol. 366, no. 6464, pp. 447–453, 2019. [10] A. Bose and W. Hamilton, “Compositional fairness constraints for graph 0.87 embeddings,” in Proceedings of the 36th International Conference on Machine Learning (K. Chaudhuri and R. Salakhutdinov, eds.), vol. 97 0.5 0.6 0.7 0.8 0.9 of Proceedings of Machine Learning Research, pp. 715–724, PMLR, Representation Bias 09–15 Jun 2019. [11] A. Grover and J. Leskovec, “Node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD International Fig. 4. Visualization on the FB dataset of the trade-off between the Conference on Knowledge Discovery and Data Mining, KDD ’16, (New representation biases and the performances on the downstream link prediction York, NY, USA), p. 855–864, Association for Computing Machinery, task. 2016. [12] P. Cui, X. Wang, J. Pei, and W. Zhu, “A survey on network embedding,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, targets the negative effect of the network’s homophily w.r.t pp. 833–852, 2019. the sensitive attribute. A single hyperparameter regulates the [13] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst, “Geometric deep learning: going beyond Euclidean data,” IEEE Signal intensity of the constraint, ranging from maximum fairness to Process. Magazine, vol. 34, no. 4, pp. 18–42, 2017. an unbiased edge dropout. We tested FairDrop performances [14] Q. Huang, H. He, A. Singh, S.-N. Lim, and A. Benson, “Combining label on two common benchmark tasks, specifically unsupervised propagation and simple models out-performs graph neural networks,” in International Conference on Learning Representations, 2021. node embedding generation and link prediction. It is possible [15] F. Karimi, M. Genois,´ C. Wagner, P. Singer, and M. Strohmaier, “Ho- to frame FairDrop as a biased data augmentation pipeline. mophily influences ranking of minorities in social networks,” Scientific Therefore it can be used with different architectures and paired Reports, vol. 8, 07 2018. [16] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, and A. Gal- with other fairness constraints. styan, “A survey on bias and fairness in machine learning,” ArXiv, To measure the gains, we adapted the group fairness metrics vol. abs/1908.09635, 2019. to estimate the disparities between the dyadic groups defined [17] F. Masrour, T. Wilson, H. Yan, P.-N. Tan, and A. Esfahanian, “Bursting the filter bubble: Fairness-aware network link prediction,” Proceedings by the edges. We proposed a new dyadic group aimed to of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 841–848, be closer to the original groups defined by the sensitive Apr. 2020. attributes. Finally we extended the previous metric evaluating [18] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkata- subramanian, “Certifying and removing disparate impact,” in Proceed- the representation bias to take into account the graph structure. ings of the 21th ACM SIGKDD International Conference on Knowl- A possible limitation of FairDrop is its extension to the case edge Discovery and Data Mining, KDD ’15, (New York, NY, USA), where multiple sensitive are present at once. We proposed to p. 259–268, Association for Computing Machinery, 2015. [19] M. Buyl and T. De Bie, “DeBayes: a Bayesian method for debiasing alternate the masks of the different sensitive attributes to guide network embeddings,” in Proceedings of the 37th International Con- the biased dropout. However, this pipeline may be sub-optimal. ference on Machine Learning (H. D. III and A. Singh, eds.), vol. 119 Randomness is a fundamental concept for FairDrop. How- of Proceedings of Machine Learning Research, pp. 1220–1229, PMLR, ever, not every connection has the same importance inside 13–18 Jul 2020. [20] A. Romei and S. Ruggieri, “A multidisciplinary survey on discrimination a graph. Future lines of research may combine perturbation- analysis,” The Knowledge Engineering Review, vol. 29, pp. 582 – 638, aware algorithms [46] with fairness constraints to perform a 2013. more selective edge sampling procedure. [21] T. Rahman, B. Surma, M. Backes, and Y. Zhang, “Fairwalk: Towards fair ,” in Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pp. 3289–3295, International Joint Conferences on Artificial Intelligence Organization, REFERENCES 7 2019. [1] M. McPherson, L. Smith-Lovin, and J. M. Cook, “Birds of a feather: [22] P. Li, Y. Wang, H. Zhao, P. Hong, and H. Liu, “On dyadic fairness: Homophily in social networks,” Annual Review of Sociology, vol. 27, Exploring and mitigating bias in graph connections,” in International no. 1, pp. 415–444, 2001. Conference on Learning Representations, 2021. [2] Y. Halberstam and B. Knight, “Homophily, group size, and the diffusion [23] B. Kang, J. Lijffijt, and T. D. Bie, “Conditional network embeddings,” of political information in social networks: Evidence from twitter,” in International Conference on Learning Representations, 2019. Journal of Public Economics, vol. 143, pp. 73–88, 2016. [24] T. N. Kipf and M. Welling, “Semi-supervised classification with graph [3] D. Centola, “The spread of behavior in an online social network convolutional networks,” in International Conference on Learning Rep- experiment,” Science, vol. 329, pp. 1194 – 1197, 2010. resentations (ICLR), 2017. [4] E. Lee, F. Karimi, C. Wagner, H.-H. Jo, M. Strohmaier, and M. Galesic, [25] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for “Homophily and minority-group size explain perception biases in social social networks,” vol. 58, p. 1019–1031, May 2007. networks,” Nature Human Behaviour, vol. 3, 10 2019. [26] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” NIPS [5] P. K. Roy and S. Chahar, “Fake profile detection on social networking Workshop on Bayesian Deep Learning, 2016. websites: A comprehensive review,” IEEE Transactions on Artificial [27] D. Bacciu, F. Errica, A. Micheli, and M. Podda, “A gentle introduction Intelligence, pp. 1–1, 2021. to deep learning for graphs,” Neural Networks, 2020. INDRO SPINELLI et al.: FAIR EDGE DROPOUT 9

[28] Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: A survey,” Indro Spinelli received the master’s degree in IEEE Transactions on Knowledge and Data Engineering, 2020. artificial intelligence and robotics in 2019 from the [29] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A Sapienza University of Rome, Italy, where he is comprehensive survey on graph neural networks,” IEEE Transactions currently working toward the Ph.D. degree with the on Neural Networks and Learning Systems, 2020. Department of Information Engineering, Electronics, [30] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning of and Telecommunications. His main research inter- social representations,” in Proceedings of the 20th ACM SIGKDD Inter- ests include graph deep learning and trustworthy national Conference on Knowledge Discovery and Data Mining, KDD machine learning for graph-structured data. Before ’14, (New York, NY, USA), p. 701–710, Association for Computing joining the “Intelligent Signal Processing and Mul- Machinery, 2014. timedia” (ISPAMM) group, his work focused on [31] W. Jin, T. Derr, H. Liu, Y. Wang, S. Wang, Z. Liu, and J. Tang, “Self- robotics perception (SLAM). supervised learning on graphs: Deep insights and new direction,” arXiv preprint arXiv:2006.10141, 2020. Simone Scardapane is currently an Assistant [32] N. A. Saxena, K. Huang, E. DeFilippis, G. Radanovic, D. C. Parkes, Professor with the Sapienza University of Rome, and Y. Liu, “How do fairness definitions fare? examining public attitudes Italy, where he works on deep learning applied towards algorithmic definitions of fairness,” in Proceedings of the 2019 to audio, video, and graphs, and their application AAAI/ACM Conference on AI, Ethics, and Society, AIES ’19, (New York, in distributed and decentralized environments. He NY, USA), p. 99–106, Association for Computing Machinery, 2019. has authored more than 70 articles in the fields [33] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel, “Fairness of machine and deep learning. Dr. Scardapane is through awareness,” in Proceedings of the 3rd Innovations in Theoretical a member of the IEEE CIS Social Media Sub- Computer Science Conference, ITCS ’12, (New York, NY, USA), Committee, IEEE Task Force on Reservoir Comput- p. 214–226, Association for Computing Machinery, 2012. ing, of the “Machine learning in geodesy” joint study [34] M. Hardt, E. Price, E. Price, and N. Srebro, “Equality of opportunity group of the International Association of Geodesy. in supervised learning,” in Advances in Neural Information Processing He is the Chair of the Statistical Pattern Recognition Techniques TC of the Systems, vol. 29, Curran Associates, Inc., 2016. International Association for Pattern Recognition, and the chairman of the [35] S. Bird, M. Dud´ık, R. Edgar, B. Horn, R. Lutz, V. Milan, M. Sameki, Italian Association for Machine Learning. H. Wallach, and K. Walker, “Fairlearn: A toolkit for assessing and improving fairness in AI,” Tech. Rep. MSR-TR-2020-32, Microsoft, Amir Hussain is founding Director of the Cen- May 2020. tre of AI and Data Science at Edinburgh Napier [36] D. P. Helmbold and P. M. Long, “On the inductive bias of dropout,” The University, UK. His research interests are cross- Journal of Machine Learning Research, vol. 16, no. 1, pp. 3403–3454, disciplinary and industry-led, aimed at developing 2015. cognitive data science and AI technologies, to en- [37] S. L. Warner, “Randomized response: A survey technique for eliminating gineer the smart and secure systems of tomorrow. evasive answer bias,” Journal of the American Statistical Association, He has (co)authored three international patents and vol. 60, no. 309, pp. 63–69, 1965. PMID: 12261830. nearly 500 publications, including around 200 jour- [38] Y. Rong, W. Huang, T. Xu, and J. Huang, “Dropedge: Towards deep nal papers, and over 20 Books/monographs. He has graph convolutional networks on node classification,” in International led major national, EU and internationally funded Conference on Learning Representations, 2020. projects, and supervised over 30 PhD students to- [39] F. Calmon, D. Wei, B. Vinzamuri, K. Natesan Ramamurthy, and K. R. date. He is founding Chief Editor of Cognitive Computation journal (Springer Varshney, “Optimized pre-processing for discrimination prevention,” in Nature), and Springer Book Series on Socio-Affective Computing. Amongst Advances in Neural Information Processing Systems (I. Guyon, U. V. other distinguished roles, he is a member of the UK Computing Research Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and Committee (expert panel of the IET and the BCS), General Chair of IEEE R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017. WCCI 2020 (the world’s largest IEEE CIS technical event in computational [40] P. Velickoviˇ c,´ G. Cucurull, A. Casanova, A. Romero, P. Lio,` and intelligence, comprising IJCNN, IEEE CEC and FUZZ-IEEE), and IEEE UK Y. Bengio, “Graph Attention Networks,” International Conference on and Ireland Chapter Chair of the IEEE Industry Applications Society. Learning Representations, 2018. accepted as poster. [41] I. Spinelli, S. Scardapane, and A. Uncini, “Adaptive propagation graph Aurelio Uncini (M’88) received the Laurea de- convolutional network,” IEEE Transactions on Neural Networks and gree in Electronic Engineering from the University Learning Systems, pp. 1–6, 2020. of Ancona, Italy, on 1983 and the Ph.D. degree [42] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork, “Learning fair in Electrical Engineering in 1994 from University representations,” in Proceedings of the 30th International Conference on of Bologna, Italy. Currently, he is Full Professor International Conference on Machine Learning - Volume 28, ICML’13, with the Department of Information Engineering, p. III–325–III–333, JMLR.org, 2013. Electronics and Telecommunications, where he is [43] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su, “Arnetminer: teaching Neural Networks, Adaptive Algorithm for Extraction and mining of academic social networks,” in Proceedings of Signal Processing and Digital Audio Processing, and the 14th ACM SIGKDD International Conference on Knowledge Dis- where he is the founder and director of the “Intelli- covery and Data Mining, KDD ’08, (New York, NY, USA), p. 990–998, gent Signal Processing and Multimedia” (ISPAMM) Association for Computing Machinery, 2008. group. His present research interests include adaptive filters, adaptive audio [44] J. Leskovec and J. Mcauley, “Learning to discover social circles in and array processing, machine learning for signal processing, blind signal ego networks,” in Advances in Neural Information Processing Systems processing and multi-sensors data fusion. He is a member of the Institute (F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, eds.), of Electrical and Electronics Engineers (IEEE), of the Associazione Elet- vol. 25, Curran Associates, Inc., 2012. trotecnica ed Elettronica Italiana (AEI), of the International Neural Networks [45] D. Chen, Y. Lin, W. Li, P. Li, J. Zhou, and X. Sun, “Measuring and Society (INNS) and of the Societa` Italiana Reti Neuroniche (SIREN). relieving the over-smoothing problem for graph neural networks from the topological view,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 3438–3445, Apr. 2020. [46] E. Ceci and S. Barbarossa, “Graph signal processing in the presence of topology uncertainties,” IEEE Transactions on Signal Processing, vol. 68, pp. 1558–1573, 2020.