Biased Edge Dropout for Enhancing Fairness in Graph Representation

PREPRINT SUBMITTED TO A JOURNAL. 1 Biased Edge Dropout for Enhancing Fairness in Graph Representation Learning Indro Spinelli, Graduate Student Member, IEEE, Simone Scardapane, Amir Hussain, Senior Member, IEEE and Aurelio Uncini, Member, IEEE Abstract—Graph representation learning has become a ubiqui- I. INTRODUCTION tous component in many scenarios, ranging from social network analysis to energy forecasting in smart grids. In several applica- RAPH structured data, ranging from friendships on tions, ensuring the fairness of the node (or graph) representations G social networks to physical links in energy grids, powers with respect to some protected attributes is crucial for their many algorithms governing our digital life. Social networks correct deployment. Yet, fairness in graph deep learning remains under-explored, with few solutions available. In particular, the topologies define the stream of information we will receive, tendency of similar nodes to cluster on several real-world often influencing our opinion [1][2][3][4]. Bad actors, some- graphs (i.e., homophily) can dramatically worsen the fairness times, define these topologies ad-hoc to spread false informa- of these procedures. In this paper, we propose a biased edge tion [5]. Similarly, recommender systems [6] suggest products dropout algorithm (FairDrop) to counter-act homophily and tailored to our own experiences and history of purchases. improve fairness in graph representation learning. FairDrop can be plugged in easily on many existing algorithms, is efficient, However, pursuing the highest accuracy as the only metric of adaptable, and can be combined with other fairness-inducing interest has let many of these algorithms discriminate against solutions. After describing the general algorithm, we demonstrate minorities in the past [7][8][9], despite the law prohibiting its application on two benchmark tasks, specifically, as a random unfair treatment based on sensitive traits such as race, religion, walk model for producing node embeddings, and to a graph and gender. For this reason, the research community has convolutional network for link prediction. We prove that the proposed algorithm can successfully improve the fairness of started looking into the biases introduced by machine learning all models up to a small or negligible drop in accuracy, and algorithms working over graphs [10]. compares favourably with existing state-of-the-art solutions. In One of the most common approaches to process graphs is an ablation study, we demonstrate that our algorithm can flexibly via learning vector embeddings for the nodes (or the edges), interpolate between biasing towards fairness and an unbiased edge dropout. Furthermore, to better evaluate the gains, we e.g., [11]. These are a low dimensional representation of the propose a new dyadic group definition to measure the bias of nodes (or the edges), encoding the local topology. Downstream a link prediction task when paired with group-based fairness tasks then use these embeddings as inputs. Some examples of metrics. In particular, we extend the metric used to measure these tasks are node classification, community detection, and the bias in the node embeddings to take into account the graph link prediction [12]. We will focus on the latter due to its structure. widespread application in social networks and recommender Impact Statement—Fairness in graph representation learn- systems. Alternatively, graph neural networks (GNNs) [13] ing is under-explored. Yet, the algorithms working with these solve link prediction or other downstream tasks in an end- types of data have a fundamental impact on our digital life. Therefore, despite the law prohibits unfair treatment based to-end fashion, without prior learning of embeddings through on sensitive traits, social networks and recommender systems ad-hoc procedures. The techniques developed in this paper can systematically discriminate against minorities. Current solutions be applied to both scenarios. are computationally intensive or significantly lower the accuracy In this work, we will concentrate on the bias introduced by arXiv:2104.14210v1 [cs.LG] 29 Apr 2021 score. To solve the fairness problem, we propose FairDrop, a one of the key aspects behind the success of GNNs and node biased edge dropout. Our approach provides protection against unfairness generated from the network’s homophily w.r.t the embedding procedures: homophily. Homophily is the principle sensitive attributes. It is easy to integrate FairDrop into today’s that similar users interact at a higher rate than dissimilar ones. solutions for learning network embeddings or downstream tasks. In a graph, this means that nodes with similar characteristics We believe that the lack of expensive computations and the are more likely to be connected. In node classification, this flexibility of our fairness constraint will spread the awareness encourages smoothness in terms of label distributions and of the fairness issue. embeddings, yielding excellent results [14]. Index Terms—Graph representation learning, graph embed- From the fairness point of view, the homophily of sensitive ding, fairness, link prediction, graph neural network. attributes directly influences the prediction and introduces inequalities. In social networks, the “unfair homophily” of Manuscript received xx xx, xx. (Corresponding author: Indro Spinelli.) race, gender or nationality, limits the contents accessible by I. Spinelli, S. Scardapane, and A. Uncini are with the Department of Infor- mation Engineering, Electronics and Telecommunications (DIET), Sapienza the users, influencing their online behaviour [1]. For example, University of Rome, 00184 Rome, Italy (e-mail: [email protected]; the authors of [2] showed that users affiliated with majority [email protected]; [email protected]) political groups are exposed to new information faster and A. Hussain is with the School of Computing, Edinburgh Napier University, UK (e-mail: [email protected]). in greater quantity. Similarly, homophily can put minority Handled by yyy.. groups at a disadvantage by restricting their ability to establish 2 PREPRINT SUBMITTED TO A JOURNAL. links with a majority group [15]. An unfair link prediction II. RELATED WORKS magnifies this issue, known as “filter bubble”, by increasing The literature on algorithmic bias is extensive and interdis- the segregation between the groups. ciplinary [20]. However, most approaches study independent To mitigate this issue, in this paper we propose a biased and identically distributed data. Just recently, with the success dropout strategy that forces the graph topology to reduce the of GNNs, some works started to investigate fairness in graph homophily of sensitive attributes. At each step of training, representation learning. Some works focused on the creation we compute a random copy of the adjacency matrix biased of fair node embeddings [10][21][19] that can be used as the towards reducing the amount of homophily in the original input of a downstream task of link prediction. Others targeted graph. Thus, models trained on these modified graphs undergo directly the task of a fair link prediction [17][22]. a fairness-biased data augmentation regime. Our approach Some of these approaches base their foundations on adver- limits the biases introduced by the unfair homophily, result- sarial learning. Compositional fairness constraints [10] learn ing in fairer node representations and link predictions while a set of adversarial filters that remove information about preserving most of the original accuracy. particular sensitive attributes. Similarly, Fairness-aware link Measuring fairness in this context requires some adapta- prediction [17] employs an adversarial learning approach to tions. Most works on fairness measures focus on independent ensure that inter-group links are well-represented among the and identically distributed (i.i.d.) data. These works proposed predicted links. FairAdj [22] learns a fair adjacency matrix many metrics, each one protecting against a different bias [16]. during an end-to-end link prediction task. The two techniques In particular, group fairness measures determine the level of closest to our work are DeBayes [19] and Fairwalk [21], which equity of the algorithm predictions between groups of individ- we describe in-depth below. uals. Link prediction requires a dyadic fairness measure that DeBayes [19] extends directly “Conditional Network Em- considers the influence of both sensitive attributes associated bedding” (CNE) [23] to improve its fairness. CNE uses the with the connection [17]. However, it is still possible to apply Bayes rule to combine prior knowledge about the network with group fairness metrics by defining new groups for the edges. a probabilistic model for the Euclidean embedding conditioned on the network. Then, DeBayes maximizes the obtained poste- A. Contributions of the paper rior probability for the network conditioned on the embedding We propose a preprocessing technique that modifies the to yield a maximum likelihood embedding. DeBayes models training data to reduce the predictability of its sensitive the sensitive information as part of the prior distribution. attributes [18]. Our algorithm introduces no overhead and Fairwalk [21] is an adaptation of Node2Vec [11] that aims can be framed as a biased data augmentation technique. A to increase the fairness of the resulting embeddings. It modifies single hyperparameter regulates the intensity of the fairness the transition probability

Biased Edge Dropout for Enhancing Fairness in Graph Representation

Combining Collective Classification and Link Prediction

Finding Experts by Link Prediction in Co-Authorship Networks

Learning to Make Predictions on Graphs with Autoencoders

Predicting Disease Related Microrna Based on Similarity and Topology

Representation Learning on Graphs

Khanamk2021m-1A.Pdf (2.656Mb)

Neural Variational Inference for Embedding Knowledge Graphs

A Scalable and Distributed Actor-Based Version of the Node2vec Algorithm

Application of Machine Learning to Link Prediction

Neural Graph Representations and Their Application to Link Prediction

Link Prediction in Large-Scale Complex Networks (Application to Bibliographical Networks)

Program Schedule Virtual Event