Recommending Collaborations Using Link Prediction BE ACCEPTED in PARTIAL FULFILLMENT of the REQUIREMENTS for the DEGREE of Master of Science

Recommending Collaborations Using Link Prediction BE ACCEPTED in PARTIAL FULFILLMENT of the REQUIREMENTS for the DEGREE of Master of Science

RECOMMENDING COLLABORATIONS USING LINK PREDICTION A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science By NIKHIL CHENNUPATI B. Tech., Gandhi Institute of Technology and Management, India, 2016 2021 Wright State University WRIGHT STATE UNIVERSITY GRADUATE SCHOOL April 21, 2021 I HEREBY RECOMMEND THAT THE THESIS PREPARED UNDER MY SUPERVISION BY Nikhil Chennupati ENTITLED Recommending Collaborations Using Link Prediction BE ACCEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science. ___________________________ Tanvi Banerjee, Ph.D. Thesis Director ___________________________ Mateen M.Rizki, Ph.D. Chair, Department of Computer Science and Engineering Committee on Final Examination ________________________________ Tanvi Banerjee, Ph.D. ________________________________ Krishnaprasad Thirunarayan, Ph.D. ________________________________ Michael L Raymer, Ph.D. ________________________________ Barry Milligan, Ph.D. Vice Provost for Academic Affairs Dean of the Graduate School. ABSTRACT Chennupati, Nikhil. M.S., Department of Computer Science and Engineering, Wright State University, 2021. Recommending Collaborations Using Link Prediction. Link prediction in the domain of scientific collaborative networks refers to exploring and determining whether a connection between two entities in an academic network may emerge in the future. This study aims to analyse the relevance of academic collaborations and identify the factors that drive co-author relationships in a heterogeneous bibliographic network. Using topological, semantic, and graph representation learning techniques, we measure the authors' similarities w.r.t their structural and publication data to identify the reasons that promote co-authorships. Experimental results show that the proposed approach successfully infer the co-author links by identifying authors with similar research interests. Such a system can be used to recommend potential collaborations among the authors. iii Table of Contents 1. Introduction ................................................................................................................ 1 1.1. Overview ............................................................................................................. 1 1.2. Link Prediction for Recommending Author Collaborations ............................... 4 1.3. Research Questions and Contributions................................................................ 5 1.4. Thesis Outline...................................................................................................... 7 2. Related Work ............................................................................................................. 8 .................................................................................................................................... 9 2.1. Feature Extraction Based Methods...................................................................... 9 2.1.1. Similarity-based Metrics ............................................................................... 9 2.1.2. Probabilistic and Maximum-Likelihood Models ........................................ 20 2.2. Feature Learning Methods................................................................................. 25 2.2.1. Matrix Factorization Methods .................................................................... 26 2.2.2. Random Walk Based Methods ................................................................... 29 2.2.3. Neural Network-based Methods ................................................................. 33 3. Methods.................................................................................................................... 37 3.1. Feature Extraction Methods .............................................................................. 37 3.1.1. Feature Extraction Based on Topology ...................................................... 37 3.1.2. Feature extraction based on Node Attributes (Semantic similarity) ........... 41 3.2. Network Embedding Based Approach for Link Prediction .............................. 45 3.2.1. Homogeneous Network Embedding ........................................................... 45 3.2.2. Heterogeneous Network Embedding .......................................................... 46 3.2.3. Weighted Meta-path Biased Random Walks .............................................. 46 3.2.4. Heterogeneous Skip-gram Model ............................................................... 49 3.3. Supervised Machine Learning Algorithms........................................................ 51 3.3.1. Logistic Regression .................................................................................... 51 3.3.2. Support Vector Machines ........................................................................... 52 3.3.3. Random Forests .......................................................................................... 53 3.3.4. AdaBoost .................................................................................................... 54 3.4. Evaluation Metrics ............................................................................................ 55 3.4.1. Precision ..................................................................................................... 55 3.4.2. Recall .......................................................................................................... 56 iv 3.4.3. F- measure .................................................................................................. 56 3.4.4. AUC Score .................................................................................................. 56 4. Data and Experimental Setup................................................................................... 58 4.1. Data ................................................................................................................... 58 4.1.1. Microsoft Academic Graph ........................................................................ 59 4.1.2. Data Collection ........................................................................................... 60 4.1.3. Building a Collaboration Graph .................................................................. 61 4.2. Link Prediction Problem ................................................................................... 62 4.2.1. Case 1: Experiment with Negative Samples as Nodes n-hop Away .......... 63 4.2.1. Case 2: Experiment with Randomly Chosen Negative Samples ................ 64 4.3. Generating Link Prediction Features ................................................................. 65 4.4. Choosing a Binary Classifier ............................................................................. 65 4.5. Network Embedding Based Approach for Predicting Future Collaborations ... 65 4.5.1. Generating Node Embeddings .................................................................... 66 4.5.2. Prediction Pipeline ...................................................................................... 68 5. Results and Discussion ............................................................................................ 72 5.1. Feature Extraction Based Approach Results ..................................................... 72 5.1.1. Results of Experiments with Negative Samples as Nodes n-hop Away .... 73 5.1.2. Results of Experiments with Randomly chosen Negative Samples ........... 75 5.1.3. Comparing Results of Case-1 and Case-2 .................................................. 76 5.2. Network Embedding Based Approach Results ................................................. 76 5.2.1. Author’s Node Embedding Visualizations ................................................. 77 5.2.2. Weighted Meta-path Based Supervised Learning Results .......................... 78 5.3. Case Study: Relevant Author Search ................................................................ 81 5.4. Comparison of Feature Extraction Based and Network Embedding Based Approach .................................................................................................................. 82 6. Conclusion and Future Work ................................................................................... 83 References .................................................................................................................... 84 v List of Figures Figure1. Trending authors in machine learning (adapted from academic.microsoft.com) ...................................................................................................................... 2 Figure 2. Trending topics in all fields (adapted from academic.microsoft.com) ........... 3 Figure 3. A sample collaboration graph of authors from different institutes................. 5 Figure 4. Pipeline of the feature extraction and learning-based approach ..................... 6 Figure 5. Overarching block diagram of weighted meta-path-based network embedding method .......................................................................................................... 7 Figure 6. Taxonomy of link prediction approaches ....................................................... 9 Figure 7. Local probabilistic model ............................................................................. 21 Figure 8. Frequency of common authors vs Percentage of collaborations .................. 38 Figure 9. Weighted meta-path approach using supervised learning ............................ 48 Figure 10. Weighted

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    101 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us