Unified Mathmatical Treatment of Complex Cascaded Bipartite Networks: the Case of Collections of Journal Papers

Total Page:16

File Type:pdf, Size:1020Kb

Unified Mathmatical Treatment of Complex Cascaded Bipartite Networks: the Case of Collections of Journal Papers UNIFIED MATHMATICAL TREATMENT OF COMPLEX CASCADED BIPARTITE NETWORKS: THE CASE OF COLLECTIONS OF JOURNAL PAPERS By STEVEN ALLEN MORRIS Bachelor of Science Tulsa University Tulsa, Oklahoma 1983 Master of Science Tulsa University Tulsa, Oklahoma 1987 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY May, 2005 UNIFIED MATHMATICAL TREATMENT OF COMPLEX CASCADED BIPARTITE NETWORKS: THE CASE OF COLLECTIONS OF JOURNAL PAPERS Thesis Approved: __________________Dr. Gary Yen___________________ Thesis Advisor ________________Dr. Chris Hutchens________________ _________________Dr. Martin Hagan________________ __________________Dr. David Pratt_________________ ________________Dr. A. Gordon Emslie______________ Dean of the Graduate College ii PREFACE In this study, a mathematical treatment is proposed for analysis of entities and relations among entities in complex networks consisting of cascaded bipartite networks. This treatment is applied to the case of collections of journal papers. In this case, entities are distinguishable objects and concepts, such as papers, references, paper authors, reference authors, paper journals, reference journals, institutions, terms, and term definitions. Relations are associations between entity-types such as papers and the references they cite, or paper authors and the papers they write. An entity-relationship model is introduced that explicitly shows direct links between entity-types and possible useful indirect relations. From this a matrix formulation and generalized matrix arithmetic are introduced that allow easy expression of relations between entities and calculation of weights of indirect links and co-occurrence links. Occurrence matrices, equivalence matrices, membership matrices and co-occurrence matrices are described. A dynamic model of growth describes recursive relations in occurrence and co-occurrence matrices as papers are added to the paper collection. Graph theoretic matrices are introduced to allow information flow studies of networks of papers linked by their citations. Similarity calculations and similarity fusion are explained. Derivation of feature vectors for pattern recognition techniques is presented. The relation of the proposed mathematical treatment to seriation, clustering, multidimensional scaling, and visualization techniques is discussed. It is shown that most existing bibliometric analysis techniques for dealing with collections of journal papers are easily expressed in terms of the proposed mathematical treatment: co-citation analysis, bibliographic coupling analysis, author co-citation analysis, journal co-citation analysis, Braam-Moed-vanRaan (BMV) co-citation/co-word analysis, latent semantic analysis, hubs and authorities, and multidimensional scaling. This report discusses an extensive software toolkit that was developed for this research for analyzing and visualizing entities and links in a collection of journal papers. Additionally, an extensive case study is presented, analyzing and visualizing 60 years of anthrax research through a collection of journal papers. When dealing with complex networks that consist of cascaded bipartite networks, the treatment presented here provides a general mathematical framework for all aspects of analysis of static network structure and network dynamic growth. As such, it provides a basic paradigm for thinking about and modeling such networks: computing direct and indirect links, expressing and analyzing statistical distributions of network characteristics, describing network growth, deriving feature vectors, clustering, and visualizing network structure and growth. iii ACKNOWLEDGEMENTS I wish to express my appreciation to my major advisor, Dr. Gary Yen, for his guidance and encouragement. I appreciate the freedom and trust that he gave me during my studies and research. I am grateful to Dr. Chris Hutchens for encouraging me to start this whole journey, and for providing financial support when it was needed. I also deeply appreciate the help of Dr. Tom Collins, who encouraged the original research and helped with financial support by employing me on the ASSET project. It was a great pleasure to have fellow student Michel Goldstein as a coworker, his ideas on ontology research were the inspiration for the entity-relation model that is the basis of this study. Finally, I wish to express my heartfelt appreciation to my wife Esther who never faltered in her support throughout the whole project. Both Esther and my daughter Jenny showed great patience and cheerfulness throughout the whole journey. iv TABLE OF CONTENTS Chapter Page 1. INTRODUCTION...............................................................................................................................1 1.1 Motivation ..........................................................................................................................1 1.2 Summary.............................................................................................................................4 2. ENTITY-RELATIONSHIP MODEL OF A COLLECTION OF JOURNAL PAPERS.....................7 2.1 Definition of collections of journal papers .........................................................................7 2.2 Entities in collection of journal papers ...............................................................................7 2.3 Relationships among entities in collections of journal papers ............................................9 2.4 An entity-relationship diagram for collections of journal papers .....................................11 2.5 Dyadic links and dyad notation ........................................................................................12 3. COMPUTATION OF LINK WEIGHTS ..........................................................................................15 3.1 Bipartite networks.............................................................................................................15 3.2 Cascaded bipartite networks .............................................................................................16 3.3 Matrix multiplication........................................................................................................20 3.4 Overlap function ...............................................................................................................22 3.5 Inverse Minkowski function .............................................................................................24 4. OCCURRENCE MATRICES...........................................................................................................27 4.1 Definition of occurrence matrix........................................................................................27 4.2 Indirect occurrence matrices calculated by cascading occurrence matrices .....................28 4.3 Equivalence matrices ........................................................................................................30 4.4 Membership matrices........................................................................................................32 5. CO-OCCURRENCE MATRICES ....................................................................................................35 5.1 Definition of co-occurrence matrix...................................................................................35 5.2 Overlap function for calculating co-occurrence................................................................36 5.3 Commonly used co-occurrence matrices ..........................................................................38 6. BIBLIOMETRIC DISTRIBUTIONS ...............................................................................................41 6.1 Dyadic distributions..........................................................................................................41 6.2 Fixed occurrence dyadic distributions ..............................................................................43 6.3 Cumulative occurrence dyadic distributions.....................................................................43 6.4 Co-occurrence distributions..............................................................................................45 6.5 Clustering coefficient distributions...................................................................................46 v Chapter Page 7. RECURSIVE MATRIX GROWTH..................................................................................................49 7.1 Introduction ......................................................................................................................49 7.2 Growth of the paper-reference matrix...............................................................................49 7.3 Dynamic growth of the paper-reference matrix................................................................50 7.4 Dynamic growth of the bibliographic coupling matrix.....................................................51 7.5 Dynamic growth of the co-citation matrix........................................................................52 7.6 General growth of occurrence and co-occurrence matrices..............................................53 8. GRAPH THEORETIC MATRICES .................................................................................................56 8.1 Direct citation ...................................................................................................................56 8.2 Longitudinal
Recommended publications
  • Arxiv:1506.00348V3 [Physics.Soc-Ph] 4 Feb 2019
    Enhanced Gravity Model of trade: reconciling macroeconomic and network models Assaf Almog,1 Rhys Bird,2 and Diego Garlaschelli3, 2 1Tel Aviv University, Department of Industrial Engineering, 69978 Tel Aviv, Israel 2Instituut-Lorentz for Theoretical Physics, Leiden University, Niels Bohrweg 2, 2333 CA Leiden, Netherlands 3IMT School for Advanced Studies, Piazza S. Francesco 19, 55100 Lucca, Italy The structure of the International Trade Network (ITN), whose nodes and links represent world countries and their trade relations respectively, affects key economic processes worldwide, includ- ing globalization, economic integration, industrial production, and the propagation of shocks and instabilities. Characterizing the ITN via a simple yet accurate model is an open problem. The traditional Gravity Model (GM) successfully reproduces the volume of trade between connected countries, using macroeconomic properties such as GDP, geographic distance, and possibly other factors. However, it predicts a network with complete or homogeneous topology, thus failing to reproduce the highly heterogeneous structure of the ITN. On the other hand, recent maximum- entropy network models successfully reproduce the complex topology of the ITN, but provide no information about trade volumes. Here we integrate these two currently incompatible approaches via the introduction of an Enhanced Gravity Model (EGM) of trade. The EGM is the simplest model combining the GM with the network approach within a maximum-entropy framework. Via a unified and principled mechanism that is transparent enough to be generalized to any economic network, the EGM provides a new econometric framework wherein trade probabilities and trade volumes can be separately controlled by any combination of dyadic and country-specific macroe- conomic variables.
    [Show full text]
  • Information Theory and Statistics
    MONOGRAPH SERIES INFORMATION TECHNOLOGIES: RESEARCH AND THEIR INTERDISCIPLINARY APPLICATIONS 1 ŁUKASZ DĘBOWSKI INFORMATION THEORY AND STATISTICS INSTITUTE OF COMPUTER SCIENCE i POLISH ACADEMY OF SCIENCES Warsaw, 2013 Publication issued as a part of the project: ‘Information technologies: research and their interdisciplinary applications’, Objective 4.1 of Human Capital Operational Programme. Agreement number UDA-POKL.04.01.01-00-051/10-00. Publication is co-financed by European Union from resources of European Social Fund. Project leader: Institute of Computer Science, Polish Academy of Sciences Project partners: System Research Institute, Polish Academy of Sciences, Nałęcz Institute of Biocybernetics and Biomedical Engineering, Polish Academy of Sciences Editors-in-chief: Olgierd Hryniewicz Jan Mielniczuk Wojciech Penczek Jacek Waniewski Reviewer: Brunon Kamiński Łukasz Dębowski Institute of Computer Science, Polish Academy of Sciences [email protected] http://www.ipipan.waw.pl/staff/l.debowski/ Publication is distributed free of charge c Copyright by Łukasz Dębowski c Copyright by Institute of Computer Science, Polish Academy of Sciences, 2013 ISBN 978-83-63159-06-1 e-ISBN 978-83-63159-07-8 Cover design: Waldemar Słonina Table of Contents Preface :::::::::::::::::::::::::::::::::::::::::::::::::::::::: 5 1 Probabilistic preliminaries ::::::::::::::::::::::::::::: 7 2 Entropy and information ::::::::::::::::::::::::::::::: 15 3 Source coding :::::::::::::::::::::::::::::::::::::::::::: 27 4 Stationary processes ::::::::::::::::::::::::::::::::::::
    [Show full text]
  • Random Walk: a Modern Introduction
    Random Walk: A Modern Introduction Gregory F. Lawler and Vlada Limic Contents Preface page 6 1 Introduction 9 1.1 Basic definitions 9 1.2 Continuous-time random walk 12 1.3 Other lattices 14 1.4 Other walks 16 1.5 Generator 17 1.6 Filtrations and strong Markov property 19 1.7 A word about constants 21 2 Local Central Limit Theorem 24 2.1 Introduction 24 2.2 Characteristic Functions and LCLT 27 2.2.1 Characteristic functions of random variables in Rd 27 2.2.2 Characteristic functions of random variables in Zd 29 2.3 LCLT — characteristic function approach 29 2.3.1 Exponential moments 42 2.4 Some corollaries of the LCLT 47 2.5 LCLT — combinatorial approach 51 2.5.1 Stirling’s formula and 1-d walks 52 2.5.2 LCLT for Poisson and continuous-time walks 56 3 Approximation by Brownian motion 63 3.1 Introduction 63 3.2 Construction of Brownian motion 64 3.3 Skorokhod embedding 68 3.4 Higher dimensions 71 3.5 An alternative formulation 72 4 Green’s Function 75 4.1 Recurrence and transience 75 4.2 Green’s generating function 76 4.3 Green’s function, transient case 81 4.3.1 Asymptotics under weaker assumptions 84 3 4 Contents 4.4 Potential kernel 86 4.4.1 Two dimensions 86 4.4.2 Asymptotics under weaker assumptions 90 4.4.3 One dimension 92 4.5 Fundamental solutions 95 4.6 Green’s function for a set 96 5 One-dimensional walks 103 5.1 Gambler’s ruin estimate 103 5.1.1 General case 106 5.2 One-dimensional killed walks 112 5.3 Hitting a half-line 115 6 Potential Theory 119 6.1 Introduction 119 6.2 Dirichlet problem 121 6.3 Difference estimates and Harnack
    [Show full text]
  • M O N O Gr Ap H Series M O N O Gr Ap H Series 1
    1 MONOGRAPH SERIES: ŁUKASZ DĘBOWSKI - INFORMATION THEORY AND STATISTICS THEORY MONOGRAPH SERIES: ŁUKASZ DĘBOWSKI - INFORMATION 1 MONOGRAPH SERIES MONOGRAPH MONOGRAPH SERIES MONOGRAPH ŁUKASZ DĘBOWSKI INFORMATION THEORY AND STATISTICS INSTITUTE OF COMPUTER SCIENCE POLISH ACADEMY OF SCIENCES MONOGRAPH SERIES INFORMATION TECHNOLOGIES: RESEARCH AND THEIR INTERDISCIPLINARY APPLICATIONS 1 ŁUKASZ DĘBOWSKI INFORMATION THEORY AND STATISTICS INSTITUTE OF COMPUTER SCIENCE i POLISH ACADEMY OF SCIENCES Warsaw, 2013 Publication issued as a part of the project: ‘Information technologies: research and their interdisciplinary applications’, Objective 4.1 of Human Capital Operational Programme. Agreement number UDA-POKL.04.01.01-00-051/10-00. Publication is co-financed by European Union from resources of European Social Fund. Project leader: Institute of Computer Science, Polish Academy of Sciences Project partners: System Research Institute, Polish Academy of Sciences, Nałęcz Institute of Biocybernetics and Biomedical Engineering, Polish Academy of Sciences Editors-in-chief: Olgierd Hryniewicz Jan Mielniczuk Wojciech Penczek Jacek Waniewski Reviewer: Brunon Kamiński Łukasz Dębowski Institute of Computer Science, Polish Academy of Sciences [email protected] http://www.ipipan.waw.pl/staff/l.debowski/ Publication is distributed free of charge c Copyright by Łukasz Dębowski c Copyright by Institute of Computer Science, Polish Academy of Sciences, 2013 ISBN 978-83-63159-06-1 e-ISBN 978-83-63159-07-8 Cover design: Waldemar Słonina Table of Contents
    [Show full text]
  • Lossless Data Compression Christian Steinruecken King’S College
    Lossless Data Compression Christian Steinruecken King’s College Dissertation submitted in candidature for the degree of Doctor of Philosophy, University of Cambridge August 2014 2014 © Christian Steinruecken, University of Cambridge. Re-typeset on March 2, 2018, using LATEX, gnuplot, and custom-written software. Lossless Data Compression Christian Steinruecken Abstract This thesis makes several contributions to the field of data compression. Lossless data com- pression algorithms shorten the description of input objects, such as sequences of text, in a way that allows perfect recovery of the original object. Such algorithms exploit the fact that input objects are not uniformly distributed: by allocating shorter descriptions to more probable objects and longer descriptions to less probable objects, the expected length of the compressed output can be made shorter than the object’s original description. Compression algorithms can be designed to match almost any given probability distribution over input objects. This thesis employs probabilistic modelling, Bayesian inference, and arithmetic coding to derive compression algorithms for a variety of applications, making the underlying probability distributions explicit throughout. A general compression toolbox is described, consisting of practical algorithms for compressing data distributed by various fundamental probability distributions, and mechanisms for combining these algorithms in a principled way. Building on the compression toolbox, new mathematical theory is introduced for compressing objects with an underlying combinatorial structure, such as permutations, combinations, and multisets. An example application is given that compresses unordered collections of strings, even if the strings in the collection are individually incompressible. For text compression, a novel unifying construction is developed for a family of context- sensitive compression algorithms.
    [Show full text]