Unified Mathmatical Treatment of Complex Cascaded Bipartite Networks: the Case of Collections of Journal Papers
Total Page:16
File Type:pdf, Size:1020Kb
UNIFIED MATHMATICAL TREATMENT OF COMPLEX CASCADED BIPARTITE NETWORKS: THE CASE OF COLLECTIONS OF JOURNAL PAPERS By STEVEN ALLEN MORRIS Bachelor of Science Tulsa University Tulsa, Oklahoma 1983 Master of Science Tulsa University Tulsa, Oklahoma 1987 Submitted to the Faculty of the Graduate College of the Oklahoma State University in partial fulfillment of the requirements for the Degree of DOCTOR OF PHILOSOPHY May, 2005 UNIFIED MATHMATICAL TREATMENT OF COMPLEX CASCADED BIPARTITE NETWORKS: THE CASE OF COLLECTIONS OF JOURNAL PAPERS Thesis Approved: __________________Dr. Gary Yen___________________ Thesis Advisor ________________Dr. Chris Hutchens________________ _________________Dr. Martin Hagan________________ __________________Dr. David Pratt_________________ ________________Dr. A. Gordon Emslie______________ Dean of the Graduate College ii PREFACE In this study, a mathematical treatment is proposed for analysis of entities and relations among entities in complex networks consisting of cascaded bipartite networks. This treatment is applied to the case of collections of journal papers. In this case, entities are distinguishable objects and concepts, such as papers, references, paper authors, reference authors, paper journals, reference journals, institutions, terms, and term definitions. Relations are associations between entity-types such as papers and the references they cite, or paper authors and the papers they write. An entity-relationship model is introduced that explicitly shows direct links between entity-types and possible useful indirect relations. From this a matrix formulation and generalized matrix arithmetic are introduced that allow easy expression of relations between entities and calculation of weights of indirect links and co-occurrence links. Occurrence matrices, equivalence matrices, membership matrices and co-occurrence matrices are described. A dynamic model of growth describes recursive relations in occurrence and co-occurrence matrices as papers are added to the paper collection. Graph theoretic matrices are introduced to allow information flow studies of networks of papers linked by their citations. Similarity calculations and similarity fusion are explained. Derivation of feature vectors for pattern recognition techniques is presented. The relation of the proposed mathematical treatment to seriation, clustering, multidimensional scaling, and visualization techniques is discussed. It is shown that most existing bibliometric analysis techniques for dealing with collections of journal papers are easily expressed in terms of the proposed mathematical treatment: co-citation analysis, bibliographic coupling analysis, author co-citation analysis, journal co-citation analysis, Braam-Moed-vanRaan (BMV) co-citation/co-word analysis, latent semantic analysis, hubs and authorities, and multidimensional scaling. This report discusses an extensive software toolkit that was developed for this research for analyzing and visualizing entities and links in a collection of journal papers. Additionally, an extensive case study is presented, analyzing and visualizing 60 years of anthrax research through a collection of journal papers. When dealing with complex networks that consist of cascaded bipartite networks, the treatment presented here provides a general mathematical framework for all aspects of analysis of static network structure and network dynamic growth. As such, it provides a basic paradigm for thinking about and modeling such networks: computing direct and indirect links, expressing and analyzing statistical distributions of network characteristics, describing network growth, deriving feature vectors, clustering, and visualizing network structure and growth. iii ACKNOWLEDGEMENTS I wish to express my appreciation to my major advisor, Dr. Gary Yen, for his guidance and encouragement. I appreciate the freedom and trust that he gave me during my studies and research. I am grateful to Dr. Chris Hutchens for encouraging me to start this whole journey, and for providing financial support when it was needed. I also deeply appreciate the help of Dr. Tom Collins, who encouraged the original research and helped with financial support by employing me on the ASSET project. It was a great pleasure to have fellow student Michel Goldstein as a coworker, his ideas on ontology research were the inspiration for the entity-relation model that is the basis of this study. Finally, I wish to express my heartfelt appreciation to my wife Esther who never faltered in her support throughout the whole project. Both Esther and my daughter Jenny showed great patience and cheerfulness throughout the whole journey. iv TABLE OF CONTENTS Chapter Page 1. INTRODUCTION...............................................................................................................................1 1.1 Motivation ..........................................................................................................................1 1.2 Summary.............................................................................................................................4 2. ENTITY-RELATIONSHIP MODEL OF A COLLECTION OF JOURNAL PAPERS.....................7 2.1 Definition of collections of journal papers .........................................................................7 2.2 Entities in collection of journal papers ...............................................................................7 2.3 Relationships among entities in collections of journal papers ............................................9 2.4 An entity-relationship diagram for collections of journal papers .....................................11 2.5 Dyadic links and dyad notation ........................................................................................12 3. COMPUTATION OF LINK WEIGHTS ..........................................................................................15 3.1 Bipartite networks.............................................................................................................15 3.2 Cascaded bipartite networks .............................................................................................16 3.3 Matrix multiplication........................................................................................................20 3.4 Overlap function ...............................................................................................................22 3.5 Inverse Minkowski function .............................................................................................24 4. OCCURRENCE MATRICES...........................................................................................................27 4.1 Definition of occurrence matrix........................................................................................27 4.2 Indirect occurrence matrices calculated by cascading occurrence matrices .....................28 4.3 Equivalence matrices ........................................................................................................30 4.4 Membership matrices........................................................................................................32 5. CO-OCCURRENCE MATRICES ....................................................................................................35 5.1 Definition of co-occurrence matrix...................................................................................35 5.2 Overlap function for calculating co-occurrence................................................................36 5.3 Commonly used co-occurrence matrices ..........................................................................38 6. BIBLIOMETRIC DISTRIBUTIONS ...............................................................................................41 6.1 Dyadic distributions..........................................................................................................41 6.2 Fixed occurrence dyadic distributions ..............................................................................43 6.3 Cumulative occurrence dyadic distributions.....................................................................43 6.4 Co-occurrence distributions..............................................................................................45 6.5 Clustering coefficient distributions...................................................................................46 v Chapter Page 7. RECURSIVE MATRIX GROWTH..................................................................................................49 7.1 Introduction ......................................................................................................................49 7.2 Growth of the paper-reference matrix...............................................................................49 7.3 Dynamic growth of the paper-reference matrix................................................................50 7.4 Dynamic growth of the bibliographic coupling matrix.....................................................51 7.5 Dynamic growth of the co-citation matrix........................................................................52 7.6 General growth of occurrence and co-occurrence matrices..............................................53 8. GRAPH THEORETIC MATRICES .................................................................................................56 8.1 Direct citation ...................................................................................................................56 8.2 Longitudinal