Devezas and Nunes Appl Netw Sci (2020) 5:79 https://doi.org/10.1007/s41109‑020‑00320‑z Applied Network Science RESEARCH Open Access Characterizing the hypergraph‑of‑entity and the structural impact of its extensions José Devezas* and Sérgio Nunes *Correspondence:
[email protected] Abstract INESC TEC and Faculty The hypergraph-of-entity is a joint representation model for terms, entities and their of Engineering, University of Porto, Rua Dr. Roberto relations, used as an indexing approach in entity-oriented search. In this work, we Frias, s/n, 4200-465 Porto, PT, characterize the structure of the hypergraph, from a microscopic and macroscopic Portugal scale, as well as over time with an increasing number of documents. We use a random walk based approach to estimate shortest distances and node sampling to estimate clustering coefcients. We also propose the calculation of a general mixed hypergraph density measure based on the corresponding bipartite mixed graph. We analyze these statistics for the hypergraph-of-entity, fnding that hyperedge-based node degrees are distributed as a power law, while node-based node degrees and hyperedge cardinali- ties are log-normally distributed. We also fnd that most statistics tend to converge after an initial period of accentuated growth in the number of documents. We then repeat the analysis over three extensions—materialized through synonym, context, and tf_bin hyperedges—in order to assess their structural impact in the hypergraph. Finally, we focus on the application-specifc aspects of the hypergraph-of-entity, in the domain of information retrieval. We analyze the correlation between the retrieval efectiveness and the structural features of the representation model, proposing ranking and anom- aly indicators, as useful guides for modifying or extending the hypergraph-of-entity.