KGTK: A Toolkit for Large Knowledge Graph Manipulation and Analysis Filip Ilievski1[0000−0002−1735−0686], Daniel Garijo1[0000−0003−0454−7145], Hans Chalupsky1[0000−0002−8902−1662], Naren Teja Divvala1, Yixiang Yao1[0000−0002−2471−5591], Craig Rogers1[0000−0002−5818−3802], Rongpeng Li1[0000−0002−6911−8002], Jun Liu1, Amandeep Singh1[0000−0002−1926−6859], Daniel Schwabe2[0000−0003−4347−2940], and Pedro Szekely1 1 Information Sciences Institute, University of Southern California filievski,dgarijo,hans,divvala,yixiangy,rogers, rli,junliu,amandeep,
[email protected] 2 Dept. of Informatics, Pontificia Universidade Cat´olicaRio de Janeiro
[email protected] Abstract. Knowledge graphs (KGs) have become the preferred tech- nology for representing, sharing and adding knowledge to modern AI applications. While KGs have become a mainstream technology, the RDF/SPARQL-centric toolset for operating with them at scale is hetero- geneous, difficult to integrate and only covers a subset of the operations that are commonly needed in data science applications. In this paper we present KGTK, a data science-centric toolkit designed to represent, cre- ate, transform, enhance and analyze KGs. KGTK represents graphs in tables and leverages popular libraries developed for data science applica- tions, enabling a wide audience of developers to easily construct knowl- edge graph pipelines for their applications. We illustrate the framework with real-world scenarios where we have used KGTK to integrate and manipulate large KGs, such as Wikidata, DBpedia and ConceptNet. Resource type: Software License: MIT DOI: https://doi.org/10.5281/zenodo.3828068 Repository: https://github.com/usc-isi-i2/kgtk/ Keywords: knowledge graph · knowledge graph embedding · knowl- edge graph filtering · knowledge graph manipulation arXiv:2006.00088v3 [cs.AI] 26 May 2021 1 Introduction Knowledge graphs (KGs) have become the preferred technology for representing, sharing and using knowledge in applications.