Undefined 0 (2015) 1 1 IOS Press N-ary Relation Extraction for Simultaneous T-Box and A-Box Knowledge Base Augmentation Marco Fossati a;∗, Emilio Dorigatti b, and Claudio Giuliano c a Data and Knowledge Management Unit, Fondazione Bruno Kessler, via Sommarive 18, 38123 Trento, Italy E-mail:
[email protected] b Department of Computer Science, University of Trento, via Sommarive 9, 38123 Trento, Italy E-mail:
[email protected] c Future Media Unit, Fondazione Bruno Kessler, via Sommarive 18, 38123 Trento, Italy E-mail:
[email protected] Abstract. The Web has evolved into a huge mine of knowledge carved in different forms, the predominant one still being the free-text document. This motivates the need for Intelligent Web-reading Agents: hypothetically, they would skim through disparate Web sources corpora and generate meaningful structured assertions to fuel Knowledge Bases (KBs). Ultimately, comprehensive KBs, like WIKIDATA and DBPEDIA, play a fundamental role to cope with the issue of information overload. On account of such vision, this paper depicts the FACT EXTRACTOR, a complete Natural Language Processing (NLP) pipeline which reads an input textual corpus and produces machine-readable statements. Each statement is supplied with a confidence score and undergoes a disambiguation step via Entity Linking, thus allowing the assignment of KB-compliant URIs. The system implements four research contributions: it (1) executes N-ary relation extraction by applying the Frame Semantics linguistic theory, as opposed to binary techniques; it (2) simultaneously populates both the T-Box and the A-Box of the target KB; it (3) relies on a single NLP layer, namely part-of-speech tagging; it (4) enables a completely supervised yet reasonably priced machine learning environment through a crowdsourcing strategy.