João Pedro Valladão Pinheiro Improving the Quality of the User

João Pedro Valladão Pinheiro Improving the Quality of the User Experience by Query Answer Modification Dissertação de Mestrado Dissertation presented to the Programa de Pós–graduação em Informática of PUC-Rio in partial fulfillment of the requirements for the degree of Mestre em Informática. Advisor : Prof. Marco Antonio Casanova Co-advisor: Profa. Elisa Souza Menendez Rio de Janeiro April 2021 João Pedro Valladão Pinheiro Improving the Quality of the User Experience by Query Answer Modification Dissertation presented to the Programa de Pós–graduação em Informática of PUC-Rio in partial fulfillment of the requirements for the degree of Mestre em Informática. Approved by the Examination Committee: Prof. Marco Antonio Casanova Advisor Departamento de Informática – PUC-Rio Profa. Elisa Souza Menendez Co-Advisor Campus Xique-Xique – Instituto Federal de Educação, Ciência e Tecnologia Baiano Prof. Antonio Luz Furtado Departamento de Informática – PUC-Rio Prof. Luiz André Portes Paes Leme Departamento de Ciências da Computação – UFF Rio de Janeiro, April 30th, 2021 All rights reserved. João Pedro Valladão Pinheiro João Pedro Valladão Pinheiro holds a bachelor degree in Computer Engineering from Pontifical Catholic University of Rio de Janeiro (PUC-Rio). His main research topics are Semantic Web and Information Retrieval. Bibliographic data Pinheiro, João Pedro V. Improving the Quality of the User Experience by Query Answer Modification / João Pedro Valladão Pinheiro; advisor: Marco Antonio Casanova; co-advisor: Elisa Souza Menendez. – 2021. 55 f: il. color. ; 30 cm Dissertação (mestrado) - Pontifícia Universidade Católica do Rio de Janeiro, Departamento de Informática, 2021. Inclui bibliografia 1. Computer Science – Teses. 2. Informatics – Teses. 3. Pergunta e Resposta (QA). 4. Consulta em Linguagem Natural (NLQ). 5. Agregação. 6. RDF. 7. Web Semântica . I. Casanova, Marco A.. II. Menendez, Elisa S.. III. Pontifícia Universidade Católica do Rio de Janeiro. Departamento de Informática. IV. Título. CDD: 004 Acknowledgments First, a special thank you to my advisor, Prof. Marco Antonio Casanova. For sure, his full support, patience, and wisdom were key contributors to my academic achievements. Prof. Casanova brilliantly guided me in my first article publication. Also, he was very comprehensive with the birth of my daughter. I will always remember this. I would like to thank my co-advisor, Profa. Elisa Souza Menendez. Who could imagine that an email related to QALD dataset would result in an academic partnership? Profa. Elisa gave me excellent insights related to her deep experience with RDF Knowledge Bases. Really thanks for joining us on this journey. I can’t forget to thank all the staff, professors, and my classmates from the Department of Informatics of PUC-Rio, for the hours of work, study, and fun. Especially to Alexandre Novello, Claudio Escudero, Lauro de Souza and Luis Eduardo Craizer. Last, but not least important, my deep gratitude to my wife, Amanda, for her love, support, partnership, and encouragement during these years. Our daughter, Maria Eduarda, reflects our love. Also, my gratitude to my parents, Maristela and Pedro Carlos, who always encourage me to invest in my education. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001. Abstract Pinheiro, João Pedro V.; Casanova, Marco A. (Advisor); Menendez, Elisa S. (Co-Advisor). Improving the Quality of the User Experience by Query Answer Modification. Rio de Janeiro, 2021. 55p. Dissertação de Mestrado – Departamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro. The answer of a query, submitted to a database or a knowledge base, is often long and may contain redundant data. The user is frequently forced to browse thru a long answer, or to refine and repeat the query until the answer reaches a manageable size. Without proper treatment, consuming the query answer may indeed become a tedious task. This study then proposes a process that modifies the presentation of a query answer to improve the quality of the user’s experience, in the context of an RDF knowledge base. The process reorganizes the original query answer by applying heuristics to summarize the results. The original SPARQL query is modified and an exploration over the result set starts thru a guided navigation over predicates and its facets. The article also includes experiments based on RDF versions of MusicBrainz, enriched with DBpedia data, and IMDb, each with over 200 million RDF triples. The experiments use sample queries from well-known benchmarks. Keywords Question Answering (QA); Natural Language Query (NLQ); Aggrega- tion; RDF; Semantic Web. Resumo Pinheiro, João Pedro V.; Casanova, Marco A.; Menendez, Elisa S.. Melhorando a Qualidade da Experiência do Usuário através da Modificação da Resposta da Consulta. Rio de Janeiro, 2021. 55p. Dissertação de Mestrado – Departamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro. A resposta de uma consulta, submetida a um banco de dados ou base de conhecimento, geralmente é longa e pode conter dados redundantes. O usuário é frequentemente forçado a navegar por uma longa resposta, ou refinar e repetir a consulta até que a resposta atinja um tamanho gerenciável. Sem o tratamento adequado, consumir a resposta da consulta pode se tornar uma tarefa tediosa. Este estudo, então, propõe um processo que modifica a apresentação da resposta da consulta para melhorar a qualidade de experiência do usuário, no contexto de uma base de conhecimento RDF. O processo reorganiza a resposta da consulta original aplicando heurísticas para comprimir os resultados. A consulta SPARQL original é modificada e uma exploração sobre o conjunto de resultados começa através de uma navegação guiada sobre predicados e suas facetas. O artigo também inclui experimentos baseados em versões RDF do MusicBrainz, enriquecido com dados do DBpedia, e IMDb, cada um com mais de 200 milhões de triplas RDF. Os experimentos utilizam exemplos de consultas de benchmarks conhecidos. Palavras-chave Pergunta e Resposta (QA); Consulta em Linguagem Natural (NLQ); Agregação; RDF; Web Semântica. Table of contents 1 Introduction 11 2 Background and Related Work 14 2.1 Background 14 2.1.1 Linked Data 14 2.1.2 Question Answering 16 2.2 Related Work 16 2.2.1 Aggregation and Summarization 16 2.2.2 Faceted Browsing 17 2.3 Chapter Conclusion 18 3 The query answer modification process 19 3.1 Heuristics and Thresholds 19 3.2 Transforming single-column into three-column result sets 22 3.3 Frequency analysis based on computed metadata 24 3.4 Verifying stop condition 25 3.5 Chapter Conclusion 25 4 Experiments 27 4.1 Setup 27 4.2 MusicBrainz Results 28 4.2.1 Overview 28 4.2.2 Applying the Σ heuristic over MusicBrainz 30 4.2.3 Applying the Π heuristic over MusicBrainz 33 4.2.4 Applying the Ω heuristic over MusicBrainz 36 4.3 IMDb Results 38 4.3.1 Sample Query 38 4.3.2 Applying the Σ heuristic over IMDb 39 4.3.3 Applying the Π heuristic over IMDb 40 4.3.4 Applying the Ω heuristic over IMDb 42 4.4 Compression Rate 44 5 Conclusion and Future Work 52 List of figures Figure 1.1 Search result for the input “Denzel Washington” 12 Figure 2.1 RDF example 15 Figure 3.1 K-D tree structure representing the only step over IMDb dataset 21 Figure 3.2 K-D tree structure representing 1st step over Mu- sicBrainz dataset 21 Figure 3.3 K-D tree structure representing 2nd step over Mu- sicBrainz dataset 22 Figure 3.4 Query answer modification process 22 Figure 3.5 Transformation with SPARQL queries 23 Figure 3.6 Filtered predicates by literal and highlighted InfoRank 24 Figure 4.1 MusicBrainz Schema 27 Figure 4.2 IMDb Schema 28 Figure 4.3 Predicate taxonomy examples 46 List of tables Table 1.1 Example of question answer for an open-ended question 11 Table 3.1 Examples of restrictive and embracing predicates/facets 20 Table 4.1 Preview of the original SPARQL result 29 Table 4.2 Available predicates in the 1st reformulation process 31 Table 4.3 Results from 1st reformulation process 31 Table 4.4 Available predicates in the 2nd reformulation process 32 Table 4.5 Results from 2nd reformulation process 32 Table 4.6 Available predicates in the 1st reformulation process 33 Table 4.7 Results from 1st reformulation process 33 Table 4.8 Available predicates in the 2nd reformulation process 34 Table 4.9 Results from 2nd interaction 34 Table 4.10 Available predicates in the 3rd reformulation process 35 Table 4.11 Results from 3rd interaction 35 Table 4.12 Available facets in the 1st reformulation process 36 Table 4.13 Preview of the result filtered by “male” 36 Table 4.14 Available facets in the 2nd reformulation process 36 Table 4.15 Preview of the result filtered by “non_vocal_instrumentalist” 37 Table 4.16 Available facets in the 3rd reformulation process 37 Table 4.17 Preview of the result filtered by “Musician” 37 Table 4.18 Preview of the original SPARQL result 38 Table 4.19 Available predicates 39 Table 4.20 Results from 1st interaction 40 Table 4.21 Available predicates in the 1st reformulation process 41 Table 4.22 Results from 1st reformulation process 41 Table 4.23 Available predicates in the 2nd reformulation process 42 Table 4.24 Results from 2nd interaction 42 Table 4.25 Available facets in the 1st reformulation process 43 Table 4.26 Preview of the result filtered by “Color” 43 Table 4.27 Available facets in the 2nd reformulation process 44 Table 4.28 Preview of the result filtered by “Dolby Digital” 44 Table 4.29 Available facets in the 3rd reformulation

João Pedro Valladão Pinheiro Improving the Quality of the User

124-130 WEST 125TH STREET up to Between Adam Clayton Powell Jr and Malcolm X Blvds/Lenox Avenue 22,600 SF HARLEM Available for Lease NEW YORK | NY

Tone Parallels in Music for Film: the Compositional Works of Terence Blanchard in the Diegetic Universe and a New Work for Studio Orchestra By

The Autobiography of Malcolm X: As Told to Alex Haley

1 Column Unindented

MXB Virtual Tour

S/E(F% /02 R/E',A F- B/Uez

Growing up with Malcolm X

Malcolm X: Chronology of Change Rose-Ann Cecere Iowa State University

Karaoke Song Book Karaoke Nights Frankfurt’S #1 Karaoke

(FBI) Malcolm X Surveillance File: 1953-1971 No. Of

Ilyasah Shabazz Kekla Magoon

Who Killed Malcolm X ? Thirteen Years Later Questions Linger and the Case Is About to Be Reopened