Extending Neural Question Answering with Linguistic Input Features

Extending Neural Question Answering with Linguistic Input Features Fabian Hommel1 Matthias Orlikowski1 Philipp Cimiano1;2 Matthias Hartung1 1Semalytix GmbH, Bielefeld, Germany 2Semantic Computing Group, Bielefeld University, Germany [email protected] Abstract specialized domains, QA systems face the inher- ent challenges of cross-domain generalization or Considerable progress in neural question an- domain adaptation, respectively. However, QA ap- swering has been made on competitive general proaches can be considered particularly suitable domain datasets. In order to explore methods to aid the generalization potential of question for this kind of problem, as the semantic underpin- answering models, we reimplement a state- nings of question/answer pairs capture a universal of-the-art architecture, perform a parameter layer of meaning that is domain-agnostic to some search on an open-domain dataset and eval- extent (but might require fine-tuning wrt. particu- uate a first approach for integrating linguis- lar domain concepts or terminology). tic input features such as part-of-speech tags, We hypothesize that a promising approach to- syntactic dependency relations and semantic wards rapid information access in specialized do- roles. The results show that adding these input features has a greater impact on perfor- mains would be (i) to learn the aforementioned mance than any of the architectural parameters universal meaning layer from large collections of we explore. Our findings suggest that these open-domain question/answer pairs, and (ii) adapt layers of linguistic knowledge have the poten- the resulting meaning representations to more spe- tial to substantially increase the generalization cific domains subsequently. In this paper, we focus capacities of neural QA models, thus facilitat- on the first problem. ing cross-domain model transfer or the devel- Our work is based on the assumption that rich opment of domain-agnostic QA models. representations of linguistic knowledge at high 1 Introduction levels of syntactic and semantic abstraction fa- cilitates neural NLP models to capture “univer- Recently, deep neural network approaches for sal”, domain-agnostic meaning, which in turn fos- question answering (QA) have gained traction. ters performance in open-domain QA. Against this The strong interest in this task may be explained backdrop, we evaluate the impact of explicitly en- by two promises that resonate in neural QA ap- coded linguistic information in terms of part-of- proaches: For one thing, QA is claimed to bear speech tags, syntactic dependencies and seman- the potential to subsume a lot of other NLP chal- tic roles on open-domain performance of a state- lenges. From this perspective, almost every task of-the-art neural QA model. We find that our can be framed as a natural language question (Ku- re-implementation of the deep neural QANet ar- mar et al., 2016). Thus, a QA model with the chitecture (Yu et al., 2018) benefits considerably capacity to learn mappings from natural language from these linguistically enriched representations, terminology to formal linguistic concepts could be which we consider a promising first step towards used as a surrogate model, reducing annotation generalizable, rapidly adaptable QA models. and training effort and providing fast solutions to potentially complex NLP problems. For another, 2 Related Work QA systems have always been considered as in- tuitive natural language interfaces for information In recent years, research about feature engineer- access in various domains of (technical) knowl- ing for NLP models has subsided to some extent. edge. This might be attributed to the ability of neural As any other practical NLP solution targeting networks to perform hierarchical feature learning (Bengio, 2009). Using neural approaches, many answer candidates for specific types of questions. of the core NLP tasks like part-of-speech (PoS) To extract POS tags for all contexts and questions, tagging (Koo et al., 2008), dependency parsing we used the coarse-graind PoS tag set of the spaCy (Chen and Manning, 2014), named entity recog- library1. nition (Lample et al., 2016) and semantic role la- belling (Roth and Woodsend, 2014; Zhou and Xu, Dependency Relation Labels. We expected 2015) have been improved. However, recent pa- that syntactic information might help the model to pers that make use of the improved performance predict the boundaries of spans with more preci- in these areas are few (Alexandrescu and Kirch- sion. Again, we use spaCy to extract dependency hoff, 2006; Sennrich and Haddow, 2016). Thus, information for questions and contexts. To extract we want to evaluate whether adding linguistic in- dependency information per input word, we use formation to the inputs of a QA model improves the type label of that dependency relation in which the performance. Our approach to integrating lin- the word is the child. guistic input features by embedding each individu- Semantic Roles. Semantic Role Labeling (SRL) ally and concatenating the embeddings is inspired deals with the problem of finding shallow seman- by Sennrich and Haddow(2016), who apply this tic structure in sentences by identifying events approach in the context of machine translation. (“predicates”) and their participants (“semantic This paper builds upon a host of recent develop- roles”). By identifying predicates and related ments in neural architectures for question answer- participants and properties, SRL helps to answer ing or reading comprehension. While most ap- “who” did “what” to “whom”, “where”, “when” proaches rely heavily on recurrent layers (Huang and “how”? To do that, each constituent in a sen- et al., 2017; Hu et al., 2018; Seo et al., 2016; Shen tence is assigned a semantic role from a predifined et al., 2017; Wang et al., 2017; Xiong et al., 2016), set of roles like agent, patient or location (Marquez´ we chose to reimplement QANet, a self-attention et al., 2008). Since semantic role labeling aims at based architecture (Yu et al., 2018). identifying relevant aspects of events that are di- Apart from that, we use the tools from Roth rectly related to the above-mentioned WH ques- and Woodsend(2014) for extracting semantic tions, question answering models should directly roles over the whole Stanford Question Answer- benefit from this kinds of information. ing Dataset (SQuAD) (Rajpurkar et al., 2016). We used the mate-plus tools (Roth and Wood- 3 Extending QANet with Linguistic send, 2014) for parsing the complete SQuAD Input Features dataset and to obtain PropBank-labeled semantic roles per input word (Palmer et al., 2005). We As a testbed in order to assess the impact of lin- added the role <PREDICATE> to the set of se- guistic input features in neural QA models, we mantic roles to provide the model with pointers to make use of (a re-implementation of) QANet (Yu the basic events. Words that did not correspond et al., 2018). By default, QANet solely uses word to any semantic role were assigned a <NOROLE> and character inputs. However, numerous off-the- label. shelf NLP tools are available that could be used to enrich these inputs with explicit linguistic infor- Integration of Linguistic Features in QANet. mation. This option is potentially interesting when In the standard QANet architecture, words and trying to adapt a model to other domains: While corresponding characters are embedded individ- additional training data might be expensive to ob- ually and then concatenated to obtain one repre- tain, these linguistic input features could boost sentation vector per input word. Following Sen- the performance by providing a scalable, domain- nrich and Haddow(2016), we enrich this process agnostic source of information. We expand the by mapping each of the linguistic input features per-word inputs with three different kinds of lin- described above to its own embedding space and guistic features: part-of-speech (PoS) tags, depen- then including them into the concatenation. Fig- dency relation labels and semantic roles. ure1 shows an updated version of the input embedding layer of QANet that includes the linguis- PoS Tags. We hypothesized that the information tic input features. about the part-of-speech of input tokens would help the neural network by reducing the number of 1Available at https://spacy.io/ Figure 1: The low-level structure of the input embedding layer, enriched with additional linguistic inputs. Each embedding vector consists of the embed- mance in our re-implementation of QANet. The ded information of the word, its characters, its PoS aim is to understand the impact of each parameter tag, the label of the dependency relation in which to compare it to the contribution of linguistic input the respective word is the child and its semantic features. roles. While the PoS tags and dependency relation labels are single word-level features and can be Dataset. We use the Stanford Question Answer- embedded by standard indexing and look-up, each ing Dataset (SQuAD) (Rajpurkar et al., 2016) for word can have multiple semantic roles. Therefore, parameter search. Yu et al.(2018) state that the re- we embed each semantic role separately and ag- sults on development and test set are strongly cor- gregate over them. After preliminary experimen- related. Thus, improvements on the development tation with convolution, summing and taking the set of SQuAD should also lead to improvements maximum, we decided for summing along each on the test set. Based on this claim, we only re- dimension of the semantic role embeddings2. This port results on the development set, since the test results in one aggregated semantic role embedding set of SQuAD is not publicly available. The train- vector per input word. Note that we intentionally ing set consists of 87599 samples and the test set do not compute any combinations of the features consists of 10570 samples. All texts are in English mentioned above manually. We simply enrich the language. available word-level input information and rely on Preprocessing. To preprocess SQuAD, we used the network to find meaningful connections.

Load more