Utility for Automatic Prerequisite Learning from Italian Wikipedia

UNIGE SE @ PRELEARN: Utility for Automatic Prerequisite Learning from Italian Wikipedia Alessio Moggio, Andrea Parizzi DIBRIS, Universita` degli studi di Genova fs4062312, [email protected] Abstract pose, automatic prerequisite relation learning between pairs of concepts. For the purposes of the The present paper describes the approach shared tasks, concepts are represented as learning proposed by the UNIGE SE team to tackle materials written in Italian. In particular, each con- the EVALITA 2020 shared task on Prereq- cept corresponds to a page of the Italian Wikipedia uisite Relation Learning (PRELEARN). having the concept name as title. The goal of We developed a neural network classifier the shared task is to build a system able to au- that exploits features extracted both from tomatically identify the presence or absence of a raw text and the structure of the Wikipedia prerequisite relation between two given concepts. pages provided by task organisers as train- The task is divided in four sub-tasks: specifically ing sets. We participated in all four sub– in order to make a valid submission participants tasks proposed by task organizers: the are asked to build at least one model for auto- neural network was trained on different matic prerequisite learning to be tested both in in- sets of features for each of the two training and cross-domain scenario since task organisers settings (i.e., raw and structured features) released four official training sets, one for each and evaluated in all proposed scenarios domain of the dataset. Concerning the model, it (i.e. in– and cross– domain). When eval- can exploit either 1) information extracted from uated on the official test sets, the system the raw textual content of Wikipedia pages, 2) in- was able to get improvements compared formation acquired from any kind of structured to the provided baselines, even though it knowledge resource (excluding the prerequisite la- ranked third (out of three participants). belled datasets). Eventually, we submitted our re- This contribution also describes the inter- sults on the official test sets for all four proposed face we developed to compare multiple subtasks. To tackle the problem proposed in the runs of our models. 1 shared task, we propose an approach based on deep learning to classify on different sets of fea- 1 Introduction tures in order to comply with the sub-tasks re- Prerequisite relations constitute an essential rela- quirements. We also developed a user interface tion between educational items since they express to support the comparison between the results ob- the order in which concepts should be learned by tained running the model trained using different a student in order to allow a full understanding of sets of features. Other than selecting which fea- a topic. Therefore, automatic prerequisite learn- tures should be used to train the model, the user ing is a relevant task for the development of many can exploit the interface to define the value of a educational applications. set of parameters in order to customize the classi- Prerequisite Relation Learning (PRELEARN) fier structure. The interface reports, for each run, (Alzetta et al., 2020), a shared task organized standard evaluation metrics (i.e., accuracy, preci- within EVALITA 2020, the 7th evaluation cam- sion, recall and F-score) and other statistics that paign of Natural Language Processing and Speech allow to explore the model performances. tools for Italian (Basile et al., 2020), has, as a pur- The remainder of the paper is organised as fol- lows: we present our approach and system in Sec- 1Copyright c 2020 for this paper by its authors. Use per- mitted under Creative Commons License Attribution 4.0 In- tion 2, then we discuss the results and evaluation ternational (CC BY 4.0). (Section 3). Section 4 describes the interface in detail. We conclude the paper in Section 5. validation folds. Moreover, training can be performed on a customizable set of features (see Sec- 2 System Description tion 2.2 for the complete list) since the input layer is set to dynamically match the size of the fea- ture vector. For the specific purposes of this work, we used in every scenario a model exploiting a 20 neurons hidden layer trained on 15 epochs. A 4- fold cross validation was used for the in-domain scenario. Training The official training set containing concept pairs and their binary labels was format- Figure 1: System architecture ted as a pair of numpy arrays: one of them has variable length and contains the serialization of the In this Section we present our approach for au- features, which will be the model input, whilst the tomatic prerequisite learning between Wikipedia latter contains the binary labels of the pairs. For pages. We exploited a deep learning model that the in-domain scenario, the model was trained us- can be customised by the user on a dedicated GUI. ing stratified random folds of concept pairs that The model was trained and tested on the official preserve the original proportion of domains’ pairs. dataset of the PRELEARN task. For the cross-domain evaluation scenario, a “leave ITA-PREREQ (Miaschi et al., 2019) is a binary one domain out” approach was used, training the labelled dataset in which the labels stand for the model on all domains but the one used for test. presence or the absence (1 or 0) of the prerequisite 2.2 Features relation between a pair of concepts. Each concept is an educational item associated to a Wikipedia We defined a set of features extracted from the page, therefore the concept name matches the ti- Wikipedia page content and structure that are tle of the equivalent Wikipedia page. Hence, the available in the GUI and can be selected by the dataset released for the shared task consists also user to train his model. While the pages content of the content and the link of the Wikipedia pages was provided in the official release of the training referring to the concepts appearing in the dataset. set, we exploited Wikipedia API 2 to extract the It covers four domains, namely precalculus, geom- Wikipedia metadata and knowledge structure. De- etry, physics and data mining. pending on the sub-task requirements, we trained our models with a different combination of fea- 2.1 Classifier tures. The classifier was built with the aim of testing • Features used for the raw features model: the combination of different hand–crafted features on the automatic prerequisite learning task. – titleInText: given a pair (A, B), it More specifically our classifier, whose architec- checks if the title of page A/B is men- ture is described in Figure 1, uses a two–dense– tioned in the page of the other concept. layers Neural Network built using Scikit-Learn – Jaccard similarity: a concept-based and Keras libraries (wrapped for Tensorflow). The metric that measures the similarity be- activation function for the hidden layer is ReLU tween two pages by the number of while the Adam optimizer (Kingma and Ba, 2014) words shared between them. is used as training algorithm. The output layer – LDA: the Shannon Entropy of the LDA consists of one neuron with sigmoid activation (Deerwester et al., 1990) of nouns function. and verbs in A and B. Nouns and Some structural properties of the classifier can verbs are identified thanks to a morpho- be customised by the user from a dedicated GUI. syntactic analysis of the page content In particular, for what concerns the structure of the performed by UDPipe pipeline (Straka neural network the user can define the size of the and Strakova,´ 2017). hidden layer and the number of epochs, while for 2https://github.com/martin-majlis/ the evaluation the user can set the number of cross Wikipedia-API an average accuracy computed across all four domains of 0.700. Interestingly Data Mining con- stitutes the only case where raw textual features are more effective than a combination of raw and structural features. In fact this domain shows lower accuracies in the structured settings, pos- sibly due to the lower number of entries within the dataset of Data Mining with respect to the other three domains and to the lower coverage of Wikipedia. If we compare our results obtained for each Figure 2: Variation of correctly labelled pairs wrt domain with those obtained by the official base- the classifier confidence for all submitted models. line, there are only two cases where our models do not outperform the baseline, i.e. Geometry in – LDA Cross Entropy: the cross entropy the raw feats in-domain subtask and physics in of the LDA vectors AnB. both cross-domain subtasks. During error analysis on geometry pairs, we observe that, while • Features used for the raw and structural fea- pages about geometric figures, e.g. ”Rettangolo” tures model. We exploited all the above fea- and ”Poligono”, show a prerequisite relation in the tures combined with the followings: gold dataset, our systems always fail to correctly classify them. Concerning Physics, we observe – extractCategories: the Wikipedia cate- that in both cross-domain settings the classifier gory(s) to which each page of the pair did not consider the page ”Fisica” as prerequisite (A, B) belongs. of other pages belonging to the Physics domain, – extractLinkConnections: for each pair causing the performances to be below baseline. of concepts (A, B) checks if the If we look at the variation of accuracy values Wikipedia page of B contains a link to for each model with respect to the classifier con- A. fidence (see Figure 2), we notice that although – totalIncoming/OutgoingLinks: it the four systems have a similar accuracy when computes how much a concept is linked the confidence is low those related to the two in- to/from other concepts.

Utility for Automatic Prerequisite Learning from Italian Wikipedia

A Topic-Aligned Multilingual Corpus of Wikipedia Articles for Studying Information Asymmetry in Low Resource Languages

Omnipedia: Bridging the Wikipedia Language

An Analysis of Contributions to Wikipedia from Tor

Does Wikipedia Matter? the Effect of Wikipedia on Tourist Choices Marit Hinnosaar, Toomas Hinnosaar, Michael Kummer, and Olga Slivko Discus Si On Paper No

Wikipedia Matters∗

How Does Mediawiki Work? 1 How Does Mediawiki Work?

Critical Point of View: a Wikipedia Reader

Wikipedia Survey – Overview of Results

Editorial Wikipedia Popularity from a Citation Analysis Point of View

Package 'Wikipedir'

Wikipedia Matters∗

Analyzing Wikidata Transclusion on English Wikipedia