On Applying Controlled Natural Languages for Ontology Authoring and Semantic Annotation
Total Page:16
File Type:pdf, Size:1020Kb
Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title On Applying Controlled Natural Languages for Ontology Authoring and Semantic Annotation Author(s) Davis, Brian Patrick Publication Date 2013-02-07 Item record http://hdl.handle.net/10379/4176 Downloaded 2021-09-25T01:14:11Z Some rights reserved. For more information, please see the item record link above. On Applying Controlled Natural Languages for Ontology Authoring and Semantic Annotation Brian Patrick Davis Submitted in fulfillment of the requirements for the degree of Doctor of Philosophy PRIMARY SUPERVISOR: SECONDARY SUPERVISOR: Prof. Dr. Siegfried Handschuh Professor Hamish Cunningham National University of Ireland Galway University of Sheffield INTERNAL EXAMINER: Prof. Dr. Manfred Hauswirth National University of Ireland Galway EXTERNAL EXAMINER: EXTERNAL EXAMINER: Professor Chris Mellish Professor Laurette Pretorius University of Aberdeen University of South Africa Digital Enterprise Research Institute, National University of Ireland Galway. February 2013 i Declaration I declare that the work covered by this thesis is composed by myself, and that it has not been submitted for any other degree or professional qualification except as specified. Brian Davis The research contributions reported by this thesis was supported(in part) by the Lion project supported by Science Foundation Ireland under Grant No. SFI/02/CE1/I131 and (in part) by the European project NEPOMUK No FP6-027705. ii iii Abstract Creating formal data is a high initial barrier for small organisations and individuals wishing to create ontologies and thus benefit from semantic technologies. Part of the so- lution comes from ontology authoring, but this often requires specialist skills in ontology engineering. Defining a Controlled Natural Language (CNL) for formal data description can enable naive users to develop ontologies using a subset of natural language. How- ever despite the benefits of CNLs, users are still required to learn the correct syntactic structures in order to use the Controlled Language properly. This can be time consum- ing, annoying and in certain cases may prevent uptake of the tool. The reversal of the CNL authoring process involves generation of the controlled language from an existing ontology using Natural Language Generation (NLG) techniques, which results in a round trip ontology authoring environment: one can start with an existing imported ontology (re)produce the CNL using NLG, modify or edit the text as required and subsequently parse the text back into the ontology using the CNL authoring environment. By in- troducing language generation into the authoring process, the learning curve associated with the CNL can be reduced. While the creation of ontologies is critical for the Se- mantic Web, without a critical mass of richly interlinked metadata, this vision cannot become a reality. Manual semantic annotation is a labor-intensive task requiring training in formal ontological descriptions for the otherwise non-expert user. Although automatic annotation tools attempt to ease this knowledge acquisition barrier, their development often requires access to specialists in Natural Language Processing (NLP). This chal- lenges researchers to develop user-friendly annotation environments. While CNLs have been applied to ontology authoring, little research has focused on their application to se- mantic annotation. In summary, this research applies CNL techniques to both ontology authoring and semantic annotation, and provides solid empirical evidence that for certain scenarios applying CNLs to both tasks can be more user friendly than standard ontology authoring and manual semantic annotation tools respectively. iv v Acknowledgements I would first like to thank Professor Hamish Cunningham, University of Sheffield, for giving me the opportunity to pursue postgraduate study and start an academic career at DERI, NUI Galway. In particular, I want to thank him and the GATE team for their support and invaluable training, Most importantly, I would like to thank my supervisor, Prof. Dr. Siegfried Handschuh for his patience, guidance and mentorship with respect to scientific writing, research project management, teaching, proposal writing, research leadership etc and for giving me countless opportunities to development my research skills and academic career. Other academics, I would like to thank include Prof. Dr. Stefan Decker, for his inspiration, Dr Conor Hayes for his practical advice on research project management and Dr Paul Buitelaar for being an veritable treasure chest of natural language processing experience. I would like to thank Prof. Chris Mellish and Prof Laurette Pretorius, (who made the long journey from South Africa) and Prof. Dr. Manfred Hauswirth for ensuring that the final submission is in pristine condition, such that in later years I will be able to glance through this thesis with pride! DERI (now INSIGHT) has and continues to be a stimulating environment to study and work in. I have met so many wonderful, intelligent and helpful people and made so many friends and acquaintances over the years that to list them all would be impossible! I want in particular thank the old Nepomuk/SMILE team past and present: Tudor, Knud, Laura, Cipri, Georgeta, Ioana, Ismael, Simon, VinhTuan, Alex, Pradeep, Vit, Manuela, vi Keith, Judy and Jeremy, who have all gone on to bigger and better things! A particular thanks to Behrang for always raising the NLP bar with his vast knowledge and wonderful discussions! Thanks to Luk, Laleh and Bahareh for tea, friendship and the occasional needed kick in the backside! Thanks to the DERI Operations, Administrative and Technical team over the years, which without, DERI would simply have ground to a halt: Hilda, Claire, Brian Wall, Ger and Andrew (and Caragh!) Gallagher, Carmel, Michelle and Sylvia and all the Irish not forgetting Ed and Sean! You have put up with a lot over the years! To my other baby OCD Ireland, special thanks to the team past and present and my good friend Leslie for her wisdom over the years and for taking on so much extra work in the last year so I could finish thesis! A special mention to Prof. Fionnuala as well! To all my childhood friends in Dublin, Neil, Ian and Ross, Eoin and Mark, who had no idea what I was studying and why it took so long, but have made me feel extremely proud! To staff of the former M. Davis and Co. for always making me feel welcome and not forgetting Rose and the infamous Mr Kitt! I want to thank my mother and father, Louis and Mary for their unconditional love, support, understanding and patience over the years, as well as my brother Colm and sister Louise. I want to acknowledge Mark and Yvonne, and my army of nieces and nephews, Caragh, Eoghan, Amelie and my Goddaughters Niamh and Ella and all the extended family!! I also want to thank my fianc´eAlessandra, who has always been a steady force of love, patience and much needed motivation to finish, this thesis even when I resisted! Finally, I want to mention my father, Louis Davis who passed away unexpectedly a year before this thesis was defended. I miss you very much Dad, and I wish you were here to celebrate with me. vii viii Dedicated to my father, Louis Michael Davis (1940 - 2012). x Table of Contents Abstract iv Acknowledgements v Table of Contents xi List of Figures xiv List of Tables xviii I Prelude 1 1 Introduction 2 1.1 Motivation and Problem Description . .... 2 1.2 ResearchQuestions............................... 6 1.3 Contributions.................................. 7 1.4 ThesisStructure ................................ 8 II Foundations 10 2 Human Language Technology and the Semantic Web 11 2.1 TheSemanticWebandLinkedData . 11 2.2 HumanLanguageTechnology . 25 xi 2.3 InformationExtraction. 28 2.4 Ontology based Information Extraction (OBIE) . ....... 40 2.5 Natural Language Generation . 46 2.6 Natural Language Generation from Ontologies . ...... 53 2.7 Conclusions ................................... 59 3 Controlled Natural Languages and Semantic Annotation 63 3.1 Control Natural Languages for the Semantic Web . ...... 63 3.2 SemanticAnnotation.. .. .. .. .. .. .. .. .. 74 3.3 Conclusions ................................... 87 III Core Research 92 4 CLOnE - Controlled Language for Ontology Editing 93 4.1 Introduction................................... 93 4.2 Requirements and Implementation . 95 4.3 Evaluation.................................... 101 4.4 RelatedWork.................................. 110 4.5 Conclusion ................................... 113 5 Round Trip Ontology Authoring 115 5.1 Introduction................................... 115 5.2 DesignandImplementation . 117 5.3 Evaluation.................................... 127 5.4 Relatedwork .................................. 137 5.5 Conclusion&Discussion. 143 6 Towards Controlled Natural Language for Semantic Annotation 146 6.1 Introduction................................... 146 6.2 CLANN: Design and Implementation . 154 xii 6.3 Evaluation.................................... 171 6.4 Relatedwork .................................. 184 6.5 ConclusionandFutureWork . 188 IV Conclusion and Future Work 191 7 Conclusions and Future Work 192 7.1 ResearchContributions . 192 7.2 OpenQuestions................................. 195 7.3 OngoingWorkwithCLANN . 196 7.4 FutureWork .................................. 197 7.5 Summary .................................... 199 Bibliography 201 A Evaluation documents for CLIE/CLOnE 231 A.1 Trainingmanual