An Approach to Managed Clustering for Knowledge- Based Networks
Total Page:16
File Type:pdf, Size:1020Kb
AN APPROACH TO MANAGED CLUSTERING FOR KNOWLEDGE- BASED NETWORKS A thesis submitted to the University of Dublin, Trinity College for the degree of Doctor of Philosophy Dominic Hugh Jones, B.Sc. (Hons), M.Sc. Knowledge and Data Engineering Group (KDEG) School of Computer Science and Statistics Trinity College, Dublin, [email protected] Supervised by Dr. David Lewis Co-Supervised by: Dr. Declan O’Sullivan Declaration and Online Access I declare that this thesis has not been submitted as an exercise for a degree at this or any other university and it is entirely my own work. I agree to deposit this thesis in the University’s open access institutional repository or allow the library to do so on my behalf, subject to Irish Copyright Legislation and Trinity College Library conditions of use and acknowledgement. ______________________ Dominic Hugh Jones April 2013 II Acknowledgements I would first like to thank Dr. David Lewis for providing me with this opportunity to pursue the degree of Doctor of Philosophy. Dr. Lewis has been supportive, critical, encouraging and has become both a friend and mentor over my time in KDEG, it has been an honour to work with Dr. Lewis. Dr. Lewis has been able to offer me funding through Science Foundation Ireland under Grant No 05/RFP/CMS014 (Mecon) and Grant number 07/CE/I1142 (CNGL). For this I am extremely appreciative. It has been a privilege to have the co-supervision of Dr. O’Sullivan who offered me advice and support that has been invaluable over the years. Special thanks to Dr. John Keeney, Dr. Alexander O’Connor, Dr. Seamus Lawless, Dr. Owen Conlan and Mrs. Mary Sharp who have supported and encouraged me throughout the years and made the difference to many a day. It was Ms. Marie Carroll who guided me through my undergraduate study, provided endless cups of tea and was there when needed. Marie will always be a great friend, who started me on my path through University level education and is herself an asset to education. However most importantly I would like to thank my parents Jan and Hughie Jones as well as my de-facto American Mom Dr. P. Jane Gale. Without their combined support this thesis could easily have disappeared during its final year in which they picked me up, stood by me, guided me and had faith in me. “Insanity is doing the same thing, over and over again, but expecting different results.” - Possibly Albert Einstein, possibly not. III Abstract In publish/subscribe networks the capacity of the delivery network can easily be exceeded as the volume and range of content increases. This is known as the flooding problem. One solution to this problem is to group publishers and subscribers with similar interests into clusters. A publish/subscribe network typically operates across a broker overlay. The work described in this thesis offers a novel solution to the flooding problem based on reducing the number of brokers between related publishers and subscribers, thus reducing the routing message volume. With fewer brokers between publisher and subscriber, fewer hops are required to route each message and ultimately the network is more efficient, with less resources required to deliver the same number of messages. This thesis describes the design, implementation and evaluation of a prototype system in which a Semantic-based Publish/Subscribe network is used to cluster publishers of content with the message brokers that are responsible for routing content to subscribers with related interests. The objective of this clustering is to reduce the number of hops required in the routing process. This contribution has been implemented in a prototype, KBNCluster, as part of this thesis and provides a platform for the clustering of KBNImpl, which has been shown to reduce the hop count required to deliver publications to subscribers and thus decrease the time taken to deliver publications. A further contribution of this work is the automatic calculation, placement and re-clustering of publishers and subscribers around their interests using a Policy-based Network Management approach. This contribution is evaluated in terms of management data collection costs, storage and policy execution and is shown to scale efficiently. This work is novel in the way in which semantic interests are calculated from a client’s publication or subscription, using a semantic map of concepts to calculate a single semantic point deemed representative of the client. An additional source of novelty in this work is in employing the network’s Semantic-based Publish/Subscribe capability for the delivery of management messages to the management system using a push-based approach in contrast to more standard pull-based architectures. IV Table of Contents Declaration and Online Access ........................................................................................ II! Acknowledgements .......................................................................................................... III! Abstract ............................................................................................................................ IV! Table of Contents ............................................................................................................. V! List Of Figures ................................................................................................................. XI! List Of Tables ............................................................................................................... XIII! List Of Code Examples ................................................................................................ XIV! Abbreviations ................................................................................................................. XV! 1! INTRODUCTION ....................................................................................................... 1! 1.1! Research Question ............................................................................................................. 5! 1.2! Research Objectives .......................................................................................................... 5! 1.3! Methodological Approach Taken ..................................................................................... 6! 1.4! Evaluation Findings ........................................................................................................... 6! 1.5! Thesis Contribution ........................................................................................................... 8! 1.6! Selected publications ......................................................................................................... 9! 1.7! Thesis Outline .................................................................................................................. 10! 2! BACKGROUND ........................................................................................................ 11! 2.1! Key Terms ........................................................................................................................ 11! 2.2! Semantics .......................................................................................................................... 12! 2.2.1! RDF and Ontologies ................................................................................................... 12! 2.3! Publish/Subscribe Middleware ....................................................................................... 15! 2.3.1! Topic-based Publish / Subscribe: ............................................................................... 16! 2.3.2! Content-based Publish / Subscribe ............................................................................. 16! 2.3.3! Knowledge-based Networks in comparison to CBNs ................................................ 17! 2.4! Policy-based Network Management (PBNM) ............................................................... 20! 2.5! Conclusion ........................................................................................................................ 21! 3! STATE OF THE ART .............................................................................................. 22! 3.1! Content-based Networks ................................................................................................. 22! 3.1.1! Siena ........................................................................................................................... 22! 3.1.2! Hermes ........................................................................................................................ 23! 3.1.3! Gryphon ...................................................................................................................... 24! V 3.1.4! Elvin ........................................................................................................................... 24! 3.1.5! Analysis ...................................................................................................................... 25! 3.1.5.1! Advantages of CBNs ........................................................................................................ 25! 3.1.5.2! Disadvantages of CBNs ................................................................................................... 25! 3.2! Semantic-based Publish/Subscribe (SBPS) ................................................................... 26! 3.2.1! Knowledge-based Networks ....................................................................................... 26! 3.2.1.1! KBNImpl