Linked Data Meets Computational Intelligence Position Paper

Linked Data Meets Computational Intelligence Position paper Christophe Gueret´ Vrije Universiteit Amsterdam De Boelelaan 1105 1081 HV Amsterdam, Nederlands Abstract Data source Number of triples US Census data ≈1 billion The Web of Data (WoD) is growing at an amazing rate and DBPedia ≈274 million it will no longer be feasible to deal with it in a global way, MusicBrainz ≈36 million by centralising the data or reasoning processes making use of DBLP ≈10 million that data. We believe that Computational Intelligence tech- ≈ niques provides the adaptiveness, robustness and scalability WordNet 2 million that will be required to exploit the full value of ever growing amounts of dynamic Semantic Web data. Table 1: Example of publicly available data sets and their respective size. Introduction The Web of Data (WoD) is growing at an amazing rate as 2. Privacy and coherence problems of centralisation more and more data-sources are being made available on- Data providers may see privacy issues in providing their line in RDF, and linked. Because of its size and dynamicity data to a central store. This is particularly true for in- it is no more possible to consider the WoD as a large and formation related to social networks which is, by na- static information system. It should instead be considered a ture, highly personal. People providing social informa- complex system in constant evolution. Thus, new algorithms tion about themselves and their friends want to keep con- will have to be developed to cope with this new context. trol over their data. For these reasons, decentralisation is considered to be the future of social networks (Yeung et The decentralized nature, and future, of the WoD al. 2009). As the WoD grows we believe a centralized approach where Additionally, the centralisation of data also raise compat- data is aggregated into large databases will no longer be pos- ibility issues. Data sources using different schemas are sible. This assertion is based on the following 3 observa- better off kept separated and made compatible through tions: a translation layer rather than translated into a single 1. The massive amount of data and data sources schema and merged into one data store. A Peer Data Man- Data stores have become increasingly efficient and can agement System (PDMS) (Tatarinov et al. 2003) is an ex- now deal with billions of triples but the number of differ- ample of such a decentralised system. A PDMS is defined ent data sources is also growing rapidly. As the number of by a set of autonomous peers hosting data according to data stores increases, incoherences, uncertainty and con- different schemas they relate through semantic mappings. tradictions are more likely to appear. As an illustration For new peers willing to share data, PDMS has the advan- of these size issues, Table 1 shows examples of publicly tage of a low entry cost, which is limited to the definition available datasets. of mappings with one of the peers already present. The execution of federated queries over live SPARQL One may also consider having data physically centralised endpoints is known to be extremely expensive, because but kept de-centralised. This model is that of a single data known optimizations (for example to deal with joins) do store hosting several named graphs or that of a cloud- not work in the distributed case. Instead, snapshots are based architecture hosting several stores. Although pre- taken at intervals, dumped into gigantic repositories and serving the provenance/context information, this model made available in database style for querying. But both leads to having the data being served by a single store. the exponential increase of figures and the risk for con- As such, privacy concerns may still be present. flicts creates a serious threat to the viability of this ap- 3. Opaque data locality proach. The recent development of cloud computing has high- Copyright c 2010, Association for the Advancement of Artificial lighted an interesting fact about linked data consumers Intelligence (www.aaai.org). All rights reserved. and providers: their interest is in having data available 59 somewhere and, more importantly, always accessible - global efficiency of the entire population, making the pop- not in knowing where or how it is stored or processed. ulation robust against individual failures. And this is exactly what cloud computing provides: an It is widely recognised that new adaptive approaches to- abstraction over physical constraints related to data serv- wards robust and scalable reasoning are required to exploit ing. Once put in a cloud, such as Amazon EC2 (Amazon the full value of ever growing amounts of dynamic Semantic Web Services 2009), data is made available in a scalable Web data (Fensel and Harmelen 2007). Computational In- fashion. The addition/removal of nodes in the cloud is telligence provides the features that algorithms dealing with a transparent operation for the user which sees the data the WoD will have to exhibit. provider as a single entity. Cloud computing comes close to the idea of Tuple-space This paper based middleware such as the Triple Space Comput- ing (Fensel 2004). Such middleware creates a persistent This paper highlights some of the research that has been shared information space every peer can read from and conducted so far to apply CI to Semantic-Web problems. write data to in an asynchronous way. Tasks such as sort- Although this combination is a recent field of study, some ing & storing that data or dealing with failure of stor- achievements have already been made: the eRDF evolution- age components are in the charge of autonomic entities ary algorithm makes it possible to query live SPARQL end- that exhibit several self-* behaviours (Kephart and Chess points without any pre-processing, such as the creation of an 2003). index or the retrieval of data dumps; by gossiping semantic information a PDMS can self-organise and improve its We believe that the field of Computational Intelligence interoperability at large; a swarm of agents taking care of provides a set of techniques that can deal with the decen- the localisation of data within a Triple Space produces opti- tralised nature of the future WoD. Before discussing how mised data clusters and query routes. Computational Intelligence applies to the Semantic Web, we All of these applications and some others will be detailed briefly introduce the field. in the rest of the paper. A particular emphasis will be put on the joint work currently conducted by the CI and Knowledge Computational intelligence Representation and Reasoning groups of the Vrije Univer- Computational Intelligence (CI) is a branch of AI focused siteit of Amsterdam. We are particularly interested in fos- on heuristic algorithms for learning, adaptation and evolu- tering emergence and self-organisation (Wolf and Holvoet tion. Evolutionary computation and collective intelligence 2005) and turning data-related tasks into anytime optimisa- are among the most popular paradigms of CI. They provide tion problems. particular advantages : • Learning and adaptation: the performance of CI algo- Emergence and self-organisation in rithms improves during their execution. As good results decentralised knowledge systems are found, they learns why these results are good and/or how to find similar ones. This learning can also cope with One can define emergence as the apparition of global be- changing conditions and consequently adaptat. haviour/structure/pattern from local interaction between el- • ements of a system (Wolf and Holvoet 2005). The use of Simplicity: CI design principles are simple. For instance, emergence can be compared to the divide&conquer-like ap- an evolutionary algorithm is based on the survival of the proach commonly used for distributed computing. In di- fittest: in an iterative process, solutions are guessed, ver- vide&conquer, a supervisor divides a global task into sub- ified and deleted if they are not fit. The expected result, task, send it to a set of computing resources and coordinate that is to find an optimal solution to the problem, comes the results (Figure 1). as a consequence of the basic mecanism. This bottom-up approach differs from the complex top-down approaches commonly used to deal with hard problems. Coordinator Result • Interactivity: CI techniques are used in a continously running, typically iterative, process. This implies two things: First, at any-time, the best result found so far can be returned as an answer to the posed problem. Secondly, Peer Peer Peer CI algorithms can be made interactive and incorporate a user into the loop. This interaction is of great benefit when the quality of a result is difficult to appreciate in an auto- Figure 1: Divide and conquer resolution. A central coordi- mated way (Takagi 2001). nator direct peers toward a specific goal • Scalability, robustness and parallelisation: all of these three advantages result from using a population of co- Architectures focused on emergence define the local in- evolving solutions instead of focusing on only one. Be- teraction between entities. The global goal to achieve is un- cause each member of the population is independent, al- known to the entities, nor does one entity have as a com- gorithms are easy to parallelise. Also, the bad perfor- plete view of the solution at any moment. The expected goal mance of some members will be compensated by the emerges from the local interactions (Figure 2). 60 Peer Peer query routing, these traces are used to efficiently dissemi- Result Result nate information. Ants carying a piece of information will follow paths on which traces are the most similar to the car- Peer ried item. Result Note that collective intelligence has already proven to be efficient for network routing (Dorigo and Caro 1998) and data clustering (Deneubourg et al. 1990). The work reported Figure 2: Global properties emerge from local interaction in this section shows that these techniques can be combined among the peers with Semantic Web technology in order to make a better use of decentralised data.

Load more