From the knowledge acquisition bottleneck to the knowledge acquisition overflow: A brief French history of knowledge acquisition Nathalie Aussenac-Gilles, Fabien Gandon

To cite this version:

Nathalie Aussenac-Gilles, Fabien Gandon. From the knowledge acquisition bottleneck to the knowl- edge acquisition overflow: A brief French history of knowledge acquisition. International Journal of Human-Computer Studies, Elsevier, 2013, vol. 71 (n° 2), pp. 157-165. ￿10.1016/j.ijhcs.2012.10.009￿. ￿hal-01124416￿

HAL Id: hal-01124416 https://hal.archives-ouvertes.fr/hal-01124416 Submitted on 6 Mar 2015

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Open Archive TOULOUSE Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible.

This is an author-deposited version published in : http://oatao.univ-toulouse.fr/ Eprints ID : 12333

To link to this article : DOI :10.1016/j.ijhcs.2012.10.009 URL : http://dx.doi.org/10.1016/j.ijhcs.2012.10.009

To cite this version : Aussenac-Gilles, Nathalie and Gandon, Fabien From the knowledge acquisition bottleneck to the knowledge acquisition overflow: A brief French history of knowledge acquisition. (2013) International Journal of Human-Computer Studies, vol. 71 (n° 2). pp. 157-165. ISSN 1071-5819

Any correspondance concerning this service should be sent to the repository administrator: [email protected] From the knowledge acquisition bottleneck to the knowledge acquisition overflow: A brief French history of knowledge acquisition

Nathalie Aussenac-Gillesa,n, Fabien Gandonb

aIRIT - CNRS, Toulouse, France bINRIA Wimmics, Sophia Antipolis, France

Abstract

This article is an account of the evolution of the French-speaking research community on knowledge acquisition and knowledge modelling echoing the complex and cross-disciplinary trajectory of the field. In particular, it reports the most significant steps in the parallel evolution of the web and the knowledge acquisition paradigm, which finally converged with the project of a semantic web. As a consequence of the huge amount of available data in the web, a paradigm shift occurred in the domain, from knowledge-intensive problem solving to large-scale data acquisition and management. We also pay a tribute to Rose Dieng, one of the pioneers of this research community.

Keywords: Knowledge modelling; French research; Knowledge-based systems

In this article we give a localized account of the with the research trends and paradigm shifts which have evolution of the domain of knowledge acquisition (KA) characterized the KA domain. that Brian Gaines has presented in a broader perspective in With this paper, we also wish to pay tribute to the his contribution to this special issue (Gaines, this issue). generosity, the scientific talent and the unforgettable smile We contrast the evolution of KA with the parallel evolu- of one of the pioneers in the French and international KA tion of the Web and indeed, in the last 10 years, knowledge research communities, our colleague Rose Dieng. engineering as a research domain and the Web have converged in particular in the Semantic Web project and 1. When AI requires knowledge acquisition the current Web of Data. Here we describe the evolution of the French-speaking conference on knowledge acquisition The French AI scientific groups interested in building rule- and knowledge modelling in order to give an overview and based systems or learning systems highlighted knowledge a brief history of the domain, which has at times been acquisition as a research issue as early as 1986. Pioneer hidden by the language barrier. This evolution echoes the researchers and engineers from innovative companies experi- complex and cross-disciplinary trajectory of the field menting expert systems (like CEA and EDF) organized an presented in Brian Gaines’ exhaustive outline and is also informal scientific meeting in 1988, whereas J.G. Ganascia consistent with Musen’s historical outlook on the last 25 and Y. Kodratoff co-chaired one of the first EKAW work- years of the international workshops (Musen, this issue). shops in Paris in 1989. At times one or two years earlier or later, the French KA The French-speaking conference on knowledge engineer- conferences reveal a strong convergence and consistency ing is called IC, for Inge´nierie des Connaissances (). It started soon after, in 1990 and was known nCorresponding author. at the time as the JAC, for Journe´e d’Acquisition des E-mail addresses: [email protected] (N. Aussenac-Gilles), Connaissances, literally the knowledge acquisition day. [email protected] (F. Gandon). Approximately at the same time the idea of Web was born at CERN. The acquisition problem is directly inherited of a ‘‘file structure for the complex, the changing, and the from expert systems and focuses on capturing the knowl- indeterminate’’ information to the scale of the Internet, edge needed to feed them. Research on Knowledge Acquisi- creating a whole new social medium of knowledge. tion was one of the two trends motivated by the limitations In 1991 la JAC became les JAC (Knowledge Acquisition faced by expert systems, the other one being the definition Days) spanning several days and the program focused on the of richer logic-based knowledge representations. methodologies for the acquisition and modelling of knowledge In 1990, the first JAC (Knowledge Acquisition Day) was including references to methods like KADS (Born, 1990) and organized by CNET in Lannion (Centre National d’Etudes KOD (Vogel, 1988). Newell’s knowledge level (Newell, 1980) en Te´ le´ communications) following a meeting on the same was systematically cited as the right one to describe problem topic organized by the Artificial Intelligence research group solving knowledge before its encoding into a concrete GDR-PRC in January 4, 1989. At the time, the JAC aimed symbol-level representation. The first established results of at gathering the francophone community in the field of a large European project named KADS suggested that knowledge acquisition and to clarify the relationship several description layers were required to analyse the system between this domain and machine learning. This first knowledge from various perspectives. Influenced by edition was on purpose positioned within a multidisciplin- Chandrasekaran’s (1986) generic Tasks and McDermott ary framework, as shown by the diverse origins of the group’s work on Role Limiting Methods (McDermott, presentations that day, including computer science, indus- 1988) (reusable problem solving models), KADS clarified trial research, and psychology. The French knowledge how the system goals and tasks differed from the processes engineering community (IC) still maintains as a birth mark and methods followed to carry out the task. So, one of this specificity to be a multidisciplinary conference, rather the most studied layers was the problem solving model. than a sub-domain of Artificial Intelligence. Over the years, It described reasoning methods independently of the domain the disciplines involved in knowledge engineering have knowledge, only taking the type of problem into account. changed in keeping with the research main trends. In The two foundational research issues here are knowledge 1990, the strong reference to structuralism in expert systems reuse (how much can a knowledge model be reused and how? assumed that rules and frames are more than convenient what is reusable in a knowledge model?) as well as the implementation paradigms: they have a cognitive validity definition of modelling primitives (what are the components of and reflect cognitive structures. Expert systems map human a conceptual model at each layer? how are these components expertise, often the one of a single expert, and they aim to and layers linked together?). solve problems using the same heuristics as the expert. That same year, the first Web server was installed outside The research issues formulated at the time determined Europe. most of the domain structuring paradigms for the next ten years: What is the right abstraction level to describe the 2. Knowledge acquisition for modelling, modelling to guide system problem solving behaviour independently of the acquisition formal representation? How can a conceptual model guide the identification of the knowledge to be captured in the In 1992 the JAC (Knowledge Acquisition Days) were system? What is the structure and content of these models? particularly interested in the analysis of textual corpora, And finally which formalisms should be used and which through natural language processing, to acquire knowl- processes should be supported? Acquisition methods edge. Another key issue was the nature and reuse of included interviews as well as psychological techniques knowledge components for generic models. The emergence like card sorting or the repertory grids inspired from of issues related to the exploration of textual documents as Kelly’s Personal Construct Theory and promoted by knowledge source resulted from the systematic rewriting of Boose (1984), Gaines and Shaw (1980). In France, AI experts’ interviews. Language analysis was basic and researchers collaborated with cognitive psychologists to reflected the simplistic semantic hypotheses formulated at define acquisition methods, one of the most famous being that time: each sentence was expected to provide various KOD by Vogel (1988). predicates and logic formulae. The same year, Tim Berners-Lee proposed for the The then famous KADS’s four-layers modelled to several second time at CERN a memo specifying a system he first significant changes in the role of the conceptual model: firstly, called ‘‘Mesh’’, where he suggested the use of hypertext for a conceptual model no longer modelled an expert knowledge, information management within CERN, by extending the it represented the system knowledge; secondly, the model references of hyperlinks to network addresses of docu- could be used as a grid that drove the acquisition process. ments in order to build a ‘‘mesh’’ between documents Once selected and adapted, a problem solving method stored on different machines. The memo was returned to determined the roles played by domain knowledge and him with the handwritten ‘‘vague but exciting’’ and so identified various sets of rules to be applied at each step of began the development of the technical architecture of the the method. Model-based KA opened new perspectives to Web. This technology will in itself revolutionize the building knowledge-based systems. Whereas the ‘‘Knowledge problem of knowledge acquisition and knowledge publishing Acquisition Journal’’ first issue contributed to state that KA by extending Ted Nelson’s hypertext principle (Nelson, 1965) was not just a short-term problem, but rather a complex research issue that would survive rule-based systems, a special the most important one was Mosaic which allowed for the issue of the RIA1 French journal (Aussenac-Gilles et al., first time to visualize the images directly in the text of a page. 1992) made this research domain more visible in the French With this browser, the Web would actually spread around AI landscape. In 1988, a French special interest group about the world leaving behind his ancestors Gopher, WAIS, and KA (called GRACQ2) was set up with the help of the CNRS FTP. The so-called CGI (Common Gateway Interface) was and the French association of AI (AFIA). It gathered most also proposed to allow web servers to not simply send static of the research teams working in this area. GRACQ pages but also to execute a program to return a generated organized 6 annual meetings for several subgroups dedicated content. This technical detail would open a huge new avenue to problem solving methods, control knowledge, reuse, text for the Web, enabling it to go beyond the documentary analysis, extraction techniques and methodological issues. service to provide the means of universal access to service There were then a dozen web servers and new Web applications. browsers appeared in the course of the year. In 1994, the program of the JAC included a new The (so often cited) paper (Gruber, 1991) by T. Gruber word alongside knowledge models: ‘‘ontology’’. A about ontology issued in 1991 suggested that an ontology single paper examined the notion of ontology (Reynaud can be used as a model to represent domain knowledge in a and Tort, 1994)4, and explained to the KA French reusable way for various applications. Gruber’s paper researchers what makes ontologies different from managed to overcome a double difficulty: to turn a domain models. Hot topics were modelling methods, complex philosophical notion into a rather easy to under- model validation and simulation, the use of cases in stand technical object; and to solve two reusability issues knowledge models, the comparison of methods and the (format standardization and content unification, consensus feedback analysis of their experimental use. Another within a domain) with one single model. It would take 2 original contribution (Lepine and Aussenac-Gilles, years before this paper impacted on the French knowledge 1994) was dedicated to an innovative way to exploit acquisition community. texts as knowledge sources: instead of a systematic In 1993, JAC was not organized because the European sentence to model mapping, the idea was to use a term equivalent event, the EKAW’93 conference, took place in extractor and build up a domain vocabulary to speed France for the second time. The numerous activities of up the modelling process. GRACQ were presented via posters during the open day, The Web browser Mosaic was spreading fast and demonstrating the dynamism of more than 30 research groups. researchers in the French community wondered what The domain continued to evolve in keeping with new would be the impact of this tool compared to the Minitel5 perspectives on knowledge-based systems (KBS): knowledge system which had been deployed in France. More than 600 sources were diversified, the experts’ know-how was better Web servers were now online, but more importantly, the characterized and collaborations with human factor specialists first edition of the World Wide Web conference (WWW underlined the necessity to capture knowledge in use, to 19946) invited Tim Berners-Lee to present his vision of focus on the actual human activity and not only on their what would become the Semantic Web. In parallel, with discourse about it. Rather than independent problem solvers, the proliferation of browsers the ‘‘browser war’’ began and knowledge-based systems were defined as users’ assistants that to avoid the dangers of fragmentation or monopoly, a carry out only a part of the overall task. Feedbacks from standardization body was created for the Web: the W3C. various attempts to provide users with expert systems proved that the quality of the solving capabilities does 3. From knowledge in use to models: knowledge engineering not guarantee that the system is useful for its users. Human- centered design (Norman, 1986) suggested that the focus was In 1995, alongside topics now classic at JAC like not the system on its own but the pair (user, system) and its knowledge acquisition, explanation, and modelling and interactions. Zacklad (1993),3 by is one of the first papers at representation languages, two new topics first appeared, JAC to explore what a knowledge-based interactive support which would remain over the following years: (1) return system could be. on experience, including the problem of integration with That same year, CERN leaders officially announced that legacy systems and (2) bidirectional links between knowl- Web technology would be free and royalty free. This edge modelling on the one hand and learning and reasoning important step would allow viral penetration of these on the other hand. 1995 was the year when the Common- technologies in other information systems. Earlier this year, KADS (Schreiber et al., 1999) project started as a revision there were about fifty servers. New browsers appeared but of KADS that would take into account the context in which a system is used, task sharing between the user and 1Revue d’Intelligence Artificielle (AI Journal). 2GRACQ stands for Groupe de recherche ACQuisition des Connais- the KBS, and the possible conflicts and divergences sances. Founders and early leaders of this group are J.P. Krivine, between knowledge sources. CommonKADS also aimed P. Laublet, N. Aussenac-Gilles, G. Kassel, J. Charlet and R. Dieng. http://www.irit.fr/GRACQ/ 4early ideas that were later developed in (Reynaud and Tort, 1996) 3Read (Zacklad and Rousseaux, 1996) for an English presentation of 5http://en.wikipedia.org/wiki/Minitel this work 6http://www94.web.cern.ch/www94/ at more flexibility in the definition of specific problem the social and collective dimension of knowledge in the solving methods from generic and reusable ones. definition of an information system. The information More than 10,000 Web servers were then available. By system not only included a KBS, but it also defined a introducing CSS, the W3C began looking at how to work organisation that distributed knowledge and pro- standardize the decoupling of Web content and its pre- blem solving over the system, its users, documents, and sentation in particular to facilitate multiple processing of other human activities. A debate arose concerning the the same content. By separating the content structure from scope of KE: Should it systematically include a technical the presentation format the web made a first step toward a and engineering AI-based contribution? Should it system- web of structured content. atically require the integration of human factors and In 1996, the JAC program highlights were: ontologies, organizational issues? This debate is an on-going one, refinement of models and methods, cooperative system although the strong influence of computer science in the construction and industrialization of approaches. One of field often sets the technical issues at first place. the challenges was to push KA out of the academic area and to be able to demonstrate the feasibility of these 4. Knowledge is distributed, collective and embedded in approaches when dealing with ‘‘real’’ applications. Worry- activity ing about actual applications and not only toy data-sets was one of the key features that implied changing from In 1998, the word ‘Web’ appeared for the first time in the Knowledge Acquisition to Knowledge Engineering (KE). title of an article at IC (Knowledge Engineering confer- The complexity of knowledge and contextual features to ence). It was about time since the Web had already passed be taken into account in industrial applications required the one million servers mark the same year. While knowl- combining in an innovative way more basic or elementary edge representation, modelling languages and corporate contributions, techniques and tools. The technical feasi- memories were still very present in IC’98 program, hyper- bility of this combination raised engineering issues, and its text and hypermedia also confirmed their importance as a validity, given the theoretical background and methodolo- means to support access to knowledge in information gical assumptions behind each technique, was a true systems. It was admitted that knowledge-based systems research problem per se, at the frontier of Knowledge do not necessarily solve problems or use logic languages to Engineering. produce inferences. Knowledge systems provide a more or There were at the time more than 100,000 Web servers. less dynamic access to knowledge as it can be captured by This year also saw one of the first actions on web security language in text, by use traces or cases. A session was also with PICS recommendation for the protection of children dedicated to computer mediated cooperative information against inappropriate content: the size and the diversity of systems. Vincent Quint gave an invited talk about going the content of the web were starting to create new beyond HTML to rich documents on the Web. Indeed the challenges in parallel with their growth. same year the first recommendation on XML was pub- In 1997, JICAA (Days of Knowledge Engineering and lished. The other invited keynote from Richard Benjamins Machine Learning) is the hinge between JAC (Knowledge opened up perspectives on how to transform part of the Acquisition Days) and IC (Knowledge Engineering con- Web into a knowledge base. That same year, Tim Berners- ference) showing how the challenges are broadening from Lee published a paper called ‘‘Semantic Web Road map’’ the problem of acquisition. KBS were by then part of the (Berners-Lee, 1998). broader context of information systems, and knowledge In the KE domain, CommonKADS was about to end. acquisition and validation had to be positioned within Many papers referred to its use, evaluation or adaptation knowledge engineering. The trend was to open conferences to specific domain data. Modelling the problem solving to very different areas so that theoretical and applied knowledge was the core issue that had been significantly concerns could coexist with a maximum of relevant improved with regard to KADS. Rather than a library of different approaches. Among the session topics, we find static methods, CommonKADS offered a library of adap- ontology, terminology and knowledge extraction from table components and, for each of them a set of questions. text. The switch observed in 1994 about text analysis Answering these questions guided the selection of the best became obvious. The use of mining techniques to explore adapted components that would form the system problem text and extract linguistics clues from which pieces of solving model. CommonKADS also provided modelling model could be built up defined a new trend. It opened the languages to represent a variety of forms of knowledge way to designing, combining and testing innovative tech- including problem solving methods, domain knowledge niques for Natural Language Processing and knowledge and ontologies, and traces of activity tasks. modelling. Additional topics were: hypertext and docu- More generally, KE research issues concerned both ment modelling, corporate memories and collaborative technological and representation issues, as well as organi- systems, validation of knowledge for KBS, data mining, sation and human factors. Methodological questions like case-based reasoning, and cognitive approaches. As an how to access collaborative knowledge, knowledge in use echo to Clancey’s notion of ‘‘situated cognition’’ (Clancey, and practices in organizations needed to call for concepts, 1997), the challenge addressed here was the integration of techniques or theories (like complexity theory and systemic approaches) from other disciplines, such as, management, proposed to structure knowledge in triples that connect sociology or ergonomics. The scope of knowledge engi- URIs and values with labelled relations. The set of triples neering was widening and contributions explored very in a domain share some nodes and form a complex graph. different and complementary questions in the process that Reasoning with graphs, querying graphs of instances with ranges from qualifying source knowledge in human activ- engines like Corese (Corby et al., 2004) were one of the ities to making it operational or available via an important research issues that emerged then. That same application. year, Rose Dieng published a book in French with her In 1999, IC (Knowledge Engineering conference) still team on methods and tools for knowledge management had sessions on ontologies, and knowledge acquisition (Dieng-Kuntz et al., 2000) while chairing the International from text but a new theme was introduced on cognitive Conference EKAW 2000. ergonomics and knowledge engineering vs. requirement In 2001, the Semantic Web appeared in the program of engineering. A particular problem raised was the one of IC. It would be the theme of a session in addition to other the links between the study of information systems, the topics including: Information Systems for Decision Sup- engineering of knowledge-based systems and the manage- port, epistemology, practical experiences, cooperation and ment of organizations in which these systems are deployed. case-based reasoning. The challenge was to integrate the fact that our systems are The collective dimension of knowledge requires one to immersed in usage. The conference papers showed that the define how to combine various knowledge sources when introduction of new processes leveraging individual and building an ontology. For instance, the ontology building collective knowledge raised a set of interrelated problems platform Prote´ ge´ was modified to support the collaborative that could no longer be treated separately but had to be design of ontologies (Tudorache et al., 2008), thus allowing addressed jointly by computer scientists, cognitive scien- different kinds of organizations that distribute roles and tists, researchers in ergonomics, sociologists and managers. rights to contribute to the ontology, and to check and W3C working groups at this time generalized the validate these contributions. Another emerging challenge integration of multimedia objects and scripts in Web was to optimize knowledge reuse and model combination. pages, which would allow the Web to acquire richer, more The Web was unavoidable with 26 million servers. W3C dynamic and more reactive content. These were the basic groups were interested in generalizing the notion of link in techniques that enabled the birth of Web 2.0 in the structured documents (XLink) and in allowing access to following years. In turn this new richness of multimedia the Web by voice and hearing (Voice XML), including the content and interaction means would tremendously change phone. These changes prefigured the ideas of linked data and augment the content of the Web and the applications and mobile web. it would allow. In 2002, IC was still marked by multidisciplinary contribu- tions and topics as diverse as: problem solving systems, 5. From problem solving systems to knowledge browsing and hypermedia systems, education engineering, terminology model querying extraction from texts and knowledge management. Several contributions compared these paradigms and stressed how In 2000, the emphasis in IC was on Intranet and closely the issues raised by each paradigm were related to Internet, information systems and education engineering. knowledge management. Companion disciplines were natural An important part was also given to proofs of concepts, language processing and computational linguistics, human through a demonstration session with 20 software applica- factors and distance learning. The topic of ontologies had tions and two parallel sessions on important achievements become central. The Semantic Web was also now a topic of in the field. A session was dedicated to the epistemology of its own. Industrial design emerged as a new topic in the knowledge engineering and another session was concerned application sessions. An intelligent system was no longer an with the links between hypertext and ontologies. Termi- active problem solver but rather an assistant that drives the nology and knowledge extraction from texts were still core user towards the relevant information that will assist him topics of the program but we also discussed the evolution properly. Navigation through information sources thanks to of the ‘‘knowledge’’ capital in a business, object-oriented semantic annotations turned out to be one of the most modelling of knowledge and knowledge management. The popular knowledge-based applications. After the pioneering role of conceptual models had then definitively changed: work by Guarino and Masolo (Guarino et al., 1999) they were no longer just intermediary representations ontologies were used for concept-based annotation of before operationalisation, but rather representations to domain specific documents. The larger the document collec- be browsed, queried or used to support man-system tions, the more generic the ontology. The popularity of cooperation and task distribution. In this context, graphs lexical databases like WordNet, and their limitations emerged as a very rich and relevant knowledge representa- (Gangemi et al., 2003), suggested the need for developing tion. In particular, conceptual graphs, defined at first to more structured and appropriate ontologies. represent natural language interpretations, were used to At the same time, W3C offered a first recommendation operationalize the first versions of the Semantic Web (P3P) to promote respect for the privacy of users and these languages defined by the W3C. RDF and RDFs standards aspects were starting to worry more and more people in the community. In parallel, another group working on web openness to other disciplines such as organization theory, accessibility provided good practices to ensure that knowl- cooperative work systems and education engineering. edge on the web remained accessible. Web addresses (URLs) became multilingual (IRI), as In 2003, among the topics we could find modelling, text characters other than ASCII ones could now be used in and ontologies, Semantic Web, knowledge management, addresses. W3C also launched an initiative to promote the design and implementation. Semantic web applications Web in developing countries and a working group was were mature enough to run experiments where the gain created to work on the contributions of the Semantic Web brought to users could be qualitatively (if not quantita- to the medical field. tively) qualified. The social dimension of the annotation In 2006, annotations were an important topic of IC, as well processes and ontology definition was one of the few as mapping and visualization of knowledge. These research references to human factors at this conference. The issues were directly linked to the increasing amount of increasing number of technical issues, related to ontology information and documents available on the web. Existing mapping, concept-language association and the improve- wikis, tagged documents and tag hierarchies revealed new ment of reasoning facilities, became a major focus that put types of ‘‘social’’ knowledge. They provided valuable aside the study of human acceptance and use of these resources to identify usage-driven relationships between technologies. The W3C proposed an evolution of Web concepts. As the web grew in size and diversity, the challenge forms that could be generalized to the whole family of to turn it into a semantic web became more complex and XML languages. The web continued to host more and included the management of large data-sets, as well as the more applications and not only content. access to text and document content. Concept-based annota- tion combined the use of information extraction and NLP 6. A world of Ontologies techniques. Once annotations are available, new browsing and reading devices can be designed that take benefit from In 2004, IC considered the relationship between knowl- additional knowledge expressed in either natural language or edge engineering and document engineering, cooperative a formal representation. The question ‘‘Can the Semantic activities and support systems. In addition to text mining, Web be social?’’ was raised for debate (Zacklad, 1999; the topic of data mining was added, whereas it was Gandon, 2006): beyond the technical issues, beyond the addressed mainly in other conferences up to that time. seduction of the semantic web infrastructure, how can The popularity of mining approaches to acquire knowl- ontologies and ontology-based annotations take into account edge models revealed the increasing efficiency of machine the diversity of social groups and their points of view? What learning and statistics to ‘‘learn’’ the rules and knowledge are the social and technical challenges to be addressed before revealed by data regularity. Mining and learning took this technology gets a real take-up from users? advantage of the increasing number of available electronic The issue of internationalization spanned several work- textual documents. These methods competed with (and ing groups at W3C, which also inaugurated an office in often out-performed) the manual analysis of human China. Internationalization addressed cultural differences, expertise and expert documents. There was also a session language and domain specificities. on conflict resolution and consensus building, which In 2007, IC discussed the Semantic Web, its comple- confirmed that knowledge was diverse and that unifying mentarities with the Web 2.0 and concept-based search for models had to cope with this diversity. A panel reported on information, with a focus on the synergy between annota- the transfer to industry of knowledge technologies. Je´ r ˆome tions and knowledge-based systems. Other topics included Euzenat’s invited talk introduced OWL, the official recom- the analysis of texts and ontologies, applications of knowl- mendation from W3C to extend the expressiveness of the edge engineering, as well as cooperation and sharing of Semantic Web formalisms. knowledge within human communities. During this same The number of Web servers exceeded 46 million. Web year, the FreeBase7 database, described as ‘‘an open shared access from mobile phones and PDA had now become an database of the world’s knowledge’’ was made available. important activity in W3C. Tim O’Reilly defined it as ‘‘the bridge between the bottom In 2005, ontologies attracted more and more research up vision of Web 2.0 collective intelligence and the more work, either from a technical point of view or regarding structured world of the semantic web.’’ Another major their use and engineering. The IC conference managed to community project was launched: the first version of the bring together the issues of building and using ontologies DBpedia8 ontology was built from Wikipedia info-boxes on the one hand and of knowledge engineering in organi- (Auer and Lehmann, 2007). The same year appeared the zations on the other hand. Several papers provided first Linked Open Data cloud on the web, which was based methodological proposals for the construction of ontolo- on metadata collected and curated by contributors to the gies from text corpus or from existing knowledge bases. CKAN directory. This cloud would mirror in the following There was a development of the topic of the alignment of existing ontologies. Indexing and annotation using ontol- ogies for intelligent information search were also well 7http://wiki.freebase.com/wiki/Main_Page presented in this edition. Several articles illustrated the 8http://dbpedia.org years the growth of available linked data on the web, each and population of ontologies. Ontology-based annotation dataset acquired from very different sources. was again present as well as the design of interfaces and W3C, finally adopted a query language for XML interactions. In addition, two original sessions were added: (XQuery) and proposed (GRDDL) as a gateway between a session dedicated to the evaluation of semantic similarity the structured Web (XML documents) and the Semantic and the adaptation of ontologies to the user and a session Web (RDF graphs). W3C opened an office in South Africa devoted to modelling processes, practices and cases. The reminding us that the digital divide was real and that the notion of ontology design pattern, defined in the NeOn12 progresses we have made on knowledge engineering and European project, adapted the software engineering notion the semantic web sometimes could lead to the exclusion of of design pattern to ontology engineering so that ontology even more users, who could not benefit from them. structures could become more reliable and reflect the actual meaning that ontologist has originally assigned to them. 7. The semantic web: turning data into knowledge In 2010, IC promoted emerging issues thanks to four workshops (such as, medical semantic web and ontology In 2008, IC sessions included the dissemination of evaluation) and a tutorial on modelling before formalizing medical knowledge, querying knowledge graphs, ontologi- ontologies. Rather than technical, current issues were cal knowledge-based systems and ontology design. The methodological ones. Many papers focused on how to invited speaker Ivan Herman, Head of the W3C Semantic evaluate the quality and relevance of these data, and how Web Activity, gave an overview of the Semantic Web that to adapt models and metadata to specific applications and led to a very rich discussion on the relationships between users. A striking evolution was the convergence between the Semantic Web and the Social Web. A session on knowledge engineering and information extraction and Semantic Web and Web 2.0 echoed a co-located workshop information retrieval. Although evaluation criteria ‘‘IC 2.0’’, considering advances towards a social knowl- remained different in these scientific fields, the perspective edge engineering process, which used Web 2.0 to change of the semantic web offered a ground for collaborations. knowledge engineering practices. Other sessions addressed Concept-based and ontology-based annotations were the extraction of knowledge from texts, knowledge-based experimented as an alternative to current search engines. instrumentation of practices, traces and inscriptions of The expected gains are not that easy to demonstrate. knowledge and learning and adaptation. Another debate Evaluation would require to improve word matching, to concerned the question of the value of a French-speaking be able to evaluate synonymy and to define measures that Knowledge Engineering conference in its relations with the compare concepts from one or several ontologies. All these international conferences in the same domain. research issues define some of the recurrent topics of the On the Web, a very large number of ontologies was then last four years. Another consequence of the large amount available, which could be accessed thanks to semantic of web documents was the increasing number of references engines like Watson9, Swoogle10 or Sindice11. The majority to learning, either through Natural Language Processing of these ontologies contained concepts with labels in or through activity analysis on the basis of digital traces. English or no label at all. When trying to semantically In 2011, presentations at IC confirmed the importance of index text documents in languages other than English, building ontological reference models and their complemen- several difficulties arose to select a relevant ontology: the tarities, as knowledge sources, with human expertise, text, lack of resources in the same language as the document, and social organisation. New issues must be dealt with as the complexity of understanding the cultural background large ontologies are now used in dynamic contexts: ontology of an ontology and the need to check whether the points of modularity and ontology evolution. An important applica- view are the same in the ontology and the documents. tion domain that emerged recently is the management of Research on ontology localization attempted to define rich business rules, their maintenance, validation and use to representations for multilingual or cross-cultural ontolo- generate documents. Domain ontologies integrate formal gies. The question here covers a general issue that has to be representations of such rules as constraints on domain dealt with if knowledge models are to be widely spread: concepts. The web has by now gathered a large variety of How can one ensure the correct interpretation and local knowledge structures, of different nature and form. More adaptation of an ontology? What are similar ontologies? than a third of the papers explored the use of ontologies for How can we appreciate the adequacy of an ontology to applications as diverse as management of geographic knowl- index a document? edge, map legend design and temporal annotations. In 2009, IC took place in Tunisia, outside France for the first time. The conference had a special session on the topic ‘‘Knowledge and online communities’’ and we can find in 8. Research challenges for semantic applications the programme issues of content, construction, life cycle This year at IC 2012 web 2.0 and social web had a whole 9http://kmi-web05.open.ac.uk/WatsonWUI/ session, while the semantic web had two sessions in 10http://swoogle.umbc.edu/ 11http://sindice.com/ 12http://www.neon-project.org/ addition to ontology design, semantic annotation, ontolo- Boose J., 1984, Personal construct theory and the transfer of human gies, e-learning, terminology, and data mining. Invited expertise. In: Proceedings of the 6th European Conference on speakers addressed two topics: conceptualizations in the Artificial Intelligence ECAI’84, Advances in Artificial Intelligence. T. O’Shea Ed.: 51–60. domain of law, rights and legal matters; and the analysis of Born, G. 1990. KADS: a methodology for developing large AI systems, complex networks to explore interaction networks using CompEuro ‘90. In: Proceedings of the 1990 IEEE International Con- their community structure. The questions related to ontol- ference on Computer Systems and Software Engineering, pp.166–171, ogy building and use had a strong continuity with the early http://dx.doi.org/10.1109/CMPEUR.1990.113622. KA issues: Which kind of knowledge can be modelled? Chandrasekaran, B., 1986. Generic tasks in knowledge-based reasoning: high-level building blocks for expert system design. IEEE Expert How important is the distortion between the computer Autumn 86, 23–30. interpretation of an ontology and the human interpreta- Clancey, W.J., 1997. Situated Cognition: on Human Knowledge and tion of the associated knowledge? What kind of intelligent Computer Representations. Cambridge University Press. behaviour and tasks can formal representations be Corby O., Dieng R. and Faron-Zucker C., 2004 Querying the Semantic useful for? Web with Corese Search Engine, In: Proceedings of Prestigious Applications of Intelligent Systems PAIS, ECAI, Valencia. Clearly, the shift from early works comes from the amount Dieng-Kuntz, R., Corby, O., Gandon, F., Giboin, A., Golebiowska, J., of data available as sources: the current hypothesis is that Matta, N., Ribiere, M., 2000. Knowledge Management: Me´ thodes et quantity and redundancy of basic formal information (like Outils Pour la Gestion des Connaissances, InfoPro. Management des RDFs triples in the web of data) will balance the lack systemes d’information, Dunod. of precision and complexity. The initial problem of KA of Gaines, B.R., Shaw, M.L.G., 1980. New directions in the analysis and interactive elicitation of personal construct systems. International having too few sources has shifted towards the problem of Journal Man-Machine Studies 13, 81–116. too many sources, too big, too diverse, too dynamic, too Gaines, B.R. Knowledge acquisition: past, present, and future. Interna- distributed, etc. New challenges arise in addressing data and tional Journal of Human–Computer Studies, http://dx.doi.org/10. schema heterogeneity and provenance, incoherence, and 1016/j.ijhcs.2012.10.010, this issue. social life cycles. Old challenges are increased tenfold by Gandon F., 2006. Le Web se´ mantique n’est pas antisocial, In: Proceedings Inge´nierie des Connaissances, Semaine de la Connaissance, Nantes, the scale and complexity of the web (acquisition, formalisa- pp. 131–140, 20–30 Juin 2006. tion, evaluation, evolution, etc.). Gangemi, A., Guarino, N., Masolo, C., Oltramari, A., 2003. Sweetening Moreover, as the web becomes ubiquitous, not only do WORDNET with DOLCE. AI Magazine 24 (3), 13–24. we deploy a ubiquitous means of access to knowledge, we Gruber, T. R. 1991. The role of common ontology in achieving sharable, also deploy a ubiquitous means of acquisition through a reusable knowledge bases. In J.A. Allen, R. Fikes, & E. Sandewall (Eds.), Principles of Knowledge Representation and Reasoning: mobile web, a web of sensors, a web of things. And the Proceedings of the Second International Conference, Cambridge, MA, long tail of knowledge resources on the web requires many pp. 601–602, Morgan Kaufmann. of the advances established by knowledge engineering, in Guarino, N., Masolo, C., Vetere, G., 1999. OntoSeek: content-based order to be fully exploited, if we want knowledge and access to the web. IEEE Intelligent Systems 14 (3), 70–80. intelligence to emerge from the overall graphs they weave. Lepine, P., Aussenac-Gilles, N., 1994. Mode´ lisation de la re´ solution de problemes: comparaison expe´ rimentale de KADS et MACAO. Actes In accordance with Wielinga (this issue), we can conclude des 5emes Journe´ es d’Acquisition des Connaissances, Strasbourg, that the results and experience acquired in knowledge mars 1994. engineering in the past 20 years will be important to avoid McDermott, J., 1988. Preliminary steps toward a taxonomy of problem- repeating with the web of data the same errors that were solving methods. in: Marcus, S. (Ed.), Automating knowledge Acqui- made when trying to model human expertise. The current sition for Expert Systems. Kluwer Academic Publishers. Musen M., The knowledge acquisition workshops: a remarkable conver- standardization efforts provide a unique syntactical access gence of ideas, International Journal of Human–Computer Studies, to data but they do not necessarily make them ‘‘under- http://dx.doi.org/10.1016/j.ijhcs.2012.10.011, this issue. standable’’ or semantically manageable by a software Nelson, T.H., 1965. Complex information processing: a file structure for program. In other words, the deployment of the web of the complex, the changing and the indeterminate. In: Proceedings of data is only the first phase of the deployment of a fully- the 1965 20th National Conference of the Association for Computing Machinery, pp. 84–100, New York, NY, USA. ACM Press. fledged semantic web, a fully-fledged web of acquired Newell, A., 1980. The Knowledge Level. Presidential Address, American knowledge. Association for Artificial Intelligence. AAAI’, 80. Stanford University. Norman, D., 1986. A. et. in: Draper, S.W. (Ed.), User Centered System Design. Lawrence Erlbaum Associates Publishers. Reynaud, C., Tort, F., 1994. Connaissances du domaine d’un SBC et References ontologies: discussion. Actes des 5emes Journe´ es d’Acquisition des Connaissances, Strasbourg, mars 1994. Aussenac-Gilles, N., Krivine, J.P., Sallantin, J., 1992. Nume´ ro spe´ cial Reynaud, C., Tort, F., 1996. Use of expertise ontologies in the knowledge acquisition des connaissances. Revue d’Intelligence Artificielle (RIA), engineering process. Journal: IEEE Transactions on Applications and Hermes, Paris 6 (2), 1992. Industry, 106–109. Auer S., Lehmann J. 2007. What have Innsbruck and Leipzig in common? Schreiber, G., Akkermans, H., Anjewierden, A., de Hoog, R., Shadbolt, Extracting Semantics from Wiki Content. In Franconi et al. (eds), N., Van de Velde, W., Wielinga, B., 1999. Knowledge Engineering and Proceedings of European Semantic Web Conference (ESWC’07), LNCS Management, The CommonKADS Methodology. MIT PressMIT 4519, pp. 503–517, Springer. Press ISBN-10: 0-262-19300-0. Berners-Lee T. 1998, Semantic Web Road map, W3C Note, /http://www. Tudorache T., N.F. Noy, S.W. Tu, M.A. Musen. 2008, Supporting w3.org/DesignIssues/Semantic.htmlS. collaborative ontology development in Prote´ ge´ . In: Proceedings of Seventh International Semantic Web Conference, Karlsruhe, Germany, Computer Supported Cooperative Work: The Journal of Collaborative Springer. Computing 5, 133–154. Vogel, C., 1988. Le ge´ nie Cognitif, 180. Masson, Paris pages. Zacklad, M., 1993. Les SBC vus comme des Systemes Interactifs d’Aide a Wielinga, B. Reflections on 25þ years of knowledge acquisition. Inter- la Re´ solution de Probleme. Actes des 4emes Journe´ es d’Acquisition national Journal of Human–Computer Studies, http://dx.doi.org/10. des Connaissances du PRC-IA, Saint-Raphae¨ Šl, 31 Mars-2 Avril 1993. 1016/j.ijhcs.2012.10.007, this issue. Zacklad M., 1999. Introduction aux ontologies se´ miotiques dans le Web Zacklad, M., Rousseaux, F., 1996. Modelling co-operation in the Socio Se´ mantique, in actes de la confe´ rence, In: Jaulent, M.-C. 16emes design of knowledge production systems: the MadeIn’Coop method. journe´es francophones d’Inge´nierie des Connaissances, Grenoble: PUG.