<<

Semantic Web Beyond Science Fiction

Author

Dr. Usha Thakur, ATS Technical Research HCL Technologies, Chennai

Semantic Web: Beyond Science Fiction

© 2009, HCL Technologies Ltd.

August, 2009

Semantic Web: Beyond Science Fiction 2

Contents

Introduction ...... 4

Purpose ...... 4

Do We Need Semantic Web? ...... 5

Challenges and Responses ...... 8

Key Initiatives ...... 8

Do We Have Any Takers? ...... 13

Current Initiatives ...... 14

Semantic Web in Health Sciences/Care and Life Sciences ...... 17

The Point Is…...... 20

Acronyms ...... 21

Semantic Web: Beyond Science Fiction 3

Introduction Initially, almost all new ideas appear esoteric and even crazy to folks who have yet to consider the possibilities of what could happen down the road, and why. Moreover, when some of those ideas happen to be in the area of technology, many would not hesitate in relegating them to the realm of science‐fiction. Although not that new, Semantic Web is creating a lot of excitement among many techies and businesses whose efforts have started to bear fruit. In fact, Semantic Web proponents are predicting that we are likely to see the world they are visualizing, belonging less to science fiction and more to the real world we live in. Yes, Semantic Web is expected to be that pervasive!

It is important to note that Semantic Web is not separate from the WWW we have come to know and embrace. Semantic Web is a stage in the evolution of the . As Mills Davis aptly notes,

The first stage, Web 1.0, was about connecting information and getting on the net. Web 2.0 is about connecting people — putting the “I” in user interface, and the “we” into a web of social participation. The next stage, [W]eb 3.0, is starting now. It is about representing meanings, connecting knowledge, and putting them to work in ways that make our experience of more relevant, useful, and enjoyable. Web 4.0 will come later. It is about connecting intelligences in a ubiquitous web where both people and things can reason and communicate together.1

Purpose The main objective of this paper is to understand what the Semantic Web trend is all about, how it will impact our use of the Web, and why this trend is here to stay. We will also examine some of the industry verticals as well as technology vendors who are already embracing Semantic Web technologies and using them as business differentiators.

1 Mills Davis, "Semantic Wave 2008: Industry Roadmap to Web 3.0 and Multibillion Dollar Market Opportunities," http://project10x.com/about.php [June 2009] ‐> indicates when this website was accessed.

Semantic Web: Beyond Science Fiction 4

Do We Need Semantic Web? The essence of Semantic Web ‐ also referred to as Web 3.0 ‐ has been expressed in many ways. Some refer to it as “a network of data on the Web”2 or simply as “a Web of data.”3 Others view it as a “World Wide … [because it involves] going from a Web of connected documents to a Web of connected data.”4 Irrespective of which definition one prefers, there is a general consensus that Semantic Web structures data through various semantic tools and applications in such a way that it can be read, understood, analysed, and processed by machines within a given context. In short, the aim of Semantic Web is to allow users to phrase their queries in a way they are already used to doing (e.g., “I am wondering what time of the year will be best for wind surfing off the coast of South Africa, and at which locations?”), and have the system return only relevant information. Of course, as we will see later in this paper, there is much more to Semantic Web than just queries.

In general, the key assumptions or drivers that are behind the increasing significance of Semantic Web are as follows:

1. Users can raise contextual queries such as, “what is the best fare from Boston to Vancouver next Saturday and the possibility of an ocean facing hotel room in Vancouver for this Saturday and Sunday priced between $80 and $100 per night?”

2. Search and Find functions will be super fast because only contextually relevant results will be returned. For instance, “There is only one hotel with an ocean facing rooms in Vancouver for this Saturday and Sunday priced at $95 US. Similar rooms are available for $150 US at these hotels…..”

3. Research institutions and the like organizations can protect (laboratory and other) data (contained e.g., in pharma/medical lab notebooks) for reuse and IP creation.5

2 This is how and Tim Berners‐Lee define Semantic Web. See, Nigel Shadbolt and Tim Berners‐Lee, “Web Science: Studying the Internet to Protect Our Future” (October 2008) P. 3, http://www.scientificamerican.com/article.cfm?id=web‐science&page=3 [June 2009].

3 This is the definition adopted by W3C. For further details, see “W3C Semantic Web Frequently Asked Questions,” Question number 1.1, http://www.w3.org/2001/sw/SW‐FAQ [June 2009].

4 This is how Nova Spivack of Radar Network conceptualizes Semantic Web. See John Markoff, "Entrepreneurs See a Web Guided by Common Sense," (November 12, 2006) P.1, http://www.nytimes.com/2006/11/12/business/12web.html?pagewanted=1&_r=3&sq=interview%20with%20tim %20berners%20lee%20on%203.0&st=cse&scp=2 [May 2009].

5 Today the state of the content management system is such that researchers end up wasting a great deal of their time in inefficient activities. According to an IDC study, a typical knowledge worker uses his/her time as follows: 25% in search, 22% in information gathering, 26% in analysis, 8% in content recreation, 9% in unsuccessful search, and 10% in format reconversion. For a good analysis of how and why semantic tools can go a long way in the Semantic Web: Beyond Science Fiction 5

4. Businesses, governments, and institutions can increase employee productivity (and thereby increase profitability); employees can use their time for creating assets instead of searching for relevant information.

5. Users can obtain advisory services by raising queries such as “what is the best car to buy this year and why?”

So what is it about Semantic Web that is likely to make the above a reality on a universal scale? Why is the World Wide Web of today not adequate anymore?

As far back as 2001, Tim Berners‐Lee, who is widely acknowledged as the inventor of the Web, and some of his colleagues visualized a world of Semantic Web where “software agents” would automatically perform sophisticated tasks for users such as:

ƒ Going to the Web page of a particular clinic.

ƒ Understanding and interpreting the difference among words like ‘treatment, medicine, physical therapy’ and knowing that a particular doctor works at his clinic on Mondays, Wednesdays, and Fridays.

ƒ Combing through the doctor’s calendar and that of users and negotiating possible times for an appointment.

ƒ Rescheduling other meetings on behalf of users, in the event of a conflict.

This is a very simple scenario; the proponents of Semantic Web are promising that in the near future Semantic Web technologies will understand (and analyse) complex user requests and process them automatically with relative ease,6 and even play the role of an (intelligent) advisor e.g., financial advisor.7 One of the main reasons why the current Web is inadequate for performing the aforementioned tasks is that it is set up for searching and returning URL of pages containing the requested information rather than for returning only the relevant data in those pages. Consequently, management of unstructured information, see Fabio Rizzotto, “Quality and Value in the Management of Unstructured Information: Tools Based on Semantic Analysis,” (Sponsored by Expert System: September 2006) http://www.expertsystem.net/page.asp?id=1521 [June 2009].

6 For one such example, see the introduction in Tim Berners‐Lee, et al., “The Semantic Web: A New Form of Web Content that is Meaningful to Computers will Unleash a Revolution of New Possibilities,” (May 17, 2001) http://www.scientificamerican.com/article.cfm?id=the‐semantic‐web [June 2009].

7 John Markoff feels that "in the future, more powerful systems could act as personal advisers in areas as diverse as financial planning, with an intelligent system mapping out a retirement plan for a couple, for instance, or educational consulting, with the Web helping a high school student identify the right college." John Markoff, Op cit., P. 1. Although, today the scenario visualized by Markoff may seem far‐fetched, but down the road it may very well be a part of our reality as the vision of Semantic Web matures.

Semantic Web: Beyond Science Fiction 6

users end of spending a great deal of their time sifting through Web pages for relevant information. To drive home this point, let us take an example in the areas of Search and Find.

In a recent presentation by Lee Feigenbaum to the PRISM Forum SIG on Semantic Web, Feigenbaum revealed very interesting results. The W3C Health Care and Life Sciences (HCLS) interest group wanted to know the "genes involved in signal transduction that are related to pyramidal neurons." When the group searched for "pyramidal neurons signal transduction" on Google, it returned 223,000 hits with 0 relevant results. When it searched for "signal transduction pyramidal neurons" on the PubMed website, it returned 2,580 potential results. When the group searched for the same information in specific , it encountered the problem of silo applications. The interest group then proceeded to use Semantic Web technologies to receive precise answers to a complex request: Find me genes involved in signal transduction that are related to pyramidal neurons. This one query searched several databases and returned the precise results.8

Understanding the limitations of the existing Web 1.0 and Web 2.0 is one thing and embracing an alternative path is quite another. There are indeed several challenges in moving from the current WWW to Web 3.0 or Semantic Web. The speed with which those challenges can be overcome depends upon the speed with which Semantic Web is embraced by everyone.

8 For details on the steps involved in obtaining precise results, see Lee Feigenbaum, VP Technology & Standards, Cambridge and Co‐chair, W3C SPARQL Working Group, “The 2009 Semantic Web Landscape: Technologies, Tools, and Projects," (For PRISM Forum SIG on Semantic Web, May 12, 2009) Slides 10‐13, http://www.slideshare.net/LeeFeigenbaum/semantic‐web‐landscape‐2009 [June 2009]. According to Shadbolt and Berners‐Lee, "Today searching Google for ‘Toyota used cars for sale in western Massachusetts under $8,000’ returns more than 2,000 general Web pages. Once Semantic Web capabilities are added, a person will instead receive detailed information on seven or eight specific cars, including their price, color, mileage, condition and owner, and how to buy them." See Nigel Shadbolt and Tim Berners‐Lee, Op cit., P.3.

Semantic Web: Beyond Science Fiction 7

Challenges and Responses

As rightly pointed out by W3C, today, “[t]here is a lot of data we all use every day, and it's not part of the Web. For example, I can see my bank statements on the [W]eb, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar? Why not? Because we don't have a web of data. Because data is controlled by applications, and each application keeps it to itself.”9 Consequently, even if we buy‐in to the promises of Semantic Web and we have all the technologies in place, the vision of Semantic Web cannot be realised so long as data is kept locked up within the confines of independent applications. Feigenbaum rightly reminds us that in order for Semantic Web to work, there needs to be:

9 Agreement on common terms and relationships

9 Incremental, flexible data structure

9 Good‐enough modelling

9 Query interface tailored to the data model10

The other challenges are of a ‘how to’ nature. For instance, how to “provide a language that expresses both data and rules for reasoning about data and that allows rules from any existing knowledge‐ representation system to be exported onto the Web”?11 What is the logic that should be used for tagging and merging data? How should the relationships among terms be defined? How can the authenticity of software agents be determined? How will security of data/transactions be ensured? What are the Standards that Semantic Web technologies need to adhere to? These are just a few of the considerations that need to be taken into account when embracing the Semantic Web approach. Key Initiatives

Specifications and Recommendations For the last 10 years W3C has been quite busy working on the Specifications/Recommendations of the technologies that are required for fulfilling the vision of Semantic Web (see Figure 1).12 Note that the technology stack emphasizes the importance of the foundational layers and the long‐term

9 “W3C Semantic Web Frequently Asked Questions,” Question number 1.1, http://www.w3.org/2001/sw/SW‐FAQ [June 2009].

10 Lee Feigenbaum, Op cit., Slide 14.

11 Tim Berners‐Lee, et al., Op cit., P. 2.

12 For the latest developments and Recommendations, see Ivan Herman, "What is New in W3C Land?" (June 2009) http://www.w3.org/2009/Talks/0615‐SanJose‐talk‐IH/ [June 2009].

Semantic Web: Beyond Science Fiction 8

roadmap/vision of what is needed for fulfilling the vision of Semantic Web, but it does not imply any order to the research.13

Figure 1: Semantic Web Technology Stack14

When modeling a domain, Ontology provides a shared vocabulary in terms of the type of objects and/or concepts that (OWL): Data exist, and their properties and relations with additional meaning, which allows more people (and more machines) to For describing a domain well enough to do more with the data capture (some of) the meaning of resources and relationships in that domain

SPARQL Protocol And RDF Query Rules Interchange Format (RIF): Language (SPARQL): Languages for Standard representation for exchanging querying data sets of logical and business rules

Resources Description Framework RDF Schema (RDFS): Defines terms, (RDF): A schema‐less data model that types, and hierarchies for describing features unambiguous identifiers and named relations between pairs of Uniform Resource Identifiers (URI) / resources Internationalized Resource Identifier (IRI): Foundational layer

Other Semantic Web Technologies Gleaning Resource Descriptions from Dialects of Language (GRDDL): for getting RDF data from XML and XHTML documents

Relational Database to RDF (RDB2RDF): for defining a standard way to map from relational databases to RDF (and SPARQL) : for publishing RDF data on the Web (over HTTP) RDF in Attributes (RDFa): for allowing RDF to be embedded directly in Web pages Protocol for Web Description Resources (POWDER): for describing groups of online resources

As illustrated in Figure 2, a number of key Semantic Web technologies are being developed to enable people to create data stores on the Web,15 build vocabularies, and to write rules for handling data. The W3 Consortium is working with different industries for promoting the use of Semantic Web technologies, and several projects are under way in Health Care and Life Sciences, eGovernment, and Energy, and other areas for increasing collaboration, improving research and development, and for

13 For an interesting take on this, see Valentin Zacharias, “Ban the Semantic Web Layer Cake!” http://www.valentinzacharias.de/blog/2007/04/ban‐semantic‐web‐layer‐cake.html [June 2009].

14 Source: http://www.w3.org/2007/03/layerCake‐small.png [June 2009]. See also Lee Feigenbaum, Op cit. Slides 17‐21.

15 For an interesting article on this in the context of Medicine, see Lee Feigenbaum, et al., "Boca: An Open‐Source RDF Store for Building Semantic Web Applications," http://bib.oxfordjournals.org/cgi/content/full/bbm017v1 [July 2009]. Semantic Web: Beyond Science Fiction 9

enhancing search. For instance, Semantic Web technologies are bridging many forms of biological and medical information across institutions, and thereby by aiding decision‐making in clinical research.16

Figure 2: Semantic Web Technology Timeline17

Research Initiatives

Besides the initiatives at W3C, a number of Semantic Web collaborative research initiatives and projects are being undertaken around the world by private companies, government institutions and universities. Take, for instance, the joint initiative between Microsoft and Science Commons with the aim of enabling researchers to easily adding scientific hyperlinks as semantic annotations, drawn from ontologies, to their documents and research papers,18 the various semantic projects at the Digital Enterprise Research Institute (DERI) in Ireland,19 Turing Center at the University of Washington,20 Kno.e.sis Center at the

16 In order to have an idea about the adoption of Semantic Web technologies across industry verticals, go to http://www.w3.org/2001/sw/sweo/public/UseCases/ [June 2009].

17 Source: Lee Feigenbaum, Op cit., Slide 21.

18 For further details, go to http://sciencecommons.org/resources/funder/dispatches/first‐quarter‐2009/ [June 2009].

19 For a list of projects, go to http://www.deri.ie/research/projects/ [June 2009].

20 The Turing Center encourages interdisciplinary research. For details, go to http://turing.cs.washington.edu/ [June 2009].

Semantic Web: Beyond Science Fiction 10

Wright State University in Ohio21 the Semantic Interoperability of and Information in unLike Environments (SIMILE) project at MIT,22 the Neurocommons Project at Science Commons,23 the Semantic initiatives at the University of Texas M. D. Anderson Cancer Center, 24 and the CALO Project, which is aimed at developing a new software that could revolutionize how computers support decision‐ makers.25 Also worth mentioning is the Linked Open Data Project, which started out as a community project within the W3C Semantic Web Education & Outreach group in 2007, but today has expanded many folds. It consists of a wealth of existing, open Web‐based data sets exposed in RDF and linked together with a growing number of publicly available SPARQL endpoints. Dbpedia is probably the most important 'hub' in the project.26 This list is by no means complete but is representative of the kind of research taking place around the globe.

Furthermore, virtually every major university in North American and Europe is offering Web Sciences programmes and courses. In 2006 the University of Southampton in UK and MIT announced a multidisciplinary collaborative research initiative to study the Web. This collaboration brings together academics, scientists, sociologists, entrepreneurs, and decision makers from around the world to look at

21 The research at this Center is concentrated around the use of semantic and services science for data integration, analysis, and process management. For further details, go to http://knoesis.org/ [June 2009].

22 The main aim of the SIMILE project at MIT is to enhance interoperability among digital assets, schemata/vocabularies/ontology, metadata, and services. For details, go to http://simile.mit.edu/ [June 2009].

23 The Nerocommons projects focuses on three areas: data integration, text mining, and analytical tools. For more information, go to http://sciencecommons.org/projects/data/nc_technical_overview [June 2009]. The overall focus of Science Commons is to accelerate the research cycle — the continuous production and reuse of knowledge that is at the heart of the scientific method.

24 An interview given to PriceWaterHouseCoopers by Lynn Vogel, VP and CIO of the University of Texas M. D. Anderson Cancer Center on the potential benefits of and challenges in adopting Semantic Web technologies to bridge the gap between clinical and research data. For details of that interview, see Alan Morrison, et al., "How the Semantic Web Might Improve Cancer Treatment," http://www.pwc.com/extweb/pwcpublications.nsf/docid/C2597E0D5F39788B852575BA006354AD [June 2009]. On the subject of Semantic Web in the biopharma and biomedicine world, see Huajun Chen, et al., "Introduction to Semantic e‐Science in Biomedicine”; Alan Ruttenberg, "Advancing Translational Research with the Semantic Web," (May 9, 2007) http://www.biomedcentral.com/1471‐2105/8/S3/S1 [June 2009].

25 The Defense Advanced Research Projects Agency (DARPA) has awarded SRI International three years of a five‐ year contract to develop an enduring Personalized Assistant that Learns (PAL). The program responds to DARPA’s New Cognitive Systems Vision. The project is bringing together leading computer scientists and researchers in , machine learning, natural language processing, knowledge representation, human‐computer interaction, flexible planning, and behavioral studies. For further details, go to http://caloproject.sri.com/ [June 2009].

26 For details, go to http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/ [June 2009].

Semantic Web: Beyond Science Fiction 11

the Web holistically.27 Emphasizing the significance of the 'multidisciplinary' aspect of the Web, Tim Berners‐Lee said, “[T]he Web was built upon principles of universality. So any person on any device should be able to make use of any kind of data and access any kind of information. Our stated goal is to lead the Web to its full potential – that’s not a place but a direction, and our work is focused on trying to find the right direction.”28 Today, several countries in Europe are part of the European Academy for Semantic‐Web Education (EASE).29

In spite of all the above‐mentioned developments, it is widely acknowledged that at the present time the skills required for implementing the Semantic Web vision are fairly scarce.30 Software engineers who implement the Semantic Web vision need to, at the least,

9 Have an interdisciplinary awareness in order to understand the nuances of the context of terminologies.

9 Be well versed in Ontology Modeling and in order to accurately express the meaning of words and the contextual relationship among them.

9 Possess working knowledge of

o Semantic Web technologies such as RDF, SPARQL, OWL, SKOS,31and others over and above mentioned in Figure 1

o Natural Language Processing (NLP)

o Semantic Web data modeling

o Semantic Web architecture

9 Be knowledgeable about specific domains

While most companies and organizations are likely to wait and see how the Semantic Web trend shapes up in the near future, some have already started building Semantic Web capabilities, and are in fact becoming trend setters in their respective domains.

27 See "Southampton and MIT launch Web Science Collaboration," http://www.ecs.soton.ac.uk/about/news/1047 [June 2009].

28 Adwoa Gyimah‐Brempong, "W3C: Maintaining a Web of Humanity," (June 19, 2009) http://www.csail.mit.edu/csailspotlights/feature6 [June 2009].

29 For more details, go to http://www.semantic‐web‐academy.eu/home.html [June 2009].

30 See, for instance, Jennifer Zaino, "Sharpen Your Skills for the Semantic Web," (September 11, 2007) http://www.intranetjournal.com/articles/200711/ij_11_09_07a.html [June 2009].

31 Simple Knowledge Organization System (SKOS) is similar to OWL but used mainly for implementations that do not require the sophisticated features of OWL.

Semantic Web: Beyond Science Fiction 12

Do We Have Any Takers? The first to get excited about Semantic Web were the intellectuals at universities and private institutes, and W3C members. Of late, many start ups and big corporations who understand the value of Semantic Web are looking at developing Semantic Web technologies and applications in a serious way. Since the development and deployment of Semantic Web vision is just under way, it is not surprising that almost all the research pertaining to the potential market for Semantic Web are based on the pace of adoption of Web technologies. In a recent research undertaken by Mills Davis, the potential market for Semantic Web technologies (based on current developments) is expected to be in billions in about a decade or so, and later in trillions. His prediction is that public and private R&D on Semantic Web technologies will exceed $8 billion US by 2010 and that the market for products as well as services developed using Semantic Web technologies will reach $52.4 billion US by 2010.32

While it may not be easy to predict the potential market size and revenue with accuracy, given that Semantic Web is in its infancy, there is no doubt that the use of Semantic Web technologies by private and public organization has been on the rise across business domains, including Health Sciences, Life Sciences, Education, Tourism, Oil, Gas and Energy, Telecom, IT, Finance, Automobiles, to mention only a few. Current implementations of Semantic Web technologies to address specific problems bear ample testimony to the fact that irrespective of domain, geography, and organization, Semantic Web technologies not only improve search results but also increase the extent to which data can be shared and re‐used.33 Since the business benefits are real, there is a growing list of vendors, who are using Semantic Web technologies to create offering in a number areas (see Table 1).

Until a couple of years ago the general impression was that making Semantic Web into a reality would be very complex because the then widespread assumption was that it would be necessary to use all the (ontology, reasoning, tagging, and other) tools. However, as Ivan Herman, rightly reminds us, that is not so. The increasing use of Resource Description Framework (RDF) by numerous applications has changed people’s perception about implementing Semantic Web.34 Over the last couple of years, a few complementary technologies have emerged to take us closer to the vision of Semantic Web, thanks to the initiatives of W3C.35

32 Mills Davis, Op cit. See also “Market Potential,” http://www.softwaremind.pl/en/Market_Potential [June 2009].

33 In this regard, see the Case Studies and Use Cases that are listed at http://www.w3.org/2001/sw/sweo/public/UseCases/ [June 2009].

34 See a transcript of the interview Herman gave to Journal du net “Le Web sémantique ne tient plus de la science‐ fiction,” (June 9, 2006) http://www.journaldunet.com/developpeur/itws/060608‐itw‐w3c‐herman.shtml [June 2009].

35 For further details, see “W3C Semantic Web Activity Publications: Specifications (Recommendations and Notes),” http://www.w3.org/2001/sw/Specs.html [June 2009].

Semantic Web: Beyond Science Fiction 13

Current Initiatives Besides the initiatives of W3C, there are numerous Semantic Web commercial projects already underway. As shown in Table 1, vendors who see a serious future for Semantic Web have already started adding Semantic Web offerings in a number of areas. The list in this table is by no means exhaustive but representative of the developments under way in the vendor circle.

Table 1: Semantic Technology Vendors36

Services

Company Solution Middleware NLP Database Platform Ontology Search Consumer Web Services Developer Web Aduna37 9 9 9

The Calais Initiative38 9 9

Cambridge Semantics39 9 9

Dow Jones Client 9 9 9 Solutions40

Expert System41 9

36 Source: David Provost, “On the Cusp: A Global Review of the Semantic Web Industry,” (September 30, 2008) P. 7, http://davidjprovost.typepad.com/my_weblog/2008/09/index.html [June 2009].

37 Sesame is Aduna’s open source Semantic database, which is regarded as the cornerstone of Semantic Web. For further details on Sesame as well as tools, libraries, and frameworks for Semantic Web solutions, go to http://www.aduna‐software.com/home/overview.view [June 2009].

38 The Calais Initiative is sponsored by Thompson Reuters and it comprises several tools, the NLP engine being the main one, for extracting content from unstructured text and then processing it a meaningful manner. For further details, go to http://www.opencalais.com/about [June 2009].

39 Anzo for Excel is a versatile collaborative tool from Cambridge Semantics for building and editing ontology, collecting information as well as for linking to other (in fact any) spreadsheets on a network or on the Web. Since it allows users to enter and/or edit data at source level, all changes and updates are available almost immediately. For further information on Anzo for Excel, go to http://www.cambridgesemantics.com/home [June 2009].

40 Synaptica is a vocabulary management solution from Dow Jones Client Solution. It allows organizations to build as well as manage enterprise taxonomies, thesauri, name catalogs and other authority files, and to add structure and value to existing information. For more details, go to http://www.synaptica.com/djcs/synaptica/ [June 2009].

Semantic Web: Beyond Science Fiction 14

Services

Company Solution Middleware NLP Database Platform Ontology Search Consumer Web Services Developer Web Franz42 9

Mondeca43 9

Ontoprise44 9 9 9 9 9 9 9

Ontos45 9 9 9

OpenLink Software46 9 9 9

Primal Fusion47 9

41 Cognito from Expert System S.p.A supports a range of NLP activities such as search, content categorization, and analysis and extraction from unstructured information and texts in an intelligent manner. For more details, go to http://www.expertsystem.net/page.asp?id=1521 and http://www.expertsystem.net/page.asp?id=1515&idd=200 [June 2009].

42 AllegroGraph is a database and application framework from Franz for building Semantic Web applications. It can store data and meta‐data as triples (subject, predicate, and object); query these triples through various query APIs such as SPARQL and Prolog; and apply RDFS++ reasoning with its built‐in reasoner. For further information, go to http://www.franz.com/agraph/allegrograph/ [June 2009].

43 ITM T3 is a platform from Mondeca. It supports a range of integrated “functions for managing enterprise shareable multilingual domain‐specific taxonomies, thesaurus, and terminologies.” ITM T3 is already being used by a number of large international groups for managing reference material. For further details, go to http://www.mondeca.com/index.php/en/applications/itm_t3 [July 2009].

44 Ontoprise has a range of Semantic products and solutions that are fairly well advanced and have a track record of success with major customers. For complete details, go to http://www.ontoprise.de/en/home/ [June 2009].

45 Ontos has two Semantic products: i) OntosMiner – analyses documents on the basis of linguistic rules; LightOntos for Workgroups ‐ manages documents including their annotations, and contains several possibilities for analysis and investigation. For more information on both products as well as on the company's upcoming Ontos API, go to http://www.ontos.com [June 2009].

46 Virtuoso is a key Semantic product from OpenLink Software. It enables end users, systems architects, systems integrators, and developers to interact with data at a conceptual as opposed to the traditional logical level. For more details on how that is done, go to http://www.openlinksw.com [June 2009].

47 Primal Fusion's has an application that will allow it to provide what it calls a "thought networking" service in the very near future. What does that mean? In Primal Fusion’s own words, “[it] helps you take control of the Internet. Semantic Web: Beyond Science Fiction 15

Services

Company Solution Middleware NLP Database Platform Ontology Search Consumer Web Services Developer Web Saltlux48 9 9 9

Sindice49 9

Thetus50 9 9

TopQuadrant51 9 9 9

Make your thoughts concrete so computers can understand what you're thinking about…Primal Fusion helps you quickly pull together information about broad subjects from several popular sources on the Web. It does this by first summarizing related thoughts about a subject of interest, then encouraging you to make selections that describe how you think about the subject. Armed with this knowledge, Primal Fusion can act on your thoughts to help you get stuff done online.” In the Semantic world, Primal Fusion’s ‘thought’ can be deployed as a query. For further details, go to http://www.primalfusion.com/ [June 2009].

48 Insider Information (IN2) is a platform from Saltlux that is at the heart of its capabilities in NLP, ontology, and . For more details, go to http://www.saltlux.com/en/ and http://semanticwiki‐ en.saltlux.com/index.php/IN2%20Platform%20Overview [June 2009].

49 Sindice is a research project of the Digital Enterprise Research Institute (DERI) in Galway, Ireland, which is considered to be the world’s largest institute for Semantic Web research. Sindice offers a number of APIs for locating and accessing data published on the Web. For more information, go to http://sindice.com/ [June 2009].

50 Thetus Publisher from Thetus leverages Semantic Web technologies for creating, managing and evolving semantic knowledge models. A broad range of government and industry sectors depend on Thetus infrastructure software to automate information discovery, perform robust data extraction and produce contextual and relevant knowledge. For further details, go to http://www.thetus.com/index.html [June 2009].

51 TopBraid Suite from TopQuadrant leverages Semantic Web technology to help customers connect silos of data, systems and infrastructure and build flexible applications from linked data models. All components of the suite work within an open architecture platform built specifically to implement W3C standards for integration and combination of data drawn from diverse sources. For more details, go to http://www.topquadrant.com/products/TB_Suite.html [June 2009].

Semantic Web: Beyond Science Fiction 16

Services

Company Solution Middleware NLP Database Platform Ontology Search Consumer Web Services Developer Web Twine/Radar Networks52 9

Yahoo/SearchMonkey53 9 9

Semantic Web in Health Sciences/Care and Life Sciences What is very interesting is the fact that for the longest time the use of IT technologies and solutions by researchers in Health Sciences/Care and Life Sciences (henceforth Health and Life Sciences) was very minimal, largely because researchers were expected to adapt to IT ways of addressing problems rather than the other way around. With Semantic Web, however, there is less resistance; in fact, most of the Health and Life Sciences domains are embracing Semantic Web technologies, and doing so in a great hurry primarily because they are addressing a major challenge in those domains. If before, scientists, researchers and other professionals in the Health and Life Sciences domains felt that embracing IT technologies and solutions was just not worth their time and effort, now they see in Semantic Web technologies a solution to one of their greatest pain points, that of distributed database and formats.

In light of the above, W3C established the Semantic Web for Health Care and Life Sciences (HCLSIG) Interest Group to help organizations in their adoption of the Semantic Web.54 On the basis of the Semantic Web Specifications from W3C, many Health and Life Sciences organizations have developed a number of applications and tools. Take, for instance, the prototype infrastructure for integrated data

52 Twine is a Web application from Radar Networks and has been built using Semantic Web technologies. Its target audience is people who wish to keep track of their interests. One can use Twine to collect any online content, collate it according to topics, and then share that information with anyone. Twine also has the ability to automatically learn about one’s interests and makes connections and recommendations accordingly. For more information, go to http://www.twine.com/about [June 2009].

53 SearchMonkey from Yahoo is a Semantic tool for developers and site owners who can use structured data to make Yahoo! Search results more useful and visually appealing. For further information, go to http://developer.yahoo.com/searchmonkey/# [June 2009]. Also visit http://gallery.search.yahoo.com/ to see a list of Yahoo customers already using SearchMonkey for enhanced search results.

54 For full details on W3C’s activities in the Health and Life Sciences domains, go to http://esw.w3.org/topic/HCLS/WWW2008, http://www.w3.org/2001/sw/hcls/, and http://www.w3.org/TR/hcls‐ kb/ [June 2009].

Semantic Web: Beyond Science Fiction 17

management and analysis, developed jointly by a number of scientists,55 GoPubMed from Transinsight GmbH,56 LEAD™by scientists at the Winship Cancer Institute, School of Medicine at Emory University in Atlanta (USA),57 TCM Search for integrating traditional Chinese medicine data from a wide variety of sources and for improving search at the China Academy of Chinese Medicine Sciences,58 just to name a few.

Major pharmaceutical company are also using Semantic Web technologies to develop applications and tools to address key challenges they encounter during drug discovery. In the words of Susie Stephens of Eli Lilly:

Pharmaceutical researchers need to be able to have access to an integrated view of all of their data in order to be able to make effective decisions as to which drug targets and compounds to pursue. Companies want to minimize costly late‐stage attrition by identifying and eliminating drugs that do not have desirable safety profiles or sufficient efficacy as early on as possible. The need for effective data integration has become stronger as the cost of drug discovery and development has soared to over $1 billion… Data commonly originates in different departments in which varying terminologies are used. Further, the data itself is very heterogeneous in nature, and consists of data types that include electronic patient records, chemical structures, biological sequences, images, biological pathways, and scientific papers. Many companies have attempted to create data warehouses that contain all of this data, but many have found this approach lacking the flexibility required within a scientific discipline. Consequently, many

55 Currently being tested by the University of Texas Lung Cancer SPORE. For details, see Helena F. Deus, et al., "A Semantic Web Management Model for Integrative Biomedical Informatics," http://www.plosone.org/article/info:doi/10.1371/journal.pone.0002946, [June 2009]. See also, http://www.s3db.org/ [June 2009].

56 GoPubMed is a search engine for querying the database at the National Library of Medicine, USA and for visualizing the results according to terms from the Gene Ontology and Medical Subject Headings. To use the search engine go to http://gopubmed.org/ [June 2009].

57 Lymphoma Enterprise Architecture Data‐system™ (LEAD™) integrates the pathology, pharmacy, laboratory, cancer registry, clinical trials, and clinical data from institutional databases. For a detailed report, see Taoying Huang, et al., "Development of the Lymphoma Enterprise Architecture Database: A caBIG(tm) Silver Level Compliant System," (April 3, 2009) http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2675136 [June 2009].

58 For further details, see Zhaohui Wu, et al., "Case Study: Semantic‐based Search and Query System for the Traditional Chinese Medicine Community," (May 2007) http://www.w3.org/2001/sw/sweo/public/UseCases/UniZheijang/ [June 2009].

Semantic Web: Beyond Science Fiction 18

pharmaceutical companies are exploring the use of the Semantic Web for data integration.59

It is, therefore, not surprising that many biopharma firms are using (Semantic Web technologies‐) based applications and tools in drug discovery and related activities. In this regard, it is worth mentioning the development of Ultralink for text‐mining and at Novartis,60 PoC for data integration from multiple sources at Merck,61 full range of data integration products and services at Semantic Laboratories,62 Drug Ontology Project for Elsevier (DOPE) for data integration and improving search as well as content discovery,63 SemanticDB for integrating clinical research data at Cleveland Clinic,64 the Drug Target Assessment Tool for evaluating and prioritizing drug targets at Eli Lilly.65 This is just a sample of what is taking place out there, which is much bigger and lies outside the scope of this paper.

59 Susie Stephens, "Case Study: Prioritization of Biological Targets for Drug Discovery," (October 2007) http://www.w3.org/2001/sw/sweo/public/UseCases/Lilly/ [June 2009].

60 Ultralink is a powerful text‐mining tool and knowledge management platform. It provides a framework for data / knowledge integration that enables researchers to derive high quality information from texts and data, as well as retrieve, link, synthesize, analyse, infer and interpret Life Sciences data. For further details, see Therese Vachon, "Bridging Knowledge Gaps in Drug Discovery with Semantic Technologies," Presentation at the Conference on Semantics in Healthcare and Life Sciences (Cambridge/Boston MA, February 26 2009) http://www.iscb.org/cms_addon/conferences/cshals2009/presentations/VachonTFeb09BridgingKnowledgeGaps.p df [June 2009]. See also, Manuel C. Peitsch, "How and Why Novartis is Exploiting GRID Technology,” Slides 24‐37, http://www.hpts.ws/papers/2005/HPTS_Presentations/Day%202/GRID%20Computing%20in%20Drug%20Discover y%20‐%20HPTS%20‐%20%20Sept%2027,%202005.ppt [June 2009].

61 For details see Jaime Melendez, "Exploring Semantic Web Technologies within Research and Development," Presentation at the Conference on Semantics in Healthcare and Life Sciences (Cambridge/Boston MA, February 26 2009) http://www.iscb.org/cms_addon/conferences/cshals2009/presentations/MelendezJFeb09.pdf [June 2009].

62 This integration allows drug discovery research teams to access and integrate diverse data resources and retrieve data to perform broad domain‐specific meta‐analysis using experiment data stored in corporate databases and available from public domain repositories. For more information, go to http://www.semanticlaboratories.com/Default.htm [June 2009].

63 For complete details, see Anita de Waard, et al., "Use Case: Drug Ontology Project for Elsevier (DOPE)," (July 2007) http://www.w3.org/2001/sw/sweo/public/UseCases/Elsevier/ [June 2009].

64 Chimezie Ogbuji, et al., "Semantic Web Content Repository for Clinical Research," (October 2007) http://www.w3.org/2001/sw/sweo/public/UseCases/ClevelandClinic/ [June 2009].

65 For further details, see Susie Stephens, Op cit.

Semantic Web: Beyond Science Fiction 19

The Point Is…. Today Semantic Web has many fans not just in fantasy land but also in well‐known research organization, companies, and Government circles, thanks to the standardization effort of W3C and to the availability of a wide range of commercial and open‐source interoperable tools and systems that have been developed using Semantic Web technologies.66 Moreover, many “Enterprise Semantic Web projects are beginning to move beyond proofs of concept to serious production implementations, and community projects on the World Wide Web have linked hundreds of public data sets into an emergent Semantic Web.”

Researcher and businesses have been long haunted with problems arising from multiple sources of information as well as formats and the inability to quickly query data across multiple databases. With Semantic Web technologies, there are ways of addressing those problems successfully and increasing collaboration.

Unlike established technologies, which have a presence in the market and a competitive landscape of buyers and suppliers, Semantic Web technologies today are nowhere near that point. So depending upon how an organization sees the glass (half‐full or half‐empty), it may or may not take advantage of Semantic Web technologies. In a highly competitive world, such as the one we live in today, technologies that can be used for building tools and applications, which in turn can act as business differentiator, are worthy of our attention; Semantic Web technologies fall in that category.

The world of Semantic Web will come to fruition; all the available evidence is pointing in that direction. What remains to be seen is how fast that will happen.

66 For a list of Semantic Web tools available today, go to http://esw.w3.org/topic/SemanticWebTools [July 2009].

Semantic Web: Beyond Science Fiction 20

Acronyms TERMINOLOGY DESCRIPTION

EASE European Academy for Semantic‐Web Education

HCLS Health Care and Life Sciences

NLP Natural Language Processing

RDF Resource Description Framework

RDFa RDF Attributes

RDFS RDF Schema

RDF2RDB RDF to Relational Data Base

OWL Web Ontology Language

GRDDL Gleaning Resource Descriptions from Dialects of Language

SPARQL SPARQL Protocol and RDF Query Language

RIF

POWDER Protocol for Web Description Resources

URI Uniform Resource Identifiers

IRI Internationalized Resource Identifier

SKOS Simple Knowledge Organization System

Semantic Web: Beyond Science Fiction 21