SPARQL Assist Language-Neutral Query Composer
Total Page:16
File Type:pdf, Size:1020Kb
McCarthy et al. BMC Bioinformatics 2012, 13(Suppl 1):S2 http://www.biomedcentral.com/1471-2105/13/S1/S2 RESEARCH Open Access SPARQL Assist language-neutral query composer Luke McCarthy, Ben Vandervalk, Mark Wilkinson* From Semantic Web Applications and Tools for Life Sciences (SWAT4LS) 2010 Berlin, Germany. 10 December 2010 Abstract Background: SPARQL query composition is difficult for the lay-person, and even the experienced bioinformatician in cases where the data model is unfamiliar. Moreover, established best-practices and internationalization concerns dictate that the identifiers for ontological terms should be opaque rather than human-readable, which further complicates the task of synthesizing queries manually. Results: We present SPARQL Assist: a Web application that addresses these issues by providing context-sensitive type-ahead completion during SPARQL query construction. Ontological terms are suggested using their multi- lingual labels and descriptions, leveraging existing support for internationalization and language-neutrality. Moreover, the system utilizes the semantics embedded in ontologies, and within the query itself, to help prioritize the most likely suggestions. Conclusions: To ensure success, the Semantic Web must be easily available to all users, regardless of locale, training, or preferred language. By enhancing support for internationalization, and moreover by simplifying the manual construction of SPARQL queries through the use of controlled-natural-language interfaces, we believe we have made some early steps towards simplifying access to Semantic Web resources. Background SIO_010302 is considerably harder to glean without The health care and life science sectors have been some looking up its ontological definition. This becomes par- of the most enthusiastic adopters of Semantic Web ticularly acute in the context of SPARQL queries. While technologies. The benefits of the RDF/OWL data model often the subject and object portions of a SPARQL are well-understood by bioinformaticians who have too WHERE clause contain variables, and therefore are long had to deal with the problem of integrating data inherently human-readable, the predicates are usually from multiple sources with wildly different underlying explicitly specified in the query. Thus while: schema.Thesebenefitsarelessobvious,however,to clinicians and researchers who merely see one myster- ?gene SIO:is_homologous_to ?gene ious query language (SQL) exchanged for another (SPARQL). Even a Semantic Web-savvy informatician is quite clear to both the query composer and the can be daunted when faced with the challenge of query- human reader, the opaque equivalent ing an unfamiliar data source whose particular RDF vocabulary is initially unknown. ?gene SIO:010302 ?gene The issue is compounded by the growing use of opa- que, semantic-free URIs for ontological classes and is effectively meaningless to the reader, and absurdly properties (OBO [1], SIO [2], CWA [3]). While the difficult for the query composer to remember. Neverthe- meaning of rdf:type or dc:title is relatively clear to the less, there are many valid reasons for designing ontolo- human reader, the meaning of, for example, sio: gies using opaque identifiers, not the least of which is language neutrality. RDF/XML provides built-in language neutrality by way * Correspondence: [email protected] Providence Heart + Lung Institute at St. Paul’s Hospital, University of British of the xml:lang attribute; an ontology can therefore Columbia, Room 166 - 1081 Burrard St., Vancouver, BC, Canada V6Z 1Y6 © 2011 McCarthy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (http:// creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. McCarthy et al. BMC Bioinformatics 2012, 13(Suppl 1):S2 Page 2 of 9 http://www.biomedcentral.com/1471-2105/13/S1/S2 easily be internationalized by providing multiple rdfs: properties, individuals and namespaces that have been label or rdfs:comment properties with appropriate xml: indexed in JSON format. It can also be deployed with a lang attributes. However, even those projects who have, server-side Java component that will index OWL ontolo- in principle, adopted language neutrality for their classes gies directly. These ontologies may be parsed a priori as (e.g. OBO), most have not done so for their properties part of the SPARQL Assist installation/configuration, or (OBO Relationship Ontology [4]). This is no-doubt due, maybereferredtointheFROMclauseofaSPARQL at least in part, to the already discussed difficulty of query, in which case they are parsed dynamically at composing SPARQL queries in which predicates have query time. In the future, as much computation as pos- opaque identifiers. Nevertheless, it is crucial that we do sible will be transferred to the client side to improve not allow convenience to direct the development of a both performance and flexibility of deployment, but at core global resource (the Semantic Web) and thus the present the limited support for OWL in JavaScript problem should be solved at the level of the tools pro- requires a server-side component for ontology parsing. vided, rather than the resources themselves. Our initial implementation of SPARQL Assist was SPARQL Assist is a web application that facilitates the undertaken in the context of creating queries that will construction of SPARQL queries by providing context- be resolved by the Semantic Health and Research Envir- sensitive type-ahead completion. In addition to assis- onment (SHARE [5]), and we present here both the tance with basic syntax, ontological terms are indexed core SPARQL Assist software, as well as the extension by their labels, using the xml:lang attribute to record specific to SHARE. SHARE is an advanced SPARQL the language of each label for each term. Ontological query client built on top of the SADI Framework [6] for terms and their labels, are read on-the-fly from any Semantic Web Services. SADI services attach properties ontology specified in a SPARQL FROM clause, but to input OWL instances and these services are indexed SPARQL Assist can also be configured to pre-load in a central registry based on the properties they attach. terms from particular ontologies or SPARQL endpoints, SHARE maps the triple patterns presented in the reducing the burden on the user to know the existence WHERE clauses of a SPARQL query onto these indexed and location of all relevant vocabularies. properties, in order to discover SADI Web Services cap- The entire query, as it is being constructed, is used to able of generating the required triples. The RDF data provide context for the type-ahead suggestions. Pre- required to answer a given query is thus dynamically viously declared variables or known individuals (i.e. generated through the invocation of SADI services in values) are suggested during type-ahead for the subject response to the query being posed. or object position of the WHERE clause. In the predi- In the context of SPARQL Assist, the fact that proper- cate position, indexed ontological predicates are sug- ties do not exist at query composition time might be gested, with preference given to properties that an considered a barrier, since there is no pre-existing RDF individual is known to have if this information is avail- Store to inspect for candidate properties and individuals. able from ontological indexing and/or other query To compensate for this, the SADI extension to SPARQL clauses. Similarly, if a clause contains a variable that can Assist uses the SADI registry, in addition to any loaded ultimately be connected to a known individual in ontologies, to suggest properties to be used in a query. another part of the query, that connection is used to As in the generic case, if a clause contains a named find the most likely properties in the current clause. individual or a variable previously connected to an indi- Together, these functionalities should simplify the vidual, this information is used to further refine the sug- manual construction of SPARQL queries by (a) making gestions; in this case by filtering out properties the query more similar to natural language, and (b) sup- generated by services that cannot accept a particular porting any language in which the required terms have individual as input and highlighting properties generated been labeled. Moreover, insodoing,SPARQLAssist by services that can. Thus, not only is the lack of a static also provides ontology designers more freedom to follow triple store not a barrier to query composition, but the best-practices in ontology design and internationaliza- ability to construct a likely-successful query is, in fact, tion by reducing the barrier to query construction using enhanced by utilization of the underlying SADI opaque terms. SPARQL Assist thus represents one alter- infrastructure. native, alongside visual query builders and faceted brow- Although it was designed with SHARE in mind, sers, for helping unfamiliar users explore semantic data. SPARQL Assist can be used with any SPARQL endpoint that can be queried from client-side JavaScript. Due to Implementation the limitations of cross-domain JavaScript requests, the SPARQL Assist is implemented in JavaScript (jQuery) endpoint must reside at the same domain name as the and is intended to be accessed through a browser. In its SPARQL Assist query form, or the endpoint must be simplest configuration,