Brede Tools and Federating Online Neuroinformatics Databases∗
Total Page:16
File Type:pdf, Size:1020Kb
Brede tools and federating online neuroinformatics databases∗ Finn Arup˚ Nielsen May 14, 2013 Abstract Several tools have been described for integrating or federating neuroscience databases (Gupta et al, As open science neuroinformatics databases the 2008; Ashish et al, 2010; Cheung et al, 2009). One of Brede Database and Brede Wiki seek to make dis- the major database federation efforts is Neuroscience tribution and federation of their content as easy and Information Framework (NIF) that uses the Neuro- transparent as possible. The databases rely on sim- science Information Framework standardized (NIF- ple formats and allow other online tools to reuse their STD) ontology (Bug et al, 2008). With this ontology content. This paper describes the possible intercon- NIF performs term expansion from a user query. The nections on different levels between the Brede tools expanded query is translated to queries for the differ- and other databases. ent source databases and through a data mediator the queries are sent to these external databases and aggregated. The system relies on tools, e.g., for reg- 1 Introduction istering the schema of the external database and for full text search. The complete system may search on The concept open science entails the free access to web pages, databases, Extensible Markup Language data and methods. In neuroinformatics open science (XML) and other documents. would allow federation of databases, so researchers can make queries across data sets and ontologies. One way of database federation is by the so- Open science represents the first step enabling called Semantic Web. First prominently described data sharing. As the next step we would like to by Berners-Lee et al(2001) its community has now query across the different databases, and in a fur- established a number of technologies around the con- ther step we would like to use the data across mul- cept, e.g., triple stores, Resource Description Frame- tiple databases in statistical computations so meta- work (RDF), Notation3 (N3) and the Web Ontology analytic consensus emerge. For neuroinformatics Language (OWL). Building and using actual Seman- these last two steps are hindered by the plethora of tic Web databases has gained momentum especially different data formats, brain atlases and terminology with the so-called Linked Data approach, and the and the division of data between several databases, Linking Open Data cloud (http://lod-cloud.net) or- | making even the discovery of relevant resources ganizes many open data sets. IBM Watson DeepQA difficult. We would like the databases to expose their has recently shown the powerful applicability of the data in both human-readable and machine-readable Semantic Web as a part in a system for general format. With a machine-readable format neuroinfor- question answering in the Jeopardy televised game maticians can work with data en masse and merge where the system operated on \structured and semi- the data across databases. However, even with open structured knowledge available from for example the science data in a machine-readable format one still Semantic Web" (IBM, 2012). Among more than 100 has to match and link heterogenous data in different techniques DeepQA relied on the Semantic Web re- formats. This is where integrative neuroinformatics sources DBpedia and Yago ontology (Ferrucci et al, tools come into play. For information retrieval these 2010). Widespread use of Semantic Web technologies tools should have an understanding of concepts rather have not yet taken a strong foothold in neuroinfor- than just keywords (Gupta et al, 2008). matics, but some projects have used it, e.g., Cognitive Paradigm Ontology (CogPO), based on the Brain- ∗The final publication is available at link.springer.com: Map taxonomy, expresses its components in the Se- http://dx.doi.org/10.1007/s12021-013-9183-4 mantic Web OWL format (Turner and Laird, 2011). 1 Other neuroinformatics efforts AlzPharm (Lam et al, is to provide an open science data source for visu- 2007), Cognitive Atlas (Poldrack et al, 2011) and the alization and meta-analysis in neuroimaging. The NIFSTD ontology with NeuroLex (Larson et al, 2009) Brede Wiki (http://neuro.compute.dtu.dk/wiki/) also apply Semantic Web technologies. (Nielsen, 2009a) has a broader scope than the One of the software packages that allows users to Brede Database, not only recording studies with collaboratively construct Semantic Web resources on- stereotaxic coordinates, but also brain morphometry line is the Semantic MediaWiki extension that ex- and personality genetics studies as well as studies tends MediaWiki-based wikis, so wiki links can be outside neuroscience. It contains descriptions of typed and semantic information queried (Kr¨otzsch 869 journal papers and 174 conference papers, 129 et al, 2006). Further extensions on top of Seman- pages with stereotaxic coordinates, 479 pages each tic MediaWiki allow for, e.g., form-based input and describing a brain region and 599 pages describing a import and export of data in comma-separated val- `topic'. In total the wiki has presently 3,291 content ues. Semantic MediaWiki has been embraced in neu- pages (This number can be compared with the 7,555 roinformatics with NeuroLex and ConnectomeWiki content pages in NeuroLex). Apart from providing (Gerhard et al, 2010). open science data, the purpose of the Brede Wiki The Semantic Web is no silver bullet for neuroin- is also to provide a more direct way to contribute formatics. Data may not align well with the frame- data and ontology information, to be able to handle work provided by the Semantic Web. For instance, many different forms of data (not just stereotaxic central to neuroimaging is the volume files with asso- coordinates) and to provide means for freeform ciated image processing and statistical analysis. Such textual annotation of scientific publications. The files can be linked and described by the Semantic Brede Wiki for Personality Genetics (Nielsen, 2010) Web, but it does not make specialized data analysis has data from 87 published personality genetics methods or image-based queries available. In some studies with information about personality scores, cases neuroscience data are conveniently described in genotype and subject group. Its purpose is to tabular form and indeed many current neuroinfor- perform a mass meta-analysis of personality genetics matics databases use relational databases where ma- and to provide a testing ground for an open science ture database engines provide complex query facili- collaborative online \real-time" mass meta-analysis ties. Sufficiently complex queries required for scien- with visualization. tific data may simply not be available in the Seman- K¨otter(2001) laid out the main points to consider tic Web query language SPARQL (Gray et al, 2009). in the construction of neuroscience databases. The Furthermore, for neuroimaging results we found that following paragraphs will use these main points to Semantic MediaWiki does not provide sufficient func- describe the Brede Database and Brede Wiki. tionality so the statistical analysis for meta-analysis can be implemented (Nielsen et al, 2012). Data acquisition: Results from published neu- The following sections first introduce the Brede roimaging experiments are manually entered in the Database and Wiki and then describe how the neu- Brede Database via a graphical user interface imple- roinformatics databases link up with other databases. mented in Matlab and available via the Brede Tool- box. Other users can download the toolbox and enter data for inclusion in the Brede Database. In Brede 2 Brede tools Wiki registered and \anonymous" wiki users can en- ter data and text directly in the standard wiki in- The Brede Database (http://neuro.compute.- terface. To aid data entry, web services can extract dtu.dk/services/brededatabase/)(Nielsen, 2003) bibliographic information from PubMed and extract contains data from 186 published neuroimaging part of the brain coordinate information from scien- articles that include stereotaxic coordinates. In tific Portable Document Format (PDF) documents a fashion inspired by the BrainMap database, the by converting the PDFs to text and matching brain Brede Database structures data from each article into coordinates with regular expressions. It is a method one or multiple experiments, that each may contain related to the extraction method in connection with one or more stereotaxic coordinates. There are 586 the NeuroSynth database (Yarkoni et al, 2011). To experiments, and the Brede Database is supported by handle the slight differences in stereotaxic coordinate simple ontologies for topics, brain regions, journals spaces the Brede Toolbox transforms brain coordi- and persons. The purpose of the Brede Database nates to the Talairach atlas during data entry, storing 2 both the original reported and the transformed coor- System implementation: The philosophy be- dinates in the Brede Database. For the Brede Wiki hind the Brede Database and Wiki of using sim- the original reported coordinates are entered along ple and standardized components makes it easy with a field indicating the stereotaxic space. A tem- to migrate and distribute. We use standard free plate in the Brede Wiki creates links to these two software for the online components of the Brede web services based on the PubMed identifier and a tools: Apache, MediaWiki, Perl and Python run- link to an accessible PDF. The Brede Wiki for per- ning on a Debian/Ubuntu computer. The Brede sonality genetics provides online form-based input