Building Web Observatories for Health Web Science

Building Web Observatories for Health Web Science Dominic DiFranzo, John S. Erickson, Marie Joan Kristine T. Gloria, Deborah L. McGuinness, Joanne S. Luciano Tetherless World Constellation Rensselaer Polytechnic Institute Troy, NY 12180difrad, erickj4, [email protected] dlm, [email protected] Abstract. As the World Wide Web continues to grow and change, the need to study and understand it grows. Web Science is an effort to do just this. Due to the multidisciplinary nature of Web Science, and the wide variety of data on the web it studies and produces, Web Observatories are needed to help foster collaboration and provide archiving of work and studies for future generations of researchers. In this paper, we use health data on the Web, and Health Web Sci- ence as a use case in showing how and why we need to build Web Observato- ries. We characterize the unique challenges that exist in building Web Observa- tories and present a methodology that accommodates these challenges. 1 Introduction As the World Wide Web continues to evolve and impact our lives on a daily basis, researchers from a range of disciplines have turned their attention to its study. The field of Web Science, created as an approach for a coherent and connected study of the Web as an entity unto itself, faces several unique challenges ranging from conflict- ing methodological philosophies to the development of a suitable infrastructure for its scientists to collaborate and share resources. Web scientists, therefore, are looking to mixed methodology practices, new computational infrastructure and large-scale ana- lytics in order to better make sense of these complex phenomena. 1.1 Web Observatories As a means for managing this research complexity, the Web Science community has undertaken the development of a new distributed platform to facilitate the collection, analysis and sharing of data about the Web. Whether viewed as individual observatories or as a federation, the web observatory will be a distributed archive of data on the Web and its activity. Additionally, it will expose mechanisms and tools that will enable the exploration of the Web's past development, to examine its present condition and to establish potential developments in the future [1]. Web observatories promise to be a vast improvement over the resources currently in use by Web scientists, which are centralized, ad hoc in their composition, and often proprietary. With these challenges in mind, the Web Science Trust's Web Observatory Project has three main objectives [1]: 1. Create a global data resource that moves beyond the traditional understanding of a centralized data warehouse to that of a more distributed environment for inter- disciplinary analysis and knowledge sharing. 2. Provide Web scientists a space to foster the development and sharing of tool- sets, frameworks and workflows. This can only be accomplished by adopting a bottom-up approach that aggregates individual repositories into a virtual infrastructure. 3. Promote and empower researchers to use not just quantitative correlation me- thods on datasets, but to explore and incorporate qualitative analyses that may help provide a more comprehensive understanding of the socio-technical evolution of the Web. 2 Health Web Science and the Health Web Observatory Imperative One sub-domain of particular interest to web scientists is the interplay between health, health sciences, and the Web. It is in this intersection that we are challenged with multi-level, multi-disciplined questions about not just the impact on users of the Web, but the evolution of the Web's infrastructure and related public policies. Thus, Health Web Science (HWS) is an emerging discipline from which its practitioners aim to "describe how the Web shapes, and is shaped by, medicine and health care ecosys- tems. Through this information, HWS will help engineer the Web and Web-related technologies to facilitate health-related endeavors and empower health professionals, patients, health researchers and lay communities. For example, HWS-based research may focus on developing and/or evaluating user responses to Web-based applications that seek to promote the formation of communities and the sharing experience [5]. Unique to this discipline, HWS researchers would not just examine the social practices of such experiences; but also raise questions re- garding the role of the data platform itself in ensuring the proper anonymization of data and user privacy. Furthermore, HWS, in general, includes the examination and understanding of “citizen science”, discovery of new questions and answers in the Web’s metadata, and representation and use of health knowledge by patients, medical professionals, and their machines [8]. 2.1 From Health Data to a Health Web Observatory Given the copious amounts of richly heterogeneous data, provided in many file for- mats; both unstructured and structured; and highly sensitive, the HWS community would benefit greatly from the development of a distributed network of Health Web observatories. This would allow for better gathering, quantifying, connecting, sharing, and searching over health and related data on the web (both quantitative and qualitative in nature); This federated HWO would focus on: “the synthesis, curation, and discovery of Web pages containing health information; the structure and utilization of interactive social media sites relevant to pa- tient support groups; and semantic annotation and linking of health records and data to facilitate mechanized exploration and analysis [8]. “ We advocate that the health-domain, in particular Health Web observatories, may serve as an exemplar for all future Web observatories given its technical, social and political complexity. In general, a HWO must be a materialization of the Web Obser- vatory information model as represented by the Web Observatory schema1. This model allows Web scientists to leverage the vast scale of commercial search engines to uncover properly described and published resources. Using the Web Observatory model, certain data characteristics of interest to health web science investigators must be clearly expressed: • Data identification and description: What is this data? In particular, is it statisti- cal in nature? Does it describe a scientific or medical result? • Origination: What is the source of this data? Did it originate with physicians, or patients, or researchers, or governments? Was the data scraped, or submitted, or generated through some study? • Usage: How has this data been used previously, and by whom? Has it been used in research, or advocacy, or treatment, or for decision making? • Citation: How is the data related to specific publications in the scientific and medical literature? • Provenance: How has this data been aggregated and shared? Was (for example) the data made available through a built-for-purpose data repository? Is it an ad hoc source? Has it been provided through a public "data hubs" such as datahub.io? • Policy: What policies and restrictions govern the collection and use of the data? This applies to both data submitted (collection terms) and data provided by a particular web observatory. For any given dataset, what policies affect it? 1 http://logd.tw.rpi.edu/web_observatory 3 Building a Web Observatory Outlined previously, the goals of the Web Observatory Project are to create a distributed data resource which enables collaboration among Web scientists across multiple domains and from around the world. It also aims to promote exploration and incorpo- ration of both qualitative and quantitative methodologies. It is, therefore, unsurprising that one of the biggest obstacles in designing and building a WO is in anticipating the variety of innovative and interesting uses by others, which are not explicitly built into the WO. To minimize this uncertainty, we present a process that prioritizes internal purposes first; while promoting the use of technologies and systems that enable the duplication, understanding and reuse of your Web observatory by others and for alter- native purposes. This general process draws its inspiration from the Semantic eScience methodology [6] Due to the unique challenges and nature of Web Observatories [1] the Semantic eScience methodology doesn't completely address some of the issues that arise in building one. In the following section, we use many parts of the Semantic eScience methodology as a basis to build a new methodology for building Web Observatories. 3.1 Forming Your Purpose, Questions, Ideas and Users We propose the first step to be: define the purpose of your specific Web observatory. In this, one should define the audience of the WO as well as its initial or alpha users. We recommend asking questions such as: What research domain do we want to explore?; What types of questions would we like to answer?; and, What do will this WO contribute to the greater Web Science community? This phase differs from the "use case" step provided in the eScience methodology as we focus on identifying a "user story" rather than formal, specific use case. The intent is to prevent restricting the observatory to only serve alpha users at the cost of alienating new users and applications. 3.2 Small Core Team, Mixed Skills The next step is: to create a small team with a diverse skill set. Similar to the second step of the eScience methodology, this team is necessary for researching and discov- ering datasets and tools to be used in the Web observatory. This has yet to be formal- ized within our own current Web observatory process; however, we recognize that this lends a great advantage in ensuring your WO encompasses relevant and authoritative data. 3.3 Gather Data Sources and Data Sets The third step is one of the most difficult. Gathering data sources and data sets may seem an insurmountable task as the Web is a vast source of information. Identifying which datasets and data sources exist is only one part of the puzzle. Additionally, one must also consider whether the data collected is trustworthy, non-proprietary, privacy- sensitive, etc.

Building Web Observatories for Health Web Science

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support