Serious Collectives
Total Page:16
File Type:pdf, Size:1020Kb
Ubiquitous Infrastructure for Deep Accountability Robert E. McGrath National Center for Supercomputing Applications University of Illinois, Urbana-Champaign [email protected] ABSTRACT national CyberInfrastructure standards, can substantially The “Web 2.0” has created new capabilities to “mash up” improve accountability for digital communities of many types. content and “meet” on-line to create new collective knowledge. The general principle is to design flexible and reusable But to develop, maintain, and sustain serious collective middleware that provides the “right” set of services, without knowledge, serious accountability is needed. The emerging “wiring in” a specific set of assumptions about how the systems CyberInfrastructure provides ubiquitous mechanisms for must be used. accessing shared resources, but serious collective knowledge 2. COLLABORATION IN DIGITAL COMMUNITIES requires deep accountability for “everything”: data, process, and assertions. This will be accomplished by software infrastructure While on-line collaboration is scarcely new ([1, 20, 25]), the that implements: advent of so-called “Web 2.0” technologies has enabled the • Stable identities for every entity of interest Internet to support collective activity of all types. Social • Open metadata computing technologies offer inexpensive and ubiquitous • Data and process provenance sharing and communication, with capabilities for reuse and These critical issues are addressed by current developments border-crossing. These technologies include blogs, RSS feeds, from many sources. Several projects show that these principles wikis, file sharing, folksonomies, and simple technologies for are within reach. While emerging from large-scale science and aggregating this content [7, 36, 48, 51]. engineering, they provide a toolkit that will be useful for many types of collaborations, and will enhance and expand the Using these technologies, it is comparatively easy for ordinary deployment of mass collaborations such as Wikipedia. users to “repurpose” web content to create their own views and aggregate (“mash up”) information from many sources. Information is reused outside the borders of its original context, Keywords: CyberEnvironments, CyberInfrastructure, perhaps for radically different purpose. Collective Knowledge, Collaboration, Accountability Serious collaborations have existed long before the Internet, in the form of dictionaries, encyclopedias, scholarship, science, 1. INTRODUCTION and other knowledge intensive activities. Conventional scholarship places stock in expertise and professional The internet provides a ubiquitous decentralized infrastructure credentials, and heavily filters “content” on the way to for sharing information. This is evolving into a global publication. Historic standards of publication, citation, and CyberInfrastructure to support collaboration in virtual review have carried forward to be applied to digital objects. organizations [34]. For example, the Grid—e.g., the Open Grid However, digital environments are too complex and the data too Services Infrastructure [49]—provides strong security, and voluminous to manually create all the necessary content and facilities for managing data and computation. However, it is context. In short, digital collaborations cannot rely on manual centralized, complex, and not flexible enough for many uses. cataloging, selection, and annotation, as was done in the past. For example, the Grid community has struggled to support anonymous community accounts, which are essential for For example, in scientific fields, instruments can easily generate collaborative groups [41]. thousands of results per minute—far too many for any human or group to check or annotate the purpose, quality, and use of each The so-called “Web 2.0”—blogs, wikis, etc.—has enabled measurement. Furthermore, it is increasingly important to study ordinary users to develop complex applications [36, 44, 48]. problems and systems that span phenomena and disciplines, This has led to the emergence of web-based activities that aim arenas where there is no single discipline or group of experts. to create collective knowledge—once the domain of These critical studies of complex systems require knowledge scholarship, science, and experts. This is epitomized by “mash ups” across multiple domains of expertise, in which it is wikipedia (http://www.wikipedia.org), a decentralized, not possible for a single person or group to create all the content volunteer-fueled knowledge base, similar to an encyclopedia, needed to understand and use the digital artifacts. The producers but written and corrected by anyone who donates their effort of software, data, and knowledge do not know who the [35]. Wikipedia employs simple social networking technology consumers may be, let alone what they may wish to do. and “mass amateurization”, and has generated knowledge at a fantastic pace on a broad array of topics [35, 44, 50]. However, The Web 2.0 deal with this scaling challenge through “mass in many cases, the longevity and/or quality of the knowledge is amateurization” and a “publish, then filter” approach [44]. suspect (e.g., [52]). Wikipedia, in particular, has proved to be a serious provocation to scholarly communities, which have well-established methods Does the Web 2.0 have “too little accountability”, and the Grid for creating cumulative, collective knowledge. Comparing “too much accountability”, or perhaps both are just not what is today’s popular collectives (e.g., wikipedia) to established needed? What needs to be built? This paper suggests a specific scholarship (e.g., a journal such as Nature (www.nature.com)); infrastructure that, together with Web, Grid, and emerging they have similar goals and appearance. However, there are Accountability cannot be done through a centralized authority, significant differences in practices and expectations about it will follow the philosophy and model of the Web: simple accountability: a substantial difference in culture. reliable mechanisms that enable users to “mash up” accountability for their own purposes. The general principle is The question of accountability underlies the potential value of a to design flexible and reusable middleware that provides the “serious collective”. Is the cumulative knowledge authoritative “right” set of services, without “wiring in” a specific set of (reliable, correct, useful, etc.)? What would need to be done to assumptions about how the systems must be used. establish a great wiki page as “the authoritative” source on that topic, upon which others should rely and build new knowledge? 4. CYBERENVIRONMENTS: INFRASTRUCTURE For what audience and for what purposes? If not, what could be FOR COLLABORATION done to make it so? In recent years, a ubiquitous middleware has emerged to enable The following sections outline new and achievable large-scale, multidiscipline, system-level science and infrastructure that will make it possible to achieve deep engineering. This CyberInfrastructure (CI) is greatly decreasing accountability when needed within Web 2.0 style open the costs of sharing data, instrument, and computational networks. resources [34]. For many users, ubiquitous access is necessary 3. COMMUNITIES FOR CREATING COLLECTIVE but not sufficient. Additional software is needed to enable KNOWLEDGE information-intensive communities to exploit local resources and national CI in their research, development, and teaching A collaborative community is about sharing, presence, and activities, which we have termed Cyberenvironments [31, 33]. commitment to common goals. Some collaborations—such as Rather than focusing on universal access to resources, scientific communities—have deep and serious goals, such as Cyberenvironments emphasize the integration of resources into developing and promulgating reliable knowledge. Creating end-to-end scientific processes, integration across knowledge requires more than socializing, easy file sharing, and Cyberenvironments, and the continuing development and the capability to “mash up” information, it requires clear dissemination of new resources and new knowledge. understanding of the context and quality of the information underlying the knowledge. This understanding is based on This vision of Cyberenvironments assumes that research results accounts of “where it comes from”, “who says so” and “why”. such as papers, processes, and data can be conveyed with Supposing that collective knowledge is generated, how can it be enough information about themselves to be incorporated into propagated and built on by others? A classical approach is to further research work. This capability is essential to establishing capture knowledge in highly authoritative artifacts (e.g., journal accountability, and requires infrastructure for managing articles), constructed with extreme care toward sources and metadata (i.e. what units are the data in) and provenance (which arguments. Centuries of scholarly and scientific practice have data was discussed in a paper, what analysis was applied to it) worked from the principle that transparency and accountability [31]. are more important than majority votes. Much of the so-called Cyberenvironments draw inspiration from reflective software scientific method and real life scientific practice is dedicated to [22]. Adaptive systems employ brokers to dynamically compose accounting for the sources and destinations of data and software components using several critical design principles arguments. [33]: Digital collectives are exploring