Monitoring Virtual Team : Methods, Applications, and Experiences in Engineering Design

Dissertation zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften (Dr.-Ing.)

eingereicht an der Mathematisch-Naturwissenschaftlichen Fakult¨at der Universit¨atPotsdam

von Matthias Uflacker, M.Sc.

Potsdam, November 2010

Abstract

What distinguishes high-performance engineering teams from lower-per- forming ones in the design of complex software, products, and services? Answering this question traditionally involves extensive protocol studies and retrospective assessments of team effectiveness, e.g., in terms of adher- ence to budget and timelines, customer satisfaction, or innovation. Little attention has been paid to developing applicable techniques for observ- ing performance-relevant differentiators directly in the behavioral aspects of digitally-mediated creative teamwork. The expanding role of ‘virtual collaboration’ in engineering projects requires new computational instru- ments to efficiently study if and how designing is reflected in the implicit processes, tactics, and strategies carried out over today’s dense network of groupware, e-mail, and Web 2.0 services. This dissertation handles two important aspects in the realization of such an instrument. First, it de- velops an adaptable monitoring service platform called, d.store, to capture and analyze virtual collaboration activities live, i.e., while a project is still ongoing. Secondly, it applies the services in global, small-group engineer- ing teams to identify structural differences in collaboration behavior that correlate with independent team performance measures. With the services provided by the d.store platform it is possible to tap into heterogeneous online communication channels and to automatically generate a descriptive model of how teams virtually communicate, interact, and share information over the course of a project. The semantics and temporal attributes of the identified actors, resources, and relationships are represented in so-called Team Collaboration Networks. The platform is evaluated in the conceptual design phases of eleven distributed, multi- disciplinary engineering projects over a period of eight months each. The activities monitored in the e-mail archives, Wikis, and file sharing systems provide the basis for a detailed visual and quantitative examination of differences and similarities in the collaboration behavior of the observed teams. The analysis conducted on the generated Team Collaboration Networks indicates that high-performance design teams produce different collabora- tion patterns than lower-performing ones. Furthermore, the patterns that ii Abstract correlate with team performance suggest that an adherence to basic design principles has positive effects: teams who applied an ‘outside-in’ perspec- tive by emphasizing interactions with team-external stakeholders, contacts to domain experts, or group-internal knowledge sharing were generally more satisfied with their work, explored more design alternatives, or re- ceived higher ratings from independent judges. This is relevant, because it demonstrates that automatically collected objective real-time collaboration metrics can provide valuable insights into performance-relevant aspects of teamwork. The contribution of this work is a tested, non-interfering monitoring instrument, which establishes a technological foundation for the scientific observation, comparison, and analysis of virtual collaboration activities as a service. A pilot application in engineering design gives first evidence that meaningful team performance indicators can be drawn from this approach. The results encourage a continued and intensified utilization of the instru- ment to assist in the evaluation of IT-mediated collaboration processes, ultimately promoting a new paradigm in the conduction of real-time team diagnostics and support in engineering design. Acknowledgements

I am grateful for the many people who have accompanied me along this journey. First, I would like to thank Prof. Hasso Plattner for his inspiring, generous, and liberal guidance over the course of this dissertation. His passion and striking commitment for the topic encouraged me to pursue this line of research. The experiences that I was able to gain during this time were exceptional, invaluable, and are never forgotten. I also would like to thank the professors of the HPI research school, es- pecially Prof. Christoph Meinel and Prof. Andreas Polze, for their valuable feedback and guidance when it was needed. Many thanks also to Alexander Zeier, who has provided me with the environment and freedom to finish this dissertation and to pursue my research interests in various projects. I am deeply grateful to have met Prof. Larry Leifer and the people at the Center for Design Research, who were an invaluable source of inspiration and the second lighthouse as I progressed through this endeavor. Many of you have become good friends. Philipp Skogstad and Martin Steinert deserve special recognition. Thank you for the fruitful discussions, support, and motivation that I have received from you. I am thankful to have worked with my colleagues from the EPIC re- search group and the HPI research school, of which I can mention only few here: J¨urgenM¨ullerand Thomas Kowark for providing unbiased feedback on the drafts of this dissertation; Martin Faust and David Schwalb for their support during the implementation and data analysis. My special thanks go to Vishal Sikka and Sam Yen from SAP for their interest in my research and for providing professional feedback and industry perspectives. Finally, and most importantly, I would like to thank my parents, the rest of my family, and especially Meike. Thank you for your patience, un- derstanding, and support while I took my time to complete this chapter of my life.

Potsdam, Matthias Uflacker November 2010

Table of Contents

List of Tables ...... ix

List of Figures ...... xi

Listings ...... xiv

1 Introduction ...... 1 1.1 Motivation ...... 1 1.2 Why Monitoring of Design Collaboration is Relevant ...... 2 1.3 Problem Definition ...... 4 1.4 Research Objective ...... 6 1.4.1 A Service Platform for Virtual Collaboration Monitoring ...... 7 1.4.2 Scope of the Dissertation ...... 8 1.4.3 Underlying Principles and Assumptions ...... 9 1.5 Research Approach & Guiding Questions ...... 9 1.5.1 Step 1: Development of a Descriptive Model ...... 10 1.5.2 Step 2: System Implementation & Customization . . . . 10 1.5.3 Step 3: Application in Conceptual Engineering Design 11 1.6 Results and Contribution...... 11 1.7 Outline of the Thesis ...... 12

Part I Background & Preliminaries

2 A Review of Engineering Design Literature ...... 17 2.1 Conceptual Engineering Design ...... 18 2.1.1 The Fuzzy Front End of Innovation ...... 20 2.1.2 User-Centered Design ...... 22 2.1.3 Design Thinking ...... 24 2.1.4 Conclusions Drawn From Review ...... 26 2.2 Teamwork, Information & Virtual Collaboration ...... 26 2.2.1 Design Teams: A Working Definition ...... 27 2.2.2 Models of Design Work ...... 28 vi Table of Contents

2.2.3 Virtual Collaboration in Design ...... 31 2.2.4 Conclusions Drawn From Review ...... 34 2.3 CSCW and Groupware in Conceptual Design ...... 34 2.3.1 Basics of CSCW in Design ...... 34 2.3.2 Synchronous & Asynchronous Groupware ...... 36 2.3.3 Hypermedia & Web-based Collaboration Platforms . . 38 2.3.4 Application Lifecycle Management Platforms ...... 41 2.3.5 Conclusions Drawn From Review ...... 42 2.4 Instruments for Virtual Collaboration Monitoring ...... 42 2.4.1 Monitoring of Information Artifacts ...... 43 2.4.2 Monitoring of Process Participants ...... 44 2.4.3 Combined Monitoring of Information and Participants 45 2.4.4 Derivation of System Requirements ...... 46 2.4.5 Moving Beyond the Existing Literature ...... 48 2.5 Chapter Summary ...... 49

3 Technological Foundations ...... 51 3.1 Definitions ...... 51 3.2 Representational State Transfer ...... 53 3.3 Of Resources and Semantics ...... 56 3.4 Semantic Web ...... 58 3.4.1 Ontologies ...... 59 3.4.2 The Resource Description Framework ...... 59 3.4.3 The OWL Web Ontology Language ...... 61 3.4.4 A Graphical Notation for RDF/OWL Ontologies . . . . 62 3.5 Chapter Summary ...... 64

Part II Models for Team Collaboration Capture

4 Team Collaboration Networks ...... 67 4.1 Foundations ...... 67 4.2 Temporal Network Properties ...... 69 4.3 Representing Team Collaboration Networks in OWL ...... 72 4.3.1 Motivation ...... 72 4.3.2 Terminological Components ...... 73 4.3.3 Assertion Components ...... 76 4.4 Chapter Summary ...... 78

5 An Ontology System for Team Collaboration Networks . 79 5.1 Foundations ...... 79 5.2 Named Graph Partitioning ...... 80 Table of Contents vii

5.2.1 Domain Ontologies & Rule Graphs...... 82 5.2.2 The TCN-S Concept Graph ...... 84 5.2.3 The TCN-S Instance Graph ...... 84 5.2.4 TCN Concept Graphs ...... 85 5.2.5 TCN Instance Graphs ...... 85 5.3 Chapter Summary ...... 86

Part III System Implementation

6 d.store: A Resource-oriented Team Collaboration Network System ...... 89 6.1 Platform Architecture Overview ...... 89 6.1.1 Client Applications ...... 89 6.1.2 RDF/OWL Graph Component ...... 90 6.1.3 Service Interface ...... 91 6.2 The d.store Concept Graph ...... 92 6.3 Processing Temporal Network Properties ...... 94 6.3.1 Storing Temporal RDF Statements ...... 94 6.3.2 Modifying the RDF/OWL Subsystem ...... 96 6.3.3 Modifying the Relational Storage Interface ...... 97 6.3.4 Advantages and Disadvantages of the Approach . . . . . 100 6.4 Implementing the Service Interface ...... 101 6.4.1 Platform Resources ...... 101 6.4.2 Exploring Team Collaboration Network Resources . . . 102 6.4.3 Manipulating Team Collaboration Network Resources 106 6.5 Chapter Summary ...... 108

7 System Configuration ...... 109 7.1 Domain Ontologies for Online Collaboration ...... 109 7.1.1 web: An Ontology for Hyperlinked Collaboration Resources ...... 110 7.1.2 wiki: An Ontology for Wiki-based Collaboration . . . . . 111 7.1.3 : An Ontology for Email-based Messaging ...... 111 7.1.4 file: An Ontology for Shared Document Storages . . . . . 113 7.2 Inference Rules ...... 115 7.3 Preparing the Data Collection Process ...... 116 7.3.1 Initializing the Networks ...... 116 7.3.2 Setting up the Sensor Clients...... 117 7.3.3 Specifying Participant Roles and Alias Names ...... 118 7.4 Chapter Summary ...... 118 viii Table of Contents

Part IV Evaluation & Discussion

8 A Pilot Application in Engineering Design ...... 123 8.1 ME310: A Global Academic Project Testbed ...... 124 8.1.1 Project & Team Setups ...... 124 8.1.2 Process Participants & Team Interactions in ME310 . 126 8.1.3 A Shared ICT Infrastructure for Virtual Collaboration 127 8.1.4 Privacy and Confidentiality of the Observations . . . . . 128 8.1.5 me310 : An Ontology for Project Roles & Participants in ME310 ...... 129 8.2 A Quantitative Appraisal of the Generated Networks ...... 129 8.2.1 Activities Captured From Email Lists ...... 131 8.2.2 Activities Captured From the Wiki System ...... 135 8.2.3 Activities Captured From WebDAV Folders ...... 136 8.2.4 Summary ...... 137 8.3 Temporal Variations & Dynamics in Team Collaboration . . . 138 8.3.1 Individual Participation on Project Email Lists ...... 138 8.3.2 Evolution of Project Wiki Spaces ...... 140 8.4 Performance Correlations ...... 142 8.4.1 How Team Performance was Measured ...... 142 8.4.2 Finding Dependencies in the Captured Group Activities ...... 143 8.4.3 Correlations with External Communication Activities 144 8.4.4 Correlations with the Number of Shared Resources. . . 145 8.4.5 Correlations with Coach Engagement ...... 147 8.5 Summary of Findings & Critical Discussion ...... 149

9 Conclusion ...... 154 9.1 Contribution ...... 154 9.1.1 Contribution to Design Practice ...... 155 9.1.2 Contribution to Design Theory ...... 156 9.2 Discussion ...... 156 9.3 Legal & Moral Aspects...... 158 9.3.1 Legislations on Monitoring Employees Communication 158 9.3.2 Employee’s Privacy and Autonomy ...... 159 9.4 Recommendations for Monitoring Virtual Team Collaboration159 9.4.1 Organizations Should Implement a Monitoring Scheme in Agreement with Employees and Legal Regulations ...... 159 9.4.2 Organizations Should Govern a Virtual Collaboration Infrastructure To Support Monitoring Objectives . . . . 160 Table of Contents ix

9.5 Recommendations for Distributed Engineering Design Teams 161 9.5.1 Teams Should Assign a Project Communication & Information Manager ...... 161 9.5.2 Teams Should Engage With Domain Experts as Quickly as Possible ...... 162 9.6 Ongoing Research & Future Work ...... 162

References ...... 165

Appendix

Case Study Data ...... 182 A.1 Individual Team Member Scores ...... 182 A.2 Team Level Scores...... 186

Regression Analysis Results ...... 188 d.store - API Reference ...... 191

Acronyms ...... 200

Glossary ...... 202

Index ...... 204 List of Tables

2.1 Differences between the Front End of Innovation (FEI) and New Product & Process Development...... 21 2.2 Contrasting the traditional approach to design with UCD. . . 22 2.3 Time and space-based views of CSCW technologies...... 36 2.4 A comparison of properties met by existing instruments for capturing and analyzing the work of engineering design teams...... 49

4.1 Statements declaring three node types Email, Person, and TeamMember...... 74 4.2 Statements declaring a relationship type ‘hasSent’...... 74 4.3 Statements declaring an attribute type ‘address’...... 75 4.4 Statements declaring a network node for a resource identified by ‘ex:email’...... 76 4.5 A reified statement to define the validity interval of a ‘tcn:createdBy’ relationship...... 78 4.6 A reified statement to define the validity interval of a ‘tcn:mailbox’ attribute...... 78

5.1 Graph names and aliases used in the following examples. . . . 82 5.2 Statements of a domain ontology graph to declare common node, relationship, and attribute types for email-based conversations...... 82 5.3 A TCN-S Concept Graph with basic class definitions...... 84 5.4 Statements of a TCN-S Instance Graph to assign two named graphs as concept and instance models to a TCN instance...... 85 5.5 A TCN Concept Graph with import statements and a custom node type ‘tag’...... 85 5.6 A TCN Instance Model with two node instances and attributes...... 86 List of Tables xi

6.1 The 5-tuple schema assigns a validity interval to every record of an RDF statement, reducing the overall number of statements and allowing for efficient time-point querying of TCN components...... 96

8.1 Overview of ME310 projects in 2007/2008 (Skogstad et al., 2009)...... 125 8.2 Key dimensions of the generated Team Collaboration Networks...... 131 8.3 Summary of findings from testing relationships quantitatively at the design team level...... 153

A.1 Individual team member attributes queried from the generated Team Collaboration Networks...... 182 A.2 Team-level attributes queried from the generated Team Collaboration Networks...... 187 List of Figures

1.1 The instrumentation of virtual collaboration facilitates “in-flight” monitoring of engineering design processes by means of computational capture and analysis of collaboration activities...... 7 1.2 A monitoring service platform provides a common interface to distributed clients that are specialized in the capture, analysis, and monitoring of collaboration activities...... 8

2.1 A standard model in German industry for designing new products...... 19 2.2 A hierarchy of requirements for system acceptability. The purpose of user-centered design is to achieve usability, a prerequisite for usefulness and practical system acceptability. 23 2.3 ISO 13407 - Human-Centred Design Processes for Interactive Systems, an abstraction of basic principles in UCD processes...... 23 2.4 The “sweet spot” of good design: innovation is stimulated by leveraging expertise in each of the interrelated areas of human factors, technical factors, and business factors...... 25 2.5 Opportunity for groupware in early design stages. Tools to support collaboration groups in the conceptual design phase are scarce...... 39

3.1 Adding a semantic layer to information in a virtual collaboration process by means of descriptive resources. . . . . 58 3.2 An example RDF graph...... 60 3.3 Subclass relationships between OWL and RDF/RDFS...... 62 3.4 A graphical notation for RDF/OWL-based ontologies...... 63

4.1 A Team Collaboration Network with different types and instances of nodes, relationships, and attributes...... 69 4.2a Team Collaboration Network at time t − 1...... 71 4.2b Team Collaboration Network at time t...... 71 4.2c Team Collaboration Network at time t + 1...... 71 List of Figures xiii

4.3 Graphical notation of node, relationship, and attribute types that define the terminological components (TBox) of a Team Collaboration Network...... 75

5.1 Partitioning of a Team Collaboration Networks system into named graphs. Socialization of common domain concepts and isolation of independent TCN instances is achieved through the transitive import of ontological fragments...... 81

6.1 The d.store platform architecture. A variable set of clients access and modify the state of Team Collaboration Networks via a RESTful server interface...... 90 6.2 The TCN-S Concept Graph of the d.store platform...... 93 6.3 Customized Jena types to address the n-tuple extension in a Triple...... 97

7.1 are a popular medium to share and point others to relevant information on the Web...... 110 7.2 An ontology for hyperlink relationships between general collaboration resources (dstore:Resource) and Web resources. 111 7.3 wiki: An ontology for concepts and properties in Wiki-based collaboration...... 112 7.4 email: An ontology for concepts and properties in email-based communication...... 113 7.5 file: An ontology for basic collaboration activities in shared document storages...... 114 7.6 d.person: A client application to manage the roles (types) and alias attributes of person nodes in Team Collaboration Networks...... 119

8.1 Roles and process participants in ME310...... 125 8.2 Team members and teaching team during a weekly project meeting...... 126 8.3 Impressions of the ME310 design space at Stanford University. A shared ICT infrastructure supports global collaboration between distributed team members...... 128 8.4 An ontology of project participants and roles in the observed design curriculum ME310...... 130 8.5 Weekly amounts of emails, hyperlinks, and file attachments sent via the project lists during the observed project period. 132 8.6 Total amount of emails sent to the project lists of each team. 133 8.7 This email was sent by one team member to the rest of the global project team. The message body provides additional context for attached and hyperlinked resources...... 134 xiv List of Figures

8.8 Representation of the relationships between email messages (green), attachments (red), and email receivers (blue) captured in one of the projects...... 134 8.9 Total amount of Wiki pages created in the projects...... 135 8.10 Total amount of files shared in the WebDAV team folders. . . 137 8.11 Snapshot of an interactive visualization of the individual participation in email-based project communication...... 139 8.12 Wiki spaces of projects Theta, Alpha, Gamma...... 141 8.13 The proportional amount of outbound emails (compared to team-internal messages) sent by a team correlates positively and significantly with the average team member satisfaction. 146 8.14 The average team member satisfaction correlates positively and significantly with variables in the online interaction behavior of the teams...... 147 8.15 The total number of distinct URLs shared within design teams correlates positively and significantly with output performance, suggesting that breadth of shared information impacts performance...... 148 8.16 The number of coach emails correlates positively and significantly with the total number of prototyping activities undertaken by design teams, suggesting that coaches have a positive impact on prototyping...... 149 Listings

5.1 A SWRL inference rule in RDF/XML syntax...... 83 6.1 Example of a SQL query that finds all node instance resources valid on June 1st, 2009...... 95 6.2 Creating a statement table for a TCN instance graph...... 98 6.3 Binding a trigger function ’timetravel’ to the statement table of a TCN instance model...... 99 6.4 Querying the currently valid statements from the statement table of a TCN instance model...... 100 6.5 Querying statements from a statement table that have been valid on June 1st, 2009...... 100 6.6 Examples of valid d.store URL paths...... 101 6.7 Retrieving all nodes from the latest network representation.. 103 6.8 Retrieving all nodes in the network as on June 1st, 2009. . . . 103 6.9 Retrieving a list of all Email-typed nodes in the current network...... 104 6.10 Retrieving a JSON-formatted node instance...... 105 6.11 SPARQL statements to filter the list of node instances...... 106 6.12 Retrieving Wiki pages that have been referenced in an email. 106 6.13 Creating a Node Instance...... 107 6.14 Removing a Node Instance...... 108 7.1 Creating a TCN instance...... 117 8.1 Querying emails that have been sent from team members to at least one team-external person...... 144 8.2 Querying emails that have been sent from team members to other team members only...... 145 8.3 Querying Web resources referenced in at least one email that has been sent by a team member...... 147 8.4 Query clause to determine the emails that a team has received from its coaches...... 148

1 Introduction to the Dissertation

1.1 Motivation

The digital revolution and the rise of information and communication tech- nology (ICT) over the last decades has had a major impact on all indus- tries, but is perhaps most visible in commercial aviation. The improve- ments in processes and equipment are apparent when comparing modern air transportation with the pioneering days of aviation. Digital sensors and on-board computer networks today provide pilots with a wealth of data that is constantly measured, processed, and evaluated throughout a flight. Key metrics and activities are continuously recorded and communicated to ground control stations for monitoring and guidance. Unexpected signal deviations are indicators for potential problems, allowing pilots to respond accordingly and in time. Clearly, the instrumentation of aircrafts has led to a permanent control over relevant in-flight parameters and to aviation becoming more reliable, predictable, and efficient (Spenser, 2009). Collaboration processes in the engineering industry lack a compara- ble set of “in-flight” metrics. In the development of new products, soft- ware, and services, teams and managers rarely have objective real-time data about the status of their projects. Numerous studies, reports, and personal experiences have shown that various problems may result from this lack of awareness: communication and information barriers, intrans- parent processes, unforeseen defects, and incorrect assessment of progress and failures (e.g., Brooks, 1995; DeMarco and Lister, 1999). Because a de- tailed overview of team activities is not available, project teams are often limited to “post-mortem” metrics in judging and evaluating their collab- orative work (Sterpe et al., 2007; Tucci, 2008). Efficient instrumentation is required to better monitor engineering design processes and to achieve insights into beneficial or detrimental activity patterns. The improvements in computer hard- and software have created new opportunities for monitoring team collaboration. Information and commu- nication technology pervades the work infrastructure of today’s global or- ganizations. Since its public emergence in 1990s, the has become an inherent component in the work of engineering teams (cf. Mankin et al., 1996). Email and tools for computer-supported cooperative work (CSCW) 2 Introduction gave rise to the digitalization of task coordination and knowledge exchange across temporal and geographical distance. This trend continues as new communication channels such as Wikis, social networks, and blogs are find- ing their way into the enterprise. The widespread use of digital media to communicate and coordinate in distributed and co-located collaboration groups has led to a common form of teamwork collectively referred to as ‘virtual collaboration’, a subject of intensive research in many disciplines (Powell et al., 2004). How the growing digital footprint of collaboration in engineering design teams can be efficiently exploited for monitoring pur- poses and what we can learn from it are questions that have not been sufficiently answered yet. Research has begun to examine virtual collaboration in co-located and distributed design teams. But the lack of a generally applicable instrument for recording and analyzing the full range of online team activities hinders a broader investigation. Existing approaches commonly rely on custom tools that work well in a specific scenario, collaboration system, or work environ- ment, but which can not be applied in even similar observation contexts. Transferring existing instruments into a broader set of application scenarios often requires rewriting the software or interfering with the subjects under study. Repetitive efforts and costs in the observation of virtual collabora- tion processes are the consequence. Furthermore, the possibility for other researchers to replicate or verify previous findings, an important criterion for relevance and rigor in empirical design research (see, e.g., Dixon, 1987), is severely lowered by the use of isolated instruments and data formats. A common technological foundation for monitoring and studying virtual collaboration in the field is needed, which minimizes the efforts for data col- lection and analysis and which facilitates comparative research in design. This work aims to contribute to this goal by providing methods, applica- tions, and experiences in the monitoring of engineering design teams.

1.2 Why Monitoring of Design Collaboration is Relevant

The need for improved instrumentation in virtual collaboration processes in engineering design is driven by the economic interests of the industry. As argued in the following three steps, the way design teams work, communi- cate, and interact by means of ICT ultimately influences an organization’s potential to grow and innovate. 1. Economic Growth Needs Innovation. The dynamics of economic life has always been influenced by wave-like movements, often triggered by new technology and innovative products entering the market (Kon- dratieff and Stolper, 1935). Innovation is the motor for economic growth 1.2. Why Monitoring of Design Collaboration is Relevant 3

and is therefore to be maximized (Audretsch, 1995). Companies are con- stantly challenged with what has been termed by Schumpeter (1942) as ‘creative destruction’: the incessant generation of viable products that is imperative to stay ahead and survive in a global market. It is nec- essary to understand how innovation occurs in order to systematically increase innovative potential. The foundation for innovation is laid by new concept creation during the early stages of engineering projects. This ‘front end of innovation’ (Koen et al., 2001) is poorly understood and presents one of the greatest opportunities for improving the innova- tion process (Reinertsen, 1999). Iterative methodologies such as User- centered Design (e.g., Nielsen, 1994; Constantine and Lockwood, 1999) and Design Thinking (e.g., Kelley and Littman, 2001; Cross, 2006) have shown to stimulate the generation of innovative concepts (Brown, 2008; Mao et al., 2005). At the same time, there is little understanding of how ‘designerly ways’ of interacting and working together can be quantita- tively measured, expressed, or analyzed.

2. Innovation Needs Collaboration. Sch¨on(1992) describes design as a reflective conversation with the materials of a design situation. He sketches design as an iterative process of “seeing–moving–seeing”: in- terpreting and making sense of the world (seeing), performing actions to affect a desired change (moving), and assessing the effects in the changed environment (seeing). Transferred into a team context, de- sign becomes an inherently social, interactive, and collaborative pro- cess. This perspective on design not only reveals the unstructured na- ture of the process, but also the importance of sharing knowledge and relevant design information in a team. Essays such as The Mythical Man-Month (Brooks, 1995), Peopleware (DeMarco and Lister, 1999), and other more research oriented studies (e.g., Driskell et al., 2003; Thompson and Coovert, 2003) document that efficient team collabora- tion is a dominant factor for the well-being and success of engineering projects. Other studies of the design process further prove that com- munication within design teams is instrumental to successful design activity (Skogstad, 2009; Cockayne, 2004; Cross and Clayburn, 1995). However, while it is generally accepted that collaboration is a critical factor in design, only little is known about how collaboration patterns reflect or impact qualitative aspects of the design process.

3. Collaboration Needs ICT. The proliferation of CSCW and virtual collaboration has fundamentally changed design environments and the way teams communicate and share information (Mankin et al., 1996). Software for virtual collaboration (groupware) has moved to the In- ternet and the World Wide Web, providing an ubiquitous and effective 4 Introduction

medium for connecting people and information. In fact, communicating over the Internet has become standard and indispensable, especially in global organizations. About 62% of all employed Americans have In- ternet access and virtually all of those (98%) use email on the job (Fallows, 2002). Web 2.0 services and other communication channels such as and Voice over Internet Protocol (VoIP) are increasingly moving into the workplace (Shiu and Lenhart, 2004). Any- time/anywhere access to groupware and project resources creates an unprecedented level of information availability and facilitates the col- laborative creation of a shared knowledge base. Virtual collaboration even became common in co-located teams, as the use of groupware tech- nology permits interactions to take place within time frames that fit the convenience and different work patterns of team members (Katzenbach and Smith, 2001). The growing availability of online documented collaboration activity motivates the application of computational techniques to capture, moni- tor, and analyze the collective work streams of engineering design teams. Being able to observe virtual collaboration activities on a detailed level provides a foundation for the scientific exploration of ICT-supported de- sign processes. Combined with traditional empirical techniques, the com- putational evaluation of team strategies promises to be a powerful method to optimize communication and coordination in design processes and to improve the innovative potential in organizations (Jacoby and Rodriguez, 2008; Ashworth, 2007).

1.3 Problem Definition

The instrumentation of virtual collaboration processes in engineering de- sign is confronted with technical and conceptual challenges. These can be generally related to two major factors: the complexity of design processes and the ambiguity of computationally collected data. Process Complexity. The computational recording and analysis of vir- tual collaboration demands for a well-defined and unambiguous repre- sentation of the processes under study. Team interactions in concep- tual design, however, are highly dynamic, unstructured, and ad-hoc on the level of individual process participants and activities. Dispersed teams, distributed collaboration landscapes, and concurrent interac- tions further add to this complexity. This calls for an extensible and non-prescriptive approach to capture the ‘who’, ‘what’, and ‘when’ of a collaboration process in a structured and meaningful way. Due to the diversity of existing and future collaboration environments, the con- structs needed for such a representation can not be statically defined. 1.3. Problem Definition 5

Also, teams generally use more than one collaboration system during the course of a project, depending on the prevailing task and situation at hand (Perry et al., 1999). Monitoring a single communication chan- nel can only capture fractional or isolated parts of the collaboration process (Olson and Teasley, 1996). To overcome this challenge, a new approach is required, one that enables design researchers and practi- tioners to monitor collaboration activities on top of a mixed groupware environment. The system must be generic and adaptable to the specific types, numbers, and locations of process participants and the utilized collaboration tools.

Data Ambiguity. With data available at little or no additional costs, the computational analysis of the digital footprint of team communication promises to be a cost-effective approach to gain insight into the work of virtually collaborating groups. In contrast to manual data collection techniques (cf. Baya, 1996), the automated recording of groupware ac- tivities allows to collect high volumes of data for empirical research (Liang et al., 1999). However, the quality of this data in terms of how it can be interpreted in a concrete team situation is largely unknown. The extent to which the digital representation of groupware activity can be a surrogate for the original intent is a fundamental question that is deeply rooted in the nature of communication itself (Shannon and Weaver, 1949). How precisely do the captured symbols convey the de- sired meaning? Recent design studies give first evidence that data at the technical level of communication can be an observable surrogate for the semantic intent of a message (Milne, 2005). Still, it is unclear to which extent the digital traces of communication reflect or indicate differences in design team activity and performance. What are relevant patterns and metrics in technologically mediated collaboration that are worth to be observed? Correlations in the digital footprint of collaboration processes and independent measures of team effectiveness are relatively unexplored. Providing evidence of their existence (or non-existence) re- quires a systematic approach to monitor virtual collaboration processes on a larger scale. In summary, the dissertation addresses the following problems: 1. There is no common technological foundation for the computational recording and analysis of virtual collaboration activities, which can meet the requirements for monitoring the early, unstructured stages of engineering design processes in the field. This has the following effects: a) Detailed scientific investigations into the use of groupware during the conceptualization of new products are rare and hindered by 6 Introduction

extra costs and expertise required to implement custom solutions for data collection and processing. b) Existing instruments and studies are generally limited with regard to the type and number of communication channels being consid- ered, preventing a holistic view on the virtual collaboration process and transfer into different design environments. c) Individual tools and data formats hinder a broader comparison and verification of findings by other researchers. Data samples are not accessible or can not be reused to reproduce previous findings. 2. Due to the lack of a shareable monitoring platform, existing studies in design research are unaligned and knowledge about relevant criteria in the technological mediation of design collaboration is fragmented. Coordinated case studies are needed to empirically explore and learn about design team behavior and to understand how performance is reflected in or influenced by groupware use.

1.4 Research Objective

The objective of this dissertation is to contribute to a better instrumen- tation and understanding of virtual collaboration processes in engineering design teams by providing a platform to systematically measure and study the use of online communication channels. It accounts for the fact that the digital traces of virtual collaboration, in their entirety, have up to now re- mained mostly unexploited in the context of observing concept generation during the early stages of engineering projects. Approaching the above mentioned challenges, the work seeks to create a monitoring instrument that facilitates detailed studies of virtual collaboration processes, thus al- lowing design researchers to share their observations and to interconnect their collective analysis efforts. As a second objective, the methods and applications developed in this work are evaluated in a series of engineering design projects to empirically show how the virtual collaboration behavior of high-performance teams was different to that of lower-performing ones. Thus, by contributing new tools and experiences in design team monitoring, the dissertation aims to promote the generation of new hypotheses for further research rather than to produce generalized knowledge about the process itself. The work articulates requirements and architectural specifications for an “in-flight” monitoring instrument that is able to serve as a technologi- cal foundation for capturing and analyzing virtual collaboration activities in real-time1. Common virtual collaboration activities in engineering de-

1 Considering its ambivalent meaning in computer science, the notion of ‘real-time’ refers here to the ability to monitor collaboration activities in an almost undelayed manner, i.e., within the range of seconds after an activity took place. 1.4. Research Objective 7 TheoriesMetrics&

Guidance Refinement Hypotheses,

Design Monitoring Design

Processes Instrument Research

Capture Analysis

Figure 1.1. The instrumentation of virtual collaboration facilitates “in-flight” monitoring of engineering design processes by means of computational capture and analysis of collaboration ac- tivities. New insights stimulate design research and may lead to the refinement of the instrument and its application to guide the work of engineering teams. sign include, e.g., synchronous/asynchronous messaging, or the sharing, accessing, and editing of digital documents. The instrument coordinates the automated capture of such activities without interfering with the de- sign process or the applied groupware tools. It establishes a central point of access to detailed information and up-to-date records of how teams inter- act and communicate during their work. Simultaneously, the records can be used to explore collaboration patterns, trends, and characteristics in the observed activities, thus providing a basis for empirically researching the implicit processes in engineering design projects. New insights, the- ories, and hypotheses can be drawn from the observations and promote conceptual design methodology as well as the identification of critical col- laboration metrics. Based on this knowledge, the instrument can be iter- atively refined and leveraged to provide engineering teams with improved in-process support and guidance (Fig. 1.1).

1.4.1 A Service Platform for Virtual Collaboration Monitoring

From a software engineering point of view, the work addresses the challenge of creating an instrument that is applicable in a variety of different collab- oration environments without restricting or interfering with the processes under study. The proposed monitoring system must be configurable and respond to distributed design environments, heterogeneous collaboration tool landscapes, and concurrent team interactions. The work aims for a service-oriented approach to design monitoring by decomposing the system into dedicated function blocks for capturing and processing virtual collaboration activities. Service-orientation defines an architectural paradigm for distributed and loosely coupled software sys- tems that promotes integrability, flexibility, and reusability (Erl, 2005). In a service-oriented monitoring infrastructure, the capturing, organization, and processing of activity records is decoupled to achieve a high degree of generality in terms of the observed communication channels, activities, and the intended analysis procedures. 8 Introduction

Dashboard Clients

Engineers Researchers Monitoring Service Platform

Activity Records Collaboration Tools

Sensor Analysis Clients Clients

Figure 1.2. A monitoring service platform provides a common interface to distributed clients that are specialized in the capture, analysis, and monitoring of collaboration activities.

Figure 1.2 sketches the outlined client-server architecture that is elabo- rated in this work. A central monitoring service platform provides a com- mon interface to store, read, and update a set of records to describe the activities that have occurred in a virtual collaboration process. Client ap- plications insert or modify records or evaluate the captured activities in one or more engineering design processes: groupware-specific sensor clients continuously scan collaboration activities and feed the information to the platform; analysis clients visualize, explore, and compare different patterns, trends, and characteristics in the recorded events; dashboard clients to support the work of the engineers by providing functionality to navigate through the records and to track relevant collaboration metrics during the design process.

1.4.2 Scope of the Dissertation

In the realization of the outlined monitoring infrastructure, the following research tasks fall into the scope of this work: the elaboration of descrip- tive models to record and analyze collaboration activity, the design and prototypical implementation of a service-based monitoring instrument, the integration and application of this instrument in a series of distributed engineering design projects, and the evaluation of the captured activity records with a view towards metrics that correlate with team performance. Research that is beyond the scope of this dissertation includes an analysis of how this instrument can be further refined and used to guide engineering design teams in their processes and how the work of designers and managers is influenced by the availability of dashboards and real-time collaboration metrics. 1.5. Research Approach & Guiding Questions 9

1.4.3 Underlying Principles and Assumptions Several assumptions about virtual collaboration underlie this research and determine the scope of the work. Assumptions are made with regard to the intent of the collaboration, the structure of the teams, and the type of collaboration environments that are being monitored. An elaboration of these principles is given in Chapter 2. Collaboration Intent: The work investigates virtual collaboration activ- ities in conceptual engineering design, i.e., the early stages of an engi- neering project, in which opportunities and innovative concepts for a new product, software, or service are ideated, (re-)designed, and con- ceptualized before the detailed production can begin. The outcome of these activities is initially unknown and open-ended, but intends to achieve a viable, feasible, and desirable solution to an identified market need. The nature of conceptual design is experimental, unstructured, difficult to plan, and involves considerable application of knowledge, judgment, and expertise in a team.

Team Setup: The team formations in engineering design follow the pat- terns of project teams often encountered in organizations ranging from start-ups to multi-national firms. Design teams are specifically assem- bled and usually time-limited to the duration of a project. Team mem- bers are drawn from different disciplines and/or functional units, so that specialized expertise can be applied to the design task at hand. All members usually show a high level of commitment for the project, trust each other, and share responsibility for the result. Design teams produce a non-repetitive outcome, representing either an incremental improvement over an existing concept or a radically different new idea.

Collaboration Environment: The work further assumes that team col- laboration takes place in an organizational environment that requires and facilitates the informal exchange of knowledge and ideas through a set of virtual collaboration tools and groupware. In such an envi- ronment, team members frequently use Internet-based media (email, World Wide Web, etc.) to communicate, document knowledge, and to disseminate information to collaboration partners separated by time or space. Teams have ubiquitous access to a centrally managed ICT in- frastructure that fulfills basic communication and coordination needs.

1.5 Research Approach & Guiding Questions The work takes three necessary steps in the construction, application, and evaluation of a service-based monitoring instrument for virtual collabora- 10 Introduction tion processes. Each step is guided by and provides answers to one partic- ular research question, which are presented in the following sections.

1.5.1 Step 1: Development of a Descriptive Model

Guiding Question: How can arbitrary virtual collaboration activities of engineering design teams be logically modeled and represented in a generic format? The work begins with asking for a descriptive model that is able to serve as a record for the course of virtual collaboration activities in heterogeneous team processes. To be widely applicable, the model must not be restricted to a predetermined set of concepts for describing activities within specific groupware or project settings. An extensible schema is needed, which can be customized to fit to any kind of observed collaboration process. At the same time, the model needs to preserve previous states of the recorded process to allow for retrospective analyses. In this work, a labeled graph approach to describe collaboration activ- ities as relationships between resources and/or people is proposed. Team Collaboration Networks (TCN) define the semantics, occurrences, and tem- poral properties of the actors, resources, and relationships identified in a collaboration process. An extensible set of ontologies provide the linguis- tic constructs needed to describe the interactions with different groupware applications. By recording the ‘who’, ‘what’, and ‘when’ in a virtual col- laboration process, Team Collaboration Networks establish a computer- processable description of the online activities in a project and provide the basis for an analysis of the communication and coordination behavior of design teams.

1.5.2 Step 2: System Implementation & Customization

Guiding Question: What is an appropriate architectural layout for a virtual collaboration monitoring platform and what functionality does it need to provide? A service-based software system to capture, monitor, and analyze vir- tual collaboration activities (Fig. 1.2) presents a new approach in design research. What services does such a system need to provide to be seam- lessly integrable and applicable in real collaboration environments? Critical components of the system architecture and their relationships need to be defined. The work introduces d.store, a service platform for the computational construction and analysis of Team Collaboration Networks. Applying the resource-oriented architectural style defined by Fielding (2000), a Team 1.6. Results and Contribution 11

Collaboration Network’s state is represented by a set of resources that can be addressed in an universal syntax and manipulated by a set of well-defined operations. The applicability of this platform is demonstrated in this work. The system is implemented and configured to capture dis- tributed, multi-channel collaboration activities from the groupware sys- tems utilized in different teams. It is shown how the services can be used to inspect patterns and other characteristics in the observed collaboration processes.

1.5.3 Step 3: Application in Conceptual Engineering Design

Guiding Question: Can patterns in the monitored collaboration behav- ior of design teams indicate team performance? Hypothesizing that high-performance design teams produce different collaboration patterns than lower-performing ones, the work continues with applying d.store in the analysis of eleven engineering projects during an eight-month period of early stage concept creation and prototyping. The activities scanned from email archives, Wiki pages, and shared document folders are represented as Team Collaboration Networks and provide the basis for a detailed inspection and comparison of collaboration patterns. In particular, the system is used to test whether the occurrence of spe- cific patterns correlates with independent measures of team effectiveness. Teams have been ranked based on different performance criteria: the av- erage satisfaction of team members as determined by a team diagnostic survey (cf. Wageman et al., 2005), judges reviewing the project outcome, and the number of explored design alternatives. Statistical correlations between patterns and team effectiveness are tested by means of linear re- gression analysis (e.g., Backhaus et al., 2008).

1.6 Results and Contribution

Facing the importance of design excellence, global organizations are search- ing for new methods to monitor and evaluate factors that impact the per- formance of their engineering teams. Understanding the necessary require- ments (i.e., how to monitor) and relevant metrics (i.e., what to monitor) is a premise. This work contributes answers to both, the ‘how’ and the ‘what’. It articulates a novel approach to capturing and analyzing ICT-mediated team activities in the field and empirically tests observable collaboration patterns for performance indicators. Team Collaboration Networks pro- vide a descriptive model of the semantic and temporal properties of actors and information resources during project-based collaboration. d.store, an 12 Introduction adaptable monitoring instrument consisting of distributed sensor and anal- ysis components, allows researchers and designers to tap into heterogeneous data sources and to handle the increasing complexity of technology-enabled design spaces. This way, d.store connects information silos that otherwise would be scattered across the virtual collaboration landscape. The instru- ment automates the data collection process, provides real-time evaluation capabilities, and establishes a technological foundation for the exploration, quantification, and comparison of collaboration processes in engineering de- sign. Details of this approach have been reviewed and published (Uflacker and Zeier, 2008a,c; Uflacker, 2007). Findings from a pilot application give first indications that performance conclusions can be drawn from virtual collaboration patterns. Patterns that correlate with independent team performance metrics can be inter- preted as surrogates for ‘outside-in’-driven design and team-internal infor- mation sharing. For example, a positive and significant correlation exists between the self-reported satisfaction of team members and a team’s ten- dency to contact external process participants (e.g., end-users, customers, domain experts), suggesting that a close involvement of team-external pro- cess stakeholders has beneficial effects on a project. The results indicate that high-performance design teams share different collaboration patterns than low-performance teams, endorsing a continued utilization of the in- strument to evaluate relevant performance indicators and new opportuni- ties in the conduction of real-time team diagnostics. The application and results of the analysis have been presented at various international con- ferences and publications (e.g., Uflacker and Zeier, 2010b, 2009; Uflacker et al., 2009). The contribution of this dissertation is also visible in the work of other researchers who have applied d.store to analyze virtual team collaboration. Skogstad (2009) has developed new theory about how designers gain in- sights required to create novel solutions and how reviewers can have both positive and negative effects on this process. Parts of his hypothesis testing are based on data retrieved from the instrument presented in this work.

1.7 Outline of the Thesis

The document is structured into four parts. Part I – ‘Background & Preliminaries’ introduces the research do- main, provides technical and theoretical basics, and elaborates on the needs and gaps addressed by this dissertation. Chapter 2 – ‘A Review of Engineering Design Literature’ – gives an overview of existing work in the area of engineering design research, in particular of ICT systems to support the design process and its scientific exploration. Chapter 1.7. Outline of the Thesis 13

3 – ‘Technological Foundations’ – introduces standards and technolo- gies in the background of resource-oriented architectures and knowl- edge representation, which form a foundation for the development of the service-based monitoring instrument. Part II – ‘Models for Team Collaboration Capture’ introduces da- ta structures that are required to describe virtual collaboration activi- ties. Chapter 4 – ‘Team Collaboration Networks’ – defines a structure to describe the actors, resources, and relationships in a single team or project. Chapter 5 – ‘An Ontology System for Team Collaboration Networks’ – describes how multiple instances of Team Collaboration Networks can be integrated to facilitate the concurrent monitoring and comparison of different collaboration processes. Part III – ‘System Implementation’ continues with the development of a service-based tool landscape for monitoring virtual collaboration activities in teams. Based on the requirements and data models de- fined before, a system architecture and implementation is presented in Chapter 6 – ‘d.store: A Resource-oriented Team Collaboration Net- work System’. The configurability of this system and its preparation for a specific groupware landscape is demonstrated in Chapter 7 – ‘System Configuration’. Part IV – ‘Evaluation & Discussion’ deploys the monitoring plat- form in real project environments and critically appraises the findings and insights obtained from the application. Chapter 8 – ‘A Pilot Ap- plication in Engineering Design’ – presents the project setup and the analysis of the collaboration behavior of eleven distributed teams over a period of eight months using d.store. Chapter 9 – ‘Conclusion’ – sum- marizes the work and discusses the contribution against the backdrop of collaboration monitoring and future research in engineering design.

Part I

Background & Preliminaries

2 A Review of Engineering Design Literature

The study of collaboration processes in engineering design is not a new research area and has produced a substantial body of literature. In recent years, social and cognitive viewpoints have increasingly influenced research directions in engineering design and opened the field to diverse disciplines and communities. A common goal of design research is to develop a better understanding of what it is that designers do when they do design (Ju et al., 2007). Design research has begun to identify critical process characteristics and factors that affect the quality of the design outcome, influencing the development of new tools and techniques to support designers in the exe- cution and researchers in the observation of this complex activity. A review of influential literature in this multifaceted field is therefore expedient and necessary, before beginning with the construction of an aligned monitoring instrument. The review begins with a general overview of theories in conceptual design, and then narrows the focus to concentrate on software systems to support the design process and its computational observation. The chapter introduces technology and related work to construct a common terminology for the remainder of this thesis, and forms a foundation for the design of a monitoring system. The review is structured into four interrelated fields of design research in order to: • give an introduction into conceptual engineering design processes, user- centered design, and design thinking, and to explain how and why these concepts are understood as a driver for innovation (Sect. 2.1); • present theories and models for information handling and coordination in design teams, and to explain the role of ICT and virtual collaboration in engineering design processes (Sect. 2.2); • give an overview of past and present software solutions used for computer- supported cooperative work in conceptual design (Sect. 2.3); and to • present related work and existing software instruments to support the capture and analysis of team-based engineering design activities (Sect. 2.4). The chapter concludes with a set of system requirements for a virtual collaboration monitoring instrument that are derived from the reviewed 18 A Review of Engineering Design Literature literature and which are used to differentiate this work from previous re- search.

2.1 Conceptual Engineering Design

Conceptual design refers to the activities that occur at the first stages of a product life cycle. It is an iterative and incremental process and is often considered the most critical phase of product design (Wang et al., 2002). Vosinakis et al. (2007) point out that “decisions made at this phase determine the rest of product development” and that “any unintended mis- takes, misconceptions and omissions have significant negative impact to the project”. To reflect and theorize about the design process, a diversity of pre- scriptive engineering process models have been described in the literature, generally varying in terminology, granularity, and industry focus. Prescrip- tive models suggest a systematic and algorithmic procedure that should be carried out, structuring the design process into a set of compartments with well defined boundaries (Baya, 1996). This is very contrary to how most designers work and what is often observed in empirical, descriptive stud- ies (Finger and Dixon, 1989). Nevertheless, a few shall be mentioned here briefly to give an impression of what is considered conceptual engineering design. In general, engineering processes are triggered by a market need or a new idea and start off with the conceptualization of a solution, i.e., the mental creation of a new product. However, there is dissent regarding the scope of this design phase, i.e., where conceptual design begins and where it ends. For example, Ulrich and Seering (1987) define conceptual design very broadly as the transformation of functional or behavioral requirements into structural embodiments or descriptions. Other, more detailed and prescrip- tive models of the design process (e.g., Pahl et al., 1996, Fig. 2.1) isolate the conceptual design phase from planning and task clarification and the embodiment design. Yet other, more design- and innovation-centric view- points explicitly count the discovery of opportunities and concepts as part of this creative process (e.g., Weiss, 2002, Sect. 2.1.3). This work adopts a comprehensive interpretation in that it understands conceptual design as outlined in the following definition:

Definition (2.1): Conceptual design in engineering embraces the it- erative development and optimization of an innovative principle solution, comprising diverse activities such as need finding, the formulation of prod- uct proposals, task clarification, elaboration of requirements, the search for working principles, and the evaluation against technical, economical, and human criteria. 2.1. Conceptual Engineering Design 19

!"#$ %"&$'()*+,-."/0)*'+,/,-0

12"/*"/3*+2"&450*(6'*("#$7 8/"20#'*(6'*-"&$'(*"/3*(6'*+,-."/0*#4(9"(4,/ :4/3*"/3*#'2'+(*.&,39+(*43'"# :,&-92"('*"*.&,39+(*.&,.,#"2 ;2"&450*(6'*("#$ <2"=,&"('*"*&'>94&'-'/(#*24#( 12"//4/B*"/3* +2"&4504/B*(6'*("#$

?'>94&'-'/(#*24#( @A'#4B/*#.'+4!+"(4,/C

A'D'2,.*(6'*.&4/+4.2'*#,29(4,/7 E3'/(450*'##'/(4"2*.&,=2'-# <#("=24#6*59/+(4,/*#(&9+(9&'# F'"&+6*5,&*G,&$4/B*.&4/+4.2'#*"/3*G,&$4/B*#(&9+(9&'# ;,-=4/'*"/3*!&-*9.*4/(,*+,/+'.(*D"&4"/(#

;,/+'.( J.(4-4#"(4,/*,5*(6'*.&4/+4.2' @1&4/+4.2'*F,29(4,/C

A'D'2,.*(6'*+,/#(&9+(4,/*#(&9+(9&'7 1&'24-4/"&0*5,&-*3'#4B/)*-"('&4"2*#'2'+(4,/*"/3*+"2+92"(4,/ F'2'+(*='#(*.&'24-4/"&0*2"0,9(# ?'!/'*"/3*4-.&,D'*2"0,9(#

1&'24-4/"&0*H"0,9( E/5,&-"(4,/7*83".(*(6'*&'>94&'-'/(#*24#(

A'!/'*(6'*+,/#(&9+(4,/*#(&9+(9&'7 <24-4/"('*G'"$*#.,(# <-=,34-'/(*3'#4B/ ;6'+$*5,&*'&&,&#)*34#(9&=4/B*4/"9'/+'#*"/3*-4/4-9-*+,#(#

1&'."&'*(6'*.&'24-4/"&0*."&(#*24#(*"/3*.&,39+(4,/*"/3*"##'-=20* J.(4-4#"(4,/*,5*(6'*2"0,9()*5,&-#*"/3*-"('&4"2# 3,+9-'/(#

A'!/4(4D'*H"0,9( J.(4-4#"(4,/*,5*(6'*.&,39+(4,/

1&'."&'*.&,39+(4,/*"/3*,.'&"(4/B*3,+9-'/(#7 <2"=,&"('*3'("42*3&"G4/B#*"/3*."&(#*24#(# ;,-.2'('*.&,39+(4,/)*"##'-=20)*(&"/#.,&(*"/3*,.'&"(4/B*4/#(&9+(4,/# ;6'+$*"22*3,+9-'/(# A'("42*3'#4B/

1&,39+(*3,+9-'/("(4,/

F,29(4,/

Figure 2.1. A standard model in German industry for designing new products (Pahl et al., 1996). Conceptual design is considered the process of developing a principle solution from a list of requirements. 20 A Review of Engineering Design Literature

It should be emphasized that the extent of conceptual design activities varies from project to project. Situations exist in which a solution is fully known from the outset and direct progress to the embodiment and the detailed engineering phase is reasonable. The focus of this work, however, is on open-ended, new product development projects, where the basic so- lution path needs to be laid down through the collaborative elaboration of an innovative principle. In such projects, conceptual design presents a necessary and essential part of the product life cycle. Gero (1998) points out that “in conceptual designing not all that is needed to be known to complete a design is known at the outset, i.e. part of the process of de- signing involves finding/determining what is needed”. This unclarity and uncertainty in the beginning of a project has motivated the scientific explo- ration of early-stage design activities, often called the “Fuzzy Front End” of innovation.

2.1.1 The Fuzzy Front End of Innovation

The term ‘Fuzzy Front End’ (FFE) has been coined by Reinertsen (1999), who describes it as the stage “between when work on a new idea could start and when it actually starts”. Kim and Wilemon (1999) adopt this notion and speak of the “period between when an opportunity is first considered and when it is judged ready for development”. Koen et al. (2001) define FFE as “activities that take place prior to the formal, well-structured New Product and Process Development”. The same authors also prefer to use the term ‘Front End of Innovation’ (FEI) as opposed to ‘Fuzzy Front End’. They argue that the use of the term FFE incorrectly suggests that unknow- able and uncontrollable factors dominate the front end, implying that this initial part of the innovation process can never be managed (ibid.). For the remainder of this work, both terms are treated as equivalent. The fuzzy front end is the beginning of an innovation process that is structured into three distinct phases: FFE, New Product and Process De- velopment (NPPD), and commercialization (Koen et al., 2002). Several studies indicate that those organizations that excel in managing the fuzzy front end are more likely to succeed in the following phases and to win the innovation race (e.g., Cooper, 1998; Cooper and Kleinschmidt, 2000). Fur- thermore, there is broad consensus that FFE is usually full of opportunities for improvement and that it presents one of the greatest opportunities for improving the overall innovation process (Reinertsen, 1999; Koen et al., 2001). Studies of project managers in fast time-to-market industries also show that the initial phase of a complex project has a disproportionately large impact on the end results (Gary, 2003). However, Kim and Wilemon (2002) note that “many firms, unfortu- nately, acknowledge serious weaknesses in the predevelopment steps of their 2.1. Conceptual Engineering Design 21 innovation process. In fact, data on resources spent [...] show that limited time and money are devoted to these early, critical steps”. A reason for this is that FFE, as opposed to NPPD, is typically poorly understood. Many of the practices that aid in the NPPD do not apply to the FFE. They fall short because the nature of work, commercialization date, funding level, revenue expectations, activities and measures of progress are fundamentally different (Table 2.1).

Table 2.1. Differences between the Front End of Innovation (FEI) and New Product & Process Development (Koen et al., 2001).

Front End of New Product & Process Innovation (FEI) Development (NPPD) Nature of Work Experimental, often chaotic. Structured, disciplined and Difficult to plan. goal-oriented with a project plan. Commercialization Unpredictable Definable Date Funding Variable. In the beginning Budgeted phases, many projects may be “bootlegged”, while others will need funding to proceed. Revenue Often uncertain. Sometimes Believable and with increas- Expectations done with a great deal of ing certainty, analysis and speculation. documentation as the prod- uct release date gets closer. Activity Both individual and team in Multi-functional product areas to minimize risk and and/or process development optimize potential. team.

The differences between FEI and NPPD suggest that “a distinctly dif- ferent approach, skill-set and mindset are required to succeed in each phase” (Skogstad, 2009). Great value is therefore placed by design researchers on better understanding and optimizing the front end of innovation, since a product is more likely to be successfully developed and marketed when the FFE activities are understood and carefully managed (Kim and Wilemon, 2002). Deeper insights into the critical methods by which designers negoti- ate the creative process are required in order to improve its management. Hence, efficient instruments for the observation and assessment of FFE activities are needed. Several methodologies and mindsets to guide designers in the organiza- tion and execution of the early design phases have been developed. Two examples shall be briefly presented here: user-centered design, a participa- tory approach often associated with the development of interactive software systems, and design thinking, a neoteric term that comprises creativity- 22 A Review of Engineering Design Literature enhancing methods and best practices for the front end of engineering projects in general.

2.1.2 User-Centered Design

User-centered design (UCD) describes a design approach that acknowl- edges the following basic principles: the active involvement of users for a clear understanding of user and task requirements, iterative design and evaluation, and a multi-disciplinary approach (Vredenburg et al., 2002b). Thus, UCD emphasizes what is often neglected in the early stages of tradi- tional product development processes: empathy for the targeted user group and awareness of the human needs. UCD aims for a shift from traditional ‘inside-out’ design that is driven by technology and engineers, to a user- driven ‘outside-in’ approach, which is grounded on information about the people who will use the product. Table 2.2 summarizes the key differences to a traditional, technology-driven design approach.

Table 2.2. Contrasting the traditional approach to design with UCD (Vredenburg et al., 2002a).

Traditional Approach UCD Technology driven User driven Component focus Solution Focus Limited multidisciplinary cooperation Multidisciplinary team work Focus on internals architecture Focus on externals design Some competitive focus Focus on competition Development prior to user validation Develop only user validated designs Product defect view of quality User view of quality Limited focus on user measurement Prime focus on user measurement Focus on current customers Focus on current and future customers

The underlying motivation for UCD is to maximize the usability of a product, i.e., “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” (ISO/IEC, 1998). Nielsen (1994) argues that usability results from the observance of a number of quality measures that are addressed by UCD: learnability, efficiency of use, memorability, error rate, and subjective satisfaction. In this argumentation, UCD becomes a dominant driver for the usefulness and practical acceptability of a product or system (Fig. 2.2). Over the last two decades, UCD has received broad attention in and the design of interactive systems. User experience has be- come a dominant differentiator in markets where customers “can take their business elsewhere with just one mouse click” (Mao et al., 2005). Conse- quently, UCD has been advocated by a growing research community, result- ing in a number of prescriptive process models and tool sets to guide in the 2.1. Conceptual Engineering Design 23

Social Utility acceptability

Usefulness System acceptability Easy to learn Usability Efficient to use Cost Practical Easy to remember accepta- Compatibility bility Few errors Reliability Subjectively pleasing Etc.

Figure 2.2. A hierarchy of requirements for system acceptability (Nielsen, 1994). The purpose of user-centered design is to achieve usability, a prerequisite for usefulness and practical system acceptability. implementation of UCD principles. The basic concepts have their origins in the seminal work of Norman and Draper (1986) on user-centered system design. To endorse the proliferation of UCD best practices, an abstraction of these principles has been defined in ISO 13407: ‘Human-Centred De- sign Processes for Interactive Systems’ (Fig. 2.3). Several instantiations of these guidelines have been proposed in the literature: Usability Engineer- ing (Nielsen, 1994), Contextual Design (Beyer and Holtzblatt, 1998), the Usability Engineering Lifecycle (Mayhew, 1999), and others (e.g., Jokela, 2002; Vredenburg et al., 2002a). However, usability is often hindered from playing a strategic role in organizations due to several problems in its im- plementation (Venturi and Troost, 2004). Surveys have shown that obsta- cles arise for example from resource constraints, ineffective communication

Identify need for user-centered design Understand and specify the context of use

System satisfies Specify user and Evaluate designs specified organizational against requirements requirements requirements

Produce design solutions

Figure 2.3. ISO 13407 - Human-Centred Design Processes for Interactive Systems (ISO/IEC, 1999), an abstraction of basic principles in UCD processes. 24 A Review of Engineering Design Literature

(Rosenbaum et al., 2000), and a lack of effective usability metrics (Vreden- burg et al., 2002b).

2.1.3 Design Thinking

Having received greater momentum in recent years, design thinking has become an expression for a design approach that systematically draws on established principles and methods in the conceptualization of a product. Design thinking directs a team’s attention to the full scope of design: under- standing the design problem in a holistic context, collaborative generation of many ideas, and prototyping and testing increasingly sophisticated de- sign alternatives. A similar view on design is given in a definition from Pahl et al. (1996), in which design is described as an engineering activity that • affects almost all areas of human life; • uses the laws and insights of science; • builds upon special expertise; and • provides the prerequisites for the physical realization of solution ideas. Accordingly, Sheppard (2003) describes the work of engineers as to “scope, generate, evaluate, and realize ideas”. In contrast to design in the context of industrial or artistic design, where it refers to aesthetic form giv- ing, engineering design refers to the iterative and collaborative process of creating new concepts (Skogstad, 2009). Design thinking, as understood in this work, is equivalent to design as seen in the context of engineering. It is a systematic, intelligent process in which designers generate, evaluate, and specify concepts for devices, systems, or processes whose form and func- tion achieve clients objectives or users needs while satisfying a specified set of constraints (Dym et al., 2006). It involves all aspects of organizational problem solving and combines inspiration, ideation, and implementation in an iterative process of concept discovery and concept design (Brown, 2008). Three factors are driving design thinking (Fig. 2.4): desirability (the human factor), i.e., the understanding of how people interpret and inter- act with the things they encounter in the world, feasibility (the technical factor), i.e., the understanding how new technologies can be harnessed to make a nascent product or service concept come to life in a way that is meaningful for users, and viability (the business factor), i.e., understanding whether embracing a new technology or supporting a particular user need is truly aligned with the organizations strategic objectives and competitive positioning (Weiss, 2002). In contrast to a traditional ‘technology push’ methodology, design think- ing embraces factors such as utility, costs, and social acceptability. With this holistic approach, it is consequently moving more and more to the 2.1. Conceptual Engineering Design 25

Figure 3.

Figure 2.4. The “sweet spot” of good design: innovation is stimulated by leveraging expertise in each of the interrelated areas of human factors, technical factors, and business factors (Weiss, 2002). Design thinking is the iterated discovery, design, and refinement of concepts against the backdrop of desirability, feasibility, and viability. center of organizational strategies. Brown (2008) states that “rather than asking designers to make an already developed idea more attractive to con- sumers, companies are [now] asking them to create ideas that better meet consumers needs and desires”. He further points out that “the former role is tactical, and results in limited value creation; the latter is strategic, and leads to dramatic new forms of value”. Design thinking has become a crucial business asset (Jacoby and Rodriguez, 2008) and provides best practices for innovation by “making the clients business objectives relevant to the end user, and enabling the users needs to influence the development of the clients business objectives” (Weiss, 2002). In this work, design thinking is further characterized by the following three basic design principles: Close involvement of end-users and customers. Design thinking is radically “outside-in”. It acknowledges that user requirements and hu- man factors are often unclear and must be well understood from the beginning. Critical needs must be identified and continuously revised through observations of the use context, user tasks, and customer de- mands. Interdisciplinary knowledge sharing. Design thinking emphasizes the formation of cross-functional teams and different disciplines in the prob- lem solving process. Intellectual diversity aims to facilitate the collab- orative exploration of “outside-the-box” opportunities through knowl- edge exchange across broad areas of expertise. As David Kelley puts it, “successful design is done by teams. Creative leaps might be taken by individuals, but design thrives on the different points of view found in teams. You want a multidisciplinary team [...]. You want different 26 A Review of Engineering Design Literature

brains working on the problem. Otherwise, the person with the power, or the person who speaks the loudest, sets the direction for the whole design” (Kelley and Hartfield, 1996). A culture of prototyping. Prototyping is an effective mechanism to evaluate (intermediary) design solutions against technical, economi- cal, and human requirements. Following the maxim of “fail early, fail cheap”, prototypes in the early stages of concept creation present a low-cost technique to communicate and test design ideas and to ensure that requirements are well-understood. A more theoretical approach to characterize design thinking has been presented by Eris (2002), in which design is formulated as an alternation of divergent and convergent questioning. Eris distinguishes between gener- ative design questions, where the questioner attempts to diverge from facts to the possibilities that can be created from them, and deep reasoning ques- tions, which attempt to converge and reveal facts. Design thinking becomes a process of inquiries, which is only effective if it includes both “a convergent component of building up to asking deep reasoning questions by systemat- ically asking lower-level, convergent questions, and a divergent component in which generative design questions are asked to create the concepts on which the convergent component can act” (Dym et al., 2006). From this point of view, the above mentioned principles of user-centeredness, knowl- edge sharing, and prototyping can be understood as drivers for diverging and converging in the solution space.

2.1.4 Conclusions Drawn From Review

The early stages of engineering projects are considered the most critical phase of a product lifecycle. Decisions made at this phase determine the rest of the engineering process and any misconceptions and omissions have significant impact to the project outcome. Design thinking describes an approach to facilitate the conceptualization of design solutions that meet human, technological, and business criteria, and is considered a motor for the “front end of innovation”. This front end must be better understood in order to support the design process and to systematically strengthen the innovative potential of organizations.

2.2 Teamwork, Information & Virtual Collaboration

Conceptual design is essentially a collaborative process and involves a multi-disciplinary team of engineers, domain experts, clients, and others (Vosinakis et al., 2007). Because design outcome derives from teamwork, 2.2. Teamwork, Information & Virtual Collaboration 27 any study of design produces more relevant results if it focuses on de- sign teams (Milne, 2005). This is supported by a study of Frankenberger et al. (1997), who report cases in which designers spent 85% of their time working alone, but noted that 88% of the critical events took place dur- ing cooperative interactions. Hence, studying how members of a design team work together is a necessary step towards a better understanding of the design process, because “both the idea and the project team are pri- mary determinants of FFE performance and, in turn, influence and shape the development phase” (Kim and Wilemon, 2002). This section gives an overview of concepts, models, and theories that describe the work of design teams, in particular with respect to information handling, coordination, and computer-supported tasks.

2.2.1 Design Teams: A Working Definition

Before reflecting on the work of design teams, it is worthwhile to look at the predicates of individual designers and teams. A designer is a person who is committed to the identification of an engineering problem and the creative elaboration of an innovative solution to this problem. Designers go beyond the simple coordination of individualistic work to engage in joint activ- ity aimed at the co-construction of collective work products (Geisler and Rogers, 2000). Hence, “designers contribute to finding solutions and devel- oping products in a very specific way” (Pahl et al., 1996). Their individual role and responsibility in a project is critical, since “their ideas, knowledge and skills determine the technical, economic and ecological properties of the product in a decisive way” (ibid.). Now, a design team can be delineated by the following core character- istics:

Definition (2.2): A design team is a collection of designers who are drawn from different disciplines and functional units, are interdependent in their tasks, and share responsibility for the design outcome. It is time- limited and set up to produce an improved or radically new concept for a product, software, or service being marketed.

Other typical properties of a design team include its self-managing na- ture, i.e., design teams are responsible for defining the conceptual frame- work for the project and identifying objectives and methods for accom- plishing their tasks. Their work is non-repetitive and involves considerable application of knowledge, judgment, and expertise (Cohen and Bailey, 1997; Mankin et al., 1996). Due to the cross-functional, multidisciplinary setup, members of a design team are often geographically distributed. 28 A Review of Engineering Design Literature

2.2.2 Models of Design Work Several prescriptive models, i.e., models that define how a design process ought to proceed, have been suggested (Sect. 2.1). It is an often implicit (and untested) assumption of this research that if designers follow the pre- scribed process, better designs will result. Prescriptive models have limited value if they are not grounded on a reasonable body of empirical research (Tebay et al., 1984; Finger and Dixon, 1989). Unlike their prescriptive counterparts, descriptive models aim to explain empirically how design processes really proceed. Based on formal observations and in-depth anal- ysis of design activity, descriptive models define the design process from a behavioral standpoint, forming the basis for the development of new the- ories and hypotheses and for improving design practice. Work in this area acknowledges that design collaboration basically evolves through a process of social argumentation (Geisler and Rogers, 2000) and that the observa- tion of designers as they engage in collaborative activities can yield a much more nuanced comprehension of how they work than any prescriptive model of a design process could do. Various approaches for observing and analyzing group design activity have been presented (e.g., Bessant, 1979; Tang and Leifer, 1991). Central in many of those group activity studies is the investigation of how designers communicate and interact with information. The following sections give a brief overview.

Information Handling in Design A majority of design activities are associated with information handling tasks, i.e., tasks undertaken by designers which involve design information. Design information refers to all data that is generated, used, referred to, or consulted during the design process. Information is handled by means of computational and non-computational tools and exists in a variety of media and artifacts such as text, graphic, audio, video, technical drawings, and others. Designers generate and share information artifacts to communicate knowledge, insights, and proposals about the design state. Each artifact constitutes an explicit representation of facts, ideas, or emotions that are related to the design process. Baya (1996) has developed an ‘Information Handling Framework’ to im- prove the understanding and support of information handling needs in de- sign. The framework provides a multi-layered classification scheme for ‘in- formation fragments’, including information activity. The framework allows to characterize the use of design information at a fine-grained level. How- ever, the work focuses exclusively on individual engineers working alone on design problems, so it can not identify activities that would be charac- teristic of team activity. Milne (2005) extended the framework to account 2.2. Teamwork, Information & Virtual Collaboration 29 for activities that would be present in a group-user scenario. The ‘Team Handling Framework’ provides a coding scheme that can be applied to analyze activities of design teams during the early phases of conceptual design. A quantitative analysis of the design activity, based on a verbal protocol analysis of two design teams, was conducted, but engaged in a co-located conceptual design activity and did not consider computer-based or distributed collaboration scenarios. Wiegers and Knoop (1998) have applied verbal protocol analysis com- bined with software techniques to capture and visualize information han- dling activities and to locate blockades and bottlenecks in conceptual design processes. The diagrams visualize process activities and information han- dling, depicted along a time axis, allowing to look out for points of interests in the design process by identifying fundamental information actions within the timeline of a design process. Unfortunately, their model does not reflect group interactions and focuses on individual design subjects only.

Information Spaces & Common Ground

Designers continuously mediate between private and public information spaces by traversing six stages of collaboration and information integra- tion Geisler et al. (1999). Members of a design team come together to share the results of their private work, to propose, discuss, and ratify next steps in the design process, to update their current understanding of the de- sign work, and to disseminate the results of the collaborative conversation back to the private space. Through this repeated positioning of individ- ual viewpoints within a community, process participants construct what is often denoted as ‘common ground’ (Clark, 1996) or ‘common information spaces’ (Bannon and Bødker, 1997), i.e., the source of conversants’ abil- ity to coordinate and the set of knowledge, beliefs, and suppositions that they believe they share. Clark (1996) stresses that designers solve coordina- tion problems in undertaking a joint activity by using conversational turns to display their understanding of the current state of activity; an under- standing that other process participants may, in subsequent turns, either ratify or correct. Geisler and Rogers (2000) adds that “through sequences of such conversational pairs, participants accumulate the common ground necessary to support common goals”. It is obvious that the generation of information spaces and of common ground is immediately affected by the way information is handled in a design team. Information spaces are a critical source of context and form the basis for decision making. Thus, it is critical in the empirical observa- tion of a design process to keep a record of the activities and individual contributions to a teams’ information space. 30 A Review of Engineering Design Literature

Collaborating in a Situated Context

While design information is often represented explicitly in shared artifacts, a considerable portion of knowledge remains tacit and implicit in the de- signers’ heads. These internal conceptions are influenced by previous ex- periences and determine how the design context is perceived. From this theoretical point of view, designing becomes a function of the internal and external representations of a designer, which in turn affects both of these worlds through the actions the designer takes. Thus, the context of design changes continuously through the recursive activities performed in a team. This concept of situatedness has been described by Sch¨on(1992) as a reflec- tive conversation with the materials of a design situation, which he sketches as a process of “seeing–moving–seeing” (see also Sect. 1.1, Clancey, 1997; Winograd, 1996, pp. 171). Fischer et al. (1992) were among the first to describe the design process as a structured set of issues and responses to those issues. In their work, the authors link collaboration with individual work tasks and focus on the argumentative aspect of design. Following Sch¨on’sunderstanding of situ- atedness, they introduce the concept of construction and argumentation, where construction is the process of shaping the solution (e.g, manipulating form) and argumentation is the process of reasoning about the problem and its solution. More recently, Gero and Kannengiesser (2004) presented a gen- eral model for situatedness that makes allowance for constant modifications of the design state, based on present knowledge and expectations of the de- signers. Design representations are distinguished by three semantic classes: function (what the object is for), behavior (what the object does), and structure (how the object is built) (Gero, 1990). The relationship between design representations and designers are expressed as interactions between the mental and physical environments of the design process. An extension to the situated function–behavior–structure framework has been proposed to integrate the notion of user needs into the model and to explicitly re- flect core elements of user-centered software design processes (Uflacker and Zeier, 2008b). Seeing design as a reflective, situated conversation in private and public information spaces reveals the importance of communicating and creating common ground in engineering teams. Failure to do so can negatively im- pact the design process and compromise its outcome (Layzell et al., 2000). The creation of common ground is a crucial and challenging task for design teams, but is complicated when team members are distributed in differ- ent locations or otherwise rely on technology-mediated interactions (Perry et al., 1999). The special requirements of virtually collaborating teams need thoughtful consideration and are subject of the following section. 2.2. Teamwork, Information & Virtual Collaboration 31

2.2.3 Virtual Collaboration in Design

ICT is the primary enabler for organizations to quickly adapt to new and ever-changing requirements in their competitive landscapes. Hence, computer-supported and distributed collaboration has become a common phenomenon in engineering design (Jarvenpaa and Ives, 1994). While some authors claim that distance may no longer be a limiting factor in a group’s ability to communicate and is quickly becoming irrelevant (e.g., Cairncross, 2001), many researchers found that the nature of technology-enabled in- teractions differ in a number of important ways from face-to-face inter- actions, concluding that distance still matters (Olson and Olson, 2000). Group members who use computer-based tools to coordinate their group effort are more likely to face obstacles in the collaboration process (Driskell et al., 2003). We will explore some of the reasons in more detail.

Virtual Teams

The effects of global collaboration in design are often discussed in research. One notion that is frequently used in this context is that of virtual teams. Driskell et al. (2003) gives a short but precise definition for a virtual team:

Definition (2.3): A virtual team is a group of collaborating individuals whose members are mediated by time, distance, or technology.

Because of the distributed nature of their work unit, members of a vir- tual team are brought together by information and communication tech- nologies to accomplish one or more common tasks. Hence, the distinctive feature of a virtual team is that people rely frequently – and sometimes ex- clusively – on ICT systems to communicate and work together while they are often dispersed across space, time, and/or organizational boundaries (Powell et al., 2004; DeSanctis and Poole, 1994; Lurey and Raisinghani, 2001). As a result of these characteristics, virtual teams are more likely to suffer from problems of information distribution (Cramton, 1997), face dif- ficulties in creating and maintaining good working relationships (Johnson, 1999), and are more likely to have problems in developing mutual trust (Jarvenpaa and Leidner, 1999). Virtual teams collaborate by means of groupware systems (Sect. 2.3), which establish different kinds of communication channels to support team members in their information handling tasks. To study how these systems are utilized by virtual teams in distributed and co-located team settings presents a relatively new subject in design research. Olson and Teasley (1996) criticize empirical observations that target only one specific collab- oration tool, such as email, video conferencing, or workflow systems, con- 32 A Review of Engineering Design Literature cluding that it is necessary to study teams with a full set of collaborative tools to comprise all modes of work.

Digital Information Spaces

While the information spaces of traditional co-located engineering teams are formed by physical artifacts and design representations, virtual teams create information spaces that increasingly consist of digital communica- tion and information artifacts. With the proliferation of virtual collabora- tion, the problem of achieving “common ground” has received new atten- tion and influenced the development of collaboration tools to support the organization of digital information spaces. Schmidt and Bannon (1992) ar- gue that the construction and management of common information spaces has been somewhat neglected in tool implementations, despite its critical importance for the accomplishment of many distributed work activities. Early work investigated how people in a distributed setting can work co- operatively in a digital information space by maintaining a central archive of information with some level of shared agreement. Such archives are gen- erally constituted and maintained by different actors, who employ different conceptualizations and multiple decision making strategies, supported by technology. However, digital information spaces do not simply consist of objects and events, but also crucially involve the joint interpretation of these objects and events by the actors involved: “Cooperative work is not facilitated simply by the provision of a shared database, but requires the ac- tive construction by the participants of a common information space where the meanings of the shared objects are debated and resolved” (ibid.). Ac- cordingly, Fischer et al. (1992) suggest that virtual spaces need to provide for the integration of private and public work. The properties of “virtual” objects shared in a digital information space have been investigated by Geisler and Rogers (2000), who identified three characteristics shared by such artifacts. They are mutable, meaning their functions and features change during the development process; they are translucent, i.e., the knowledge of their functions and features is unevenly distributed on the project team; and they are considered under construc- tion, i.e., taken by participants to be modifiable rather than done. Considering the diversity of prevailing collaboration tools and platforms, it becomes apparent that the problem of organizing and sharing common information spaces is still relevant in modern collaboration landscapes. Dig- ital information spaces are clustered and distributed across different team members, formats, and communication channels. Communicating and co- ordinating along these heterogeneous sources of information is therefore a critical component in the work of virtual teams that must be better under- stood. 2.2. Teamwork, Information & Virtual Collaboration 33

Communication & Coordination Challenges

Communication and coordination defects in virtual teams are one of the major issues to impact team performance (Powell et al., 2004). Commu- nication comprises meetings, email conversations, file exchanges, etc., that teams use to share information, negotiate their goals, and make decisions. Coordination is mediated through explicit messages sent at definite times to specified recipients, or by creating shared information spaces contain- ing versions of the design artifact, argumentation structures, and design rationale (Fischer et al., 1992). The challenges to effective communication and coordination are manifold. They include the failure to retain contex- tual information, unevenly distributed information, time delays in sending feedback, disagreement in the salience of information, differences in speed of access to information, and difficulty interpreting the participation of re- mote team members (Johansson et al., 1999; Lurey and Raisinghani, 2001; Cramton, 2001; McGrath and Hollingshead, 1994). Compared with their face-to-face counterparts, computer-mediated teams viewed their discus- sions as more confusing and less satisfying, spent more time devising deci- sions, and felt less content with their outcomes (Thompson and Coovert, 2003). Furthermore, dispersed members often assume that co-located team members are talking and sharing information that is not communicated to them (Sarker and Sahay, 2002). The lack of social context and mutual knowledge in virtual teams is a fundamental, overarching problem created by text-based and other online communication channels (Cramton, 2001; Steinfield et al., 2002). Nonver- bal communication, an important component of team communication, is usually reduced in virtual teams because electronic media is intrinsically leaner than face-to-face communication and conveys a limited set of com- munication cues (Sproull and Kiesler, 1986, 1992). Thus, teams operating in a virtual environment face greater obstacles to information exchange than their traditional counterparts, especially when the virtual team is physically distributed (Hightower et al., 1998; McDonough et al., 2001). Empirical studies of how communication and coordination is quantita- tively associated with team performance are relatively rare. Fussell et al. (1998) found that how much teams communicated, what they communi- cated about, and the technologies they used to communicate predicted co- ordination, which in turn predicted team success. Powell et al. (2004) men- tions similar studies, which suggest that the frequency and predictability of communication and the extent to which feedback is provided on a regu- lar basis improves communication effectiveness, leading to higher trust and performance (e.g., Jarvenpaa and Leidner, 1999; Maznevski and Chudoba, 2000). Conversely, unpredictable communication patterns have been found to undermine the coordination and success of virtual teams (Johansson 34 A Review of Engineering Design Literature et al., 1999; Cramton, 2001). With respect to the extent of communica- tion, virtual teams have been found to communicate more frequently than traditional teams (Eveland and Bikson, 1988; Galegher and Kraut, 1994).

2.2.4 Conclusions Drawn From Review

At the core of any design process is communication. Monitoring the work of design teams means to observe argumentative and multidisciplinary in- teractions that are taking place in private and public information spaces. Basis and result of these interactions is information in form of resources that encode facts, ideas, or emotions relevant to the process participants. The virtualization of collaboration processes through computer-based tools implicates that the work of design teams is being reflected in the way how digital information resources are created, shared, and involved in the com- munication process. This motivates a computational monitoring instru- ment to simplify and to extend the scope of empirical design studies. The automated creation of a descriptive representation of team activities, dis- tributed design work, and temporal aspects of online information handling creates new opportunities for researching performance-relevant collabora- tion patterns on a larger scale. Distributed team structures and virtual collaboration environments call for a scalable and flexible approach to ob- serve design interactions on multiple communication channels.

2.3 CSCW and Groupware in Conceptual Design

This section highlights existing technology and tools to support virtual design teams in the coordination and communication of their efforts. It outlines the evolution of design support systems and collaboration plat- forms up to the point of Internet- and Web-based technologies. The review shall give an overall picture of computer support systems in conceptual design and work out the requirements for a generic monitoring instrument to observe the utilization of these systems in engineering teams.

2.3.1 Basics of CSCW in Design

The notion of CSCW has been coined by Greif (1988) and others in the 1980s to describe computer-assisted coordinated activity that is carried out by groups of collaborating individuals (Baecker et al., 1995, p. 741). Ac- cording to Jabi (2003), the basic principles of CSCW can be traced back to the work of Engelbart (1962), who envisions that computers could help a design team to solve problems and make decisions. Engelbart states that “there proves to be a really phenomenal boost in group effectiveness over any 2.3. CSCW and Groupware in Conceptual Design 35 previous form of cooperation” and argues: “The whole team can join forces at a moment’s notice to ‘pull together’ on some stubborn little problem, or to make group decision.” Shortly after, Coons (1963) outlines require- ments for a computer-aided design system: “The Computer-Aided Design System should be capable of carrying on conversations with, and perform- ing computations for several designers at several consoles substantially all at once. In this way each designer can be immediately aware of what the other designers are doing, and thus avoid one of the truly severe problems of intercommunication that designers face today.” Obviously, achieving com- mon ground and mutual knowledge has been early identified as a subject of critical importance in CSCW research. The pioneering visions of Engelbart and Coons set the research agenda tone for the next forty years (Jabi, 2003). However, their oversimplified as- sumptions about computer-supported work could not withstand the chal- lenges that reality brings: “Engelbart believes that merging different so- lutions is an easy task and resolving conflicts can occur naturally. Sim- ilarly, Coons falsely believes that synchronicity and social awareness are the only needed features in addressing the problems that designers face” (ibid.). It took more than two decades before CSCW was understood as a multi-disciplinary research discipline, which needs to combine“the under- standing of the way people work in groups with the enabling technologies of computer networking and associated hardware, software, services and tech- niques” (Wilson, 1991). Around that time, the term groupware has been coined. The following definition for groupware has been adopted from Ellis et al. (1991):

Definition (2.4): Groupware denotes computer-based systems that sup- port groups of people engaged in a common task or goal and that provide an interface to a shared environment.

Early research on groupware systems focused on capturing design ra- tionale and organizational memory to support decision making. Prominent work in this field includes experimental software systems such as gIBIS (Conklin and Begeman, 1989) or Answer Garden (Ackerman and Malone, 1990). A second stream of activities was aiming towards the visibility of concurrent design activities over distance in time or space (e.g., Ishii, 1990). However, most of the early research-driven efforts to leverage information systems in cooperative design practice have consistently fallen short of expectations. Grudin (1988) has identified low benefits of use and large overhead introduced to the collaboration process to be common challenges for the development of CSCW applications. 36 A Review of Engineering Design Literature

Table 2.3. Time and space-based views of CSCW technologies (Baecker et al., 1995, p. 742).

Same Place Different Places Same Face to Face Interactions Remote Interactions Time • Public computer displays • Shared desktop systems • Electronic meeting rooms • Video conferencing • Group decision support • Media spaces

Different Ongoing Tasks Communication & Coordination Times • Team rooms • Email • Group displays • Bulletin boards • Shift work groupware • Structured messaging systems • Project management • Workflow management • • Cooperative

2.3.2 Synchronous & Asynchronous Groupware With the beginning of the 1990s, the maturing and convergence of telecom- munications and personal computing technology has led to an exploding number of groupware solutions. Especially in the field of cooperative de- sign, researchers and practitioners started early to adopt and experiment with groupware (Schmidt, 1998). In fact, the idea of leveraging techno- logical apparatus to support collaborative design dates back to the early days of computers and is considered “an inherent human objective” (Kvan, 2000; Vosinakis et al., 2008). To categorize the different research streams in CSCW, DeSanctis and Gallupe (1987) presented a typology of group support systems, subsequently refined by Johansen (1988) to become the well-known 2-by-2 matrix as seen in Table 2.3. Groupware applications are structured in terms of their ability to alleviate temporal and geograph- ical distance. Group members may work synchronously (same time) in face-to-face meetings or remote in multiple meeting sites. Asynchronous collaboration (different times) is taking place on-site or across different floors, buildings, cities, or continents. Although a number of extensions for this taxonomy have been proposed over the years (e.g., Nunamaker et al., 1991; Grudin, 1994), this basic differentiation on a time and space dimen- sion presents a suitable schema for groupware categorization. However, as systems become more functional and grow in complexity, the classification of groupware is often ambiguous and can not be based on a single category.

Synchronous Collaboration Synchronous groupware assists a group of individuals in working together at the same time. Electronic meeting rooms or remote interactions via 2.3. CSCW and Groupware in Conceptual Design 37 shared desktop or video conferencing systems are examples for ICT-enabled synchronous collaboration. A central theme in computer-mediated syn- chronous collaboration is ‘WYSIWIS’ (What You See Is What I See), an “idealization of multiuser interfaces in which everyone sees exactly the same image of the shared meeting workspace and can see where everyone else is pointing” (Baecker et al., 1995, p. 745). A pioneering project in this field is the ‘Colab’ system, an experimental meeting room designed de- signed at Xerox PARC to support collaborative brainstorming, argument development, and freestyle sketching in face-to-face meetings (Stefik et al., 1987). The WYSIWIS paradigm quickly led to various research efforts in the field of ‘desktop conferencing’. Desktop conferencing describes real- time, computer-based conferences in which users may share data through their personal computers (Ahuja et al., 1990). One early example is the ‘Rapport’ system built at AT&T Bell Laboratories, a “multimedia con- ferencing system that allows a group of people, using the computers and phones in their offices, to hold real-time discussions sharing voice, data, and images” (Ahuja et al., 1988). Systems such as Colab and Rapport represent the starting points for synchronous collaboration activities mediated over computer networks. With the invention of wide area networks and the Internet, the usage of those systems within one single room or building was no longer a restric- tion. Synchronous remote collaboration became technically feasible and dis- persed work groups, who were interacting across longer distances, became commonplace in organizations. However, research-driven progress in elec- tronic meeting support did not meet expectations because of shortcomings with available technology, poor integration, and incomplete understand- ing of the nature of group decision making (Kraemer and King, 1988). In the meantime, lightweight general-purpose utilities found their way into distributed workspaces, supporting ad-hoc communication between two or more participants via text-based instant messaging or audio/video con- ferencing. Today, instant messaging, video telephony, or desktop sharing applications are standard tools on personal computers, workstations, and in the daily business of virtual teams (Shiu and Lenhart, 2004).

Asynchronous Collaboration

Asynchronous groupware supports communication and problem solving among groups of individuals who contribute at different times (Baecker et al., 1995, p. 743). Dennis and Valacich (1999) has elaborated several advantageous properties that are linked to this type of groupware. Asyn- chronous communication allows team members more time to fine-tune or edit messages in order to establish the reasoning behind it. The communi- cation process is staggered and can proceed independently from the avail- 38 A Review of Engineering Design Literature ability and individual schedule of communication partners. Participants can post or reply to shared information when they have time to deal with it, making it a convenient tool for teams distributed across time zones. Parallelism in the course of communication allows for the simultaneous input of information that mitigates blocks in the collaboration process. Finally, messages can be reexamined and processed again later in the pro- cess. Thus, asynchronous groupware facilitates the creation of an electronic team memory and shared information spaces (Schmidt et al., 2001). The most successful asynchronous coordination tool to date is electronic mail. Since 1998, the number of email mailboxes has grown from 253 million to nearly 1.6 billion in 2006, and is growing further (Gantz et al., 2007). Email is fast, easy to use, can address one or multiple receivers, and incor- porates file transfer of, e.g., text, images, audio, video content. 77% of email workers say email helps them keep up with events at work and 63% find email more effective than using the phone or talking in person for making arrangements and appointments (Fallows, 2002). Loftus et al. (2008) report on a recent study in which 35% of participating engineers spend on aver- age over two hours a day reading and answering emails. Perry et al. (1999) observed that email was used by distributed design teams to communicate non-urgent messages, allowing the sender to manage their time resources more flexibly. They report that “email was used as a means of distributing collaborative work over time so that shared work could be carried out when convenient to its recipients. Email could be tightly targeted at particular people and did not take up ‘group time’, which was a valuable commodity”. In other cases, email was also used as a means of reminding the others to perform tasks at the appropriate time. “This function of email as a ‘demon’ was used by the team whenever they updated information in their group space that they felt the others would need to know about” (ibid.). Several extensions to electronic mail have been proposed, including Winograd’s work on the theory of conversation (Winograd, 1986) and Mal- one’s semistructured messaging systems (Malone et al., 1987). Structural and semantic frameworks (e.g., McDowell et al., 2004), and other methods of improvement imposed to email systems, were expected to bring addi- tional benefits in the coordination of processes and information (Fischer et al., 1992). However, most of these systems failed to receive broader at- tention in practice.

2.3.3 Hypermedia & Web-based Collaboration Platforms

Hypermedia systems have been recognized in engineering design even be- fore the World Wide Web became popular (e.g., Conklin and Begeman, 1987; McCall et al., 1990). The value that hypermedia brings to engineer- ing teams essentially results from two inherent characteristics: (a) the mul- 2.3. CSCW and Groupware in Conceptual Design 39 tiplicity of connections between media fragments as opposed to the linear structure of traditional text, and (b) the availability of media other than text (Fischer et al., 1992). Hyperlinked documents establish relationships between explanatory, elaborative, and other correlated information, creat- ing multimedia information spaces beyond the text-based level. Graphics, animation, and sound are more effective than text in conveying certain kinds of information such as two- and three-dimensional spatial relation- ships as well as processes, behaviors, and the evolution of a designed prod- uct (ibid.). However, it was not before the vast proliferation of the World Wide Web that this concept broadly influenced design collaboration. The Web has simplified the platform-independent integration, combination, and dissemination of distributed information resources. With the new century approaching, the Internet has become a unique infrastructure for resource integration, data sharing, and design collaboration and constitutes a de- signer’s reference library (Wang et al., 2002). Distributed design processes have been physically enabled by the Internet and are functionally supported by numerous types of applications and services provided on the Web. Wang et al. (2002) remarks that mature groupware has found its way especially into areas such as simulation, analysis, and optimization. Later- stage engineering activities such as detailed design and production receive intensive tool support, for example through professional computer-aided design (CAD) solutions. But relatively few applications exist at the con- ceptual design stage, where the impact of decisions is still high (Fig. 2.5). Many approaches to support the early-stage conceptual design process failed to gain foothold, “because knowledge of the design requirements and constraints during this early phase of a product’s lifecycle is usually impre- cise and incomplete, making it difficult to utilise computer-based systems or prototypes” (ibid.). 982 L. Wang et al. / Computer-Aided Design 34 12002) 981±996 phase in the product design cycle, when the basic solution path is laid down through the elaboration of a solution prin- ciple [4]. It involves formulation of abstract ideas with approximate concrete representations [7]. The early or conceptual stage of the design process is dominated by the generation of ideas, which are subsequently evaluated against general requirements' criteria. There follows a process whereby additional data are incorporated allowing decisions to be made between competing alternatives as more tangible evidence of function is derived [8]. The conceptual design is crucial, particularly, when Figure 2.5. OpportunityFig. for groupware1. Opportunity in early in early design design stages stage. (Wang et al., 2002). Tools to support collaboration groups in the conceptual design phase are scarce. designing new and innovative products, or when generating a completely new design for an existing product. It is popularity of the Internet is largely due to the in¯uence of common knowledge that the majority of the product cost the World Wide Web proposed in 1989, which has made the is committed by the end of the conceptual design phase Internet accessible and available to mass population. [9,10]. At this phase, information is very fuzzy and incom- Powered by the ever-improving information technologies, plete, which makes the design process quite dif®cult and such as , search engines, email, HTML 9Hyper Text challenging. It also renders a problem for representing the Markup Language), XML 9eXtensible Markup Language), designed product. Several representations have been and RMI 9Remote Method Invocation), the Web provides proposed for this phaseÐbond graphs [11], the sketching another familiar interface and gives us a common `look and of abstractions [12] to name a few. How to capture user's feel' to information exchange. As the use of the Internet and intent at this stage is challenging. Qin et al. look at this Web spreads, and because of globalisation, the paradigm of interesting research problem of capturing user's sketching the design activity is changing drastically. Speci®cally, intentions and automatically generating the corresponding there is an ever-increasing need for the continuous colla- 2D geometric primitives [13]. When it is possible, the 2D boration among geographically distributed design teams. objects are projected into 3D models. The collaborative conceptual design process is physically Most common techniques used in the conceptual design enabled by the Internet and Web technologies, and function- include problem solving strategies, genetic algorithms, ally supported by the technologies in the domain of arti®cial case-based reasoning, and agent technology. Wang et al. intelligence, such as agent technology, knowledge manage- applied a DAER 9design-analysis-evaluation-redesign) ment, knowledge-based systems, and so on. These enabling model for conceptual design, combining numerical calcula- technologies serve as the wheels of the collaborative design tion with symbolic reasoning [1]. Hague et al. [14] acknowl- vehicle to move forward. edged the fact with the help of machine learning, that As an extended version of our paper presented at CSCWD product developers must, at an early design stage, take 2000 [5], this paper is to report on the needs and require- into account all the life-cycle concerns such as manufactur- ments for conceptual engineering design, to clarify the ing, reliability, marketing and distribution, to achieve high- current situation of conceptual design practice, to classify est return on investments. This requirement was realised, the available methodologies, architectures, tools, and tech- partially, by Co-Designer [15], using agents and machine nologies, and ®nally to identify the future trend in this area. learning techniques such as rote learning and parameter More than 80 journal and conference papers and about 20 adjustment learning. Santillan-Gutierrez and Wright [16] projects are selected and reviewed based on the primary use genetic algorithm 9GA) for locating groups of promising focus mentioned above. The selected research works are solutions, aimed at helping designers during the end of the further categorised into several areas based on the applica- conceptual design stage and dealing with often vague and tion domain, design theory, and the technology used for imprecise information. Most efforts concentrated on a speci- implementation. The selected research projects and applica- ®c type of design problems; they have limitations to extend tions are basically for, but not limited to, the collaborative to commercial applications. The result has been that engi- conceptual design. neering speci®cation is not the driver for design generation. Rather, designers generate design based on what they are most familiar with. Unfortunately, an optimal design is not 2. Collaborative conceptual design likely generated in current design practice. Usually, the conceptual design phase starts with 2.1. Conceptual design clari®ed engineering speci®cations. It is followed by the establishment of function structures, by the search Conceptual design commences with high-level descrip- for appropriate working principles and their combina- tions of requirements and proceeds with a high level tion, and by the evaluation of concept variants against description of a solution [6]. Conceptual design is that technical and economic criteria. By the end of the 40 A Review of Engineering Design Literature

Leifer et al. (2002) stress the importance of user-editable Web page systems in conceptual engineering design. Early systems such as Sparrows (Chang, 1998) and the subsequent popular Wiki applications lower the barrier-to-entry for Web content producers by providing means to con- tribute and edit content directly in a Web browser. Chen et al. (2005) have highlighted Weblogs (blogs) and Wikis as valuable knowledge management and group communication tools in engineering communities. They point out that “the rapid rise of interest in software to support group interaction can be attributed to an emerging Web-based platform based on blogs, Wikis, and RSS feeds, on ease of use, and on the ubiquity of web access. In the professional and personal worlds, social interactions increasingly occur and move fluidly between virtual and face-to-face environments”. Standardized formats and protocols further promoted this trend and led to a richness of convenient, interactive, and interconnected online services collectively termed the ‘Web 2.0’ (O’Reilly, 2005). Several other tools and frameworks to support conceptual design have been presented (for a comprehensive review see Wang et al., 2002). The variety of work modes in engineering collaboration led to a trend towards functionally-rich platforms that integrate synchronous and asynchronous tool sets under one user interface and administration. One example is BSCW1 (Basic Support for Cooperative Work). BSCW was the first fully Web-based integrated groupware system without need for special client software (Appelt, 1999). It is a shared workspace system, where authenti- cated members own and manage hierarchically structured document repos- itories stored on a central server. BSCW features distributed document sharing, group discussions, workflow management, mail lists, polls, and awareness mechanisms such as a daily report, information on currently online members, or history of workspace events (Fischer, 2007). Over the years a large number of such versatile solutions have entered the market, of- tentimes referred to as group decision support systems. Younger examples include enterprise collaboration platforms such as WebSpace2 (formerly ipTeam), a content management system offering a suite of tools for sup- porting collaborative product development, and SAP StreamWork3, a flex- ible and modular Web application for project planning and management. Microsofts product portfolio features several solutions to support asyn- chronous collaboration and information handling in virtual teams. Share- Point4, e.g., is an application that provides a Web-compatible interface for sharing, editing, annotation, and searching documents and information in

1 http://www.bscw.de/ 2 http://www.nexprise.com/ 3 http://www.sapstreamwork.com 4 http://sharepoint.microsoft.com 2.3. CSCW and Groupware in Conceptual Design 41 a central project repository. It is extensible, integrable into other Microsoft products, and has basic support for workflow management. Systems such as BSCW and other tools for virtual collaboration man- agement were the precursors for a special branch of groupware solutions, which have specialized on the improvement and support of software engi- neering processes.

2.3.4 Application Lifecycle Management Platforms

The term Application Lifecycle Management (ALM) denotes the coordi- nation of activities to produce software applications, i.e., the management of their development, deployment, and maintenance phases. An ALM solu- tion is the integration (or connection) of different lifecycle tools to support in this process (Schwaber et al., 2006). Several of those solutions – com- mercial and non-commercial – are available to provide teams with an usu- ally exhaustive number of integrated lifecycle components, e.g., for project management, modeling and design, requirements analysis, change and con- figuration management, testing, and deployment. More recent ALM tools also increasingly support the monitoring of software development metrics. A few examples of ALM solutions shall be briefly mentioned. Team- Forge5 offers project management and collaboration support for software development teams, integrating issue tracker tools, source code version con- trol, discussion forums, Wikis, release management, project dashboards, and much more in a Web-based interface. Microsofts Team Foundation Server6 (TFS) provides a rich ALM platform designed to support large software development teams. It offers typical application lifecycle manage- ment functionality and allows to combine version control, build manage- ment, and other tools in an integrated development environment. TFS also supports the automated creation of reports to inform stakeholders, devel- opers, and managers about software metrics, issue tracking, or the status of the project in general. Borland Management Solutions7 connect a modular and extensible ALM tool infrastructure on top of an open ALM platform. An integrated data warehouse allows the generation of reports to analyze the software development process from different angles. ALM solutions do not only focus the early project stages of conceptual design activities, but aim to support the software development and deliv- ery process from a holistic standpoint. Hence, the monitoring capabilities provided by some of the solutions primarily address the observation and analysis of typical software development metrics usually concerned with

5 http://www.collab.net/ 6 http://www.microsoft.com/visualstudio/en-us/products/2010-editions/team-foundation- server 7 http://www.borland.com/de/solutions/software-delivery-management/index.html 42 A Review of Engineering Design Literature source code measures, requirement coverage, performance checks, etc. So- cial factors, such as the communication and interactions between involved stakeholders are usually not considered or captured.

2.3.5 Conclusions Drawn From Review Having come a long way from the early days of the Internet and before, CSCW in design has reached the World Wide Web in an era where the handling and exchange of multimedia documents, messages, social bonds, expert knowledge, and community-generated content is omnipresent. The Web has become an universal platform for versatile groupware applications and for synchronous, asynchronous, remote, and co-located design collab- oration. While it can be assumed that the Web will retain its central role as a technological platform for virtual collaboration, it is apparent that groupware applications will continue to evolve and improve. Functionally comprehensive, yet domain-specific collaboration platforms such as TFS or TeamForge can cover most of the basic information handling needs of specific virtual teams. But especially in the early stages of idea genera- tion and conceptualization in cross-disciplinary projects, large integrated toolsets may prove to be too inflexible and are likely to be replaced or temporally complemented by other, more appropriate services. Monitoring virtual collaboration activities therefore requires to consider heterogeneous groupware landscapes, and an external instrument that is largely indepen- dent of the utilized tools and platforms. The resource-oriented client-server architecture of the Web represents the common denominator of modern groupware solutions and forms a stable basis for the design of such an instrument.

2.4 Instruments for Virtual Collaboration Monitoring

The final section of this chapter summarizes work that was conducted on software instruments to capture, monitor, and analyze collaboration activi- ties in engineering design teams. The purpose of this review is to learn from previous approaches and to understand basic needs in the instrumentation of design processes. A set of system requirements for a virtual collaboration monitoring instrument is derived from the reviewed literature and used to differentiate this work from previous research. The review considers work that comprises software instruments to cap- ture data about the communication and coordination behavior in collabo- ration process with the goal to document and study this process and/or to support the work of the collaborating group. Particular attention is given to those approaches that focus on the application in engineering design pro- cesses. Three different categories of instruments are distinguished: 1) those 2.4. Instruments for Virtual Collaboration Monitoring 43 that concentrate on information artifacts and the temporal and semantic relationships between those artifacts, 2) those that concentrate on the ac- tors and participants in the collaboration process and their relationships that evolve through communication, and 3) those that apply a combined view that addresses both the information artifacts and the involved stake- holders. Existing work from each of these three categories is presented in the following sections.

2.4.1 Monitoring of Information Artifacts

The tracking of documents and other information resources created in a design process has a rather long history and represents the beginning of computational monitoring instruments in this field. In general, the goal of such instruments is to capture syntactic or semantic relationships between artifacts that have been created by the designers or the design tools during the course of a design process. The information is used, e.g., to retrace the design evolution and rationale, or to make the structural and temporal dependencies between relevant information resources more transparent. One of the first approaches to supervise the use of design tools is a system presented by Di Janni (1986). The ‘Monitor’ system is based on an extended Petri Net model and handles the interactions among a set of tools for designing integrated circuits. The places of the Petri Net represent input and output files and the transitions represent tools. The focus of the system is on design automation. It aims to support users in the execution of tool chains and to provide updated documentation for individual subparts of the development process. The rigidity of the underlying Petri Net model and the absence of a mechanism to maintain a history of the design are major deficiencies of this approach. Casotto et al. (1990) presented an automatic design manager (ADM) based on design traces. A design trace is a directed and acyclic graph, in which nodes represent either design data or CAD transactions. The trace is a syntactic and historical model of the design activity that is built automatically and non-intrusively by the design tools. A client-server ar- chitecture supports the integration of tools that inform the server about their execution and other programs that query the server for information about design activity. Despite many innovative features, the ADM system generates a rather abstract representation of transactions initiated from a UNIX shell that are using some data objects as inputs and produce other data objects as outputs. Non-linear and parallel activities in team-based design processes, as understood in this work, can not be addressed by this solution. SHARE (Toye et al., 1994) is an open, heterogeneous, network-oriented environment for concurrent engineering, particularly for design informa- 44 A Review of Engineering Design Literature tion and data capturing and sharing through asynchronous communica- tion. SHARE enables engineers to participate in a distributed team, allow- ing them to achieve a shared understanding of their processes and artifacts using email, Web-based tools, and agent-based services. A method to visualize the progress of engineering design processes has been developed by Wiegers and Knoop (1998). The process is depicted as a series of activity boxes and information events that are aligned along a time line. Information events are either requests for information, answers to requests (information inputs), or results produced by the designer (in- formation outputs). While the visualizations are generated by a software tool, the approach relies on data that is manually assembled during the ob- servation. The presented information handling process model is simplistic and does not reflect complex relationships in realistic, team-based design landscapes. Yen (2000) has analyzed the relationship between verbal communica- tion and sketch activity in conceptual design with a tool called ‘Recall’. The system has been applied in a number of short-term design sessions to simultaneously capture audio and video along with sketching activity on a digital whiteboard. His analysis shows that sketch activities serve as a precise and relevant index into the conversation of a design session, sug- gesting that a time-stamped sketch archive might be an effective technique for retrieving design information. Lim and Sato (2001) developed a Design Information Framework (DIF) to support work among multi-disciplinary team members during the design of interactive software systems. DIF establishes an evolutionary ontology for design information consisting of general and project-specific elements. Based on their work, Jung et al. (2005) describe a knowledge management system that organizes distributed multi-media information resources to fa- cilitate knowledge sharing and reuse in the creation of user scenarios. The system focuses on automating the evaluation of different design aspects and does not consider design history, nor does it capture the process of designing itself.

2.4.2 Monitoring of Process Participants

The idea to track the activities of individual actors in the design process is relatively young. Most of the existing work has been influenced by the growing research interest in the area of social networks and social network analysis. A common goal of these studies is to identify relationships between actors in the design process that result from interactions or information that has been exchanged between two or more participants. The resulting structural properties of a team can be used to analyze, e.g., the impact 2.4. Instruments for Virtual Collaboration Monitoring 45 of social factors, team setup, or specific communication patterns on the design process. Wong and Burton (2000) presented a comprehensive study of the char- acteristics of virtual teams and the impact of distributed collaboration on team performance. Using a discrete event simulation model, the authors simulated different virtual team models and examined their impact on var- ious team performance dimensions. Task structures and the description of actors were the input variables for the simulation. Their results describe different effects that virtual team characteristics can have on team perfor- mance. Bird et al. (2006) studied the relationship of communication and coor- dination activities of open-source software developers, as revealed in the email archives, to their software development activities documented in the source code repository log files. By collecting data from the Apache HTTP server project, the authors applied social network measures to identify sig- nificant roles and actors in the generated communication structures. The work does not go into details about the underlying data model and the applicability to other projects and groupware. ‘TeCFlow’ (Gloor and Zhao, 2004) is a tool to generate interactive movies of communication flows among individuals by mining email log files. It visualizes the social bonds between email senders and receivers in a graph and allows to explore its evolution over time. The TeCFlow sys- tem has been applied in open-source communities and different student groups to identify correlations between the temporal communication pat- terns and different measures of team performance (Kidane and Gloor, 2007; Gloor et al., 2008). The ‘iQuest’ system (Gloor and Zhao, 2006) extends the TeCFlow concept by adding support for multiple data sources and functionality to explore related communication resources based on term frequency.

2.4.3 Combined Monitoring of Information and Participants

Few approaches exist that monitor the design process in terms of the in- formation resources and the actors that are involved in the handling of those resources. The benefit of such a holistic view on the design process is that it combines the analytical potential of the two previous categories. By recording the relationships between captured activities, information re- sources, and the participants involved, the instruments are able to establish a detailed description of the collaboration process. Ramesh and Tiwana (1999) have developed a system to support knowl- edge management tasks in collaborative new product development. The system allows a group of distributed users to co-create a semantic network of design concepts, issues, alternatives, assumptions, or any other ontologi- 46 A Review of Engineering Design Literature cal component for classifying their design knowledge. Web resources can be linked to the nodes in the network to provide supporting documents. Basic functionality for semantic integrity checks and deductive rules is provided. Using this facility, team members conduct conversations via a graphical interface to generate a structured representation in terms of the primitives specified in the ontology. The users can communicate their viewpoints and expertise and map their views of the problem with those of others. How the generation of these models can be automated and linked to other com- munication channels is not reported. Milne (2005) presented an information-theoretic approach to study the use of ubiquitous computing workspaces in distributed engineering design teams. The study is based on a groupware system called ‘GroupBoard’, which supports distributed engineering design teams in synchronous col- laboration tasks. GroupBoard integrates different communication channels such as audio, video, sketching, text messaging, and application sharing in multiple physical workspaces. In the analysis, event messages that are ex- changed between connected workspaces to synchronize remote interactions are captured to provide a high-resolution record of pen strokes, keyboard entries, etc., per individual participant. Thus, the instrument is well suited to efficiently examine how the nature of the developed collaboration tool in- fluenced design activities in detail. The exploration of long-term conceptual design processes, in which asynchronous information handling on different communication channels considerably determines the tenor of communica- tion in virtual teams, has not been addressed.

2.4.4 Derivation of System Requirements

From the above review of conceptual engineering design, information han- dling, and virtual collaboration in design teams, it is apparent that the monitoring of groupware use ‘in the field’ is a complex objective to achieve. Due to the number of activities to observe at multiple sites, the wide vari- ability that may be found in tools and group composition, and the range of environmental factors, the demands for an applicable monitoring sys- tem are challenging. Based on the reviewed literature and the research objectives defined before (Sect. 1.4), the following section lists five basic properties P1 – P5 that have been identified to be relevant in the context of observing virtual collaboration processes. Hence, the properties define important requirements for the monitoring system presented in this work. P1: Extensibility in Terms of Data Collection. The software land- scape in creative collaboration processes like engineering design will remain heterogeneous. The unstructured and varying nature of team interactions in the early stages of design demands for diverse sets of 2.4. Instruments for Virtual Collaboration Monitoring 47

support tools with different strengths and characteristics. Extensibility with regard to data collection ensures that a monitoring system can be easily configured to incorporate new types of communication channels and groupware activities without the need to modify the underlying structure of the system. This property reinforces the need for a flexible and generic data model to consolidate different collaboration activi- ties, and for a uniform interface to access and store information about heterogeneous communication artifacts. P2: Extensibility in Terms of Data Analysis. Extensibility in terms of data analysis provides for the possibility to implement custom pro- cedures for the evaluation of the collaboration activities being recorded by the instrument. The analytical extensions may comprise, e.g., new forms of visualizations, statistical evaluations, or collaboration dash- boards. To allow for a variety of analytical procedures to be applied, the monitoring system must provide an appropriate interface to a canoni- cal data model through which attributes and patterns of the monitored collaboration process can be queried. P3: Automated, Non-Interfering Data Collection. The automated collection of collaboration data should not interfere with or hinder the work of the observed process participants. Hence, it must not rely on or be manually triggered by explicit actions being introduced into the collaboration process. Any additional efforts required will interfere with the natural behavior of the process participants and lead to changed behavior and a low acceptance of the system. P4: Real-Time Analysis. The term ‘monitoring’ denotes the inspection of the current conditions of a system under observation. Hence, the monitoring of team collaboration requires the immediate processing of collaboration activities in order to allow for real-time inspection and analysis routines. Only if the system provides an up-to-date view on the status-quo of the collaboration process, researchers and practition- ers can respond to a given situation in a timely manner. Although the utilization for the purpose of guidance and team support is not ex- plicitly addressed in this research, the basic capabilities for real-time process diagnoses need to be provided by a monitoring system. P5: History & Backtracking Support. Cooperative work in engineer- ing design is situated in social and historical activities, which are influ- enced by former practice, experiences, and information shared in the team. The study of groupware use in engineering projects should there- fore be able to extend over a longer period of time, in order to address temporal and causal dependencies in the analysis. For this reason, a monitoring instrument should be able to reproduce previous states of the communication structures to allow for the backtracking of collabo- ration activities and the exploration of trends and prior conditions. 48 A Review of Engineering Design Literature

The following section summarizes the shortcomings of current solutions and argues that meeting all of the above properties marks an important step towards a better instrumentation of virtual collaboration processes.

2.4.5 Moving Beyond the Existing Literature

New tools are required to advance design research in the age of the Inter- net and the World Wide Web. A flexible approach is needed, which lowers the technical monitoring barriers for researchers and practitioners by being applicable in various collaboration environments. Only by decoupling the data collection process from the data modeling and analysis, a reusable ob- servation platform can be created that is able to adapt to different scenarios at a time and which is not restricted to a specific CSCW environment. The five general system properties listed above set the framework for the design of such a monitoring instrument that is extensible in terms of data collec- tion and analysis, non-interfering, and essential for the evaluation of virtual collaboration processes. Table 2.4 gives a subjective summary of how well these properties are met by existing monitoring approaches presented in the reviewed research work. The table shows that some of the more recent approaches are moving in a promising direction. Especially the work of Milne (2005) has set a new benchmark for a versatile monitoring instrument that allows to ob- serve how teams interact with synchronous groupware on a fine-grained level. Unfortunately, the instrument is tightly integrated into a prescribed solution for “shared desktop” design sessions and seems not to work well with the long-term, asynchronous collaboration structures of email and Web-based groupware systems. Other monitoring approaches that are not often discussed in this line of research are those that are integrated into commercial ALM platforms such as Microsoft’s TFS and Borland Management Suite. While these solutions can offer advanced reporting functionality with a wide range of predefined and extensible project metrics, the scope and focus of such systems gener- ally does not conform well with the needs of early-stage conceptual design teams. Status reports and team statistics integrated into single CSCW tools or collaboration platforms can only address part of the information space and hence capture fractions of the communication structures in a team. Thus, in order to make progress with the broad instrumentation of vir- tual collaboration processes, a technological foundation for monitoring ar- bitrary collaboration activities must be created that is stand-alone and in- dependent of existing collaboration tools. This forms a basis for providing monitoring functionality as a lightweight service to teams and observers, who can tailor the system to their specific environments, compare their 2.5. Chapter Summary 49

Table 2.4. A comparison of properties met by existing instruments for capturing and analyzing the work of engineering design teams. P1 P2 P3 P4 P5 Focus on Information Artifacts Di Janni (1986) + ◦ ◦ + ◦ Casotto et al. (1990) + + + + + Toye et al. (1994) + + + ◦ ◦ Wiegers and Knoop (1998) + ◦ ◦ ◦ ++ Yen (2000) ◦ ◦ + ◦ ++ Jung et al. (2005) + + ◦ ◦ ◦ Focus on Process Participants Wong and Burton (2000) + + ◦ ◦ ◦ Bird et al. (2006) ◦ ◦ ++ ◦ ◦ Gloor and Zhao (2004) ◦ ◦ ++ ◦ ++ Combined View Ramesh and Tiwana (1999) ◦ ◦ ◦ + + Milne (2005) + + + + + Gloor and Zhao (2006) + ◦ ++ ◦ ++

Legend: P1 Extensibility in Terms of Data Collection P2 Extensibility in Terms of Data Analysis P3 Automated, Non-Interfering Data Collection P4 Real-Time Analysis P5 History & Backtracking Support ++ fully meets property + mostly meets property ◦ doesn’t have property, or limited

findings, and reduce the overall efforts caused by setting up a custom in- frastructure. The instrument developed in this work realizes this approach.

2.5 Chapter Summary

The chapter has introduced engineering design as a core discipline of orga- nizations involved in the creation of new products, systems, and services. Design comprises the iterative team-based generation of concepts, fueled by intense collaboration and information handling, and aiming to achieve an innovative concept solution to be productized. Ad-hoc communication and coordination of information resources is at the heart of engineering design, resulting in mostly unstructured and unpredictable workflows. ICT is playing a major role in supporting these workflows. Different types of groupware applications and communication channels determine today’s vir- tual collaboration environments. With the digital footprint of design work 50 A Review of Engineering Design Literature growing, new opportunities for the instrumentation of the design process arise. To better understand and optimize engineering design collaboration, means to efficiently monitor and analyze ICT-mediated team activities are required. Existing instruments have been presented to support the obser- vation of virtual collaboration activities in research and praxis. They have been compared on the basis of key properties found to be relevant in the computational monitoring of engineering design and a need for an exten- sible approach has been identified. The new system has to respond to the requirements of online, Web-based team collaboration, and facilitate non- interfering data aggregation, processing, and real-time analysis. Flexible standards and a sound technological foundation are required to establish such a technique. The following chapter introduces concepts and technol- ogy that provide the basis for the development of an applicable monitoring instrument. 3 Technological Foundations

The design and implementation of a service platform for virtual collab- oration monitoring is tightly linked with the service-oriented computing and architecture paradigm in software engineering. This chapter gives an introduction into relevant concepts and standards to provide a common understanding of the underlying design principles that influence the devel- opment of the instrument.

3.1 Definitions

A number of basic terms need to be clarified before continuing with an elaboration on service engineering principles and Web technology. Some of them have already been used in this work sporadically, but have not been precisely defined yet. To start with, the notion of a system is introduced.

Definition (3.1): A system is something of interest as a whole or as comprised parts. Therefore a system may be referred to as an entity. A component of a system may itself be a system, in which case it may be called a subsystem (ISO/IEC, 1996).

For the scope of this work, two types of systems can be differentiated. One are the instances of engineering design processes that we want to better understand and that comprise complex subsystems such as involved stakeholders and software tools to support these processes (Chap. 2). The other type of systems treated in this work are monitoring systems, which give insight into the current state of another (monitored) system.

Definition (3.2): A monitoring system is a system that continuously records, processes, and reacts to observable changes in the state of one or more monitored systems with the goal to provide feedback about activities and to assist in the understanding and control of these systems.

In particular, a monitoring system for virtual collaboration processes observes group interactions carried by means of digital media. One ele- ment of this research work is to describe the internal structure of such a 52 Technological Foundations monitoring system, i.e, its components and subsystems, by means of defin- ing an eligible system architecture.

Definition (3.3): The architecture of a system is the fundamental or- ganization of a system embodied in its components, their relationships to each other and to the environment and the principles guiding its design and evolution (IEEE, 2000).

Over the last decades, the software engineering discipline has suggested several styles of system architectures, each one responding to specific re- quirements and environmental constraints of the system under develop- ment.

Definition (3.4): An architectural style is a coordinated set of ar- chitectural constraints that restricts the roles/features of architectural elements and the allowed relationships among those elements within any architecture that conforms to that style (Fielding, 2000).

One style that has recently gained extensive momentum in industry and academia is that of service-oriented architectures (SOA, Erl, 2005). Service orientation realizes the design of distributed, loosely-coupled sys- tems, whose core architectural elements are services and service consumers (Matthew et al., 2006). Hence, SOA as an architectural style can be briefly defined as follows.

Definition (3.5): A service-oriented architectural style is a coor- dinated set of constraints that restricts the roles, features and allowed relationships of services and service consumers (Usl¨ander,2010).

The most predominant architectural element in a SOA is the “service”. A very broad and abstract definition of this overloaded notion is given by Preist (2004), who defines a service as “the provision of something of value, in the context of some domain of application, by one party to another”. We adopt a more precise, but still generic definition that highlights the role of a service in the specification of a software system.

Definition (3.6): A service is a distinct part of the functionality that is provided by an entity through interfaces, whereby an interface is a named set of operations that characterize the behavior of an entity (ISO/IEC, 2005; Usl¨ander,2010)

Note that the definition of a service does not imply technical require- ments of its implementation. Several strategies have been developed to im- plement services and service-oriented architectures, ranging from message- oriented and object-oriented middleware (e.g., Mowbray and Ruh, 1998) 3.2. Representational State Transfer 53 up to Web service standards and protocols (Erl, 2005). Hence, the imple- mentation of a service and the way it is described, provided, and consumed very much depends on the characteristics of the computational environment (Usl¨ander,2010). Independent of that, a service platform can be defined as follows.

Definition (3.7): A service platform is a software system that provides a set of distinct, but logically related services that adhere to the architec- tural rules and restrictions defined by the system’s architectural style.

Hence, a service platform becomes a critical component (subsystem) in the design of software systems that directly or indirectly consume the provided services. A monitoring service platform can now be defined using the terminology introduced above.

Definition (3.8): A monitoring service platform is a service platform for monitoring systems that enables the recording, access, management, and processing of information about one or more monitored systems by means of the provided set of services.

In particular, a monitoring service platform for virtual collaboration processes enables the recording, access, management, and processing of in- formation about virtual collaboration activities carried out through email, groupware, or other ICT-based collaboration tools.

3.2 Representational State Transfer

The fundamental architectural style of resource-oriented systems has been identified and described by Roy Fielding as ”Representational State Trans- fer”, or REST (Fielding, 2000; Fielding and Taylor, 2002). The term em- phasizes one of the key characteristics of that architectural style: the trans- fer of application state between components by means of resources and their representations.

Definition (3.9): A resource-oriented architectural style is a co- ordinated set of architectural constraints that restricts the identification, characteristics and allowed links and methods of resources and their rep- resentations (Usl¨ander,2010).

The REST architectural style evolved during work on HTTP (Field- ing et al., 1999a) – the HyperText Transfer Protocol. It was formalized as a guideline to the transition from HTTP/1.0 to HTTP/1.1 and thus HTTP is the dominant implementation of the resource-oriented architec- tural style (Overdick, 2007). In many ways, REST provides an idealized, 54 Technological Foundations abstract view of the architectural goals of HTTP and describes a style that is well-suited to very large scale distributed hypermedia applications. REST conforms with the client-server architectural style (Sommerville, 2006, pp. 270), advocating separation of concerns across multiple platforms. Another important restriction in a RESTful architecture is that communication be- tween its components is stateless. This means that each client request must contain all of the information necessary to understand the request on the server side. The advantage of stateless message exchange is that clients and servers do not need to store context information, which simplifies the design of a resource-oriented software system (Dunkel et al., 2008). As already mentioned, the key abstraction of information in REST is a resource. So, what is a resource? According to Fielding (2000), any in- formation that can be named can be a resource: a document or image, a temporal service, a collection of other resources, or a non-virtual object such as a person.

Definition (3.10): A resource is anything that is important enough to be referenced as a thing itself. Resources have a globally shared request message classification system called uniform interface and are addressable via uniform resource identifiers.

REST components perform actions on a resource by using a represen- tation to capture the current or intended state of that resource and trans- ferring that representation between components (Fielding, 2000). Thus, the application state in a resource-oriented architecture is manipulated through the stateless exchange of resource representations between clients and servers.

Definition (3.11): The representation of a resource comprises any use- ful information about the current state of a resource. A resource may have (and usually has) several representations. A representation of a resource may contain one or more links to another representation of the same or another resource (Usl¨ander,2010).

A key component of the REST architectural style is the uniform inter- face between server and clients. A uniform interface has commonly agreed, well-defined semantics and allows access to and the manipulation of re- sources. The advantage of generic interaction semantics is that components are able to create, update, read, and delete resources, regardless of the un- derlying implementation of resources and communication mechanisms. All that is needed to interact with a resource is its resource identifier. In order to obtain a uniform interface, REST is defined by four interface constraints: identification of resources, manipulation of resources through representations, self-descriptive messages, and hypermedia as the engine 3.2. Representational State Transfer 55 of application state (Fielding, 2000). The most successful application of resource orientation today is the World Wide Web. Here, resources are identified by Universal Resource Identifiers (URI, Berners-Lee et al., 1998) and a common data format for the representations of Web resources is the Hypertext Markup Language (HTML, Connolly and Masinter, 2000). HTTP is the protocol standard of the Web and, as such, it is apparent in the daily use of Web browsers, where it mostly controls hyperlink traversal and form-based data submission. However, the uniform resource interface of HTTP allows for a larger set of operations than the use of GET to recover the representation of a resource and the use of POST to send data to a web application. The basic HTTP methods for fully managing the life cycle and state of resources through representation interchange are: GET – Request a representation of the resource state: Messages labeled as GET have an empty service request and are guaranteed to have no substantial effect within the receiver of such a request, i.e. they are safe to call. GET responses are expected to be a description of the current state of the targeted resource. These attributes allow GET to act as a universal reflection mechanism: it can be issued without any prior knowledge of the resource (Overdick, 2007). PUT – Update the state of the resource: Messages labeled as PUT do cause an effect in the targeted resource, but do so in an idempotent fashion. An idempotent interaction is defined as replayable, i.e., the effect of n messages is the same as that of 1. In a distributed system, where transactions may not be readily available, this is a great help to error recovery. Again, this assumption can be made without any prior semantic knowledge of the resource involved (ibid.). DELETE – Remove the resource: Messages labeled as DELETE do cause an effect in the targeted resource, where that effect has a negative connotation. Just as PUT, DELETE is defined as idempotent. However, as with all messages, the interpretation is solely the responsibility of the receiver, i.e. a DELETE has to be regarded as “please terminate” (ibid.). POST – Create a child resource: The POST method is used to request that the origin server accepts the entity enclosed in the request as a new subordinate of the resource identified by a URI enclosed in the request (Fielding et al., 1999a). Thus, POST messages cause an effect in the receiver and are not safe to replay. This limited set of well-defined operations distinguishes REST from other distributed computing paradigms such as remote procedure calls (RPCs), in which resources can be the target of arbitrary operations. While REST is often criticized for this architectural restriction, the simple inter- 56 Technological Foundations face design has the potential to afford large-scale distributed application development by means of loosely-coupled, late-binding service components. The resource-oriented architectural style, as it has been summarized in this section, defines the principle design patterns for the implementation of a monitoring service platform later in this work.

Definition (3.12): A resource-oriented monitoring service plat- form is a monitoring service platform that enables the recording, access, management, and processing of information about one or more monitored systems by means of resources and their representations.

Thus, in the case of monitoring virtual team collaboration, such a ser- vice platform provides a set of resources that are necessary to represent information about the entities and activities in (for example) engineering design processes.

3.3 Of Resources and Semantics

Virtual collaboration, as it is practiced today, comprises team activities that in most instances involve services and information provided over the Internet and the World Wide Web. As described above, Web technology defines the standard for providing and updating information in a hyper- media environment, in which addressable resources are the basic logical entities for information representation. In the context of conceptual de- sign collaboration, a resource can represent, e.g., product requirements collected on a Wiki page, a prototype sketch, the video of a user observa- tion, the discussion on a design decision, source code, a calendar, etc. In addition, while online tools and services to support knowledge exchange by means of hyperlinked resources have become commonplace, the Web has further evolved to a platform for representing and providing meta- information about such resources. Several extensions and Web standards summarized under the term ‘Semantic Web’ provide a basis for defining computer-understandable descriptive properties and relationships that can be assigned to any resource. Arguing that virtual collaboration is reflected in the way how resources are handled (i.e, created, shared, manipulated, etc.) and hyperlinked in a team process, it is reasonable to represent collected meta-information about the handling of these resources again in terms of resources and rela- tionships. We define resources that represent meta-information about other resources as descriptive resources. 3.3. Of Resources and Semantics 57

Definition (3.13): A descriptive resource dr defines properties of a resource r and its relationships to other resources. In particular, represen- 0 tations of dr link to r and to any related resource r , respectively dr0 . The properties of a resource are associated with concepts taken from a semantic model, e.g., an ontology specified in a formal language (Sect. 3.4.1). In the context of describing resources that play role in virtual col- laboration, the set of properties comprises thematic properties (i.e., “what information does it represent?”, “what are the relationships?”), as well as temporal properties (i.e., “when has the resource been introduced?”, “when have relationships been established to another resource?”). For example, let us assume a descriptive resource dr that defines an ‘is topic of ’ relationship between r (e.g., a sketch or technical drawing) and r0 (e.g., a forum topic or blog entry). A representation of dr would contain temporal information 0 and references to the drawing, its discussion, and to dr, thus supporting the traversal and exploration of meta-information about semantically related resources. Figure 3.1 visualizes the interplay between resources and descriptive re- sources with another example. Here, let us consider the Web-based informa- tion space of a collaboration team as a directed, not necessarily connected graph of resources and hyperlinks and let us further assume a collabora- tion scenario comprising of four distinct resources (Fig. 3.1a). The resources are not further classified and may represent elements of different groupware tools and arbitrary Web applications. While hyperlinks in a resource repre- sentation indicate some unidirectional dependencies between the resource and another, the actual but implicit contextual relationships of a resource are usually more comprehensive and not specified. Figure 3.1b gives an example of how these relationships might look like for the previous four resources. Note that their relationships extend to nodes that were not part of the original graph. These can represent, e.g., non-virtual entities such as persons that are involved in the handling of the other resources. Also note the bi-directional nature of the edges due to the general invertibility of this kind of relationships (e.g., ‘is topic of ’ vs. ‘has topic’). Given such implicit properties of an information space, descriptive resources can now be used to create an explicit representation of these properties, thus establishing a semantic layer on top of existing information without altering the original resources (Fig. 3.1c). The representation of additional context information in a semantic layer of descriptive resources is a feasible way to define the thematic and tempo- ral properties for arbitrary resources. The representations and the under- lying implementations of the to-be-described resources remain unaffected, a bonus, especially when resources are provided by closed or 3rd-party systems. Given a service platform that coordinates the generation of de- 58 Technological Foundations

(a) Four resources represent- (b) Semantic relationships of (c) Descriptive resources estab- ing (hyperlinked) information collaboration resources are of- lish a semantic layer without al- being handled in a collabora- ten complex and implicit. tering the existing resources. tion process.

Figure 3.1. Adding a semantic layer to information in a virtual collaboration process by means of descriptive resources. scriptive resources, a central source and broker for describing or retrieving the semantic annotations for any type of collaboration resources can be es- tablished. All that is needed are the identifiers of the described resources. Several efforts to formally describe meta-information about resources in a computer-understandable format have been made and came to popular attention in recent years. The notion of the ‘Semantic Web’ combines de- facto standards and formats to define and computationally reason about concepts and situations in a resource environment. The basic principles of the Semantic Web shall be presented in the following section.

3.4 Semantic Web

The basic idea behind the Semantic Web is to provide information about resources in a format that allows machines to handle this information in a meaningful and serviceable way (Hitzler et al., 2008). A prerequisite for this are open and interoperable standards that allow to describe and ex- change this information on different applications and platforms. To achieve this goal, the Semantic Web technology has been grounded on a number of well-defined and extensible standards recommended by the World Wide Web Consortium (W3C), the most relevant being XML, the Resource De- scription Framework (RDF), RDF Schema (RDFS), and the OWL Web Ontology Language. Thus, the Semantic Web can be understood as an ex- tension to the ‘traditional’ Web and its fundamental technologies such as Uniform Resource Identifiers (URIs). In computer science, the term semantics generally denotes the meaning of literal strings and their interrelationships. Given a formal language to describe this meaning, computational methods can be applied to derive (infer) “new” information out of specified axioms and already inferred the- orems. The Semantic Web constitutes such a formal system by establishing 3.4. Semantic Web 59 means to define well-defined premises that are linked to resources and a set of inference rules to define possible conclusions.

3.4.1 Ontologies

Central to semantic technologies is the notion of ontologies. Many attempts to define what constitutes an ontology have been made. A popular, yet broad definition is given by Gruber (1993), stating that an ontology is “an explicit specification of a conceptualization”. In this context, a concep- tualization means “an abstract model of some aspect of the world, taking the form of a definition of the properties of important concepts and rela- tionships” (Baader et al., 2004). An explicit specification means that the concepts and relationships are represented in an well-defined format, al- lowing humans and machines to unambiguously interpret and reason about the model. Thus, we specifically refer to an ontology also as an informa- tion object and engineering artifact. The W3C has defined standard mod- els and logic-based representation languages for dealing with ontologies. Those languages allow to write explicit, formal conceptualizations of do- main models by means of well-defined, unambiguous syntax and semantics, efficient reasoning support, and sufficient expressive power (Antoniou and Van Harmelen, 2004). The standard languages RDF/RDFS and OWL are briefly introduced below. The primary goal of ontologies is to enable agreement on the meaning of specific vocabulary terms and, thus, to facilitate information integration across individual applications (Cimpian et al., 2008). They are used to formally specify concepts and their relationships and provide the means to create semantic metadata for any kind of object. In database terms, we can divide an ontology into two parts: a schema and instance data (Perry, 2008). The schema models a domain by defining class types (e.g., Institute, City) and relationship types (e.g., located in). The schema is populated with instances of classes and relationships (e.g., Hasso Plattner Institute located in Potsdam) to create facts representing knowledge of the domain.

3.4.2 The Resource Description Framework

The Resource Description Framework (RDF) has been adopted by the W3C as a standard for representing decentralized metadata on the Web (W3C, 2004b,f). It defines a language for describing arbitrary Web resources and any other (physical or virtual) entities that are globally identified by an URI, which, in fact, can be anything. The language allows to freely de- fine properties and to use these properties to describe resources by means of binary relationships to other resources or to literals such as strings or 60 Technological Foundations numbers. The binary relationships are encoded as triples of the form (sub- ject, predicate, object). Each triple denotes that a resource – the subject – has a property, called the predicate, with a value, the object. Triples are also referred to as statements, corresponding to the notion of statements formed by simple sentences in a natural language. For example, the follow- ing statement could assert that an entity named John is the creator of a particular resource identified as UIConcept. John has created UIConcept ‘John’ is the subject of the statement, ‘has created’ is the predicate, and ‘UIConcept’ is the object. Transferred to RDF, i.e. statements about re- sources identified by URIs, the entity ‘John’ could, e.g., be referenced by the URI ‘http://example.org/John’, which could point to a resource represent- ing the identity of a person named John. A common and convenient form of writing long URIs is to define a prefix for frequently used namespaces. Assuming that ‘ex’ is the preface for the namespace ‘http://example.org/’, the above resource can be rewritten as ‘ex:John’. Likewise, the property and object of the RDF triple could be uniquely identified by the URIs ‘ex:hasCreated’ and ‘ex:UIConcept’, yielding the following RDF triple: (ex:John, ex:hasCreated, ex:UIConcept) A set of RDF triples is called an RDF graph, as triples can be repre- sented as a directed, labeled graph with labels on both edges and nodes. A directed edge labeled with the property identifier connects a subject to the object of a triple. An example graph is shown in Fig. 3.2.

ex:hasCreated ex:John ex:UIConcept

vcard:email rdf:type

[email protected] ex:Gallery

Figure 3.2. An example RDF graph.

In this notation adopted from (W3C, 2004b), elliptical nodes represent resources and rectangular nodes represent literal values. The example graph shows three triples, one of which being the above statement. A second triple assigns an email address to ‘ex:John’ using a property from the vCard 3.4. Semantic Web 61 vocabulary (Halpin et al., 2010). The third triple states that the resource being created is a member of the class (i.e. “of type”) ‘ex:Gallery’ using the built-in RDF property ‘rdf:type’. A class represents a collection of resources, for example, the class of image galleries. A member of a class is said to be an instance of the class. The set of members of a class is called the class extension of the class. The fact that classes and properties in RDF are themselves resources and identified by URIs provides the basis for constructing ontologies by means of RDF graphs. The standard vocabulary that is required to de- scribe ontological relationships between classes and properties used in RDF graphs is provided by the RDF Schema vocabulary (RDFS, W3C, 2004d). RDFS offers a set of built-in concepts that can be used to define hierarchies or arbitrary graphs of classes (as instances of ‘rdfs:Class’) and properties (as instances of ‘rdf:Property’). The ‘rdfs:subClassOf’ property is used to state that one class is a subclass of another, i.e. that the class extension of the subject class is a subset of the object class extension (McBride, 2004). RDFS also allows to state that the subjects and objects of a prop- erty belong to a certain class. The triple (S, rdfs:domain, O) states that all subjects of a property S are members of the class O. Likewise, the triple (S, rdfs:range, O) states that all objects of a property S are members of O. For a more detailed introduction into RDF and RDFS see, e.g., McBride (2004). The W3C has further defined a set of entailment rules for RDF and RDFS (W3C, 2004c). Conceptually, these rules specify that an additional triple can be added to an RDF graph if the graph contains triples of a spe- cific pattern (Perry, 2008). Such rules describe, for example, the transitivity of the ‘rdfs:subClassOf’ property: if ‘A subClassOf B’ and ‘B subClassOf C’ then ‘A subClassOf C’. Thus, in summary, the unique aspects of RDF, when compared to other data models, are demonstrated by the following characteristics: (1) rela- tionships that are represented as first class objects rather than represented implicitly with, e.g., foreign key constraints in the relational model, and (2) formal semantics, which are specified according to the defined entailment rules for RDF and RDFS entities (ibid.).

3.4.3 The OWL Web Ontology Language

The OWL Web Ontology Language is a formal language developed by the W3C for representing ontologies in the Semantic Web (W3C, 2004a). OWL is heavily based on Description Logics and is designed to facilitate greater machine interpretability of data (i.e., more logical reasoning) than what is capable with RDF and RDFS (Perry, 2008). 62 Technological Foundations

It extends the basic fact-stating ability of RDF and the class- and property-structuring capabilities of RDF Schema in several important ways (Horrocks et al., 2003). OWL classes can be specified as logical combina- tions (intersections, unions, or complements) of other classes, or as enumer- ations of specified objects. OWL properties can be specified as transitive, symmetric, functional, or as the inverse of another property. Additionally, with OWL one is able to define restrictions on how properties behave that are local to a class. For example, we can state that the class ‘Canadian’ is defined precisely as those members of the class ‘Person’ that have ‘Canada’ as a value of the property ‘Nationality’ (ibid.). OWL provides three increasingly expressive sublanguages: OWL-Lite, OWL-DL and OWL-Full. Every legal OWL-Lite ontology is a legal OWL- DL ontology, and every legal OWL-DL ontology is a legal OWL-Full ontol- ogy. OWL-DL – the Description Logic style of using OWL – allows maxi- mum expressiveness while permitting efficient reasoning support and guar- anteeing decidable inference. OWL-Lite consists of a subset of the OWL- DL constructors that eliminates some computational complexity problems during the inferencing process, but which has restricted expressivity. OWL- Full provides maximum expressiveness with no computational guarantees (Perry, 2008). Classes in OWL are defined using the ‘owl:Class’ element, which is a subclass of ‘rdfs:Class’ (Fig. 3.3). Properties are distinguished between ob- ject properties, which relate objects to other objects, and datatype proper- ties, which relate objects to datatype values (Antoniou and Van Harmelen, 2004). For a detailed overview of the design of OWL and its constructs see Horrocks et al. (2003).

rdfs:Resource

rdfs:Class rdf:Property

owl:Class owl:ObjectProperty owl:DatatypeProperty

Figure 3.3. Subclass relationships between OWL and RDF/RDFS.

3.4.4 A Graphical Notation for RDF/OWL Ontologies

Lacking standards for the graphical notation of RDF/OWL-based ontol- ogy models, a custom presentation language to describe the elements of 3.4. Semantic Web 63 the platform ontologies is presented and used throughout this thesis. The notation is borrowing concepts from DLG21, a graphical presentation lan- guage for RDF and OWL, but has been tailored and extended to address the requirements in the context of this work. To give a short introduction to the notation, Fig. 3.4 shows a simple ontology model.

http://hpi-web.de/ns/dstore/example/ @prefix xsd: http://www.w3.org/2001/XMLSchema# @prefix ext: http://www.example.com/namespace#

attribute xsd:int ParentClass ext:ParentClass owl:DatatypeProperty

Domain

relation Range owl:ObjectProperty

Figure 3.4. A graphical notation for RDF/OWL-based ontologies.

In this notation, the ontology namespace along with a number of prefix definitions for reused ontologies is stated on top of the graphical representa- tion. Class definitions are represented as boxes and property types are rep- resented as flat, acuminated shapes. Associated superordinates for classes and properties are listed above the particular definition, while additional type allocations are appended below. The example ontology in Fig. 3.4 fea- tures a class ‘Domain’, which is a subclass of a class ‘ParentClass’ defined in the ‘ext’ namespace. Note that internal superclass/subclass relationships within a namespace can also be expressed by a connecting arrow as shown for the classes ‘ParentClass’ and ‘Range’ in the ontology namespace. The ontology also defines two properties, a data property ‘attribute’ and an ob- ject property ‘relation’. The latter defines a relationship type between two resource classes. The domain and range types of a property are indicated by inbound and outbound connections to the associated resources. In this example, the ontology specifies that all instances are of type ‘Domain’ if they are subject of a property ‘relation’. Likewise, the range of ‘relation’ asserts that targeted node instances of that relationship are of type ‘Range’ (and hence of type ‘ParentClass’). In the case of data-valued properties, the type of the value range is stated on the outbound side of the property (e.g., ‘xsd:int’ for property ‘attribute’). With these graphical primitives, an adequate overview of the ontology specifications that form the foundation of Team Collaboration Networks can be provided. The following sections make use of this notation in the

1 Directed Labeled Graph 2: http://www.charlestoncore.org/dlg2/ 64 Technological Foundations introduction of ontologies for virtual collaboration activities and their or- ganization in a monitoring instrument.

3.5 Chapter Summary

The chapter has given basic definitions for the terms and technical concepts relevant for the remainder of this work. REST has been introduced as a resource-oriented architectural style that facilitates lightweight integration of loosely-coupled services (resources) through an unified resource inter- face and stateless client-server communication. The abstract concept of a descriptive resource has been introduced as an approach to provide meta- information about any other resource, e.g., about its history and handling in a collaboration process. It has been further explained how the Web, as an incarnation of a resource-oriented system, has been enriched with formal languages and semantics that allow to specify information about resources and their re- lationships in form of ontologies. RDF and OWL define the current stan- dards for this Semantic Web. A graphical notation for the description of RDF/OWL-based ontologies has been presented. Part II

Models for Team Collaboration Capture

4 Team Collaboration Networks

This chapter introduces Team Collaboration Networks (TCN), a graph- based model to describe the relationships between heterogeneous informa- tion resources and actors during a collaboration process1. A TCN comprises a concerted vocabulary of node and relationship types and the individual instantiations of these concepts as they appear over time. The model is cu- mulative, meaning that the temporal evolution of a network is reproduced in the model by the incremental recording of any changes in the set of nodes and edges, while maintaining historical information about preceding network states. With the structural and temporal properties of Team Collaboration Net- works defined in this chapter, a medium for capturing unstructured team interactions in virtual collaboration environments is presented. A map- ping between TCN representations and the OWL Web Ontology Language establishes a computer-processable representation of chronological collab- oration activities in distributed teams. This is the basis for the analysis of virtual team collaboration as outlined later in this work.

4.1 Foundations

Team Collaboration Networks describe a graph structure to express di- rected relationships (edges) between persons and/or information resources (nodes). Nodes and edges can be assigned with an arbitrary number of properties in form of literal attribute values. Nodes, edges, and attributes represent the basic building blocks of a network. Every instance of a node, edge, or attribute is typed, i.e., semantically classified by one or more ter- minological concepts. For example, a node can be defined to be of the type ‘Person’, ‘Email’, or ‘Wiki page’. A relationship between nodes might be classified by the types ‘has sent’, ‘is replying to’, or ‘is author of’. At- tributes, e.g., represent literal properties such as an ‘email address’, or a ‘username’.

1 Initially termed Team Communication Networks in previous publications (Uflacker and Zeier, 2008c; Uflacker, 2009), I decided to amend the notion to Team Collaboration Networks to avoid ambiguity in the area of tele-communication networks. 68 Team Collaboration Networks

In order to maintain historical information about the temporal evolution of a network, the model implements a no-overwrite strategy by accumu- lating changes to the network structure and preserving the original state. For this reason, two timestamps are associated with every instance of a node, relationship, or attribute. This way, Team Collaboration Networks keep track of each element’s validity, i.e., the interval a node, relationship, or attribute value holds true for the depicted collaboration space.

Definition (4.1): Let the validity interval of a node, edge, or attribute in a Team Collaboration Network be determined by two time values tstart ∈ N and tend ∈ N ∪ ∞ with tend > tstart. The entity is considered valid at a given point t, if tstart ≤ t < tend. In particular, tend = ∞, if the end of the validity is undefined, i.e., if the entity has not (yet) been invalidated.

Thus, a Team Collaboration Network is defined by a set of type def- initions for nodes, relations, and attributes, and a set of according type instantiations, whose validity intervals specify the temporal dimension of the network structure:

Definition (4.2): A Team Collaboration Network is a directed la- beled graph TCN := (Tv,Te,Ta,V,E,A), where

• Tv is a set of node types to define the semantic classes of network ver- tices. • Te is a set of relationship types to define the semantic classes of network edges. • Ta is a set of attribute types to define the semantic classes of attributes, that can be assigned to node and relationships. • V is a set of labeled network nodes. Each node v ∈ V is a 4-tuple hid, types, tstart, tendi with id being the unique label of the node and types being a list of assigned node types t ∈ Tv. tstart and tend define the validity interval of the node in the network. • E is a set of directed edges. Every edge connects any of two nodes vi, vj ∈ V to express a relationship of type r ∈ Te between vi and vj. Hence, every edge e ∈ E is a 5-tuple hs, r, o, tstart, tendi, where s ∈ V is a source node, r ∈ Te is a relationship type, and o ∈ V is a target node. tstart and tend define the validity interval of the relationship in the network. • A is a set of attributes. An attribute a ∈ A is a 5-tuple hs, p, val, tstart, tendi where s is either a node v ∈ V or an edge e ∈ E and represents the node or edge to which a is assigned to. p ∈ Ta represents the type of the attribute. val is the literal value of a. tstart, tend define the validity interval of the attribute in the network. 4.2. Temporal Network Properties 69

The elements Tv, Te, and Ta define the conceptual level of a Team Col- laboration Network. They describe its terminology in terms of a controlled vocabulary of node, relationship, and attribute types. V , E, and A de- fine the individual instantiations of the terminological constructs as they appear over time in a particular collaboration context. To give an example, Fig. 4.1 shows a Team Collaboration Network con- sisting of a set of four nodes (V := {1, 2, 3, 4}). Nodes 1 and 4 each rep- resent a person that is participating in the collaboration process. Node 2 depicts a Wiki page that has been created by node 1. Node 3 represents an email that has been sent by node 1 to node 4 and which is holding a refer- ence to the Wiki page. The terminology of this network is defined by Tv := {Person, Email, WikiPage}, Te := {hasSent, hasReceived, linkedFrom, cre- atedBy} and Ta := {mailbox, wikiuser, from, to, create user }. For the sake of clarity, the temporal properties of the displayed nodes, relationships, and attribute instances have been omitted from the figure. Details on the temporal aspects and the evolution of Team Collaboration Networks are provided in the next section.

VEmail

wikiuser: john from: [email protected] mailbox: [email protected] to: [email protected] 4 hasReceived 3

hasSent linkedFrom

wikiuser: paul mailbox: [email protected] create_user: paul

1 createdBy 2

VPerson VWikiPage

Figure 4.1. A Team Collaboration Network with different types and instances of nodes, rela- tionships, and attributes.

4.2 Temporal Network Properties

Team Collaboration Networks store the validity intervals of nodes, rela- tionships, and attributes by associating two timestamps tstart and tend with 70 Team Collaboration Networks every instance. The interpretation of these values depends on the actual type of instance and the way a TCN is prepared in a particular collabora- tion process. For a node of type ‘Email’, e.g., tstart can denote the moment at which the message has been sent. For a ‘Person’ node, it delineates the point in time an actor has joined the project or first appeared in the collab- oration process. For a relationship of type ‘editedBy’ between a ‘WikiPage’ and a ‘Person’, the value could indicate the time at which the page has been edited, etc. The end of a node’s validity (tend) implies that an object does no longer appear in the collaboration process. While this is unusual for ‘persistent’ objects such as archived emails, tend could for example define the time at which a file has been deleted from a shared storage drive. Likewise, an attribute is invalidated when its value has been changed and replaced by a new attributed instance. By definition, tend = ∞ as long as an entity has not been invalidated. With the validity intervals specified, a cumulative, non-overwriting model behavior can be established. If an entity is updated or removed from a network, the original instance is retained but marked as invalid by setting tend to the time of its modification. Accordingly, the actual repre- sentation of a Team Collaboration Network at a given point t is limited to those instances whose validity interval falls into t. With this approach, the model preserves the history of previous network states and maintains the traceability of changes in the network structure. The exploration of trends and the evolution of collaboration behavior over the course of a long-running project becomes feasible. The following directives outline a procedure to handle temporal network properties when a node, relationship, or attribute instance is inserted, up- dated, or removed from a Team Collaboration Network: • If instance a is inserted in a network at time t, then a is initialized with tstart = t and tend = ∞. • If instance a is changed to a0 at time t, then a is marked as invalid with 0 tend = t and a new instance a is created with tstart = t and tend = ∞. • If instance a is to be removed from a network at time t, then a is retained but marked as invalid with tend set to t. A basic example to demonstrate the temporal aspects of a Team Collab- oration Network is given by the series of three succeeding representations shown in Figs. 4.2a – 4.2c. The series starts with an earlier version of the example network presented in the last section (Fig. 4.2a). The network comprises a person (node 1) and a Wiki page (node 2), and asserts that the Wiki page has been created by the person. In the next iteration (Fig. 4.2b), the same person has sent an email (node 3) to a second person iden- tified by node 4. The email further contains a reference to the initially 4.2. Temporal Network Properties 71

wikiuser: paul mailbox: [email protected] create_user: paul

1 createdBy 2

from: [email protected]

VPerson VWikiPage

Figure 4.2a. Team Collaboration Network at time t − 1.

VEmail

wikiuser: john from: [email protected] mailbox: [email protected] to: [email protected] 4 hasReceived 3

hasSent linkedFrom

wikiuser: paul mailbox: [email protected] create_user: paul

1 createdBy 2

VPerson VWikiPage

Figure 4.2b. Team Collaboration Network at time t.

VEmail

wikiuser: john from: [email protected] mailbox: [email protected] to: [email protected] 4 hasReceived 3 editedBy hasSent linkedFrom

wikiuser: paul create_user: paul mailbox: [email protected] edit_user: john

1 createdBy 2

VPerson VWikiPage

Figure 4.2c. Team Collaboration Network at time t + 1. 72 Team Collaboration Networks created Wiki page. Apparently, the creator of the Wiki page wants to dis- seminate the provided information by sharing the URL of the resource. In the final iteration (Fig. 4.2c), the Wiki page has been edited by the receiver of the email, which becomes evident in a new ‘editedBy’ relationship and ‘edit user’ attribute. A general mechanism to represent versioning information about the ac- tual content of collaboration resources has intentionally been excluded from the model. This is grounded on the position that the management of re- source revisions in a collaborative process falls within the responsibility of the providing application. Information about the delta between two revi- sions should be provided by the collaboration software (e.g., as it is the case for many Wiki systems). Ideally, every revision of a resource is ex- posed by the application as a dedicated URI and hence would become a node in its own right. Revisions can then be associated to previous versions through according relationships, resulting in a more explicit representation of content changes in the Team Collaboration Networks. Through the selective exclusion of nodes, relationships, and attributes whose validity interval is outside of a given time point, a reconstruction of historical network states can be established. Later in Chap. 6, it is demon- strated how time point queries on TCN models can be efficiently realized on database level.

4.3 Representing Team Collaboration Networks in OWL

The Web Ontology Language OWL (cf. Sect. 3.4.3) defines a constrained vocabulary of classes, properties, and instances that can be used to describe Team Collaboration Networks. This section motivates OWL as an eligible TCN representation format and exemplifies the serialization of networks in the form of language-compliant ‘subject – predicate – object’ statements.

4.3.1 Motivation

A mapping between Team Collaboration Networks and OWL establishes a computer-processable description of actors, resources and relationships in distributed collaboration activities. Choosing OWL as the target represen- tation for TCNs can be justified by several reasons: Resource-Orientation: The purpose of Team Collaboration Networks is to make statements about the resources in a collaboration process. OWL is a vocabulary extension to the Resource Description Framework and is intended to define ontological statements about resources by means of class and property assertions. The underlying URI addressing scheme 4.3. Representing Team Collaboration Networks in OWL 73

fits naturally into the application context of describing online, Web- based team collaboration. Ontological Extensions: OWL ontologies are serialized as RDF graphs con- sisting of RDF triples. The triples present an universal linguistic con- struct to express arbitrary TCN properties in a generic format. Hence, the semantics that can be expressed in a TCN are decoupled from an application-specific data schema. The extension and adaptation of TCN terminology is simplified, which presents an advantage in versatile ap- plication domains such as engineering design. Description Logic Satisfiability: By limiting the vocabulary of TCN mod- els to the OWL-DL subset, the mapping is restricted to classes, data ranges, and facts, which are analogues of concepts, concrete datatypes, and axioms in description logics (Horrocks and Patel-Schneider, 2004). Therefore, the well-understood computational properties of description logics also apply to Team Collaboration Networks, allowing the appli- cation of formal reasoning and decision procedures. Tool Availability: OWL is widely recognized as a Semantic Web standard and supported by various tool implementations that provide efficient ontology handling and storage services. Sound, complete, and efficient reasoner modules are available to computationally infer consequences from TCN knowledge bases. The effect of resource-orientation is that resources become the primary entity in the OWL-based TCN representation. Every type and every in- stance defined in a TCN is represented by a resource and identified by an URI. Decomposed into a set of RDF/OWL statements, a Team Collabo- ration Network G := (Tv,Te,Ta,V,E,A) is described by a set of ‘subject – predicate – object’ triples. Each triple asserts an atomic fact about a subject resource, expressing a particular role or relationship to an object resource. The set of triples of a Team Collaboration Network define the knowledge base (KB) of a TCN. A knowledge base contains a set of termi- nological statements – called TBox – that define the conceptual network vocabulary, and a set of assertional statements – called ABox – that define the individual instances of nodes, relationships, and attributes (cf. Baader et al., 2003). Hence, statements defining elements Tv, Te, and Ta are part of TBox. Statements that specify the sets of instances (V , E, A) are part of ABox.

4.3.2 Terminological Components

The terminological part of a TCN knowledge base (TBox) defines the ex- isting types of nodes (Tv), relationships (Te), and attributes (Ta). The following examples make use of the standard namespace aliases for RDF 74 Team Collaboration Networks

(‘rdf’), RDF Schema (‘rdfs’), OWL (‘owl’), and XML Schema (‘xsd’). The ‘tcn’ prefix is used as a shorthand alias for an example Team Collaboration Network namespace.

Node Types

Node types provide semantic value to the nodes of a Team Collabora- tion Network, representing concepts for information classification and ac- tor roles on different levels of abstraction. The specification of the node type collection (Tv) is realized through the instantiation of OWL classes (‘owl:Class’), which are hierarchically ordered in subordinate / superor- dinate relationships. The root node type, and hence superordinate of all other node types is represented by the abstract class ‘tcn:Resource’, which is an implicit element in Tv. Subclass relationships are expressed by the ‘rdfs:subClassOf’ property. To give an example, the statements listed in Table 4.1 define two node types ‘tcn:Email’ and ‘tcn:Person’, as well as a special class of person identified as ‘tcn:TeamMember’.

Table 4.1. Statements declaring three node types Email, Person, and TeamMember.

Subject Predicate Object tcn:Resource rdf:type owl:Class tcn:Email rdf:type owl:Class tcn:Person rdf:type owl:Class tcn:TeamMember rdf:type owl:Class tcn:Email rdfs:subClassOf tcn:Resource tcn:Person rdfs:subClassOf tcn:Resource tcn:TeamMember rdfs:subClassOf tcn:Person

Relationship Types

The collection of relationship types (Te) describes the classes of associa- tions that can be defined between any two nodes of a Team Collaboration Network. One example is a ‘hasSent’ relationship that can exist between a person and an email node (Table 4.2). The ‘tcn:Relation’ concept serves

Table 4.2. Statements declaring a relationship type ‘hasSent’.

Subject Predicate Object tcn:Relation rdf:type owl:Class tcn:Relation rdfs:subClassOf owl:ObjectProperty tcn:hasSent rdf:type tcn:Relation tcn:hasSent rdfs:domain tcn:Person tcn:hasSent rdfs:range tcn:Email 4.3. Representing Team Collaboration Networks in OWL 75 as a parent class for all relationship types and is a specialization of the object property class defined in OWL. Using the properties ‘rdfs:domain’ and ‘rdfs:range’, the types of the source and target nodes are appointed. In this case, the source node of the ‘hasSent’ relationship is defined to be of type ‘tcn:Person’ and the target node is of type ‘tcn:Email’.

Attribute Types

The set of attribute types (Ta) describes the classes of literal attribute val- ues that can be assigned to the nodes and edges of a Team Collaboration Network. To give an example, we consider the attribute type ‘mailbox’, which is used to assign email addresses to persons (Table 4.3). Attribute types are defined as instances of the class ‘tcn:Attribute’, a subclass of ‘owl:DatatypeProperty’. Datatype properties link individuals to data val- ues. The value range of an attribute type is specified by the ‘rdfs:range’ property.

Table 4.3. Statements declaring an attribute type ‘address’.

Subject Predicate Object tcn:Attribute rdf:type owl:Class tcn:Attribute rdfs:subClassOf owl:DatatypeProperty tcn:mailbox rdf:type tcn:Attribute tcn:mailbox rdfs:domain tcn:Person tcn:mailbox rdfs:range xsd:string

Using the graphical notation introduced in Sect. 3.4.4, the type decla- rations of the previous examples (Tables 4.1 – 4.3) are summarized in Fig. 4.3.

http://hpi-web.de/ns/tcn/

@prefix xsd: http://www.w3.org/2001/XMLSchema#

Resource Resource

Person hasSent Email Relation

mailbox xsd:string Attribute

TeamMember

Figure 4.3. Graphical notation of node, relationship, and attribute types that define the ter- minological components (TBox) of a Team Collaboration Network. 76 Team Collaboration Networks

4.3.3 Assertion Components

The second category of OWL statements defines the assertional compo- nents in a TCN knowledge base (ABox) and completes the ontological specification of a Team Collaboration Network. The individual nodes (V ), relationships (E), and attributes (A) represent instantiations of the termi- nological concepts in TBox. In the following examples, the ‘ex’ prefix is used to denote a fictitious namespace for collaboration resources.

Node Instances

Every node v := hid, types, tstart, tendi ∈ V in a Team Collaboration Net- work is a proxy for a distinct resource that is part of a collaboration process. Nodes provide semantic meta-information about resources by determining a set of associated node types types ⊆ Tv. In an OWL-based representa- tion, collaboration resources are identified by an URI, so that a surjective mapping exists between the set of node instances (node labels) and the set of resource URIs. The surjection implies that two different node instances with distinct labels id1 and id2 can map to the same URI. However, in this case the validity intervals of the two nodes must not overlap in order to prevent ambiguous meta-information about a resource at a given point in time. A node is instantiated by stating that a resource is of the predefined type ‘tcn:Resource’. For every node type t ∈ types, a further ‘rdf:type’ association between the resource and t is inserted into the knowledge base. Table 4.4 gives an example: a fictitious resource ‘ex:email’ is associated with the node type ‘tcn:Email’ (a subclass of ‘tcn:Resource’). Note that the URI of the email resource does not necessarily need to point to an actual representation of the message. However, it can, for example, point to the resource of a Web-based email client or archive, through which the content of the email is accessible. The network-internal unique label of a node is assigned through the ‘tcn:instanceId’ property.

Table 4.4. Statements declaring a network node for a resource identified by ‘ex:email’.

Subject Predicate Object ex:email rdf:type tcn:Resource ex:email rdf:type tcn:Email ex:email tcn:instanceId “3” ex:email tcn:validFrom “2009-06-03T11:31:22” ex:email tcn:validTo “INF”

The validity interval of a node is asserted by two properties ‘tcn:validFrom’ and ‘tcn:validTo’ to define tstart and tend, respectively. The time values are 4.3. Representing Team Collaboration Networks in OWL 77

encoded in a standardized date string format. In case of tend = ∞, the special string ‘INF’ is used to indicate that a node has not been removed from the network. If a node is eventually invalidated at all largely depends on the nature of the resource it is representing. Archived emails, e.g., have a persistent characteristic and usually ‘stay’ in the information space once they have been sent or received. Other collaboration resources are more volatile. For example, Web resources or shared files in public team folders happen to be more frequently replaced or removed.

Relationship Instances

A relationship instance of type P between two resources S and O is ex- pressed in RDF/OWL through a single statement (S,P ,O). For example, to state that the resource ‘ex:wikipage’ has been created by a person de- picted by the resource ‘ex:Paul’, one can insert the triple ‘ex:wikipage – tcn:createdBy – ex:Paul’ to the knowledge base. However, a TCN relation- ship e := hs, r, o, tstart, tendi ∈ E requires two additional values tstart and tend to be associated with the statement. To support statements about statements, RDF provides a mechanism called ‘reification’ (W3C, 2004b). Reification is a technique that can be used in order to represent relations with arity greater than two (Andronikos et al., 2009). The principle behind reification is that statements themselves become a resource. A statement is decomposed (‘reified’) into a set of four cohering replacement triples, the so called ‘reification quad’. A reified statement becomes itself a re- source and is identified by a dedicated URI. This resource is the subject of the four triples. The first triple identifies the new resource to be of type ‘rdf:Statement’. The three other triples express the original statement via ‘rdf:subject’, ‘rdf:predicate’, and ‘rdf:object’ properties. The same subject URI can then be used to specify additional statement properties such as temporal dimensions. Table 4.5 shows a reified statement that is identified by the URI ‘tcn:relation 1’. The first four statements represent the reification quad, which asserts that a resource ‘ex:wikipage’ holds a ‘tcn:createdBy’ rela- tionship to the resource ‘ex:Paul’. The example implies that the source, predicate, and object resources are defined as network nodes and relation- ship types. To specify the validity interval of the relationship, two addi- tional properties ‘tcn:validFrom’ and ‘tcn:validTo’ have been appended to the knowledge base.

Attribute Instances

The specification of TCN attribute instances in OWL is analogue to that of relationships. The last example shows the specification of an attribute that 78 Team Collaboration Networks

Table 4.5. A reified statement to define the validity interval of a ‘tcn:createdBy’ relationship between two resources ‘ex:wikipage’ and ‘ex:Paul’.

Subject Predicate Object tcn:relation 1 rdf:type rdf:Statement tcn:relation 1 rdf:subject ex:wikipage tcn:relation 1 rdf:predicate tcn:createdBy tcn:relation 1 rdf:object ex:Paul tcn:relation 1 tcn:validFrom “2009-06-01T09:15:59” tcn:relation 1 tcn:validTo “INF” is identified by the URI ‘tcn:attribute 1’ and which is of type ‘tcn:mailbox’ (Table 4.6). Like relationships, attributes are expressed in form of reified statements in order to support the annotation of temporal information. However, attributes are datatype properties in OWL, meaning that the ob- ject of the statement is a literal value rather than another node instance. In this case, the literal “[email protected]” is assigned to a resource identified as ‘ex:Paul’. Again, the example assumes that ‘ex:Paul’ is the identifier of a network node and that ‘tcn:mailbox’ represents an attribute type defined in TBox.

Table 4.6. A reified statement to define the validity interval of a ‘tcn:mailbox’ attribute.

Subject Predicate Object tcn:attribute 1 rdf:type rdf:Statement tcn:attribute 1 rdf:subject ex:Paul tcn:attribute 1 rdf:predicate tcn:mailbox tcn:attribute 1 rdf:object “[email protected]” tcn:attribute 1 tcn:validFrom “2009-06-01T08:00:00” tcn:attribute 1 tcn:validTo “INF”

4.4 Chapter Summary

This chapter has introduced Team Collaboration Networks, a flexible data structure to describe the classes, occurrences, and relationships of shared information resources and actors during a collaboration process. It has further demonstrated how TCNs can be represented as a set of RDF/OWL triples. The next chapter is dealing with the organization of these triples in a system that facilitates the concurrent management of multiple TCN instances and the reuse of common terminology in different collaboration environments. It provides a blueprint for an adaptable and configurable TCN implementation, in particular for the software platform presented later in this work. 5 An Ontology System for Team Collaboration Networks

A mapping for Team Collaboration Networks has been demonstrated, in which temporal collaboration activities are represented as RDF graphs. Multiple of such graphs need to be concurrently organized and maintained if the activities of different project teams are to be captured and evalu- ated in parallel. Furthermore, projects within the same organization often share collaboration infrastructure and use similar groupware. This creates demand for an integrated system of TCN instances, which is able to set up and handle multiple independent knowledge bases and to reuse common terminological concepts across different networks. In this chapter, I intro- duce a generic structure for such a system that facilitates flexibility and a low configuration footprint for individual network instances.

5.1 Foundations

A Team Collaboration Network system (TCN-S) manages a collection of individual Team Collaboration Networks (Uflacker, 2009). It contains a configurable set of ‘domain ontologies’, which provide a vocabulary for the description of common collaboration activities. A domain ontology en- capsulates node, relationship, and attribute types that apply to a specific communication channel, groupware, or collaboration platform. For exam- ple, an ‘email’ domain ontology would specify a vocabulary to describe the structure of email-based conversations. The types defined in a domain ontology can be dynamically mapped into the graphs of Team Collabora- tion Networks and become available to describe the individual instances of these concepts in a concrete project scenario. A TCN-S further consists of a set of inference rules, which define logic to derive implicit facts from the knowledge bases of Team Collaboration Networks. Inference rules represent a formal specification of universal if – then dependencies among domain-specific concepts. For example, if the value of an attribute ‘from’ of a node with type ‘Email’ matches the value of another node’s ‘mailbox’ attribute, then a ‘hasSent’ relationship can be inferred between the latter node and the email node. 80 An Ontology System for Team Collaboration Networks

Definition (5.1): A Team Collaboration Network System is a 4- tuple TCN-S := (graphs, domains, rules, map), where

• graphs is a set of Team Collaboration Networks {TCN1, ..., T CNi}. • domains is a set of domain ontologies {Dom1, ..., Domj}. Each ontology describes node, relationship, and attribute types that apply to a specific domain of collaboration channels. • rules is a set of inference rules {r1, ..., rk}. • map is a set of tuples {m1, ..., ml} with m1..l ∈ (domains × graphs) ∪ (rules × graphs). Each tuple (a, b) introduces domain ontology or rule a to network b.

In the remainder of this chapter, a template for the organization of the TCN-S components graphs, domains, rules, and map by means of discrete named ontology graphs is introduced. It is demonstrated how a collection of independent TCN instances can be assembled through the dynamic composition of such graphs. This suggests a data layout for TCN-S implementations that facilitate configurability and the re-use of common domain vocabulary across independent Team Collaboration Networks.

5.2 Named Graph Partitioning

Mapped to a set of OWL statements, a Team Collaboration Network con- stitutes an RDF graph that fully describes the configuration (concepts and instances) of a network. In the following realization of a TCN-S, the elements of a TCN are decomposed into a set of discrete ontological frag- ments, which separate general concepts from specific network properties. Each fragment is a ‘named graph’ (Carroll et al., 2005), an RDF graph which has a name assigned in the form of an URI reference, and which comprises a logically coherent set of either terminological or assertional facts. Accordingly, the graphs are categorized into different groups. Con- cept graphs contain terminological components, i.e., a vocabulary of classes, properties, and rules. Instance graphs define assertion components, i.e., the instantiations of classes and properties. On an orthogonal dimension, the graphs are further classified into system graphs with system-wide concepts or instances, and network graphs, which comprise facts specific to a partic- ular TCN (Fig. 5.1). The named graphs of a TCN system are hierarchically organized and composed through import operations. Imports merge the elements of one graph into another. More precisely, let imports(A,B) be defined as a tran- sitive operation that introduces the ontological statements in B to those in A. The transitivity of imports implies that if A imports B, and B imports C, then A imports the statements of both B and C. 5.2. Named Graph Partitioning 81

Concept Graphs Instance Graphs (TBox) (ABox)

Domain Ontologies Domain-specific imports &Domain-specifi Rule Graphs c TCN Ontology TCN Ontology System Graphs

TCN-S TCN-S Concept Graph Instance Graph

Domain Ontologies TCN1 TCN1 TCN 1 & Rule Graphs Concept Graph Instance Graph . . . Network . . . Graphs . . .

Domain Ontologies TCN n TCNn TCNn & Rule Graphs Concept Graph Instance Graph

Figure 5.1. Partitioning of a Team Collaboration Networks system into named graphs. So- cialization of common domain concepts and isolation of independent TCN instances is achieved through the transitive import of ontological fragments.

A dynamic composition of individual TCN knowledge bases according to map can now be achieved with the provided graph layout (Fig. 5.1). A set of global Domain Ontologies and Rule Graphs provides concepts and terminology that is available to all TCN instances. The collection es- tablishes a shared vocabulary and foundation for the description of com- mon collaboration activities across all networks. The global components are complemented by network-specific instances of domain ontologies and rules, that are mapped into the conceptual specification of single networks. A TCN-S Concept Graph provides general and recurring type definitions that are needed to describe a system of Team Collaboration Networks. Additional information about the concrete state and configuration of the system is defined in a TCN-S Instance Graph. For each instance TCN1...T CNn ∈ graphs, the system allots two network-specific graphs. A TCN Concept Graph imports the global type definitions and domain ontologies. Network-specific domain ontologies and rules can be imported into the conceptual domain of a network without affecting the individual state and type configuration of other networks. A TCN Instance Graph imports the terminological specifications and adds assertional facts to complete the specification of the network. In the following sections, the different graphs are explored in more de- tail. Short examples demonstrate how a TCN is composed from general 82 An Ontology System for Team Collaboration Networks and network-specific ontological fragments maintained in the system. The examples make use of graph names and aliases listed in Table 5.1.

Table 5.1. Graph names and aliases used in the following examples.

Alias Graph Name Description tcns-c http://hpi-web.de/ns/tcns-c/ TCN-S Concept Graph tcns-i http://hpi-web.de/ns/tcns-i/ TCN-S Instance Graph domain1 http://hpi-web.de/ns/domains/1/ Domain & Rule Graph 1 domain2 http://hpi-web.de/ns/domains/2/ Domain & Rule Graph 2 tcn1-c http://hpi-web.de/ns/tcn1-c TCN1 Concept Graph tcn1-i http://hpi-web.de/ns/tcn1-i TCN1 Instance Graph

5.2.1 Domain Ontologies & Rule Graphs

Domain ontologies define node, relationship, and attribute types that can be imported into one or more Team Collaboration Networks. Each ontol- ogy describes a vocabulary specific to a certain communication channel or project setting. For example, a domain ontology for email-based conversa- tions would define a node type ‘Email’, a relationship type ‘hasSent’, and an attribute type ‘mailbox’, among others. Table 5.2 adopts this exam- ple and shows definitions of email-related node, relationship, and attribute types as a set of RDF/OWL triples in the domain1 ontology.

Table 5.2. Statements of a domain ontology graph ‘domain1’ to declare common node, rela- tionship, and attribute types for email-based conversations.

Subject Predicate Object http://hpi-web.de/ns/domains/1/ rdf:type owl:Ontology domain1:Email rdfs:subClassOf tcns-c:Resource domain1:sender rdf:type tcns-c:Relation domain1:sender rdfs:domain domain1:Email domain1:sender rdfs:range tcns-c:Person domain1:sender owl:inverseOf dom1:hasSent domain1:mailbox rdf:type tcns-c:Attribute domain1:mailbox rdfs:domain tcns-c:Person domain1:mailbox rdfs:range xsd:string domain1:from rdf:type tcns-c:Attribute domain1:from rdfs:domain domain1:Email domain1:from rdfs:range xsd:string

With the terminological definitions provided in domain ontologies, a shared vocabulary of common network building blocks is established in the system. Domain ontologies can extend and build up on each other, thereby creating rich semantic descriptions of concepts and relationships in a collaboration process. The description of domain-specific properties and 5.2. Named Graph Partitioning 83 conditions can be further supported by means of inference rules, provided in form of antecedent ⇒ consequent statements. E.g., given the above domain model, the existence of a ‘domain1:sender’ relationship between any two nodes ‘a’ and ‘b’ can be computationally inferred, if the value of a’s ‘domain1:from’ attribute matches the value of b’s ‘domain1:mailbox’ attribute. This is expressed by the following rule statement: domain1:from(a,x) ∧ domain1:mailbox(b,x) ⇒ domain1:sender(a,b) It states that for any node a with an attribute of type domain1:from and value x and any node b with an attribute of type domain1:mailbox and the same value x, there exists a domain1:sender relationship between a and b. In this case, a would be a node of type ‘domain1:Email’ and b would be of type ‘tcns-c:Person’. Rules can be formulated as a graph through an RDF-compliant rep- resentation format defined by the Semantic Web Rule Language (SWRL) (W3C, 2004g). The classes and properties defined in the SWRL namespace support the specification of rules in the form of implications (‘swrl:Imp’) resulting from an antecedent (‘swrl:body’) and consequent (‘swrl:head’). To give an example, Listing 5.1 shows the above rule expressed in SWRL. The interested reader is referred to the RDF/XML syntax specification (W3C, 2004e) for detailed information on how to map this rule definition to a set of RDF triples.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Listing 5.1. A SWRL inference rule in RDF/XML syntax. 84 An Ontology System for Team Collaboration Networks

5.2.2 The TCN-S Concept Graph

The TCN-S Concept Graph is a central component in the hierarchy of named graphs. General terminology to define a collection of Team Collab- oration Networks and their structure is defined here and imported into the knowledge bases of all TCN instances. It imports the global domain ontolo- gies and rules in order to pass a shared vocabulary down to the Team Col- laboration Networks. In the example shown in Table 5.3, one domain ontol- ogy is imported via the transitive ‘owl:import’ property provided by OWL. The graph further defines a class ‘tcns-c:TCNGraph’ to represent TCN in- stances in the system. The classes ‘tcns-c:Resource’, ‘tcns-c:Relation’, and ‘tcns-c:Attribute’ are defined to provide global, abstract parent classes for all types that are specified in domain ontologies or individual networks. The graph further defines a standard node type ‘tcns-c:Person’ to denote resources representing human actors.

Table 5.3. A TCN-S Concept Graph with basic class definitions.

Subject Predicate Object http://hpi-web.de/ns/tcns-c/ rdf:type owl:Ontology http://hpi-web.de/ns/tcns-c/ owl:imports http://hpi-web.de/ns/dom1/ tcns-c:TCNGraph rdf:type owl:Class tcns-c:Resource rdf:type owl:Class tcns-c:Relation rdfs:subClassOf owl:ObjectProperty tcns-c:Attribute rdfs:subClassOf owl:DatatypeProperty tcns-c:Person rdfs:subClassOf tcns-c:Resource

5.2.3 The TCN-S Instance Graph

The TCN-S Instance Graph asserts the individual state of the system and the composition of TCN instances. The necessary terminological constructs are defined in and imported from the TCN-S Concept Graph. The kind of information that is specified in the TCN-S Instance Graph depends on the actual implementation and functionality of the system. It may address, e.g., general aspects of the system state, such as user accounts and access rights for individual networks. In the example below, the graph instantiates a Team Collaboration Network (identified as ‘http://hpi-web.de/ns/tcn1’) and assigns two graphs as the network-specific concept and instance models (Table 5.4). With that, the statements assert that the two named graphs identified by the given URIs represent the terminological and assertional components of the TCN instance. 5.2. Named Graph Partitioning 85

Table 5.4. Statements of a TCN-S Instance Graph to assign two named graphs as concept and instance models to a TCN instance.

Subject Predicate Object http://hpi-web.de/ns/tcns-i/ rdf:type owl:Ontology http://hpi-web.de/ns/tcns-i/ owl:imports http://hpi-web.de/ns/tcns-c/ http://hpi-web.de/ns/tcn1 rdf:type tcns-c:TCNGraph http://hpi-web.de/ns/tcn1 tcns-c:conceptGraph http://hpi-web.de/ns/tcn1-c http://hpi-web.de/ns/tcn1 tcns-c:instanceGraph http://hpi-web.de/ns/tcn1-i

5.2.4 TCN Concept Graphs The concept graph of an individual TCN instance represents the termi- nological components Tv, Te, and Ta that are available to describe the occurrences and relationships of people and information in a collaboration process. It forms an ontology that is composed of the system-wide building blocks imported from the TCN-S Concept Graph, individually imported domain and rule models, as well as internal, network-specific type defini- tions (cf. Fig. 5.1). The example shown in Table 5.5 imports the obligatory TCN-S Concept Graph to receive the global type definitions, as well as one additional domain ontology to further extend the network vocabulary. A private node type ‘tag’ is defined on network level for the individual annotation of resources.

Table 5.5. A TCN Concept Graph with import statements and a custom node type ‘tag’.

Subject Predicate Object http://hpi-web.de/ns/tcn1-c rdf:type owl:Ontology http://hpi-web.de/ns/tcn1-c owl:imports http://hpi-web.de/ns/tcns-c/ http://hpi-web.de/ns/tcn1-c owl:imports http://hpi-web.de/ns/dom2/ http://hpi-web.de/ns/tcn1-c#tag rdfs:subClassOf tcns-c:Resource

5.2.5 TCN Instance Graphs The instance graph of a Team Collaboration Network completes the knowl- edge base by specifying the sets of individual instances V , E, and A. Using the type vocabulary imported from the associated TCN concept graph, it defines nodes, relationships, and attributes to form a network of col- laboration resources. Table 5.6 gives an example. The described TCN in- stance contains two nodes: a resource ‘ex:Paul’ of type ‘tcns-c:Person’, and a resource ‘ex:wikipage’, which is of type ‘domain2:WikiPage’. The graph further defines two attribute instances ‘domain2:wikiuser’ and ‘do- main2:create user’ for the two nodes, respectively. Though not explicitly asserted in the table, one can derive from the matching attribute values that the person identified as ‘ex:Paul’ is the 86 An Ontology System for Team Collaboration Networks

Table 5.6. A TCN Instance Model with two node instances and attributes.

Subject Predicate Object http://hpi-web.de/ns/tcn1-i rdf:type owl:Ontology http://hpi-web.de/ns/tcn1-i owl:imports http://hpi-web.de/ns/tcn1-c/ ex:Paul rdf:type tcns-c:Person ex:Paul tcns-c:instanceId “1” ex:Paul tcns-c:validFrom “2009-06-01T08:00:00” ex:Paul tcns-c:validTo “INF” tcn1-i:attribute 1 rdf:type rdf:Statement tcn1-i:attribute 1 rdf:subject ex:Paul tcn1-i:attribute 1 rdf:predicate domain2:wikiuser tcn1-i:attribute 1 rdf:object “paul” tcn1-i:attribute 1 tcn:validFrom “2009-06-01T08:00:00” tcn1-i:attribute 1 tcn:validTo “INF” ex:wikipage rdf:type domain2:WikiPage ex:wikipage tcns-c:validFrom “2009-06-01T09:15:59” ex:wikipage tcns-c:validTo “2009-06-04T16:14:27” ex:wikipage tcns-c:instanceId “2” tcn1-i:attribute 2 rdf:type rdf:Statement tcn1-i:attribute 2 rdf:subject ex:wikipage tcn1-i:attribute 2 rdf:predicate domain2:create user tcn1-i:attribute 2 rdf:object “paul” tcn1-i:attribute 2 tcn:validFrom “2009-06-01T09:15:59” tcn1-i:attribute 2 tcn:validTo “2009-06-04T16:14:27” creator of the Wiki page. In order to express this association in the net- work, an according relationship must be added to the graph. This can be realized by adding a reified relationship to the instance graph or through the application of an inference rule, analog to the one presented for sender relationships in Sect. 5.2.1. For example, assuming that such a rule is im- ported into the instance graph, a ‘domain2:createdBy’ relationship between the resource ‘ex:wikipage’ and the resource ‘ex:Paul’ can be computation- ally derived from the knowledge base. The resulting graph resembles the Team Collaboration Network that is depicted in Fig. 4.2a.

5.3 Chapter Summary

With the flexible composition of named graphs as outlined in the above examples, a system of independent Team Collaboration Networks can be maintained and organized. The outlined structure of a TCN-S presents a flexible approach to the concurrent monitoring, comparison, and analy- sis of different project teams by means of individually configured network instances. In order to transfer the concept of a TCN-S into a realistic ap- plication scenario, a prototypical implementation of a service-based Team Collaboration Network system has been implemented and is presented in the next chapter. Part III

System Implementation

6 d.store: A Resource-oriented Team Collaboration Network System

This chapter presents the d.store platform (Uflacker and Zeier, 2008a), a Java-based implementation of a Team Collaboration Network system. The software provides a service interface to continuously capture heterogeneous online collaboration activities via distributed sensor clients. At the same time, the services can be leveraged for immediate real-time monitoring and the historical analysis of complex characteristics in the virtual collaboration behavior of project teams. This software implementation is a foundation for the temporal evaluation of collaborative team activities, in particular for the pilot application in engineering design processes that I present later in this work.

6.1 Platform Architecture Overview

The chapter begins with an overview of the software architecture of the d.store platform. It describes its central components and assigns the func- tionality of the system to these components. The technical structure and dynamic characteristics are outlined to create a template for the prototyp- ical implementation of a resource-oriented Team Collaboration Network system. The system design of the d.store platform embodies a client-server ar- chitecture. Communication between clients and server is regulated by a resource-oriented service interface that adheres to REST principles (Sect. 3.2). Clients retrieve or manipulate the state of the resources provided by the platform to access, create, and manipulate Team Collaboration Net- works. Figure 6.1 gives an overview of the architecture and the different components, which are described in more detail below.

6.1.1 Client Applications The system classifies three types of client applications according to their primary purpose. ‘Sensor Clients’ continuously scan heterogeneous data sources in the collaboration infrastructure for events and feed information to the server. An event, in this case, is the occurrence of a specific col- laborative activity at a specific point in time, such as, e.g., ‘A sends an 90 d.store: A Resource-oriented Team Collaboration Network System

Project Team

d.store

Customized RDF/OWL Framework

TCN-S TCN-S Concept Instance RDBMS Graph Graph (n-tuple Store) Domain TCN TCN Concept Instance OntologiesRules Concept Instance & Rules GraphsModels GraphsModels

Collaboration Events Reasoner

Collaboration Infrastructure

Resource Controllers / REST API

RDF JSON R HTML R JSON R GraphML

Sensor Navigation Analysis Clients Clients Clients

Figure 6.1. The d.store platform architecture. A variable set of clients access and modify the state of Team Collaboration Networks via a RESTful server interface. email to B’ or ‘A updates document C’. The events are computationally captured by the clients, e.g., by processing server log files, archives, or data provided by collaboration tool APIs. ‘Navigation Clients’ support brows- ing and exploring Team Collaboration Networks. They provide a central point of access to resources, relationships, and attributes and give users a contextual understanding about the concurrent and distributed activities in the process. ‘Analysis Clients’ query existing TCN structures to process and quantify complex network properties during and after data collection. This allows the observation of specific aspects in the collaboration behav- ior and the evaluation of network properties to support different use cases (e.g., monitoring, notifications, or statistics).

6.1.2 RDF/OWL Graph Component

Chapter 5 has demonstrated how a system of Team Collaboration Net- works can be represented as a composition of modular RDF/OWL graphs. The d.store platform adopts this strategy and utilizes the ‘Jena Seman- tic Web Framework’ (henceforth ‘Jena’) to programmatically handle and organize collections of RDF triples. Jena is a Java-based implementation of standards and recommendations for the Semantic Web, including RDF, RDF Schema, and OWL. The framework provides a comprehensive API 6.1. Platform Architecture Overview 91 for storing and querying RDF/OWL-based graph models and for the in- tegration of decision procedures and rule processors via external reasoner components. (cf. Carroll et al., 2004; Wilkinson et al., 2003). Reasoners ap- ply graph-to-graph transformations on TCN instance models to compute enriched knowledge bases that include logically derived statements. By re- stricting the asserted graphs to OWL-DL compliant statements, the class of computationally decidable facts in a TCN knowledge base comprises useful derivations such as type hierarchies (is ‘A’ a subclass/subproperty of ‘B’?), instance checking (is resource ‘A’ of type ‘B’?), or inverse rela- tionships (for more details about decidability in OWL see Horrocks et al., 2003). TCN instances keep track of the moment an element has been added or removed from a Team Collaboration Network in order to maintain temporal aspects of the model. The necessary association of an validity interval to any network node, relationship, and attribute creates room for optimizing the storage of RDF data. To improve the query performance, the d.store platform continuously keeps a copy of the momentarily valid network state in memory (instances with tend = ∞). The historical graph structures are journalized in a non-overwriting, relational database management system (RDBMS). This enables the efficient answering of common read requests directly from data that is resident in main memory, while previous states of the networks are restored from the persistent storage on demand. In addition, the following preparations have been carried out to support the handling and practicable exploration of temporal characteristics in TCN data structures. • Modification of the relational triple schema to allot two additional columns ‘ValidFrom’ and ‘ValidTo’ to every RDF statement. • Modification of the Jena Semantic Web framework to consider validity intervals in the implementation of RDF triples and to support time point queries into the relational data storage to recover historical TCN representations. • Implementation of stored procedures to effect a non-overwriting and auto-journaling execution of graph operations on database level. Detailed presentations of these modifications to the RDF/OWL sub- component and the database layer are given in Sect. 6.3.

6.1.3 Service Interface

A resource-oriented service interface gives client applications access to the functionality that is needed to create, read, update, and delete elements of a Team Collaboration Network system. Requests are dispatched to URI- referenced resources, representing logical entities such as, e.g., a Team Col- 92 d.store: A Resource-oriented Team Collaboration Network System laboration Network, a type, a set of nodes, or a single node instance. Each resource provides a uniform set of operations that can be used to retrieve or manipulate its state by means of exchanging resource representations between clients and server (REST, cf. Fielding, 2000). The underlying components for the implementation of this architectural style in d.store is the Hypertext Transfer Protocol (HTTP/1.1, Fielding et al., 1999b) and platform-independent representation formats such as the JavaScript Ob- ject Notation (JSON). The decision to use HTTP/1.1 as the application protocol for d.store is obvious: The protocol provides a basis for large-scale distributed hypermedia applications and is the foundation of the World Wide Web as it is experienced today. Hence, the integration of the plat- form into a wide range of Web-based collaboration landscapes is facilitated. Section 6.4 gives an overview of the different d.store resources and shows examples for the utilization of the interface. A complete reference of the platform API, the provided resources, and their representations is listed in Appendix C.

6.2 The d.store Concept Graph

The concept graph of a TCN-S defines general concepts and properties that are required to describe a system of Team Collaboration Networks (cf. Sect. 5.2.2). This section presents the TCN-S Concept Graph as it has been specified in the d.store platform, providing a terminological basis for other network components that are introduced later. Figure 6.2 visualizes the ontological concepts and relationships in the familiar graphical notation (Sect. 3.4.4). The classes ‘TCNConceptGraph’, ‘TCNInstanceGraph’, and ‘DomainGraph’ represent the set of named graphs as introduced in Sect. 5.2. A Team Collaboration Network is rep- resented by the ‘TCNGraph’ concept. Together with the three properties ‘conceptGraph’, ‘instanceGraph’, and ‘domainGraph’, the occurrence of a TCN and its individual composition of concept and instance graphs can be described. The platform also establishes a basis for a simple user account and authorization management by introducing the notion of a ‘TCNSUser’. The ‘Resource’ class provides an abstract root class for all node types that are defined in a Team Collaboration Network. The platform distin- guishes between ‘DomainResource’ types that are defined by domain on- tologies, and ‘CustomResource’ types, which are specified on network level to extend the vocabulary of single networks. The concept graph also defines the global node type ‘Person’, which represents the set of individual human actors in the collaboration process. The numerical ‘instanceId’ property re- flects the node identifier id that is assigned to every node instance (Def. 4.2, p. 68). 6.2. The d.store Concept Graph 93

http://hpi-web.de/ns/dstore/ @prefix owl: http://www.w3.org/2002/07/owl# @prefix xsd: http://www.w3.org/2001/XMLSchema# @prefix foaf: http://xmlns.com/foaf/0.1/

owl:Ontology

Graph

conceptGraph TCNGraph TCNConceptGraph owl:ObjectProperty

user instanceGraph owl:ObjectProperty TCNInstanceGraph owl:ObjectProperty

foaf:Person domainGraph TCNSUser DomainGraph owl:ObjectProperty

DomainResource Resource CustomResource

foaf:Person instanceId xsd:positiveInteger Person owl:DatatypeProperty owl:FunctionalProperty

owl:ObjectProperty

DomainRelation Relation CustomRelation

owl:DatatypeProperty DomainAttribute CustomAttribute Attribute

AliasAttribute AliasReferenceAttribute aliasAttribute

Figure 6.2. The TCN-S Concept Graph of the d.store platform.

The same differentiation that applies to node types (domain vs. custom resources) also applies to relationship and attribute types. For the lat- ter, two additional classes are introduced to further characterize attribute types. An ‘AliasAttribute’ is an attribute that defines a personal name alias for a resource, e.g., a user account name or email address assigned to a person. Complementary, the ‘AliasReferenceAttribute’ is an attribute that gives a literal reference to an alias, e.g., the ‘to:’-field of an email or a user name that appears in the list of editors for a particular Wiki page. Matching AliasAttributes and AliasReferenceAttributes are specified by the ‘aliasAttribute’ property. Having this knowledge explicitly encoded in domain ontologies allows to optimize the processing of network opera- tions at run-time. For example, when adding a new email resource with an AliasReferenceAttribute ‘to’, whose value cannot be matched with that of an according AliasAttribute ‘mailbox’ of any other resource, a second 94 d.store: A Resource-oriented Team Collaboration Network System resource can automatically be created to represent the previously unknown receiver.

6.3 Processing Temporal Network Properties

This section describes the necessary actions to realize an efficient repro- duction of temporal TCN properties in an RDF-based data model. By definition, Team Collaboration Networks keep historical audit trail information about when a particular node, relationship, or attribute has been added to or removed from the network. With this temporal informa- tion preserved in the data structure, the evaluation and reconstruction of historical network properties via time-point queries becomes feasible. The consequence of this temporal annotation is that, when mapping Team Col- laboration Networks to RDF-based triple sets, statements about the point in time other statements have become valid or invalid must be made. We refer to this meta-information as the ‘temporal provenance’ of a statement. An RDF-compliant way to make a statement itself the subject of another statement is given through reification (Sect. 4.3.3). Temporal provenance data can be associated to a reified statement using its URI as the subject and time values as the object of validity properties. Introducing the notion of time to RDF graphs via reification is a common approach and has been the subject of recent studies (e.g., Andronikos et al., 2009; Gutierrez et al., 2007). However, reification is inefficient to implement, because it multiplies the number of triples that must be processed and requires the generation and management of a globally unique ID for each triple (Futrelle, 2006). Since all nodes, relationships, and attributes stored in a TCN instance model require the determination of a validity interval, a considerable overhead introduced through the handling of temporal provenance can be expected. Therefore, an alternative solution has been chosen to bypass the exploding number of reified triple statements and to optimize the execution of time- point queries. The following sections describe the necessary adaptations and show how common classes of problems can be resolved by using SQL as an abstraction.

6.3.1 Storing Temporal RDF Statements

Collections of RDF statements are commonly stored in a relational database to provide a persistency layer for applications processing the knowledge base. A widely-used schema for this is the triple store. In its simplest form, the triple store schema is a 3-tuple, resembling the structure of the exam- ple statements presented in the previous chapters. Each RDF statement is 6.3. Processing Temporal Network Properties 95 stored as a single record in a three-column statement table. Each column holds the character string of an URI, identifying the subject, predicate, and object resources of an RDF triple. Several extensions such as normal- ized and denormalized triple stores have been proposed and implemented to improve the scalability of this approach (e.g., Wilkinson et al., 2003). The comprehensive audit trail that is established by recording the tem- poral provenance of all elements in a TCN multiplies the number of RDF statements required to describe the state of each network. In a standard triple store, each node requires at least three triples to define its type and the two timestamps tstart and tend. For the relationships and attributes of a TCN, the number of necessary triples is multiplied by a factor of six: the reification quad plus two triples for the start and end of the validity interval. Additional overhead is generated by the management of URIs as- sociated with the reified statements. Relational schemas exist to optimize the storing of reified statements (e.g., Carroll et al., 2004), but can only partially diminish these costs. The validity intervals of nodes, relationships, and attributes do not only affect the storage volume of a triple store. At run-time, the values determine whether an element matches a given time-point query, thus requiring the execution of self-join operations on the statement table (Listing 6.1). Pre- senting an inherent characteristic of searching RDF data, join operations pose a performance challenge already for medium-sized datasets (Abadi et al., 2007). However, time-point queries are expected to be commonplace in the context of Team Collaboration Networks, as read requests either target the current state of the network or some historical snapshot. While recent studies already present efficient optimizations for join op- erations in RDF datasets (Neumann and Weikum, 2009), the approach chosen in the implementation of d.store aims for a general avoidance of self-joins in the execution of time-point queries. For this purpose, the triple store schema used in d.store has been extended with two additional columns ‘ValidFrom’ and ‘ValidTo’, resulting in a 5-tuple schema for storing the va-

1 SELECT p1 . s u b j e c t 2 FROM tcn AS p1, tcn AS p2, tcn AS p3 3 WHERE p1.Predicate = ’rdf:type ’ 4 AND p1.Object = ’dstore:Resource ’ 5 AND p1.Subject = p2.Subject 6 AND p2.Predicate = ’dstore:validFrom ’ 7 AND p2.Object <= ’2009 −06 −01 ’ 8 AND p1.Subject = p3.Subject 9 AND p3.Predicate = ’dstore:validTo ’ 10 AND p3.Object >= ’2009 −06 −01 ’;

Listing 6.1. The standard triple store schema imposes self-join operations on time-point queries: Example of a SQL query that finds all node instance resources valid on June 1st, 2009. 96 d.store: A Resource-oriented Team Collaboration Network System lidity interval of every node, relationship, or attribute directly on statement level. Having two time values explicitly stored with the subject, predicate, and object identifiers in one single record not only reduces the number of statements and alleviates the need for reification in a TCN knowledge base. The schema also eliminates join operations when filtering by valid- ity and supports the implementation of efficient journaling mechanisms on database level, as the next sections will show. Table 6.1 depicts the modified schema and gives an example for three time-annotated RDF statements. Note that the semantic content of this table is virtually identical to that of Tables 4.4 – 4.6.

Table 6.1. The 5-tuple schema assigns a validity interval to every record of an RDF statement, reducing the overall number of statements and allowing for efficient time-point querying of TCN components.

Subject Predicate Object ValidFrom ValidTo ex:email rdf:type tcn:Email 2009-06-02T11:31:22 infinity ex:wikipage tcn:createdBy ex:Paul 2009-06-01T09:15:59 infinity ex:Paul tcn:mailbox ”[email protected]” 2009-06-01T08:00:00 infinity

A consequence of this 5-tuple schema extension is that the validity in- tervals of nodes, relationships, and attributes are no longer represented as discrete RDF statements. The schema deviates from RDF-conformable representations and entails the need to modify the underlying semantic framework. These modifications are topic of the following sections.

6.3.2 Modifying the RDF/OWL Subsystem

To appropriately reflect and leverage the two time properties of each state- ment in the Jena subsystem, some modifications to the core data structures of the framework are necessary. The following gives a short overview of the relevant changes in the code. Modifications to the class structures primarily affect the Jena types ‘Graph’ and ‘Triple’. Graphs represent collections of Triple instances and define operations on these collections. Each triple comprises three ‘Node’ properties for the subject, predicate, and object of a statement. Two ad- ditional timestamp properties have been added to a ‘Triple’ to specify the beginning and the end of a statement’s validity interval. The original inter- face of a ‘Graph’ supports modification (add and delete triples) and access (test if a triple is present or list all triples matching some pattern). For ex- ample, the method find(Node S,Node P,Node O) returns an iterator over all the triples of the graph which match the triple (S, P, O). To ‘match’ means to be equal to or for the S, P, or O node to be a wildcard that matches any node. This allows the graph to be queried for all the properties of some 6.3. Processing Temporal Network Properties 97

Model

1 Graph find(Node, Node, Node) find(Node, Node, Node, Date)

* Triple Node subject Node predicate Node object Timestamp validFrom Timestamp validTo

Figure 6.3. Customized Jena types to address the n-tuple extension in a Triple. particular subject, all the predicates with some particular object, or all the triples in the graph (Carroll et al., 2004). These graph operations were complemented with time-parameterized versions that take the validity of a triple into account. The find operation has been overloaded to accept an extra match condition of type ‘Date’, specifying a point in time at which a triple must be valid in order to match. Figure 6.3 shows an excerpt of the extended class properties.

6.3.3 Modifying the Relational Storage Interface

Jena supports the dynamic creation of RDF models in a relational database system and interfaces the triples through a set of SQL commands. The database provides a persistent backend for the RDF models that are dy- namically loaded and processed into the internal data structures of the framework. To comprise the temporal validity properties of the modified Triple class, a number of preparations in the relational storage subsystem of the Jena framework are necessary. This includes (1) the modification of the relational database schema, (2) the implementation of stored procedures to organize validity intervals on database level, and (3) query modifications to be able to select those statements that are valid at a given point in time.

Preparing the Triple Store Schema

The d.store graph models are organized in a denormalized triple store schema (Wilkinson et al., 2003): depending on the length of an URI or literal value, the data is either stored directly in the statement table or referenced via foreign key relationships from dedicated literal and resource 98 d.store: A Resource-oriented Team Collaboration Network System tables. This hybrid approach presents a trade-off between a relatively slow, but storage-efficient normalized approach (requiring a 3-way join of state- ment, literals, and resources tables) and a plain triple store schema, in which all values are directly stored in the statement table. To give room for the schema extension presented above, Jena’s state- ment tables need to be extended with two time values that define the start and end of a statement’s validity interval. Listing 6.2 presents the modi- fied SQL command as it is dispatched by the framework to a PostgreSQL database to create the statement table for a TCN instance model. Two ad- ditional columns ‘ValidFrom’ and ‘ValidTo’ are introduced to specify the validity of a statement in the TCN knowledge base. A statement is consid- ered to be valid at a given time t if t >= V alidF rom and t <= V alidT o. By definition, the special value ’infinity’ is the largest value in the range of the PostgreSQL data type ‘abstime’.

1 CREATE TABLE t c n 1 i n s t a n c e ( 2 Subj character varying(250) NOTNULL, 3 Pred character varying(250) NOTNULL, 4 Obj character varying(250) NOTNULL, 5 ValidFrom abstime , 6 ValidTo abstime DEFAULT ”infinity” );

Listing 6.2. Creating a statement table for a TCN instance graph.

Database-level Management of Time Properties

One of the underlying design decisions in the architecture of the d.store system was to decouple the platform as much as possible from the data management overhead introduced by the temporal TCN properties. The increase in complexity is caused by additional functionality, which is re- quired to define and update the validity intervals of added, updated, or removed instances. In d.store, this functionality has been delegated to the relational database system, where stored procedures implement the tem- poral network properties that have been described in Sect. 4.2. The recording of changes to data records directly in the relational ta- ble structure rather than in external system log files has its origin in the design of a no-overwrite storage manager (Stonebraker et al., 1990). This should be contrasted with a conventional approach where the update of a previous record in a table results in overwriting it with a new one. With a ‘no-overwrite’ strategy the old record remains in the database whenever an update occurs and serves the purpose normally performed by a write-ahead log. This creates the possibility to implement ‘time travel’: a feature that allows a user to perform a historical query and the database will automat- ically return information from the record valid at the correct time (ibid.). 6.3. Processing Temporal Network Properties 99

This functionality has been integrated into d.store’s PostgreSQL database by registering a stored procedure, which is triggered before any insert, up- date, or delete operation is executed on the statement table of a TCN instance graph. Accordingly, the Jena framework has been customized to bind the trigger function ‘timetravel’ to created graph models (Listing 6.3).

1 CREATE TRIGGER t i m e t r a v e l 2 b e f o r e INSERT or UPDATE or DELETE on t c n 1 i n s t a n c e 3 f o r each row 4 execute procedure 5 timetravel (ValidFrom, ValidTo);

Listing 6.3. Binding a trigger function ’timetravel’ to the statement table of a TCN instance model.

Any update of a TCN instance graph results in the execution of accord- ing INSERT, UPDATE, or DELETE commands dispatched by the Jena framework. The ‘timetravel’ procedure is triggered before any of these com- mands is committed to the rows of a statement table. The algorithm of the procedure is outlined below, assuming the execution on a row tuple (s, p, o, V alidF rom, V alidT o): • Before INSERT, if V alidF rom is NULL then set V alidF rom to current date. If V alidT o is NULL then set V alidT o to ’infinity’. Insert the tuple. • Before UPDATE, if V alidT o of original tuple is ’infinity’ then set V alidT o to current date and insert new tuple with update values s0, p0, o0 and V alidF rom0 set to current date and V alidT o0 set to ’infin- ity’. Skip update. • Before DELETE, if V alidT o is ’infinity’ then set V alidT o to current date. Skip deletion.

Querying the Statement Tables The previous sections have outlined the preparations of Jena’s data struc- ture and the relational database backend to accept time-point queries and to manage the temporal properties of TCN statements. In a final step, the SQL commands that are dispatched by the framework to load valid network elements from the modified statement tables need to be aligned accordingly. A differentiation can be made between graph model operations that query the network for elements that are valid at a given time point (via the newly-added methods in the graph interface) and those where a time point is omitted (i.e., via standard Jena calls). In the latter case, the state- ment table is queried for the latest and current description of the network, i.e., those records where ValidTo = ’infinity’. Listing 6.4 gives an example for a query that returns all statements that have not been invalidated. 100 d.store: A Resource-oriented Team Collaboration Network System

1 SELECT S.Subj, S.Pred, S.Obj, S.ValidFrom, S.ValidTo 2 FROM t c n 1 i n s t a n c e S 3 WHERE S.ValidTo = ’infinity ’;

Listing 6.4. Querying the currently valid statements from the statement table of a TCN instance model.

Historical time-point queries can be realized by parameterizing the SELECT statement with conditions for the fields ValidFrom and ValidTo. In other words, the Jena framework retrieves and processes only those statements from the knowledge base, whose validity intervals fall within a given time frame or date. Listing 6.5 shows how a date parameter that is passed to the find(S,P,O,Date) method of a graph determines the filtering of records according to the validity intervals specified in the ‘ValidFrom’ and ‘ValidTo’ fields.

1 SELECT S.Subj, S.Pred, S.Obj, S.ValidFrom, S.ValidTo 2 FROM t c n 1 i n s t a n c e S 3 WHERE S. ValidFrom <= ’2009 −06 −01’ AND S.ValidTo >= ’2009 −06 −01 ’;

Listing 6.5. Querying statements from a statement table that have been valid on June 1st, 2009.

As visible in these examples, the temporal filtering of TCN knowledge bases no longer requires the execution of self-join operations (cf. Sect. 6.3.1). Retrieving a valid snapshot of a Team Collaboration Network for a given point in time can be achieved in a single table scan.

6.3.4 Advantages and Disadvantages of the Approach

With two extra time fields appended to the triple store schema, the pre- sented 5-tuple approach breaks out of the standard RDF representation model. The temporal provenance of network nodes, relationships, or at- tributes is no longer expressed through explicit semantic facts within the RDF knowledge base. In other words, time is not a semantic element in the network representation. This makes logical reasoning on temporal at- tributes complicated and can be considered a disadvantage of this approach. In return, the 5-tuple schema facilitates the time-based slicing of TCN instance models on database level. The join complexity of interval and time point queries (e.g., “What was the state of the network on June 1st, 2009?”) is reduced compared to a conventional triple store. The neces- sary no-overwrite storage behavior and time travel functionality can be efficiently implemented by means of stored procedures. At the same time, the schema extension significantly reduces the need for statement reifica- tion and thus decreases the total number of statements stored in the tables. 6.4. Implementing the Service Interface 101

The integration of this approach into an existing RDF/OWL software com- ponent has been demonstrated. In the next section, it is shown how the non-graph-based determination of temporal network properties is realized in d.store via an URI-based filtering of network instances.

6.4 Implementing the Service Interface

This section introduces the resource-oriented service interface of the d.store platform. Short examples indicate how client applications can leverage the functionality of the system to query and manipulate Team Collaboration Networks. A complete reference of the d.store API can be found in Ap- pendix C.

6.4.1 Platform Resources

The resources provided by the d.store platform can be generally classified into system resources, i.e. resources that represent basic aspects of the sys- tem state (e.g., the collection of all networks), and network resources, which represent one or more logical entities of a TCN instance. The latter could be, e.g., a network type, a collection of nodes, relationships, or attributes that satisfy certain characteristics, or an individual instance. Hence, the set of resources provided by the platform is not statically quantifiable, but depends on the system configuration and state of the networks at run-time. However, the collection of available resources can be defined by describing the syntax of valid URL paths, as presented in Listing 6.6.

1 path =’/’ [’login’ | ’ l o g o f f ’ | (’graphs’ [tcn]) ]; 2 tcn = ’ / ’ t c n i d ’ / ’ ( date | ’now’) ’/’ [tcn el em en t ] ; 3 tc n element = (nodes | node types | r e l a t i o n t y p e s | 4 a t t r i b u t e t y p e s ) ; 5 nodes = ’resources’ [node | f i l t e r ] ; 6 node =’/’node id [ n o d e r e l a t i o n s | n o d e attributes ];

Listing 6.6. Examples of valid d.store URL paths (syntax in extended Backus-Naur form).

Besides general system resources for session and graph management (which are not discussed in detail) the provided excerpt of the grammar also describes paths of network-specific resources, such as node collections and instances. Every instance of a Team Collaboration Network is identified by a unique label, usually a short identifier for the associated project or team name (‘tcn id’, line 2). This is followed by a date identifier that asserts the validity of the represented network entity for the specified point in time. A special identifier ‘now’ can be used to retrieve the latest and current version 102 d.store: A Resource-oriented Team Collaboration Network System of a network. This mechanism allows clients to select between up-to-date network representations or historical data and triggers the execution of according time-point queries as sketched out in the previous section. The nodes of a network are combined under the ‘resources’ keyword (line 4). A numerical identifier ‘node id’ can be appended to identify a particular instance by its ’instanceId’ property (Sect. 4.3.3). Alternatively, a list of node type identifiers (Sect. 4.3.2) may act as a filter on the set of nodes. Hence, a node with label ‘1’ of a TCN ‘tcn1’ is referenced by the path ‘/graphs/tcn1/resources/1’, whereas the set of all nodes of type ‘Email’ is referenced by ‘/graphs/tcn1/resources/Email’. The following ex- amples explain the run-time behavior of the d.store platform in more detail and show how the resources can be leveraged to explore and manipulate Team Collaboration Networks.

6.4.2 Exploring Team Collaboration Network Resources

The primary function of navigation and analysis clients is to read and process the state of Team Collaboration Networks. The state is provided by network-specific resources, which represent one or more entities (e.g., nodes) of a TCN. Client applications, which seek to explore the state of those entities, are requesting representations of the resources from the server via HTTP/1.1 GET operations. The sections below give some exam- ples and assume HTTP connectivity between a client and a d.store server running at dstore.hpi-web.de. Furthermore, the examples do not consider any user identification and authorization aspects.

Retrieving All Nodes in a Network

Clients can request a list of all nodes in a Team Collaboration Network to provide browsing functionality or to analyze the topology of a network. The according node collection is identified by the ‘resources’ keyword in the URL path. The following interaction between client and server shown in Listing 6.7 demonstrates the according request and response, using sample data similar to the previous examples.

Client Request:

1 GET /graphs/tcn1/now/resources HTTP/1.1 2 Host: dstore.hpi−web . de 3 Accept: application/json

Server Response:

4 HTTP/ 1 . 1 200 OK 5 Content−Type: application/json 6 7 {”totalCount”: 2057, 6.4. Implementing the Service Interface 103

8 ”resources”: [ 9 {” i d ” : 1 , 10 ”url ”:”http://www.example.com/profiles/Paul”, 11 ”label”:”Paul”, 12 ”validFrom”:”2009−06−01 08:00:00” } , 13 {” i d ” : 3 , 14 ”url”:”http://mail.example.com/archive/0001.html”, 15 ”label”:”Meeting”, 16 ”validFrom”:”2009−06−03 11:31:22” } . .

17 ] }

Listing 6.7. Retrieving all nodes from the latest network representation.

Clients can make use of the ‘Accept’ field transmitted in the HTTP request header to negotiate the representation format that is returned from the server. The d.store platform supports different standard representation formats such as JSON or HTML to facilitate easy client integration and data accessibility. In the above example, the client has requested to receive a JSON representation of the list of all node instances in a TCN that are currently valid (‘/now/resources’, line 1). If a historical view on a TCN is required, the now keyword in the URL can be replaced by the date of interest. This is demonstrated in the fol- lowing example (Listing 6.8). The result set contains two node instance as indicated by the ‘totalCount’ attribute. The second instance is an example for a node that has been invalidated during the course of the collaboration process, caused for example by the deletion of this Wiki page.

Client Request:

1 GET /graphs/tcn1/2009−06−02/resources HTTP/1.1 2 Host: dstore.hpi−web . de 3 Accept: application/json

Server Response:

4 HTTP/ 1 . 1 200 OK 5 Content−Type: application/json 6 7 {”totalCount”: 2, 8 ”resources”: [ 9 {” i d ” : 1 , 10 ”url ”:”http://www.example.com/profiles/Paul”, 11 ”label”:”Paul”, 12 ”validFrom”:”2009−06−01 08:00:00” } , 13 {” i d ” : 2 , 14 ”url ”:”http://www.example.com/wiki/ToDoList”, 15 ”label”:”Wiki − ToDoList ” , 16 ”validFrom”:”2009−06−01 09:15:59”, 17 ”validTo”:”2009−06−04 16:14:27” } 18 ] }

Listing 6.8. Retrieving all nodes in the network as on June 1st, 2009. 104 d.store: A Resource-oriented Team Collaboration Network System

Retrieving Node Collections by Type

The list of nodes returned from the server can be filtered by node types, thus containing only those resources that satisfy certain type restrictions expressed in the URL. By appending one or more ‘+’-separated type iden- tifiers, the result set is restricted to those nodes that exhibit all of the listed types. Listing 6.9 gives an example for retrieving all ‘Email’-typed nodes from a network. Here, the server responses with a HTML-formatted representation of the resource, as requested by the client in line 3. This stripped-down version of a HTML-formatted representation demonstrates how the exploration of TCN data is directly supported in the Web browser. With the links provided, users can easily navigate back and forth to related node instances and collaboration resources.

Client Request:

1 GET /graphs/tcn1/now/resources/Email HTTP/1.1 2 Host: dstore.hpi−web . de 3 Accept: text/html

Server Response:

4 HTTP/ 1 . 1 200 OK 5 Content−Type: text/html 6 7 8 d . s t o r e − Nodes: Email 9 10

    11 12 ID 3: 13 Meeting , 14 June 3rd, 2009, 11:31:22 15 . .

    16

17 18

Listing 6.9. Retrieving a list of all Email-typed nodes in the current network. HTML has been negotiated as the representation format.

Retrieving a Node Instance

A node instance in d.store materializes what has been introduced in Sect. 3.3 as a descriptive resource. It describes meta-information about a collaboration resource, its semantic types, relationships, and associated attribute values. Descriptive resources are identified by the distinct node 6.4. Implementing the Service Interface 105 instance ID. In the following example, a client requests the state of a node with instance ID ‘3’ as it was valid on June 4th, 2009 (Listing 6.10).

Client Request:

1 GET /graphs/tcn1/2009−06−04/resources/3 HTTP/1.1 2 Host: dstore.hpi−web . de 3 Accept: application/json

Server Response:

4 HTTP/ 1 . 1 200 OK 5 Content−Type: application/json 6 7 {” i d ” : 3 , 8 ”url”: ”http://mail.example.com/archive/0001.html”, 9 ”label”: ”Meeting”, 10 ”validFrom”:”2009−06−03 11:31:22”, 11 ” types ” : [ 12 {”name”:”Email”, ”type”:”domain”, 13 ”validFrom”:”2009−06−03 11:31:22” } ], 14 ”attributes”: [ 15 {”name”:”from”, ”type”:”domain”, ”value”:”[email protected]”, 16 ”validFrom”:”2009−06−03 11:31:22” } , 17 {”name”:”to”, ”type”:”domain”, ”value”:”[email protected]”, 18 ”validFrom”:”2009−06−03 11:31:22” } ], 19 ”relations”: [ 20 {”name”:”sender”, ”type”:”domain”, ”target”: { ” i d ” : 1 , 21 ”url”:”http://example.org/profiles/Paul”} , ”inferred”:true } , 22 {”name”:”recipient”, ”type”:”domain”, ”target”: { ” i d ” : 4 , 23 ”url”:”http://example.org/profiles/John”} , ”inferred”:true } , 24 {”name”:”hyperlink”, ”type”:”domain”, ”target”: { ” i d ” : 2 , 25 ”url ”:”http://www.example.com/wiki/ToDoList”} , 26 ”validFrom”:”2009−06−03 11:31:22” } ] 27 }

Listing 6.10. Retrieving a JSON-formatted node instance.

The representation returned from the server describes basic node infor- mation along with associated types, attributes, and relationships. In the example above, the described collaboration resource is of the requested type ‘Email’. Two attributes list the mailboxes of the sender and a recipient. The node also has three relationships to other resources of the network. The relationships to sender and recipient are marked as inferred, e.g., by an inference rule checking for matching ‘mailbox’ attributes of persons in the TCN. Note that this instance closely resembles node 3 in the example network shown in Fig. 4.1.

Complex Queries

To provide more powerful filtering mechanisms to the clients, the d.store platform supports URL-encoded SPARQL queries (W3C, 2008) to be ex- ecuted on the TCN knowledge bases. The conditions of the nodes are for- mulated in the URL, while the rest of the SPARQL query, such as prefixes 106 d.store: A Resource-oriented Team Collaboration Network System and select clauses, is completed by the platform before being dispatched to the RDF layer. This simplifies the formulation of queries for the clients and keeps the request short. For example, the following three conditions produce the list of all Wiki pages that have been referenced in at least one email (Listing 6.11).

1 ?resource rdf:type ex:WikiPage . 2 ?resource ex:linkedFrom ?x . 3 ?x rdf:type ex:Email

Listing 6.11. SPARQL statements to filter the list of node instances.

By appending this query fragment as an URL-encoded parameter to the URL of a node collection, the result list is filtered down to resources that satisfy the constraints of the ‘?resource’ variable (Listing 6.12). This example also illustrates the combination of SQL-based time point queries and the graph-based filtering on non-temporal network properties. With the time point restriction provided in the URL, the SPARQL query oper- ates on a historical representation of the network, which is recovered from the relational storage layer as described in Sect. 6.3. In plain language, this query can be expressed as “Give me a list of all Wiki pages that were referenced in at least one email at June 4th, 2009.”

Client Request:

1 GET /graphs/tcn1/2009−06−04/resources/?query=%3Fresource+rdf%3 Atype+ex%3AWikiPage.%3Fresource+ex%3AlinkedFrom+%3Fx.%3Fx+rdf %3Atype+ex%3AEmail 2 Host: dstore.hpi−web . de 3 Accept: application/json

Server Response:

4 HTTP/ 1 . 1 200 OK 5 Content−Type: application/json 6 7 {”totalCount”: 1, 8 ”resources”: [ 9 {” i d ” : 2 , 10 ”url ”:”http://www.example.com/wiki/ToDoList”, 11 ”label”:”Wiki − ToDoList ” , 12 ”validFrom”:”2009−06−01 09:15:59”, 13 ”validTo”:”2009−06−04 16:14:27” } 14 ] }

Listing 6.12. Retrieving Wiki pages that have been referenced in an email.

6.4.3 Manipulating Team Collaboration Network Resources

Sensor clients constantly manipulate the state of Team Collaboration Net- works by requesting the d.store server to adapt the structure of the net- 6.4. Implementing the Service Interface 107 works according to the captured collaboration events. The following exam- ples show how basic operations for the creation and deletion of node in- stances are implemented. For a comprehensive documentation of network operations including node updates and the manipulation of attributes and relationships, please refer to Appendix C.

Creating a Node Instance

The creation of a node instance denotes the occurrence of a new actor or resource in the collaboration process of a team. A sensor client informs the d.store server about the additional resource by posting according meta- information to the network. The entity transmitted by the client includes the URL of the collaboration resource and optional parameters for at- tributes and relationships to other resources. In the following example, a sensor client is posting a new email resource with two attributes and one relationship (Listing 6.13).

Client Request:

1 POST /graphs/tcn1/now/resources HTTP/1.1 2 Host: dstore.hpi−web . de 3 User−Agent: email−s e n s o r / 1 . 0 4 Accept: application/json 5 Content−Type: application/json 6 7 {”url”:”http://mail.example.com/archive/0001.html” 8 ”label ”:”Meeting” 9 ”types”:”Email”, 10 ”attributes”: [ 11 {”name”:”from” , ”value”:”[email protected]”} , 12 {”name”:”to”, ”value”:”[email protected]”} ], 13 ”relations”: [ 14 {”name”:”hyperlink”, ”target ”:”http://www.example.com/wiki/ I d e a l o g ”} ] 15 }

Server Response:

16 HTTP/ 1 . 1 201 CREATED 17 Location: http://dstore.hpi−web.de/graphs/tcn1/now/resources/3 18 Content−Type: application/json 19 20 {” i d ” : 3 , 21 ”status”:”created”, 22 ”validFrom”:”2009−06−02 12:12:29 +0100” }

Listing 6.13. Creating a Node Instance.

If the node has been successfully created in the network, the d.store server assigns a distinct ID and points the client to the URL of the new instance (line 19). 108 d.store: A Resource-oriented Team Collaboration Network System

Removing a Node Instance

Nodes are removed from a network to indicate the deletion of a collabora- tion resource, such as s shared file or a Wiki page. Sensor clients detecting that a resource has been removed from the team’s information space can notify the d.store server by requesting the removal of the according node instance via HTTP DELETE (Listing 6.14). The operation effects the in- validation of the node instance in the current network representation at the time point specified in the URL.

Client Request:

1 DELETE /graphs/tcn1/now/resources/2 HTTP/1.1 2 Host: dstore.hpi−web . de 3 User−Agent : wiki−s e n s o r / 1 . 0 4 Accept: application/json

Server Response:

5 HTTP/ 1 . 1 200 OK 6 Content−Type: application/json 7 8 {” i d ” : 2 , 9 ”status”:”deleted”, 10 ”validTo”:”2009−06−02 10:17:00 +0100”}

Listing 6.14. Removing a Node Instance.

Note that subsequent requests to this URL will result in a ‘404 NOT FOUND’ server response. To receive the invalidated node at a later point, the URL path needs to be altered to refer to a network state before the deletion of the node.

6.5 Chapter Summary

In this chapter, the d.store platform and its architectural components have been introduced. The platform implements a Team Collaboration Network system, which can be independently utilized by client applications through its resource-oriented service interface. The system constitutes an instru- ment for the computational observation and analysis of virtual collabo- ration groups and is applied in a pilot application presented later in this work. With the specification of domain ontologies, the platform can be flexibly tailored to the different groupware environments and teams under observation. This customization process is subject of the next chapter. It introduces a set of domain ontologies to prepare for the pilot application presented later in the work. 7 System Configuration

The d.store platform introduced in the previous chapter presents a generic solution for capturing the heterogeneous information sharing activities in online collaboration. This chapter illustrates the flexibility of the system by demonstrating how it is configured to integrate with actual groupware and team landscapes. It presents domain ontologies and rules, which pro- vide common TCN terminology for widespread application scenarios such as email communication and collaborative Wikis. With those ontologies defined, the platform is customized and prepared for the application in a collaboration testbed presented in the final part of this dissertation.

7.1 Domain Ontologies for Online Collaboration

Domain ontologies define common and shared vocabularies to be used in a Team Collaboration Network system (Sect. 5.2.1). Through the configura- tion and adaptation of the ontology descriptions, the system is tailored to the specific requirements and setup of the observed collaboration process. The modeling of domain ontologies is driven by the groupware systems that are applied in the process, as well as prevailing goals, legal aspects, and technical restrictions in the monitoring of online team activities. This sections introduces an initial collection of domain ontologies, each one defining common semantic concepts encountered in online team collab- oration. The ontologies cover basic collaboration aspects of Wiki systems, email messaging, and shared file systems. Due to the proliferation of these groupware applications, the collection present a practical starting point for a wide range of virtual collaboration studies. For this work, the ontologies provide the vocabulary that is used to describe online activities in the anal- ysis of eleven engineering design projects (Chap. 8). The domain ontologies build on the core definitions of the d.store concept graph (Sect. 6.2) and are represented in the graphical notation introduced in Sect. 3.4.4. 110 System Configuration

7.1.1 web: An Ontology for Hyperlinked Collaboration Resources

The World Wide Web has evolved to a multi-facetted collaboration plat- form for team-based interactions and information sharing activities. Part of this success is attributable to the simplicity in referencing and hyper- linking arbitrary media via URLs. Information on the Web can easily be connected and disseminated by pointing others to its resource location. This may happen in various ways, e.g., verbally or by embedding hyper- links in other Web resources. It is also very common among team members to share information by sending text messages or emails containing URLs to the resources of interest. The following email snippet from one of the observed teams gives an example (Fig. 7.1).

From: Sent: Thu Mar 06 2008 - 10:30:40 PST To: Subject: [] Notes online

Hey gang, that was a nice, quick meeting this morning. The notes are online:

http://wikibox.stanford.edu/07-08/index.php//Minutes3608

[...]

Here is a link to the infrared ranging sensors I mentioned: http://www.phidgets.com/products.php?product_id=1103

Remember: Skype this Sunday, 12:30PM PST

Figure 7.1. Emails are a popular medium to share and point others to relevant information on the Web.

Apparently, Web resources are not only referenced from other Web re- sources, but are also semantically connected to information assets, which are not necessarily located on the Web per se (e.g., emails or attached documents). This relationship is reflected in the web ontology shown in Fig. 7.2. The ontology specifies a node type ‘WebResource’ to classify informa- tion resources in the Web. Note that this type is a specialization of the most general class of information objects in d.store, the ‘Resource’. By defining two inverse relationship types ‘hyperlink’ and ‘linked from’, bi-directional associations between the referenced and the referencing resources are es- tablished. 7.1. Domain Ontologies for Online Collaboration 111

http://hpi-web.de/ns/dstore/web/0.1/

@prefix dstore: http://hpi-web.de/ns/dstore/0.1

linkedFrom dstore:DomainResource dstore:DomainRelation dstore:Resource WebResource inverse

hyperlink dstore:DomainRelation

Figure 7.2. An ontology for hyperlink relationships between general collaboration resources (dstore:Resource) and Web resources.

7.1.2 wiki: An Ontology for Wiki-based Collaboration

Wikis have been widely adopted in professional scenarios and project-based collaboration to support teams in documentation and organizational tasks. The content is contributed, accessed, and incrementally revised by a com- munity of registered users. Depending on the configuration of the Wiki, users must log in to the system in order to modify or access its content. Wiki systems generally keep log files of page visits and content revisions made by authenticated users, a fact which renders this medium applicable for the computational analysis of collaboration practices. It allows the ex- amination of relationships between (the content of) individual Wiki pages and their contributors (cf., e.g., Mueller, 2008). Figure 7.3 shows an ontology for Wiki-related concepts and associa- tions that are used in the description of Team Collaboration Networks. A ‘WikiPage’ is a specialization of a Web resource and represents an indi- vidual, editable topic in a Wiki. Wiki pages are authored by users of the system (‘dstore:Person’), who are identified by a ‘username’. The ontol- ogy further defines two attribute types ‘create user’ and ‘edit user’, which appoint the usernames of the creator and the editors of a Wiki page. Modern Wiki systems also support file attachments that can be up- loaded to a topic. This can be, e.g., images or documents that are related to the information on a page. Attachments are reflected in the ontology by the node type ‘WikiAttachment’ and the two inverse relationship types ‘attachment’ and ‘attachedTo’.

7.1.3 email: An Ontology for Email-based Messaging

Email messaging is the most common technology used today to digitally ex- change information in an asynchronous and reliable manner. Studies point out that global virtual teams largely rely on day-to-day email communi- cation (Lurey and Raisinghani, 2001). The characteristic features of email 112 System Configuration

http://hpi-web.de/ns/dstore/wiki/0.1/

@prefix dstore: http://hpi-web.de/ns/dstore/0.1/ @prefix web: http://hpi-web.de/ns/dstore/web/0.1 @prefix xsd: http://www.w3.org/2001/XMLSchema#

create_user xsd:string dstore:DomainAttribute dstore:AliasReferenceAttribute

edit_user xsd:string dstore:DomainAttribute dstore:AliasReferenceAttribute web:WebResource web:WebResource attachment WikiPage WikiAttachment dstore:DomainRelation inverse

attachedTo dstore:DomainRelation

author dstore:Person dstore:DomainRelation inverse

contributedTo dstore:DomainRelation

xsd:string username dstore:DomainAttribute dstore:AliasAttribute

Figure 7.3. wiki: An ontology for concepts and properties in Wiki-based collaboration. systems have led to a strong pervasiveness of this medium in virtual collab- oration. Access to email systems is available almost anytime and anywhere, allowing asynchronous, ad-hoc transmission and retrieval of messages with- out needing to organize and participate in synchronous interactions. Figure 7.4 shows the ontology of node, relationship, and attribute types that have been used in the formalization of email communication activities in Team Collaboration Networks. The node type ‘Email’ rep- resents messages that have been sent and received by node instances of type ‘dstore:Person’. The coherences between emails, sender, and receiver are described by the four relationship types ‘sender’/‘hasSent’ and ‘re- ceiver’/‘hasReceived’. Emails are further characterized by a number of at- tributes. The ontology defines attribute types for the mailbox addresses of the sender (‘from’) and recipients (‘to’, ‘cc’), the unique message ID, and the optional ID of a preceding email to which a message replies (‘re- ply to id’). The semantic association between a message and a response is explicitly articulated by the inverse relationships ‘reply’ and ‘repliesTo’. The email attributes ‘from’, ‘to’, and ‘cc’ are specified as references to the alias attribute ‘mailbox’, which in turn assigns an email address to a 7.1. Domain Ontologies for Online Collaboration 113

http://hpi-web.de/ns/dstore/email/0.1/ @prefix dstore: http://hpi-web.de/ns/dstore/0.1 @prefix xsd: http://www.w3.org/2001/XMLSchema#

repliesTo from xsd:string dstore:DomainRelation dstore:DomainAttribute dstore:AliasReferenceAttribute inverse

reply to xsd:string dstore:DomainRelation dstore:DomainAttribute dstore:DomainResource dstore:AliasReferenceAttribute

Email cc xsd:string dstore:DomainAttribute dstore:AliasReferenceAttribute dstore:DomainResource attachment Attachment message_id xsd:string dstore:DomainRelation dstore:DomainAttribute inverse reply_to_id xsd:string attachedTo dstore:DomainAttribute dstore:DomainRelation

inverse sender hasSent dstore:DomainRelation dstore:DomainRelation dstore:Person

recipient hasReceived dstore:DomainRelation dstore:DomainRelation inverse

dstore:DomainResource xsd:string mailbox EmailList subscriber_mailbox xsd:string dstore:DomainAttribute dstore:DomainAttribute dstore:AliasAttribute

Figure 7.4. email: An ontology for concepts and properties in email-based communication. person. The ontology also considers file attachments (‘Attachment’), and provides the relationship types ‘attachment’ and ‘attachedTo’ to express the association with an email. The concept of an ‘EmailList’ represents a distribution list that can have an arbitrary number of email recipients signed up to it, as indicated by the ‘subscriber mailbox’ attribute.

7.1.4 file: An Ontology for Shared Document Storages

A shared document storage is a system that allows users to access, edit, and manage files collaboratively on remote servers. WebDAV (Goland et al., 1999) is an open standard and extension to HTTP, which implements a shared document storage on the Web. Files and folders can be accessed and modified by users directly in the Web browser or virtually mounted into the local file system. This turns WebDAV-enabled storage systems to 114 System Configuration

http://hpi-web.de/ns/dstore/webdav/0.1/ @prefix dstore: http://hpi-web.de/ns/dstore/0.1/ @prefix xsd: http://www.w3.org/2001/XMLSchema#

create_user xsd:string dstore:DomainAttribute dstore:AliasReferenceAttribute

read_user xsd:string dstore:DomainAttribute dstore:AliasReferenceAttribute web:DomainResource edit_user xsd:string File dstore:DomainAttribute dstore:AliasReferenceAttribute

delete_user xsd:string dstore:DomainAttribute dstore:AliasReferenceAttribute

sharedBy dstore:DomainRelation dstore:Person inverse

hasShared dstore:DomainRelation

accessedBy dstore:DomainRelation inverse

hasAccessed dstore:DomainRelation

xsd:string username dstore:DomainAttribute dstore:AliasAttribute

Figure 7.5. file: An ontology for basic collaboration activities in shared document storages. a convenient tool for the collaborative sharing, editing, and archiving of project-relevant documents in distributed teams. A minimal ontology for shared document storages defines basic asso- ciations between people and the shared files (Fig. 7.5). It considers four operations on a file that are carried out by users. The ‘username’ aliases responsible for creating, reading, editing, and deleting a file during its life- time are assigned to a ‘File’ instance by means of the attribute types ‘cre- ate user’, ‘read user’, ‘edit user’, and ‘delete user’. The relationship types provided by the ontology are less distinctive. Persons access files for reading (‘hasAccessed’/‘accessedBy’) or publish information by creating or modi- fying a file (‘hasShared’/‘sharedBy’). 7.2. Inference Rules 115

7.2 Inference Rules

Based on the introduced domain ontologies, additional inference rules are specified in the platform to computationally derive relationships in Team Collaboration Networks. The rules are defined as pairs of preconditions and implied postconditions, presented in the form antecedent ⇒ consequent as in Sect. 5.2.1. Please refer to the specifications of SWRL (W3C, 2004g) and RDF/XML (W3C, 2004e) for a mapping of these rules to a set of RDF triples. The logical inference of relationships lowers the number of relations that are explicitly stored in the graph. At the same time, the rules simplify the data entry and update process by reducing the semantic associations that need to be determined and uploaded by sensor clients. To give a first example, the following rule is specified to infer ‘email:sender’ relationships, using the ontological concepts defined in the email ontology: email:from(x,y) ∧ email:mailbox(z,y) ⇒ email:sender(x,z) The rule states that an ‘email:sender’ relationship exists between any two nodes x and z, if x has an ‘email:from’ attribute of value y and z has an ‘email:mailbox’ attribute of the same value y. Note that, with the specification of an inverse relationship for ‘email:sender’ in the domain ontology, an OWL-DL reasoner would simultaneously assert an according ‘email:hasSent’ relationship between z and x. Analog rules are provided to calculate relationships between email mes- sages and their recipient nodes: email:to(x,y) ∧ email:mailbox(z,y) ⇒ email:recipient(x,z), email:cc(x,y) ∧ email:mailbox(z,y) ⇒ email:recipient(x,z) Corresponding to the sender and receiver relationships of emails, rela- tionships between Wiki resources and people who have created or edited a page are inferred based on the according attribute values ‘wiki:username’, ‘wiki:create user’, and ‘wiki:edit user’. wiki:create user(x,y) ∧ wiki:username(z,y) ⇒ wiki:author(x,z), wiki:edit user(x,y) ∧ wiki:username(z,y) ⇒ wiki:author(x,z) The same approach is practiced for file associations between the users and documents of an online shared storage: file:read user(x,y) ∧ file:username(z,y) ⇒ file:accessedBy(x,z), file:create user(x,y) ∧ file:username(z,y) ⇒ file:sharedBy(x,z), file:edit user(x,y) ∧ file:username(z,y) ⇒ file:sharedBy(x,z) 116 System Configuration

The determination of bidirectional relationships between emails and their associated replies (‘email:reply’/‘email:repliesTo’) is governed by the following rule. Sensor clients only need to specify the unique message ID and the value of the ‘reply-to’ field from the email header as attributes in the posting of a new email instance. Based on these values, the rule infers according relationships between two email nodes with matching attributes values: email:reply to id(x,y) ∧ email:message id(z,y) ⇒ email:repliesTo(x,z) Distribution lists conceal the list of people who receive the messages sent to it. However, the email addresses of the subscribers are often known for project-internal distribution lists and explicit relationships between the messages and the recipients are wanted. In this case, the following rule determines such relationships between emails sent to a list and persons, whose mailbox address is subscribed. email:hasReceived(w,x) ∧ email:subscriber mailbox(w,y) ∧ email:mailbox(z,y) ⇒ email:hasReceived(z,x) With these inference rules defined in a TCN-S, a client’s effort of post- ing captured resources to a network is reduced. More concrete, clients are relieved from resolving corresponding actor nodes from the captured aliases like email addresses and usernames. Based on the above rules, the server can take care of identifying the according target nodes for sender and re- ceiver relationships based on the alias attributes assigned to person and resource nodes.

7.3 Preparing the Data Collection Process

With the domain ontologies and rules being defined in the d.store server, the platform is readily configured for one or more targeted collaboration environment. The following preparatory activities complete the setup of the system and initialize it for the actual data collection process and the generation of Team Collaboration Networks.

7.3.1 Initializing the Networks

A new TCN instance is created for each team that is to be observed in its collaboration activities. For this purpose, the platform API accepts postings to the graph collection resource, in which a client specifies key attributes for the new network. This includes a shorthand network identifier as well as a list of ontology and rule identifiers that should be mapped into the TCN knowledge base (Listing 7.1). 7.3. Preparing the Data Collection Process 117

Client Request:

1 POST / graphs HTTP/ 1 . 1 2 Host: dstore.hpi−web . de 3 Accept: application/json 4 Content−Type: application/json 5 6 {”id”:”alpha”, 7 ”label”:”Project Alpha”, 8 ”description”:””, 9 ”domains”: [ 10 {”ns”:”http://hpi−web.de/ns/dstore/web/0.1/”, ”prefix”:”web”} , 11 {”ns”:”http://hpi−web.de/ns/dstore/email/0.1/”, ”prefix”:”email”} ] 12 }

Server Response:

13 HTTP/ 1 . 1 201 CREATED 14 Location: http://dstore.hpi−web.de/graphs/alpha 15 Content−Type: application/json 16 17 {”status”:”created”, ”id”:”alpha”}

Listing 7.1. Creating a TCN instance.

In this example, a Team Collaboration Network ‘alpha’ is initialized with two network-specific domain ontologies ‘web’ and ‘email’.

7.3.2 Setting up the Sensor Clients

The way sensor clients are implemented and integrated into the collab- oration infrastructure determines how detailed and up-to-date the Team Collaboration Networks are. Consequently, the real-time monitoring and analysis of a collaboration process depends on sensor clients that continu- ously detect collaboration events as they occur and incrementally upload according meta-information to the server. In the case of an ex post analysis, the clients could be programmed to process recorded log files or archives and upload the time-stamped activities at once. A number of sensor clients have been prototypically implemented for the pilot application conducted later in this work. A feeder application for email activities scans the messages sent to the distribution lists of teams, either by scanning the message log provided by the list server in the Internet Message Format (Resnick, 2001), or by parsing the Web archive that is created with the Hypermail1 tool. A sensor client for Wiki activities has been developed, which is parsing the server-side file structure generated by a TWiki2 system. A third sensor client processes the event logs of a WebDAV-server to upload information about file-related online activities into the networks. 1 Hypermail: http://www.hypermail-project.org/, accessed Oct. 19th, 2010 2 TWiki: http://twiki.org/, accessed Oct. 19th, 2010 118 System Configuration

With the communication between clients and server established by HTTP and standard representation formats, the implementation of sen- sor clients has proven to be relatively inexpensive and unrestricted with regard to development and runtime environments. The resource-oriented system design promotes the straightforward integration of heterogeneous and distributed data sources for collaborative activities and makes the d.store platform a flexible and extensible monitoring instrument for di- verse application scenarios.

7.3.3 Specifying Participant Roles and Alias Names The stakeholders in a collaboration process can be generally classified into different roles that they fulfill in a project: team members, consultants, managers, etc. In addition, the individual actors are represented by means of virtual identities or aliases and usernames. Participants use multiple email addresses and user accounts for different collaboration platforms, by which they are identified and which serve as surrogates in the accomplish- ment and logging of virtual activities. For this reason, a client application d.person has been implemented to support in the assignment of role types and alias attributes (Fig. 7.6). Hu- man actors are represented as individual node instances in a TCN, which raises the need to consolidate the different virtual surrogates of a per- son and to map user accounts to the representing nodes. Different email addresses and usernames in the communication process can then be at- tributed to the responsible person. Roles are assigned as node types, which are either defined by imported domain ontologies or which are created in the network-internal concept graph. An example for a project-specific role ontology is given in the next chapter, where the testbed and team config- uration for the pilot application is introduced.

7.4 Chapter Summary

With the introduction of concrete domain ontologies, the customization of the d.store platform has been exemplified. Each ontology defines node, relationship, and attribute types that are used in TCN instances to describe the collaboration activities related to a particular domain of groupware applications. Sensor applications make use of this terminology to notify the server about collaboration structures that have been captured from digital event traces such as logs, archives, etc. With that, Team Collaboration Networks are gradually constructed and can be leveraged in the real-time monitoring and analysis of complex online collaboration processes. In the final part of this dissertation, the platform is taken to evaluation in an industry-oriented project testbed. The Team Collaboration Networks 7.4. Chapter Summary 119

Figure 7.6. d.person: A client application to manage the roles (types) and alias attributes of person nodes in Team Collaboration Networks. of eleven distributed engineering design teams are analyzed and compared to demonstrate the application of this approach. Qualitative insights and statistical evidence underpin the value of computational monitoring by revealing meaningful and performance-relevant characteristics in the col- laboration behavior of the observed teams.

Part IV

Evaluation & Discussion

8 A Pilot Application in Engineering Design

The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ (I found it!) but ‘That’s funny ...’. – Isaac Asimov

In the previous chapters, the work has introduced Team Collaboration Networks as a model to describe the evolving relationships between infor- mation resources and stakeholders in a virtual collaboration process. It has further introduced d.store, a configurable, service-based software system that enables the instantiation of such networks and the automated cap- ture of collaboration activities in diverse team and project settings. These concepts are now put into praxis by applying the d.store platform in the collaboration processes of eleven globally distributed engineering design teams. This chapter summarizes the first application of d.store in a realis- tic project scenario and presents the results of an explorative analysis that has been conducted on the collected data. The eleven Team Collaboration Networks, each one generated over a project period of eight months, provide extensive opportunities for the structured exploration of online communication behavior during the early stages of conceptual design. Specific analysis clients have been implemented to process, visualize, and compare the TCN instances from different angles. This chapter documents the insights and findings that could be gained from these observations. At the same time, it demonstrates the potential range and informational value of observing real-time collaboration metrics in the context of project management and design research. After introducing the pilot testbed in Sect. 8.1, the chapter highlights three major blocks of investigations, each one motivated by distinct goals and research questions: • The work begins with a quantitative investigation of the captured design team activities to appraise the scope of the generated Team Collabora- tion Networks. What was the volume and the structure of the collected activity data and how does it vary between different teams? (Sect. 8.2) • Explorations into the temporal network characteristics visualize the evo- lution and variation of collaborative activities over time. How does col- laboration behavior evolve differently in the teams? (Sect. 8.3) 124 A Pilot Application in Engineering Design

• A statistical analysis seeks to identify correlations between specific net- work patterns and independently surveyed team performance measures. Can we predict team performance from objectively measured collabo- ration structures? (Sect. 8.4) The chapter finishes with a summary of the analysis results and dis- cusses the limitations and applicability of the findings in Sect. 8.5.

8.1 ME310: A Global Academic Project Testbed

A project-based engineering design curriculum serves as a testbed for the d.store platform. Stanford University’s ‘Mechanical Engineering 310 – Project-Based Engineering Design, Innovation & Development’ (hence- forth ‘ME310’) is a nine-month graduate level engineering course in which Stanford students collaborate with students at other universities around the world to develop innovative solutions to open-ended problems. Small distributed teams work on real-world engineering design challenges posed by industry partners. The design tasks given out to the teams are pur- posely phrased broadly to challenge the students to determine, isolate, and pursue a particular opportunity for innovation. Skogstad (2009) compares the nature of the projects and the structure of the teams in ME310 with the working mode of start-ups in industry: “like many Silicon Valley ini- tiatives, design teams start with a vague idea of an area that allows for the creation of innovation” in a new or existing market.

8.1.1 Project & Team Setups

Eleven distributed teams were jointly formed of students from Stanford and six partner institutions. Stanford’s engineering graduates were partnering with students in product design, industrial design, or computer science, to foster inter-disciplinary teamwork and problem solving. The desired team size was six to eight students, with three or more students representing a co-located sub-team on both partner sides. All teams had an equally sized budget at their disposal, which could be spend for the acquisition of materials, services, and prototype development. Figure 8.1 summarizes the general setup and the participating roles in ME310 for the academic term 2007/2008. Each of the global teams has been given a realistic engineering design challenge by a corporate liaison. The challenge grounds on a relevant, but open-ended design problem from one of diverse industries such as automotive, consumer products, telecom- munications, or information technology. Over the course of the projects, the teams had to identify needs, generate concepts, and create fully func- tional prototypes to show a potential path to innovation to the corporate 8.1. ME310: A Global Academic Project Testbed 125

Industry Partners

Center for Design Research Corporate Global Partner Institutions Stanford University Liaison 1

Local Global Team 1 Sub-team Sub-team T eaching . eam

T Coaches . Coaches . T eam

eaching eaching Local Global T Team n Sub-team Sub-team

Corporate Liaison n

Figure 8.1. Roles and process participants in the project-based engineering design curriculum ME310. sponsor. A teaching team and a group of coaches supports and guides the students at the participating institutions. Table 8.1 summarizes the design challenges that were given to the student design teams:

Table 8.1. Overview of ME310 projects in 2007/2008 (Skogstad et al., 2009).

Team Task Alpha Design an intelligent system to assist drivers Beta Design a tool that facilitates distant design collaboration Delta Design a system to store small items in a car Epsilon Design a tool to support maintenance personnel in the field Gamma Design a new automotive center stack Iota Design a device that extracts drinking water from ambient air Kappa Design a new digital camera Lambda Design a method to control wearable electronic devices Omega Design a new way to use RFID in a retail environment Pi Design an industrial controller based on gesture technology Theta Design a virtual convertible

The project timelines are structured into three major phases with a duration of two to three months each. The first phase (‘make it up’) includes team building (local and global sub-teams come together for a short kick- off workshop) and is characterized by frequent benchmarking, fact finding, and brainstorming, and prototyping activities. The second phase (‘make it real’) aims towards the technological feasibility of the student’s vision, 126 A Pilot Application in Engineering Design

Figure 8.2. Team members and teaching team during a weekly project meeting. The ME310 design space provides an open environment for collocated sub-teams at Stanford University. Remote team interactions are supported by a shared ICT infrastructure for virtual collaboration. refining concepts, and the implementation of their ideas. The third phase (‘make it happen’) is about getting things done, making final decisions and presenting a fully functional solution at the end of the projects. The ME310 course framework stipulates a constant evaluation of the teams’ progress through weekly meetings with the teaching teams and prototype reviews at the end of each phase.

8.1.2 Process Participants & Team Interactions in ME310

The student team members used local face-to-face meetings, email, instant messaging, phone, video conferencing, and other communication channels to interact and share information with their partners and other stakehold- ers. All teams had access to a uniform groupware infrastructure to facil- itate and organize their remote collaboration activities (more details be- low). This large number of channels allowed the teams to interact at any time and from almost anywhere, similar to distributed project teams in global industries. Besides the interaction in and between local or remote sub-teams, Skogstad (2009) describes the following modes of interaction in ME310 and highlights parallels to industrial project setups: Interactions between team and corporate liaison: “The project-based nature of the course requires the student teams to interact with liaisons 8.1. ME310: A Global Academic Project Testbed 127

from the company sponsoring their projects. These liaisons typically meet with the students once per week in meetings or conference calls. The liaisons are equivalent to a design consultancy’s customer. They are external to the team and not included in the intra team communi- cation.” Interactions between team and teaching team: “The teaching team meets weekly with each design team to discuss project progress and future steps” (see Fig. 8.2). “These small group meetings [...] are char- acterized by a high level of interaction, open exchange of ideas, sugges- tions and critique from multiple viewpoints. Comments often include referrals to experts outside the curriculum’s community for help on par- ticular problems. This mode of interaction resembles that of review by project management and company-internal executives in industry.” Interactions between team and coach: “Each team also receives sup- port and guidance from a design process coach who has professional in- dustry experience but is not part of the review or grading process. The coach helps the students with expert subject knowledge and project and team management. Coaching is seldom found in industry at the project level, but regularly at the individual level. Companies often have ‘mentorship programs’ to support so-called ‘high-potentials’ in their development within the company.” The general setup and the interactions in ME310 projects described above show that the observed design teams share similar characteristics with those found in industry. Continuous communication among team members and with external process stakeholders presents a core facet in the course work. Unlike industry, however, ME310 is uniquely suitable for re- search studies as the milestones, deadlines, and project breadth and depth are common between teams. These commonalities make ME310 “a con- trolled environment compared to industry settings, which makes the effect of team circumstances on design performance observable in ways not pos- sible elsewhere” (ibid.).

8.1.3 A Shared ICT Infrastructure for Virtual Collaboration

A centrally managed ICT infrastructure for virtual collaboration support is provided to the student teams. The design spaces at Stanford and the global academic partner institutions are equipped with workstations that give access to the Internet and groupware systems. Three email distribu- tion lists are set up for each team to simplify the sending of messages to either one of the two sub-groups or to the whole team. Every email sent to a list is instantly archived and published in a set of cross-referenced HTML documents, which are accessible to other course participants. A 128 A Pilot Application in Engineering Design

(a) Workstations provide access to the Internet (b) Video conferencing systems establish re- and central groupware systems. mote interactions between local team members and their distant project partners.

Figure 8.3. Impressions of the ME310 design space at Stanford University. A shared ICT infrastructure supports global collaboration between distributed team members. central Wiki installation serves as a portal to all course-relevant infor- mation and to individual collaboration rooms of the eleven projects. The teams leverage and organize the Wiki as they please, e.g., for documenting project-relevant knowledge or disseminating information to other process stakeholders. To establish a common file space and to ease the exchange of large files (e.g., audio and video recordings), all team members have access to a WebDAV-based file storage. Video conferencing systems for distributed group meetings are available at all team locations worldwide to establish synchronous conversations among the students (see Fig. 8.3). The centrally managed groupware infrastructure satisfied the most el- ementary needs in establishing global collaboration between distributed sub-teams. This has led to a relatively large coverage of online team inter- actions on the ME310 servers and simplified the implementation of sensor clients for team communication capture.

8.1.4 Privacy and Confidentiality of the Observations

The issue of information privacy becomes increasingly important as more information about collaboration activities is recorded and made available. The computational processing and analysis of team interactions necessi- tates the strict observance of research ethics and the protection of human research subjects. The studies connected to this pilot application protect the privacy of the ME310 participants and take measures to prevent unau- thorized access to private or confidential information. Only data that is voluntarily provided through the shared ICT landscape in ME310 has been used in the exemplary application of the d.store platform. All team mem- bers have been informed about and agreed to the scientific exploitation of 8.2. A Quantitative Appraisal of the Generated Networks 129 this data. Any references to real persons, project partners, and external parties have been anonymized.

8.1.5 me310 : An Ontology for Project Roles & Participants in ME310 The four domain ontologies presented in Sect. 7.1 provide a basic termi- nology for the creation of Team Collaboration Networks in the context of the groupware utilized in ME310. The ontologies are deliberately generic in the sense that they do not comprise any concepts that are specific to a particular project setup. However, a project-specific taxonomy of par- ticipant roles can improve the interpretability of captured activities and allows for a finer-grained specification of collaboration relationships. For this reason, a ME310-specific domain ontology is introduced and used in this pilot application to mark the roles of process participants. The ontology shown in Fig. 8.4 defines roles and stakeholders in the ME310 community. On top level, the ontology distinguishes between role members that are located at the Stanford design space (‘Local’) and their global counterparts at the academic partner institutions (‘GlobalPartner’). A second differentiation classifies the participants into a ‘Student’ group and a ‘Non-Student’ group. From a single project point-of-view, the set of students is formed those that are members of the team (‘Team’) and those that are not (‘Non-Team’). The team itself consists of local students (‘LO Team’) and the global partner students (‘GP Team’). Non-student course stakeholders include the members of the teaching teams (‘Teach- ingTeam’), coaches (‘Coach’), the corporate liaisons (‘CorporateContact’), and administrative personnel (‘Administrative’). This application-specific ontology of role concepts identified in ME310 completes the terminological definition of Team Collaboration Networks in the pilot scenario. Using the d.person tool (Sect. 7.3.3), the individual actors in the observed processes have been assigned with the according role types, allowing for a fine-grained, group-based filtering and analysis of the team activities.

8.2 A Quantitative Appraisal of the Generated Networks

For each of the eleven teams in ME310, a Team Collaboration Network has been generated by processing the email archives, Wiki logs, and Web- DAV activities of the global projects. This section explores the volume and comprehensiveness of the data that has been collected by three according sensor clients over the course of the nine-month projects. Quantitative net- work properties between the different teams are compared to indicate the resultant proportions of the eleven network instances Alpha to Theta. 130 A Pilot Application in Engineering Design

http://hpi-web.de/ns/dstore/me310/ @prefix dstore: http://hpi-web.de/ns/dstore/0.1 @prefix owl: http://www.w3.org/2002/07/owl#

dstore:Person dstore:Person owl:disjointWith Local GlobalPartner

Team Student

LO_Team LO_Student

GP_Team GP_Student owl:complementOf

Non-Team Non-Student

TeachingTeam

LO_TeachingTeam GP_TeachingTeam

Coach

LO_Coach GP_Coach

CorporateContact

Administrative

Figure 8.4. me310 : A classification of project participants and roles in the observed design curriculum ME310. The ontology defines role hierarchies for course stakeholders on local and global partner side.

Table 8.2 gives an overview of the network dimensions after the data collection process. It shows the total number of network nodes, as well as the amount of nodes of type ‘dstore:Person’ and ‘me310:Team’. The total number of relationships results from the asserted relations explicitly stored in a network’s ABox plus those being inferred by the OWL-DL reasoner and rule engine used in d.store (Chap. 6). Over the course of nine month project-based team work and remote collaboration, the sensor clients captured approx. 10,000 email messages, 814 Wiki pages with more than 4,000 editing events, and approx. 9,000 write activities in the project 8.2. A Quantitative Appraisal of the Generated Networks 131

Table 8.2. Key dimensions of the generated Team Collaboration Networks.

Team Nodes Relationships Attributes (Total / Person / Team) (Total / Inferred) Alpha 2,622 / 109 / 6 14,323 / 13,605 9,412 Beta 3,496 / 116 / 6 16,797 / 15,166 10,113 Delta 2,517 / 109 / 7 13,059 / 12,192 8,347 Epsilon 1,653 / 75 / 7 7,401 / 6,724 4,583 Gamma 1,968 / 113 / 7 11,068 / 10,447 6,631 Iota 4,807 / 81 / 11 22,309 / 21,680 14,890 Kappa 2,058 / 137 / 6 9,844 / 8,689 5,796 Lambda 2,817 / 98 / 7 14,538 / 13,546 8,758 Omega 2,352 / 107 / 7 10,158 / 9,341 6,564 Pi 1,905 / 83 / 7 10,125 / 8,998 6,117 Theta 666 / 108 / 7 3,426 / 3,163 2,331

WebDAV folders. The resulting TCN knowledge bases comprise approx. 240,000 semantic statements and associated validity intervals. In the following sections, a more detailed appraisal of the Team Collab- oration Networks focuses on aspects specific to one of the three different domain ontologies email, wiki, and webdav.

8.2.1 Activities Captured From Email Lists

The email distribution lists provided to the teams have been frequently used by the students and other course stakeholders to broadcast messages to the members of a project team. A survey carried out with the students after the projects estimates that an average of two third (66.64%) of all project- related email traffic has been sent or copied to one of the email lists. The rest of the email traffic was sent point-to-point between individual project participants and has therefore not been considered in this study. Anyhow, this ratio supports the assumption that the emails that have been captured from the email archives constitute a relevant and representative amount of the overall exchanged messages, making the archives a valuable source for analysis. The sensor client that was responsible for scanning the Web-based email archives identified basic attributes of an email such as sender and recipients, as well as the occurrence of attachments and hyperlinks in the email body. To demonstrate the fine-grained exploration capabilities established by the generated TCN structures, Fig. 8.5 quantifies the weekly email traffic in ME310 along three dimensions: 1) the total number of email messages sent to the lists, 2) the number of resources referenced in the message bodies, and 3) the number of file attachments transmitted. The graph shows that email lists as a communication medium were used at a relatively constant rate throughout the course period, disrupted only by two breaks in week 8 – 9 and week 21 due to school terms ending 132 A Pilot Application in Engineering Design

Weekly Amount of Emails, Hyperlinks, and Aachments sent via Email Lists 500

450

400

350

300

250

200

150

100

50

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Project Week Email Hyperlinks Aachments

Figure 8.5. Weekly amounts of emails, hyperlinks, and file attachments sent via the project lists during the observed project period. and public holidays. Breaking the amount of weekly emails down to the individual project lists reveals that all teams frequently use the email lists for information distribution, with the total number of emails is varying between 268 and 1526 (Fig. 8.6). The individual values can be queried from the network resources via simple URL patterns. For example, the number of email messages sent to the distribution lists of project Alpha until a specified point in time () is determined by the size of the ‘Email’ node collection:

GET /graphs/alpha//resources/email.Email

Parallel to that, the number of email attachments is determined by the size of the according node collection:

GET /graphs/alpha//resources/email .EmailAttachment

The number of distinct URLs being shared via emails is determined by querying the networks for all resources that have a ‘web:linkedFrom’ relationship to an email node. Passed as the query parameter to the node collection of a network (cf. Sect. 6.4.2), the following SPARQL clause causes d.store to return the corresponding list of nodes:

?resource web:linkedFrom ?x . ?x rdf:type email:Email

Aggregating the email traffic captured in 35 weeks of project collabora- tion results in a total number of 10,486 emails sent to the ME310 team lists, 2,389 disseminated hyperlinks, and 1,956 file attachments. The weekly dis- tribution shown in Fig. 8.5 reveals a relatively constant usage of the email 8.2. A Quantitative Appraisal of the Generated Networks 133

Total Amount of Emails on Distribuon Lists per Team 1600

1400

1200

1000

800

600

400

200

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Project Week

Alpha Beta Delta Epsilon Gamma Iota Kappa Lambda Omega Pi Theta

Figure 8.6. Total amount of emails sent to the project lists of each team. lists with a few noteworthy peaks and lows. Mapping project milestones and delivery deadlines onto the graph reveals an increased level of infor- mation transfer directly before the approach of a due date (week 6, 11, 20). Interestingly, the usage of email lists declines shortly after a milestone deadline or at the beginning of a new project phase (week 8, 13, 21). The marginal project activities in week 9, 10, and 21 are attributed to public holidays at the end of an academic quarter. The decrease of email traf- fic shortly before the end of the projects (week 33) is based on the fact that all members of the global teams were meeting face-to-face in Stan- ford to finalize their prototypes, largely eliminating the need for electronic communication. Another insight into the usage of email as a medium for information sharing comes from the frequency of transmitted file attachments and ref- erences to arbitrary resources via URLs provided in the email body. Com- paring the total amount of emails sent to the global team lists with the number of attachments and hyperlinks documents the important role that email lists play in the distribution of project-relevant information. The av- erage ratio between the number of emails and the number of information submitted in the form of attachments and URLs is approx. 1 to 0.6. Obvi- ously, email lists provide a service to transfer information that is not only encoded in the message itself, but is often conveyed by means of multimedia attachments or pointers to resources found in the Web. The message body prevalently provides additional context for the attached or hyperlinked in- formation, as demonstrated in the following email (Fig. 8.7). A visual network representation of selected nodes and relationships in a TCN is suitable for picturing the complexity of the email-based com- 134 A Pilot Application in Engineering Design

From: Sent: Thu Jan 26 2008 - 00:28:46 PST To: Subject: [] Sound Guidance Video! Attachments: PIC00010.JPG

Hey Team, We built a prototype for the sound guidance with the theremin! (proximity sensing of hand to car controls) Check out the videos on the fileserver:

http://wikibox.stanford.edu//Sound%20Guidance%20Prototype/

We’ll be doing some more testing on monday. Let us know what you think.

Figure 8.7. This email was sent by one team member to the rest of the global project team. The message body provides additional context for attached and hyperlinked resources.

Figure 8.8. Representation of the relationships between email messages (green), attachments (red), and email receivers (blue) captured in one of the projects. munication structures. Figure 8.8 shows a high-level visualization of the relationships between email messages, receiving stakeholders, and file at- tachments, which has been generated from one of the Team Collaboration Networks approximately half way into the project. Devoid of any details, 8.2. A Quantitative Appraisal of the Generated Networks 135

Wiki Pages per Team 140

120

100

80

60

40

20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Project Week

Alpha Beta Delta Epsilon Gamma Iota Kappa Lambda Omega Pi Theta

Figure 8.9. Total amount of Wiki pages created in the projects. the representation gives a bird’s eye view on the global email communi- cation structure, showcasing the complexity of the captured data and the underlying team interactions. While the interpretation of such visual net- work patterns is still unclear, an increased degree of node connectedness can already be an early indicator for an active involvement in the communi- cation process or identify communication hubs, which might have relevance for the assessment of the collaboration activities or for the organization of the team as a whole. The ability to quickly generate such kinds of insights into the virtual collaboration process ad-hoc and at project runtime gives an impression of the platform’s potential to support the evaluation of dis- tributed collaboration processes.

8.2.2 Activities Captured From the Wiki System

A second sensor client observed the participant’s activities in the provided Wiki system. The server-side files maintained by the system have been processed to notify the d.store platform about the creation, update, and deletion of Wiki pages. The client also scanned the content of the pages for file attachments and hyperlinks to other Web resources in order to create according relationships in the TCN. The user accounts logged by the Wiki system allow to link an activity to the responsible person node. This way, the TCN instances preserve information on who has created, edited, and deleted a page. A first quantification of the Wiki pages that have been created in the different projects reveals an active and continuous usage of the Wiki sys- tem throughout all teams. The Wiki system has been employed from day one on to collect and provide basic information about the projects. It also 136 A Pilot Application in Engineering Design frequently served as a repository for team-internal notes, meeting min- utes, and documentation. Figure 8.9 visualizes the monotone growth of the individual project Wiki spaces. This data is retrieved from the Team Collaboration Networks by requesting the collection of ‘wiki:WikiPage’- typed nodes over the course of a project, represented by the following URL pattern:

GET /graphs///resources/wiki .WikiPage

A total number of 191 file attachments has been counted in the eleven Wiki spaces. However, the numbers deviate strongly between the different projects. While two teams have not used the attachment functionality at all (0 attachments), the TCN of a third team lists 138 attachments for its Wiki pages. A potential reason for this variance is that some teams were not familiar with this functionality or preferred different ways of organizing page-related documents.

8.2.3 Activities Captured From WebDAV Folders

Serving as a third data source, a WebDAV sensor client reported file han- dling activities in the shared project folders. The client processed the server log files for read, create, update, and delete events that have been effected by process participants uploading and downloading files to and from the WebDAV storage. The WebDAV account names assigned to every course participant allowed the d.store system to relate the file handling activities to the responsible person node. Only certain file types were considered by the sensor client. A whitelist included common file extensions for multi- media formats such as, e.g., images, videos, audio files, animations, CAD drawings, text, and spreadsheets. Other file types such as system files, logs, and program source code have been intentionally excluded from the data acquisition in order to reduce the total number of uploaded files to a comparable set of relatively self-contained sources of human-interpretable information. At the end of the project phases, the eleven Team Collaboration Net- works contained a total number of 9,039 nodes that represented shared files in the public team folders. Figure 8.10 shows the number of resources iden- tified in the WebDAV repositories over the course of the eleven projects. As with the previous domain concepts, the amount of resources provided in WebDAV folders at a given point in time is queried from a TCN instance via the following URL pattern:

GET /graphs///resources/file .File 8.2. A Quantitative Appraisal of the Generated Networks 137

Files in WebDAV Folders per Team 1200

1000

800

600

400

200

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Project Week

Alpha Beta Delta Epsilon Gamma Iota Kappa Lambda Omega Pi Theta

Figure 8.10. Total amount of files shared in the WebDAV team folders.

While all teams utilized the project file storage to a certain extent, the amount of uploaded resources in a team differs strongly. The disparity be- tween the lowest and highest number of file nodes per TCN instance (77 vs. 3,611) indicates that WebDAV-related usage patterns were extremely incoherent across the teams and may not be meaningful on a mere quanti- tative level. One potential reason for this heterogeneity is grounded in the general applicability and utilization of online document storages in many different project situations (short-term file storage, file transfer, backup and archiving, revision management, etc.). A semantic analysis of the shared documents could support the structural TCN properties with additional information about the intent and the meaning of the files in the process, allowing for a more precise interpretation of the activities. However, this is often expensive and difficult to accomplish. For this reason, this work abstains from drawing conclusions from the captured WebDAV activities.

8.2.4 Summary

This first quantitative appraisal of selected network dimensions gives a general idea of the captured data and the volume of the generated Team Collaboration Networks. It presents a starting point for a more detailed analysis of the eleven TCN instances. Besides the set of selected network attributes presented in this section, a comprehensive collection of collabo- ration metrics has been quantified on a per-team level, as well as on individ- ual team member level. The results of these detailed network measurements can be found in Appendix A. The diversity of network properties queried from d.store demonstrates the platform’s ability to support investigations into complex team collabo- 138 A Pilot Application in Engineering Design ration structures. It also reflects the researcher’s curiosity to uncover new data, exploring hidden dependencies, trends, similarities, or correlations in the collaboration behavior of engineering project teams. d.store facilitates question-asking and provides access to non-predefined, objectively mea- sured collaboration metrics on demand. With its help, the characteristics of virtual collaboration structures can be compared and contrasted with other empirical data over time on a fine-grained level. In the remaining sections, I will give examples for a purposive evaluation of the presented Team Collaboration Network instances. I will first show how the temporal dynamics of virtual collaboration practices in a TCN can be visualized and leveraged to uncover insightful structural conditions in a design process. Secondly, I will identify correlations between patterns in the captured collaboration activities and empirically measured design team performance variables.

8.3 Temporal Variations & Dynamics in Team Collaboration

Central in the design of Team Collaboration Networks and the d.store system is the ability to capture and monitor the dynamics of virtual col- laboration processes. How do specific aspects in the virtual collaboration behavior of a team evolve over the course of a project? This section will pro- vide examples that show how the temporal dimension captured in a TCN can be leveraged by client applications to visualize and observe significant characteristics in a collaboration process over time.

8.3.1 Individual Participation on Project Email Lists

The email lists of the eleven ME310 teams have not only been used for team-internal message exchange, but have also been utilized by external stakeholders to introduce project-related information to a selected group of students. At the same time, team members commonly put their project list on the list of email recipients when communicating with external process participants, which further increased the amount of conversations captured from the archives. The resulting communication patterns give detailed insights into how groups and individuals participate in the email-based information sharing process within and outside team boundaries. In this example, an analysis client has been developed to measure email-related TCN properties on a daily basis. For each person participating in the email communication of a project it determines the total number of emails sent and received, the number of replies (emails sent in response to another), the number 8.3. Temporal Variations & Dynamics in Team Collaboration 139

Figure 8.11. Snapshot of an interactive visualization of the individual participation in email- based project communication. A time slider allows to track the evolution of emailing parameters along multiple dimensions. Each sphere represents a process participant; colors denote partici- pant roles. of shared hyperlinks, and other emailing-related variables. The generated time series is visualized in an interactive bubble chart, representing every project stakeholder as a sphere positioned along two adjustable dimensions (Fig. 8.11). Based on the role types assigned to every person node (cf. ‘me310’ on- tology, pp. 129), the spheres are color-coded to easily distinguish between students, members of the teaching team, or external contributors such as corporate liaisons. A time slider at the bottom of the chart allows to trace back and to track the evolution of each participant’s emailing statistics over the course of the project. The graph in Fig. 8.11 demonstrates the situation for project Theta at the end of the course period. The number of received emails is compared to the number of email replies each indi- vidual has sent back to the project lists. Obviously, local and remote team members hold the lead in the number of received emails (x-axis), but show different behavior in message replying (y-axis). In this case, the local sub- team outperforms the remote peer students, at least for the set of emails captured on the distribution lists. Noteworthy is the outstanding responsive behavior of one of the corpo- rate contacts highlighted in the upper right section of the chart (‘Br 512’). The relatively high number of emails received (29) and sent to the list (27 total / 23 replies) towards the end of the project indicates a close and steady involvement of this person in the design process. Findings presented later in this chapter suggest that such active involvement of external domain 140 A Pilot Application in Engineering Design experts can have beneficial impact on the overall performance of a design team. This renders the ability to identify the existence or absence of ac- cording collaboration structures desirable and supportive in the evaluation of such processes. This example has demonstrated how hidden aspects in a collaboration process can be uncovered, processed, and visualized in an interactive and easy to consume manner. Using d.store services, the analysis client ex- tracted and aggregated complex properties of a TCN to provide instant access to meaningful collaboration metrics. Enriched with historical data, the client allows to look back in time in order to better assess the cur- rent situation and to identify desired or undesired trends in long-running processes.

8.3.2 Evolution of Project Wiki Spaces

A second example for the temporal inspection of knowledge sharing be- havior addresses the Wiki spaces of the eleven observed projects. With a sensor client processing the file structures of the Wiki system, the gener- ated Team Collaboration Networks hold information about the point in time Wiki pages have been created, edited, and hyperlinked to other re- sources of a project. This data allows insights into the structuring and coherence of information stored in the Wiki system. Graphical animations of the evolving project Wiki spaces have been cre- ated with this data gathered from the d.store services. The animations visu- alize the chronological appearance of Wiki pages and hyperlinked resources as nodes and edges in a network representation, giving a fast-forward im- pression of the creation and organization of the Wiki spaces. Three exam- ples are shown in Fig. 8.12, illustrating the final network topologies for the Wiki spaces of projects Theta, Alpha, and Gamma. This form of visualization reveals different patterns of how design teams utilize and organize their Wiki spaces over the course of a project. The net- works of the three teams exhibit distinct structures in the way resources are created and related with each other. Theta and Alpha feature a rela- tively large number of Wiki pages; isolated, i.e., disconnected topic pages are scarce or do not exist. Nodes with a large number of outgoing hyper- links obviously represent indices to a collection of other resources, forming clusters of related nodes and serving as hubs to other Wiki pages or exter- nal information resources. In contrast, the Wiki topology of team Gamma is relatively sparse. The total number of pages is considerably small, the connectedness of the resources is less pronounced. Central hubs to simplify access to related information are largely missing. Several Wiki pages are disconnected from the rest of the resources, making it difficult for the team to find or recover information. Comparing the animated evolution of the 8.3. Temporal Variations & Dynamics in Team Collaboration 141 three Wiki spaces shows a relatively balanced and continuous creation and re-organization of the pages for teams Theta and Alpha, while changes to the Wiki topology of team Gamma occur in bursts. Significant correlations with the performance of the teams could not be revealed in the analysis of the Wiki spaces. However, selected criteria of a team diagnostic survey conducted after the projects do indicate a rela- tionship between team effectiveness and the observed Wiki structures. The applied survey instrument (based on Wageman et al. (2005), see also next section) determined the degree to which the team used the full comple- ment of member knowledge and skill. The self-reflecting assessment of the following two statements contributed to the overall measure of the process quality: 1) “Members of our team actively share their special knowledge and expertise with one another”, and 2) “Our team is quite skilled at capturing the lessons that can be learned from our work experiences”. The average results for this part of the survey correlate with the visual impression that the breadth of information shared in a Wiki system and increased connect- edness of the resources may contribute to the quality of a knowledge sharing process (the results on a 5-point scale: Theta 4.857; Alpha 4.625; Gamma 3.917). Clearly, the numbers do not allow for generalization, nor are they statistically significant. However, they can serve as an early indication that a well-organized and steadily used system can facilitate information shar- ing and increase the overall satisfaction in a collaborative learning process. The visualization of clusters and connections further supports the assump- tion that Wiki structures can reveal meaningful insights into knowledge sharing behavior in a design team and suggests a continued observation of Wiki activities.

Topologies of Project Wiki Spaces

a) Theta b) Alpha c) Gamma

Wiki Page Web Resource Wiki Aachment WebDAV Resource

Figure 8.12. Wiki spaces of projects Theta, Alpha, Gamma. Clustering and general connect- edness of wiki pages can be an indicator for the quality of knowledge- and skill-related process criteria. 142 A Pilot Application in Engineering Design

8.4 Performance Correlations

Central in the evaluation of virtual design collaboration is the question whether objectively measurable collaboration metrics support conclusions about the performance of a team or its process. Do the characteristics of the captured group activities provide indicators for the quality of the project outcome? Finding appropriate correlations in the structure of the eleven Team Collaboration Networks would further prove the value of the pre- sented monitoring approach and provide first evidence for computationally measurable team performance indicators. This section presents the results of statistical analyses, which have been performed to identify such corre- lations between the online communication behavior of the observed design teams and independently surveyed team performance variables.

8.4.1 How Team Performance was Measured

Objective measurements of team performance are particularly difficult to achieve in a design context, because the “definition and measurement of design performance is elusive, analogous to the adage, ‘beauty is in the eye of the beholder’; solutions that might seem trivial to one person could appear profound to another.” Skogstad (2009). Performance can also re- late not only to the outcome of the design process, but also to the process itself. Team members can have drastically different estimations of the per- formance than their managers. Currently there is no agreement on a general construct or variable that defines and measures the success of design projects (Skogstad et al., 2009). However, measuring design performance is a key and ongoing issue for de- sign researchers, whose approach is based on empirical research comparing different projects. While traditional metrics such as costs (budget, time, resources, etc.), viability, or customer satisfaction can be applied to assess the performance of a single team or to compare projects of identical nature, the comparison between substantially different design tasks lacks a common denominating measure. To overcome this design researchers’ dilemma, this study considers multiple indicators of performance that were tracked from the perspectives of different stakeholders. Skogstad (2009) has collected different performance measures in the observed ME310 teams, which are used in this work as dependent variables in the testing of correlations with patterns in the Team Collaboration Networks. The author describes the different performance indicators and what they measure as follows: Self-reported Design Process Performance: A survey was conducted with the students after the completion of the projects to provide a measurement of the design process quality from the designers point of view. Building on an established team diagnostic instrument (Wageman 8.4. Performance Correlations 143

et al., 2005), the questions “provide a measure of three performance- relevant aspects of teamwork, which are controlled by the team and its members. The aspects are a) the quality of team task processes – a measure of team effectiveness, b) the satisfaction with within-team relationships – a measure of the willingness to work together again in the future, and c) the individual affective reactions to the team and its work – a measure of the individuals learning and well-being” (Skogstad, 2009). The survey accounts for the fact that “design performance must include more than just the project result, because no organization will survive if the designers are consistently unsatisfied” (ibid.). An average of the students’ scores was calculated for each team to get a measure for the overall team satisfaction. External Judges Assessing Output Performance: An evaluation ses- sion was conducted with domain experts who “reviewed two-page project summaries created by the designers and assessed the designs from the perspectives of a) an investor, b) a user, and c) a gadget lover. [...] The judges were selected based on familiarity with the structure and goals of the course to ensure that they could evaluate the designs based on the organizational circumstances under which they were created”. It was further ensured that “the judges did not have prior knowledge of the details of the projects or the design teams so that they were not biased by the evolution of the design” (ibid.). Number of Prototyping Activities by the Team: All design teams are required by the ME310 curriculum to document their design process and findings in three cumulative reports. “The documents show the evo- lution of the design from the initial problem brief to the final functional prototype. They include descriptions of the ideas considered, prototypes built and tested, and the decisions made by the design team” (ibid.). These documents were analyzed and coded quantitatively by multiple reviewers, who independently reviewed the design development sections of the eleven final project reports to call out instances of prototyping activity by the team. It can be expected that the reports are compara- ble because “all documents are written based on the same instructions and template and the designers receive intermediate feedback from the same graders” (ibid.). The counting of prototyping activities accounts for the generally accepted hypothesis that teams who explore and test more design alternatives are more likely to yield better results.

8.4.2 Finding Dependencies in the Captured Group Activities

Using the described performance ratings for the eleven teams, the study now begins to exhibit correlations in the captured collaboration activities by means of linear regression (for linear regression analysis see, e.g., Back- 144 A Pilot Application in Engineering Design haus et al. (2008), pp. 51). Statistical dependency is tested for the eleven cases between one of the performance measures (as the dependent variable) and the occurrences of specific patterns in the set of Team Collaboration Networks (as an independent variable). The analysis considers network pat- terns, which can be interpreted as potential indicators for applied design thinking principles or ‘designerly ways of interacting’ in a virtual collabo- ration environment. These principles comprise a) the constant involvement of end-users and customers, b) interdisciplinary teamwork and knowledge sharing, and c) a culture of prototyping (cf. Sect. 2.1.3). Directed by these core values, the study diagnoses patterns in the networks that may reflect according behavior in virtual design collaboration. These are in respective order: email conversations between the team and external process partici- pants (Sect. 8.4.3), the number of URLs shared via team lists (Sect. 8.4.4), and email conversations between the team and a coach (Sect. 8.4.5).

8.4.3 Correlations with External Communication Activities

There is general consensus that a user-centric and ‘outside-in’-driven design approach can be conducive to the creation of new and innovative concepts. Teams are good advised to become adept in project-relevant fields of ex- pertise by internalizing external knowledge, i.e, by talking to customers, end-users, and domain experts. Therefore, the degree to which a team in- volves and interacts with team-external stakeholders presents an interest- ing aspect in the observation of design processes. The Team Collaboration Networks created in this pilot application allow to monitor this type of interactions in email-based communication activities. Which patterns in the network structures allow to suspect that a team tends towards a close involvement of external contacts as insightful sources of information? A good starting point is the number of external emails that were captured in the email archives of the eleven teams. External emails are those messages that have been sent by one of the team members to at least one recipient, who is not part of the local or global sub-team (i.e. team-external). The collection of external emails can be requested from a TCN instance passing the following SPARQL clause as the query parameter to the d.store server (Listing 8.1, cf. Sect. 6.4.2). Note that the ?resource variable has been predefined by the d.store platform to match those resources that are of type ‘dstore:Resource’.

1 ?resource email:sender ?x . 2 ?x rdf:type me310:Team . 3 ?resource email:recipient ?y . 4 ?y rdf:type me310:Non−Team Listing 8.1. Querying emails that have been sent from team members to at least one team- external person. 8.4. Performance Correlations 145

To put the number of external email messages that left team boundaries in relation to the overall email traffic produced by a team, it is compared to the total amount of internal messages. Internal messages are those that are sent by one team member to other team members only, i.e., those that do not address any person outside the team boundaries. The following SPARQL clause defines an according filter to retrieve all internal emails from the set of network nodes (Listing 8.2).

1 ?resource email:sender ?x . 2 ?x rdf:type me310:Team . 3 OPTIONAL { 4 ?resource email:recipient ?y . 5 ?y rdf:type me310:Non−Team } . 6 FILTER (!bound(?y)) Listing 8.2. Querying emails that have been sent from team members to other team members only.

The ratio of a team’s external emails to it’s internal emails defines our first independent variable in testing the correlation with the average sat- isfaction of the team members. The results of a linear regression analysis show that a positive dependency exists between these two values. The ra- tio of external to internal emails correlates positively (Beta = 0.534) and significantly (R2 = 0.285, p = 0.09) with the average team member satisfaction. This suggests that the accentuation of interactions with peo- ple outside the team boundaries (in contrast to team-internal discourse) is likely to produce greater satisfaction with the final project outcome. The correlation plot is visualized in Figure 8.13. Details of the this linear regression analysis can be found in Appendix B. The quality of the prediction can be further increased when additional independent variables in the communication behavior are incorporated into the model. In a second analysis, the number of emails a project team has re- ceived from the teaching team is factored into the regression. With this sec- ond dimension added to the model, the coefficient of determination grows noticeably (R2 = 0.479, p = 0.07). The negative standardized coeffi- cient for the new variable (Beta = −0.481) indicates that ME310 teams who receive more emails from the teaching team are statistically less sat- isfied with their project. A reasonable interpretation of this phenomenon is grounded on the remediating intervention of the teaching team in case of defects or conflicts in the collaboration process. The accuracy of the performance values predicted by this model is illustrated in Fig. 8.14.

8.4.4 Correlations with the Number of Shared Resources

The segmented and concurrent nature of engineering design tasks require that knowledge generated in one part of the team gets documented and 146 A Pilot Application in Engineering Design

Outbound Email Messaging (Proporonal) vs. Team Sasfacon 5

4,5

R² = 0,285

4

3,5 Average Team Member Sasfacon 3

2,5 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 Rao External to Internal Email Messages

Figure 8.13. The proportional amount of outbound emails (compared to team-internal mes- sages) sent by a team correlates positively (Beta = 0.534) and significantly (R2 = 0.285, p = 0.09) with the average team member satisfaction. shared with the rest of the group. Distributed teams need to form consensus and a common understanding during the collaborative creation of new concepts. This even more applies to interdisciplinary team collaboration and a user-centered, ‘outside-in’ design thinking approach, which is only possible if an active dissemination of insights, opinions, and different ideas is taking place in the team. The available Team Collaboration Networks reflect certain aspects of how a group interacts, shares information, or calls attention to relevant resources using email, Wikis, or shared folders. In the following analysis, the study explores how email distribution lists are used by the teams to identify and distribute relevant resources of information in the Web. Hyperlinks (i.e., URLs) included in the body of an email message serve as indicators for the author’ intent to brief the rest of the design team on a pertinent piece of information contained in the referenced document (cf. examples in Sect. 7.1.1 and 8.2.1). The amount of references a team has sent to the list provides a rough measure for the breadth of investigation, fact finding, idea generation, or solutions being documented and shared with others. To test the effects of resource sharing in a team, the number of nodes in a TCN that have a ‘web:linkedFrom’ relationship (cf. Sect. 7.1.1) to at least one email sent by a team member is determined as an independent variable. The following query statement was executed to request these resources from the d.store networks (Listing 8.3). 8.4. Performance Correlations 147

Actual vs. Predicted Team Sasfacon

5

Actual Predicted

4,5

4

3,5 Average Team Member Sasfacon 3

2,5 Kappa Beta Lambda Gamma Delta Alpha Epsilon Iota Pi Omega Theta Project Teams (Sorted by Predicted Value)

Figure 8.14. The average team member satisfaction correlates positively and significantly with variables in the online interaction behavior of the teams. A positive tendency towards email- based interaction with team-external process participants in comparison to internal discourse can partially predict performance (R2 = 0.479, p = 0.07).

1 ?resource web:linkedFrom ?email . 2 ?email email:sender ?x . 3 ?x rdf:type me310:Team Listing 8.3. Querying Web resources referenced in at least one email that has been sent by a team member.

It turns out that the number of referenced resources in a team can be linearly regressed on the output performance measured by the judges (Skogstad, 2009). The number of distinct URLs shared on the distribution lists correlates positively (Beta = 0.514) and significantly (R2 = 0.265, p = 0.1) in the available data set (Fig. 8.15). This dependency supports the assumption that a larger breadth of information considered in a team has beneficial impact on the design output quality.

8.4.5 Correlations with Coach Engagement

The coaches that have been assigned to each team are not involved in the grading and review process, hence, providing a trusted and unbiased source of advice and domain knowledge to the designers. Their primary task is to support and to help the teams find a solution to the given design problem. Their feedback can bring new impetus and offer new perspectives and alternatives to the design process. 148 A Pilot Application in Engineering Design

Unique URLs Shared Within Design Team vs. Output Performance 3,2

3,1

3

2,9 R² = 0.265 2,8

2,7

2,6

2,5

2,4

2,3 Design Output Performance measured by External Judges

2,2 0 50 100 150 200 250 300 350 400 Total Number of Unique URLs shared within Design Team via Email

Figure 8.15. The total number of distinct URLs shared within design teams correlates positively (Beta = 0.514) and significantly (R2 = 0.265, p = 0.1) with output performance, suggesting that breadth of shared information impacts performance (Skogstad, 2009).

Therefore, the involvement of coaches in the virtual interactions of the ME310 teams presents an interesting subject for investigation. In the fol- lowing study, the number of emails a team has received from its coaches is considered as an indicator for the level of support the designers received along their path. Coach emails were retrieved from a Team Collaboration Network through the following query (Listing 8.4).

1 ?resource email:sender ?x . 2 ?x rdf:type me310:Coach Listing 8.4. Query clause to determine the emails that a team has received from its coaches.

Given the amount of coach emails as an independent variable in the network instances, it has been tested whether this number correlates with the number of prototyping activities identified in the final project reports. Prototyping is a sign of testing design alternatives and the exploration of a solution space. Each prototype brings new insights and lowers the probability for costly failures identified too late in the engineering process. Prototyping indicates design progress and a continued exploration of the solution space, and should therefore be maximized. Linear regression analysis shows that the level of coach engagement captured on the email lists is significant for the creation of prototypes (Fig. 8.16). The number of coach emails correlates positively (Beta = 0.816) and significantly (R2 = 0.666, p = 0.01) with the total number of prototyping 8.5. Summary of Findings & Critical Discussion 149 activities undertaken by design teams, suggesting that interactions with the coaches stimulate the development of new prototypes.

Emails from Coaches vs. Prototyping Acvies 70

65

60

R² = 0.666 55

50

45

40

35 Number of Prototyping Acvies

30

25

20 0 5 10 15 20 25 30 35 40 45 50 Number of Emails from Team Coaches

Figure 8.16. The number of coach emails correlates positively (Beta = 0.816) and significantly (R2 = 0.666, p = 0.01) with the total number of prototyping activities undertaken by design teams, suggesting that coaches have a positive impact on prototyping (Skogstad, 2009).

8.5 Summary of Findings & Critical Discussion

This chapter presented a pilot application of the d.store platform, in which the virtual collaboration activities of eleven distributed teams in a project- based engineering design curriculum were captured. The Team Collabora- tion Networks generated in this study have been evaluated from different perspectives to demonstrate how the system can be exploited to unveil hidden traits in the monitored collaboration processes. The chapter started with an introduction of the ME310 testbed and the monitored groupware infrastructure. A quantitative appraisal of the col- lected data looked at the general usage patterns of the three observed com- munication channels (email lists, Wiki, WebDAV folders) from a high-level perspective. Exploring the temporal variations and dynamics in the online activities has given further examples for the visualization of trends and the evolution of the collaboration structures over the course of a project. Fi- nally, a statistical analysis was conducted to identify correlations between the Team Collaboration Networks and performance measures collected in 150 A Pilot Application in Engineering Design the eleven groups. The different insights and findings that could be gained from this application are summarized below. 1. Ease of Implementation. Though being subjective, the effort re- quired to setup the d.store platform in the ME310 testbed and to answer a diverse set of questions in the analysis of the generated networks can be considered relatively low. The development of the client applications was straightforward and accelerated by the resource-oriented nature of the platform and standard Web technologies such as HTTP and JSON. The d.store services decouple the clients from data management tasks and allow to select the programming languages and environments that suit best to the prevailing data collection or analysis task. The experi- ences collected in this pilot application support the desirability of the presented monitoring approach and suggest the d.store platform as a promising tool in the observation of virtual collaboration practices. 2. Significance of Email Lists. The amount of email messages captured in the TCNs documents the popularity and usefulness of distribution lists in separated groups. Despite of the large number of alternative communication channels and the designers’ awareness that archives are being monitored for scientific evaluation, a total number of 10,486 emails could be captured from the project lists. According to estimates from the team members, this corresponds to approx. 66.64% of the over- all amount of email messages that were sent in the projects. Obviously, email distribution lists provide an applicable medium to facilitate in- formation sharing and the computational monitoring of collaboration processes. 3. Shared Resource Context is Lost in Email Archives. The signi- ficant impact of email lists on a groups’ information sharing behav- ior becomes apparent in the interconnectedness of email nodes in the TCNs. Due to the broadcasting of attachments and hyperlinks, the amount of captured email messages that show direct relationships to other information resources was in a ratio of approx. 1 to 0.6. With more than every second email providing supportive context for a shared doc- ument, Web page, etc., it is reasonable to assume that the email lists were among the most critical tools for the ad-hoc dissemination of con- text and project-relevant resources. However, their utilization raises concerns about the designer’s ability to recall that context information later on. Information often gets lost in personal mailboxes or archives when hyperlinked resources are accessed in isolation of the message. Without back-references to referring resources or persons, the compre- hension and understanding of shared material and its history can be hindered at a later stage due to missing context information. This re- veals the potential and benefit that d.store services can bring to next- 8.5. Summary of Findings & Critical Discussion 151

generation groupware systems. These can be used to retrieve contextual information, in particular referencing resources, related persons, and temporal attributes, thus helping to preserve context and to achieve common ground. 4. Information Spaces Quickly Grow in Size and Complexity. The quantification of Wiki pages and WebDAV files (Figs. 8.9 and 8.10) bared a gradual increase of the number of digital information resources that are propagated in a project. Obviously, resources are only rarely deleted or removed from a public location once they have been shared. It can be concluded that online information spaces tend to continuously grow in size and become increasingly complex as a project progresses. Appropriate tools are needed to support in the organization and uti- lization of the available information load. The d.store platform can be supportive in this task by capturing, processing, and providing seman- tic annotation to the shared resources, thus helping to keep track of team activities and progress. 5. Wiki Structures may Indicate Effective Team Learning. A visu- alization of the emerging structures in the Wiki spaces highlighted dif- ferences in the way teams organize and utilize their Wiki over the course of the projects. Although no statistically relevant conclusions about the quality of the collaboration process could yet be derived from these structures, measures of a team diagnostic survey conducted afterwards support the visual impression that an interlinked and constantly up- dated Wiki space has beneficial impact on the team member’s satis- faction with knowledge and skill-related criteria of their work. This suggests the the Wiki structures formed by the teams may indicate whether members effectively share their special knowledge and exper- tise with one another and motivates continuing research in this area. 6. Indicators for Problems in Groupware Use. The total number of Wiki page attachments counted in the eleven TCN instances differed strongly between the teams. While some teams made use of this func- tionality extensively (138 attachments), others have not used it at all (0). It is unclear whether the partial omission is due to page attachments not being considered useful in some projects, or because students were simply not familiar with this feature. In either case, the strong variance indicates that a widespread utilization of certain groupware services is hindered or undesirable. Hence, in order to eliminate the risk of draw- ing incorrect conclusions, process participants should be appropriately trained in using the monitored collaboration infrastructure. 7. Low Interpretability of Multi-Purpose Groupware Activities. The WebDAV folders have proven to be a helpful medium to share, archive, and access various kinds of electronic documents. But the gen- eral applicability and utilization of online storages presents a draw- 152 A Pilot Application in Engineering Design

back in the interpretation of the captured folder activities. Not every WebDAV operation is necessarily a sign of interaction with other pro- cess participants or part of a collaborative process. A differentiation between activities that conduce to information sharing and those that result from rather personal and isolated file handling tasks (e.g., tem- porary storage or file transfer) is necessary, but was not practicable in this study. Future investigations should take this potential lack of resolution into account when monitoring the collaborative use of such versatile applications in a design context. 8. Online Collaboration Metrics Indicate Team Performance. A statistical analysis of the generated Team Collaboration Networks re- vealed dependencies between the captured online activities and empiri- cally surveyed team performance measures. Correlations in the network structures were found with respect to the average satisfaction of the de- signers, the output performance assessed by external judges, and the number of performed prototyping activities. The results of the regres- sion tests are summarized in Table 8.3. The pilot application has demonstrated the feasibility and desirability of utilizing the d.store services in a distributed project environment. The presented evaluations have exemplified how concurrent design processes can be compared to produce quantifiable results on the usage of the group- ware infrastructure and the information sharing behavior. Of course, the findings do not allow for generalization, because external validity of the data is not given. Especially the lack of a common and universal measure for team performance hinders a generalization of the correlations identified above. Also, the application was performed in the context of an engineering curriculum, which further questions the validity of the results outside of education, i.e., in an industrial setting. Nevertheless, the ME310 projects closely resemble industry processes and are working on challenges given by real industry partners. In this set- ting, the d.store platform has provided objective and comparable measures for how teams interacted through virtual collaboration channels in a non- artificial engineering design setup. The results are encouraging in that they highlight interpretable characteristics in the collaboration behavior of the distributed teams. More applications of this kind are needed to test the validity of the findings and to better understand the meaning behind ob- jective, virtual collaboration patterns. 8.5. Summary of Findings & Critical Discussion 153

Table 8.3. Summary of findings from testing relationships quantitatively at the design team level.

Dependent Independent Finding Interpretation Variable Variable Avg. Team- Ratio of ext. The proportional amount of Teams who frequently en- Member Satis- to int. emails external emails sent by a gage with external contacts faction team correlates positively (outside-in) are more satis- and significantly with the fied with their projects. average team member sat- isfaction (Beta = 0.53, Sig. = 91%, R2 = 0.29).

Avg. Team- 1) Ratio of ext. The proportional amount Teams who frequently en- Member Satis- to int. emails of external emails combined gage with external contacts faction 2) emails from with the number of teaching (outside-in) and minimize teaching team team emails correlates even intervention from the teach- more significantly with the ing team are more satisfied average team member satis- with their projects. faction (Sig. = 93%, R2 = 0, 48).

Output Resources refer- The total number of URLs Teams whose members Performance enced on email shared by team members actively share information lists correlates positively with with each other generate the output performance better output. (Beta = 0.51, Sig. = 90%, R2 = 0.27).

Prototyping Emails from The number of emails a Frequent engagement with Activities coaches team has received from domain experts encourages coaches correlates positively teams to explore more de- with the counted prototyp- sign alternatives. ing activities (Beta = 0.82, Sig. = 99%, R2 = 0.67). 9 Conclusion

The monitoring of collaboration activities, resources, and participants in virtual team environments such as distributed engineering design presents both a challenge and opportunity in research and industry. This work sug- gests and presents a common foundation for computer-supported design observations that facilitates the collection, processing, and analysis of vir- tual team collaboration. Allowing for more efficient data collection and analysis tools, the work positions itself as a gateway to a deeper under- standing of collaboration practice in ICT-enabled design teams. In particular, the work has presented new methods: Team Collaboration Networks allow to describe collaborative activities in form of relationships between members and resources over time. Secondly, the work has devel- oped new monitoring applications: the d.store platform establishes an ex- tensible, resource-based client-server system to capture and explore online collaboration activities at project-time and on a fine-grained level. Finally, the work has documented and shared experiences gained from monitoring industry-like design processes: an analysis of the early-stage collaboration activities captured in the Team Collaboration Networks of eleven engi- neering design teams highlighted differences and similarities in groupware use and identified certain collaboration patterns to be potential perfor- mance indicators. In this concluding chapter, the contribution of this work is summarized and discussed. The thesis closes with a consideration of legal aspects and recommendations for monitoring and conducting distributed engineering design processes in organizations.

9.1 Contribution

The contribution of this dissertation is twofold. In the first place, the re- search presents an adaptive, service-based solution for monitoring multi- channeled online collaboration activities of project teams in near-realtime. The system is based on Team Collaboration Networks, a data model to de- scribe semantic relationships between participants and resources over the course of a collaboration process. Second, the dissertation generates new insights into the work of virtual teams by analyzing Team Collaboration 9.1. Contribution 155

Networks that were generated during the collaborative efforts of eleven distributed engineering design projects. Hence, the results of this research work can be categorized into stimulating input to both, design practice and design theory.

9.1.1 Contribution to Design Practice: A System for Virtual Team Monitoring

The work presents a solution to a problem rooted in the discrepancy be- tween the informality of early-stage engineering processes and the formal- ity requirements in their computational analysis. The tools and techniques introduced in this work support design researchers and practitioners in studying the ever-widening variety of technology use in distributed project teams. With Team Collaboration Networks introduced in Chapter 4, the interactions, resources, and activities during project-based collaboration are represented in a formalized, unambiguous, and chronological manner. It has been further shown how a system of Team Collaboration Networks is formulated to concurrently organize the temporal and semantic proper- ties of actors and information resources in multiple design projects at a time (Chapter 5). A prototypical tool implementation of this concept, the d.store platform, decouples the data collection process from the analysis and allows for automated and non-interfering team observations (Chapter 6). Several properties distinguish the d.store architecture from previous work. The system is extensible with regard to groupware and communi- cation channels being monitored and the analysis procedures being per- formed on the collected records. The platform supports both, the ex-post analysis of historical facts and trends in the collaboration, as well as a live observation of ongoing collaboration activities as they are captured by the system. The resource-oriented service interface simplifies the distributed and automated recording of multi-channeled activities and expedites the unobtrusive integration of collaboration monitoring into existing and future project settings. Building on open Web standards, the system integrates well into existing work environments and provides low-barrier access to structured information about how teams interact and share information. As such, the system establishes a new, applicable, and customizable foun- dation for real-time team diagnostics and the systematical exploration, quantification, and comparison of virtual collaboration processes. The gen- eral idea of a resource-oriented monitoring system as well as details of its implementation have been published (Uflacker and Zeier, 2008a,c; Uflacker, 2007). 156 Conclusion

9.1.2 Contribution to Design Theory: Findings from Conceptual Design Team Observations The work has demonstrated that the d.store system is customizable and adaptable to realistic environments in order to capture virtual team activi- ties in distributed collaboration (Chapter 7). Domain ontologies have been provided that define a set of concepts, relationships, and logical constraints for common groupware settings, including email, shared folders, and Wiki- based information sharing. Future research can make use of and extend these ontologies in the continuative analysis of collaboration activities. The data that has been collected during the d.store pilot application gives new insights into the collaboration practices of engineering teams in the early stages of conceptual design. Based on this data, the work has identified quantifiable indicators that high-performance design teams share significantly different interaction signatures than lower-performing teams (Chapter 8). The findings suggest that those teams generally per- form better, which put emphasis on the involvement of team-external par- ticipants (e.g., end-users, customers, coaches) and the sharing of informa- tion and knowledge. This supports two fundamental assumptions in design research: first, that performance-relevant process characteristics reside in the way distributed teams communicate and share information, and sec- ond, that the application of fundamental design thinking principles in the early stages of engineering projects has beneficial impact on the design outcome. With this first application of a computational monitoring sys- tem and the evaluation of objective collaboration metrics with regard to design team performance, the work initiates further investigations to vali- date the findings and provides a basis for future design research activities. The pilot application and the results of the analysis have been presented and published in a journal article (Uflacker and Zeier, 2010b), international conferences (e.g., Uflacker and Zeier, 2009; Uflacker et al., 2009), and in a book chapter on design thinking research (Uflacker and Zeier, 2010a).

9.2 Discussion

Virtual collaboration monitoring promises to deliver new insights into performance-relevant aspects of complex design processes, eventually lead- ing to increased awareness of team activities and improved project outcome. However, the observation of long-running and especially distributed collab- oration processes in ICT-enabled team landscapes is a challenging task. The acquisition of realistic data for research purposes is often not only hindered by costs and the complexities of today’s collaboration environments, but also by strict regulations that are prevalent in many industrial settings. With the developed monitoring instrument, this work has made a first 9.2. Discussion 157 step to lower the barriers for researching digitally-enabled design teams. It presents a generic approach and a technological foundation to concurrently capture and analyze multiple streams of communication and interactions in ICT-mediated groups. The approach is fully transparent in the sense that it does not impact the normal workflow of process participants. The resource- oriented architecture aligns well with today’s and tomorrows collaboration environments that shift more and more into the World Wide Web. With the social and collaborative components of Web 2.0, and the increasingly popular formal description of resource semantics in the so called Semantic Web, a computer-processable description of collaboration activities in form of Team Collaboration Networks and OWL presents a logical next step. A pilot application in engineering design has served as a first exam- ple for the integration of this approach into a realistic project setting. It has demonstrated the use of the d.store platform as a real-time diagno- sis instrument and how it can be leveraged to retrieve statistics, to ex- tract contextual and temporal information, and to generate charts and visualizations. The ability to quickly answer very specific questions about distributed, multi-channel collaboration activities displays the great value that the generated semantic layer in form of Team Collaboration Networks can bring to existing and future design observations. The project teams observed in this pilot application were settled in an academic testbed curriculum. However, the structure of the teams closely resembled real-world projects and can be considered as very similar to those found in the industry. Even though the teams were working on different projects, they are comparable in terms of size, distribution, budget, and timelines: a situation that is rarely found in professional settings. Still, in order to collect data and to validate findings externally, the implementation of the d.store platform in real industrial settings is suggested and necessary. With the real-time monitoring capabilities of the d.store platform and the results from its pilot application, the instrument establishes new possi- bilities to support design teams and managers, completing the symbiosis of design practice and design research outlined in Fig. 1.1. With a more pre- cise understanding of performance-relevant indicators, a better awareness of the current design situation can be achieved. This creates new starting points for the development of improved management tools and dashboards in conceptual design practice. Recent studies already build on the results of this dissertation. Skogstad (2009) has developed new theory about how designers gain insights needed to create novel solutions and how reviewers can have both positive and negative effects on the design process. Parts of his hypothesis testing are grounded on design team interactions that have been captured and ana- lyzed with the help of d.store. This shows that the developed instrument provides a useful basis for achieving insights into performance-relevant 158 Conclusion properties of virtual collaboration processes. Future research may build up on this technological foundation and contribute new findings through additional case studies and improved system implementations.

9.3 Legal & Moral Aspects

This work has demonstrated that monitoring a team’s online activities can provide valuable input for researching and controlling communication patterns, resources, and participants’ involvement over the course of a project. However, the automated capture and analysis of employees’ ac- tivities touches legal and moral aspects, as public discussions and promi- nent law cases illustrate. Especially the emergence of new communication channels, such as social networks, Wikis, and forums, have reactivated the debate on monitoring online communication inside an organization. To con- trol this debate, several laws have been enacted within the USA and the EU.

9.3.1 Legislations on Monitoring Employees Communication

The monitoring of employee activities and stored materials is subject of intense legal scrutiny. Specifically, Title III of the US American Omnibus Crime Control and Safe Streets Act of 1968 and the Electronic Communi- cations Privacy Act of 1986 (ECPA) outlaw the interception, use, or disclo- sure of protected wire, oral, and electronic communications. However, the laws generally indicate that monitoring of business communications sys- tems may be accomplished if necessary for the continuing operations of the business, and if the employees and other exposed parties are made aware of the extent of the observations. The consent must be clear and more than mere knowledge of the employers ability to monitor. This underscores the need to not only create an effective telecommunications policy, but to also insure that it is distributed and agreed to (by signature) by all employees (Whitman et al., 1999). The “Directive 95/46/EC on the protection of individuals with regard to the processing of personal data and on the free movement of such data” (Europ. Commission, 1995) is a European Union directive implemented in 1995 to regulate the processing of personal data. According to this directive, personal data should not be processed at all, except when certain conditions are met. These conditions ensure that data processing is transparent to the data subject and that it is limited to a specified and legitimate purpose. For example, the directive states that personal data must be “collected for specified, explicit and legitimate purposes and not further processed in a way incompatible with those purposes. Further processing of data for 9.4. Recommendations for Monitoring Virtual Team Collaboration 159 historical, statistical or scientific purposes shall not be considered as in- compatible provided that Member States provide appropriate safeguards”. The directive further postulates that every person shall be granted the right “not to be subject to a decision [...] which is based solely on automated processing of data intended to evaluate certain personal aspects relating to him, such as his performance at work [...]”.

9.3.2 Employee’s Privacy and Autonomy While employers consider communication monitoring an essential asset for retaining process excellence, employees typically try to retain a convenient work environment where their privacy and autonomy is protected. Em- ployers aim for different objectives when monitoring online communication inside companies. Besides security-related aspects such as theft and fraud protection, expected benefits include the overall increase of performance by decreasing employees lost productivity, optimizing the sharing of resources and information, and the analysis of human resources. On the other side, employees fear restrictions on their freedom, privacy and convenient work- ing environments. Some go further in this debate by claiming that watching employees online communication is a violation of human rights. Govern- ments have already responded by stating that watching online communi- cation is not allowed except under certain exceptions, such as consent and business extensions (Dempsey and Petschie, 2006a). Therefore, in order to achieve support for online collaboration monitoring, employers should make it clear to all employees that a monitoring policy is applied and ex- plain why and what kind of communication is being watched. The policy has to state exactly what is and what is not allowed in terms of the use of company equipment, and also to what extent communications can be monitored. Employees should be forewarned whenever a form of monitor- ing is taking place. Employers should also preserve some privacy options, such as providing communication tools that are not watched.

9.4 Recommendations for Monitoring Virtual Team Collaboration The following recommendations should be considered by organizations, which plan to implement a monitoring strategy for virtual collaboration activities.

9.4.1 Organizations Should Implement a Monitoring Scheme in Agreement with Employees and Legal Regulations An acceptable monitoring scheme should be achieved in agreement with the affected personnel inside an organization. The controlling policy should be 160 Conclusion clear and known to all employees. Furthermore, employers should gain the support of their employees in performing such scheme. Ensuring anonymity while analyzing communication activities can be useful from both a moral and legal perspective, e.g., by mapping employees or team members to IDs and perform all analysis on the ID level instead of the original identities. Nevertheless, monitoring schemes have the challenge of identifying and filtering work-related information from personal or private information that is exchanged between employees. Therefore, organizations should implement an appropriate use policy, and make sure that all employees have signed a written declaration that they understand this policy and that they agree on it. This policy should (Dempsey and Petschie, 2006b): • specifically address the monitoring of employee communications • require that employees are fully educated about the reasons for the policy, • limit monitoring to work-related concerns as much as possible, and • permit employees to make reasonable personal use of the communication systems at work. When the observation of communication activities may affect external process participants, those parties should also be notified of being poten- tially subject to monitoring (Whitman et al., 1999). Additionally, organiza- tions should gain employees support for the applied online communication monitoring scheme. Such a support can be achieved, e.g., by giving the employees access to the tools and captured data or by educating employees about the monitoring scheme and its benefits to the organization.

9.4.2 Organizations Should Govern a Virtual Collaboration Infrastructure To Support Monitoring Objectives

The setup of a monitoring platform for virtual collaboration can be simpli- fied if the applied groupware systems are run by the observing authority. Access to server-side system and log files can potentially provide more detailed information about groupware use and can speed up the implemen- tation of sensor components. The experiences in ME310 show that a basic groupware infrastructure to address the most urgent communication needs of distributed teams is achievable and manageable with relatively low costs and efforts. In the process of setting up and organizing the monitoring of groupware systems, organizations should take account of the following considerations: • All users should be appropriately trained and familiarized with the func- tionality of the provided groupware. 9.5. Recommendations for Distributed Engineering Design Teams 161

• The integration of monitoring capabilities into the groupware landscape should not change the normal user experience with the applications or enforce additional data entry and workflows. • External (i.e., unmonitored) collaboration tools should not be banned, as far as their usage complies with the organizational rules and condi- tions. Team productivity should always outrank observation capabili- ties. • The use of monitored groupware should be restricted to project-related activities.

9.5 Recommendations for Distributed Engineering Design Teams

The findings drawn from the analysis of the Team Collaboration Networks in ME310 also support the following recommendations for the conduction of distributed engineering projects.

9.5.1 Teams Should Assign a Project Communication & Information Manager

The achievement of common ground and task synchronization in global teams is challenging and requires guidance. The observations in the ME310 teams underpin the need for a coordinated knowledge sharing strategy (cf. Sects. 8.4.4, 8.3.2). This is even more important as designers have to deal with a continuously growing amount of online information, which becomes increasingly complex as the project progresses. Therefore, distributed design projects should appoint a team member who is responsible for overseeing the communication and documentation of project-related information in the team. The tasks of this manager is • to establish information sharing as a top priority task and an explicit team activity. • to ensure appropriate documentation and dissemination of findings, de- cisions, and results. • to create awareness among team members about each others activities, problems, and progress. • to identify problems and urgent needs in the collaboration infrastructure and communicate them to the management for correction. Established team-building methods can help to identify the right per- sonality for this critical team role (e.g., Belbin, 1996). The communication & information manager should have a good overview on the collaboration activities, the tasks, and the information that is generated in the team. The 162 Conclusion services of the d.store platform can be a valuable support for information management tasks by providing up-to-date views on distributed groupware activities, shared resources, relationships, role participation, and individual involvement.

9.5.2 Teams Should Engage With Domain Experts as Quickly as Possible

The positive impact of interactions with team-external process stakeholders on the performance of the ME310 teams (Sect. 8.4) is in line with the dogma that design processes benefit from applying an “outside-in” perspective. The results indicate a higher degree of satisfaction and an increased number of explored design alternatives for teams who emphasize interactions with coaches and other external contacts. This suggests that designers should start early to gather information and feedback from people who are experts in a particular field of knowledge relevant to the project. This includes, e.g.,: Key Users: Users know best what works for them and what not. Design- ers should identify and involve targeted user groups early to obtain feedback and to iterate over prototyped ideas and concepts until a de- sirable solution is identified. Customers: Customers specify general requirements and guidelines for the design outcome, which need to be communicated to, requested from, and scrutinized by the design team in order to be able to respond to the client’s request. Practitioners: Designers should seek technical advice from experienced practitioners in order to maximize the number of explored design solu- tions, ensure feasibility of the approach, and to speed up the prototyp- ing process. A close involvement of external stakeholders increases the chance that a team has access to critical knowledge and expertise right on time, which again lowers the likelihood of misleading decision making.

9.6 Ongoing Research & Future Work

The results and experiences drawn from the pilot application suggest a continuance and intensification of research in virtual collaboration moni- toring. Additional applications with the d.store platform have started and are planned to capture and compare online interactions in other project settings. Latest platform extensions address the monitoring of additional groupware for software engineering teams, such as revision control systems or ticketing systems. Recent work also started to increase the granularity 9.6. Ongoing Research & Future Work 163 of semantic annotations by applying natural language processing on the body of captured information resources (Ulferts, 2009; Bl¨uher, 2009). Be- ing able to extract the meaning or intent of a shared resource, email, etc. would enable a more detailed analysis of the communication between pro- cess stakeholders. The content of a resource can then be expressed in Team Collaboration Networks by means of appropriate ontologies, supporting the computational analysis of information clusters and semantical coherence in an online information space. Other activities steps involve the distribution of the platform to make its monitoring capabilities available to a broader research community. The prototypical implementation of d.store is to be extended in order to serve as an publicly available on-demand platform for virtual collaboration mon- itoring and data sharing. Appropriate mechanisms for anonymization, ac- cess control, etc., must be found. Additional ontologies need to be defined in order to analyze activities in different project settings such as CAD- based engineering processes. Researchers at the Hasso Plattner Institute and Stanford University have already started in this direction. In future, independent design team observations and the exchange of data and find- ings gained in objective measurements will deepen the scientific discourse and stimulate new hypothesis development and testing in the field of virtual collaboration in engineering design.

References

Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. (2007). Scalable semantic web data management using vertical partitioning. In VLDB ’07: Proceedings of the 33rd international conference on Very large data bases, pages 411–422. VLDB Endowment. Ackerman, M. and Malone, T. (1990). Answer garden: A tool for growing organizational memory. In Proceedings of the ACM SIGOIS and IEEE CS TC-OA conference on Office information systems, pages 31–39. ACM New York, NY, USA. Ahuja, S., Ensor, J., and Horn, D. (1988). The rapport multimedia confer- encing system. In Proceedings of the ACM SIGOIS and IEEECS TC-OA 1988 conference on Office information systems, pages 1–8. ACM New York, NY, USA. Ahuja, S., Ensor, J., and Lucco, S. (1990). A comparison of application sharing mechanisms in real-time desktop conferencing systems. ACM SIGOIS Bulletin, 11(2-3):248. Andronikos, T., Stefanidakis, M., and Papadakis, I. (2009). Adding tempo- ral dimension to ontologies via owl reification. Informatics, Panhellenic Conference on, 0:19–22. Antoniou, G. and Van Harmelen, F. (2004). Web Ontology Language: OWL. In Staab, S. and Studer, R., editors, Handbook on Ontologies, chapter 4, pages 67–92. Springer Verlag. Appelt, W. (1999). WWW Based Collaboration with the BSCW System. In Proceedings of the 26th Conference on Current Trends in Theory and Practice of Informatics, page 78. Springer-Verlag. Ashworth, M. J. (2007). Computational and Empirical Explorations of Work Group Performance. PhD thesis, Carnegie Mellon University,, Pittsburgh, Pennsylvania. Audretsch, D. (1995). Innovation, growth and survival. International Jour- nal of Industrial Organization, 13(4):441–457. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., and Patel-Schneider, P. (2003). The description logic handbook: theory, implementation, and applications. Cambridge University Press. 166 References

Baader, F., Horrocks, I., and Sattler, U. (2004). Description Logics. In Staab, S. and Studer, R., editors, Handbook on Ontologies, chapter 1, pages 3–28. Springer Verlag. Backhaus, K., Erichson, B., Plinke, W., and Weiber, R. (2008). Multivari- ate Analysemethoden: Eine anwendungsorientierte Einf¨uhrung. Springer, Berlin, 12. edition. Baecker, R., Grudin, J., Buxton, W., and Greenberg, S. (1995). Read- ings in Human-Computer Interaction: Toward the Year 2000. Morgan Kaufmann. Bannon, L. and Bødker, S. (1997). Constructing common information spaces. In Proceedings of the fifth conference on European Conference on Computer-Supported Cooperative Work, pages 81–96. Kluwer Academic Publishers Norwell, MA, USA. Baya, V. (1996). Information handling behavior of designers during concep- tual design: Three experiments. PhD thesis, Stanford University, Stan- ford, CA. Belbin, M. (1996). Team Roles at Work. Butterworth-Heinemann. Berners-Lee, T., Fielding, R., and Masinter, L. (1998). Uniform Resource Identifiers (URI): Generic Syntax. The Internet Engineering Task Force, http://www.ietf.org/rfc/rfc2396.txt. Bessant, J. (1979). Preparing for design studies: ways of watching. notes towards appropriate methodology for studying functional specialists. De- sign Studies, 1(2):77–83. Beyer, H. and Holtzblatt, K. (1998). Contextual design: defining customer- centered systems. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Bird, C., Gourley, A., Devanbu, P., Gertz, M., and Swaminathan, A. (2006). Mining email social networks. In Proceedings of the 2006 international workshop on Mining software repositories, page 143. ACM. Bl¨uher,A. (2009). Computergest¨utzteVerfahren zur Extraktion und Un- tersuchung von Nominalphrasen in der Email-Kommunikation verteilter Designteams. Master’s thesis, Hasso Plattner Institute f¨urSoftwaresys- temtechnik, Universit¨atPotsdam, Germany. Brooks, F. (1995). The Mythical Man-Month: Essays on Software Engi- neering. Addison-Wesley Reading, MA;. Brown, T. (2008). Design thinking. Harvard Business Review, pages 85–92. Cairncross, F. (2001). The death of distance. How the communications revolution is changing our lives. Harvard Business Press. Carroll, J. J., Bizer, C., Hayes, P., and Stickler, P. (2005). Named graphs, provenance and trust. In WWW ’05: Proceedings of the 14th interna- tional conference on World Wide Web, pages 613–622, New York, NY, USA. ACM. References 167

Carroll, J. J., Dickinson, I., Dollin, C., Reynolds, D., Seaborne, A., and Wilkinson, K. (2004). Jena: implementing the semantic web recommen- dations. In WWW Alt. ’04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pages 74–83, New York, NY, USA. ACM. Casotto, A., Newton, A. R., and Sangiovanni-Vincentelli, A. (1990). Design Management based on Design Traces. In DAC ’90: Proceedings of the 27th ACM/IEEE Design Automation Conference, pages 136–141, New York, NY, USA. ACM. Chang, B. (1998). In-place editing of web pages: Sparrow community- shared documents. Computer Networks and ISDN Systems, 30(1-7):489– 498. Chen, H., Cannon, D., Gabrio, J., Leifer, L., Toye, G., and Bailey, T. (2005). Using wikis and weblogs to support reflective learning in an in- troductory engineering design course. In Human Behaviour in Design’05: Preprints of the International Workshop on Human Behaviour, page 95, Melbourne, Australia. Cimpian, E., Meyer, H., Roman, D., Sirbu, A., Steinmetz, N., Staab, S., and Toma, I. (2008). Ontologies and Matchmaking. In Kuropka, D., Tr¨oger, P., Staab, S., and Weske, M., editors, Semantic Service Provisioning, chapter 3, pages 19–54. Springer. Clancey, W. (1997). Situated cognition: On human knowledge and computer representations. Cambridge University Press. Clark, H. (1996). Using language. Computational Linguistics, 23(4). Cockayne, W. R. (2004). A Study of the Formation of Innovation Ideas in Informal Networks. PhD thesis, Stanford University, Stanford, CA. Cohen, S. G. and Bailey, D. E. (1997). What makes teams work: Group effectiveness research from the shop floor to the executive suite. Journal of Management, 23(3):239–290. Conklin, J. and Begeman, M. (1987). gIBIS: A Hypertext Tool for Team Design Deliberation. In Proceedings of the ACM conference on Hypertext, pages 247–251. ACM New York, NY, USA. Conklin, J. and Begeman, M. (1989). gIBIS: A Tool for all Reasons. Journal of the American Society for Information Science, 40(3). Connolly, D. and Masinter, L. (2000). The ’text/html’ Media Type. The Internet Engineering Task Force, http://www.ietf.org/rfc/rfc2854.txt. Constantine, L. and Lockwood, L. (1999). Software for use: a practi- cal guide to the models and methods of usage-centered design. ACM Press/Addison-Wesley Publishing Co. New York, NY, USA. Coons, S. (1963). An outline of the requirements for a computer-aided design system. In Proceedings of the May 21-23, 1963, spring joint com- puter conference, pages 299–304. ACM New York, NY, USA. 168 References

Cooper, R. (1998). Benchmarking new product performance: results of the best practices study. European Management Journal, 16(1):1–17. Cooper, R. and Kleinschmidt, E. (2000). New product performance: What distinguishes the star products. Australian Journal of Management, 25(1). Cramton, C. D. (1997). Information Problems in Dispersed Teams. In Academy of Management Best Paper Proceedings, volume 1997, pages 298–302, Georgia, Southern University. Cramton, C. D. (2001). The Mutual Knowledge Problem and Its Conse- quences for Dispersed Collaboration. Organization Science, 12(3):346– 371. Cross, N. (2006). Designerly Ways of Knowing. Springer. Cross, N. and Clayburn, A. (1995). Observations of teamwork and social processes in design. Design Studies, 16(2):143–170. DeMarco, T. and Lister, T. (1999). Peopleware: Productive Projects and Teams. Dorset House, 2 edition. Dempsey, G. E. and Petschie, J. N. (2006a). Library Law: Monitoring Employee Electronic Communications, Part I. http://www.nsls.info/articles/detail.aspx?articleID=54, retrieved Sep. 17, 2010. Dempsey, G. E. and Petschie, J. N. (2006b). Library Law: Monitoring Employee Electronic Communications, Part II. http://www.nsls.info/articles/detail.aspx?articleID=55, retrieved Sep. 17, 2010. Dennis, A. and Valacich, J. (1999). Rethinking media richness: Towards a theory of media synchronicity. In Proceedings of the 32nd Hawaii International Conference on System Sciences, volume 1. DeSanctis, G. and Gallupe, R. (1987). A foundation for the study of group decision support systems. Management science, pages 589–609. DeSanctis, G. and Poole, M. S. (1994). Capturing the Complexity in Ad- vanced Technology Use: Adaptive Structuration Theory. Organization Science, 5(2):121–147. Di Janni, A. (1986). A monitor for complex CAD systems. In DAC ’86: Proceedings of the 23rd ACM/IEEE Design Automation Conference, pages 145–151, Piscataway, NJ, USA. IEEE Press. Dixon, J. R. (1987). On research methodology towards a scientific theory of engineering design. AI EDAM, 1(03):145–157. Driskell, J., Radtke, P., and Salas, E. (2003). Virtual teams: Effects of technological mediation on team performance. Group Dynamics: Theory, Research, and Practice, 7(4):297–323. Dunkel, J., Eberhart, A., Fischer, S., Kleiner, C., and Koschel, A. (2008). Systemarchitekturen f¨urverteilte Anwendungen. Client-Server, Multi- References 169

Tier, SOA, Event Driven Architecture, P2P, Grid, Web 2.0. Hanser Fachbuch. Dym, C., Agogino, A., Eris, O., Frey, D., and Leifer, L. (2006). Engi- neering Design Thinking, Teaching, and Learning. IEEE Engineering Management Review, 34(1):65–92. Ellis, C., Gibbs, S., and Rein, G. (1991). Groupware: some issues and experiences. Communications of the ACM, 34(1):39–58. Engelbart, D. (1962). Augmenting human intellect: A conceptual frame- work. Stanford Research Institute Technical Report AFOSR-3223, Con- tract AF, 49(638):1024. Eris, O. (2002). Perceiving, Comprehending, and Measuring Design Activ- ity Through the Questions Asked while Designing. PhD thesis, Stanford University, Stanford, CA. Erl, T. (2005). Service-oriented architecture: concepts, technology, and de- sign. Prentice Hall PTR, Upper Saddle River, NJ, USA. Europ. Commission (1995). Directive 95/46/EC of the European Parlia- ment and of the Council of 24 October 1995 on the protection of indi- viduals with regard to the processing of personal data and on the free movement of such data. http://ec.europa.eu/justice home/fsj/privacy/. Eveland, J. D. and Bikson, T. K. (1988). Work group structures and computer support: A field experiment. ACM Transactions on Office Information Systems, 6:354–379. Fallows, D. (2002). Email at work. Pew Internet & Americal Life Project. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and Berners-Lee, T. (1999a). Hypertext Transfer Protocol – HTTP/1.1. The Internet Engineering Task Force, http://www.ietf.org/rfc/rfc2616.txt. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and Berners-Lee, T. (1999b). RFC 2616 - Hypertext Trans- fer Protocol – HTTP/1.1. Internet Engineering Task Force, http://tools.ietf.org/html/rfc2616. Fielding, R. and Taylor, R. (2002). Principled design of the modern web ar- chitecture. ACM Transactions on Internet Technology (TOIT), 2(2):115– 150. Fielding, R. T. (2000). Architectural styles and the design of network-based software architectures. PhD thesis, University of California, Irvine. Finger, S. and Dixon, J. (1989). A review of research in mechanical en- gineering design. part i: Descriptive, prescriptive, and computer-based models of design processes. Research in Engineering Design, 1(1):51–67. Fischer, G., Grudin, J., Lemke, A., McCall, R., Ostwald, J., Reeves, B., and Shipman, F. (1992). Supporting indirect collaborative design with integrated knowledge-based design environments. Human-Computer In- teraction, 7(3):281–314. 170 References

Fischer, J. (2007). Identifying, Visualizing and Supporting Social Net- works for Collaborative Work in a CSCW-System. Master’s thesis, http://www.mrl.nott.ac.uk/˜jef/thesis jef.pdf. Frankenberger, E., Badke-Schaub, P., and Birkhofer, H. (1997). Factors influencing design work, empirical investigations of teamwork in engi- neering design practice. International Conference on Engineering Design (ICED’97), pages 387–392. Fussell, S., Kraut, R., Lerch, F., Scherlis, W., McNally, M., and Cadiz, J. (1998). Coordination, overload and team performance: effects of team communication strategies. In Proceedings of the 1998 ACM conference on Computer supported cooperative work, pages 275–284. ACM New York, NY, USA. Futrelle, J. (2006). Harvesting RDF Triples. Lecture Notes in Computer Science: Provenance and Annotation of Data, 4145:64–72. Galegher, J. and Kraut, R. (1994). Computer-mediated communication for intellectual teamwork: An experiment in group writing. Information Systems Research, 5(2):110. Gantz, J., Reinsel, D., Chute, C., Schlichting, W., McArthur, J., Minton, S., Xheneti, I., Toncheva, A., and Manfrediz, A. (2007). The expanding digital universe: A forecast of worldwide information growth through 2010. EMC Corporation. Gary, L. (2003). Dealing with a Project’s “Fuzzy Front End”. Harvard Management Update. Geisler, C. and Rogers, E. (2000). Technological mediation for design col- laboration. In Proceedings of IEEE professional communication society international professional communication conference and Proceedings of the 18th annual ACM international conference on Computer documen- tation: technology & teamwork, pages 395–405. IEEE Educational Activ- ities Department Piscataway, NJ, USA. Geisler, C., Rogers, E., and Tobin, J. (1999). Going public: Collabora- tive systems design for multidisciplinary conversations. Lecture notes in computer science, pages 89–100. Gero, J. (1990). Design prototypes: a knowledge representation schema for design. AI magazine, 11(4):26. Gero, J. (1998). Conceptual designing as a sequence of situated acts. Lec- ture Notes in Computer Science, 1454:165–177. Gero, J. and Kannengiesser, U. (2004). The situated function–behaviour– structure framework. Design Studies, 25(4):373–391. Gloor, P., Paasivaara, M., Schoder, D., and Willems, P. (2008). Finding collaborative innovation networks through correlating performance with social network structure. International Journal of Production Research, 46(5):1357. References 171

Gloor, P. and Zhao, Y. (2004). TeCFlow – A Temporal Communication Flow Analyzer for Social Network Analysis. In Workshop on Social Net- works for Design and Analysis: Using Network Information in CSCW. Gloor, P. and Zhao, Y. (2006). Analyzing Actors and Their Discussion Topics by Semantic Social Network Analysis. In Proceedings of the Tenth International Conference on Information Visualization, IV 2006, pages 130–135. Goland, Y., Whitehead, E., Faizi, A., Carter, S., and Jensen, D. (1999). RFC 2518 - HTTP Extensions for Distributed Authoring – WEBDAV. Internet Engineering Task Force, http://tools.ietf.org/html/rfc2518. Greif, I. (1988). Computer-supported cooperative work: A book of readings. Morgan Kaufmann. Gruber, T. R. (1993). A translation approach to portable ontology speci- fications. Knowl. Acquis., 5(2):199–220. Grudin, J. (1988). Why cscw applications fail: problems in the design and evaluationof organizational interfaces. In Proceedings of the 1988 ACM conference on Computer-supported cooperative work, pages 85–93. ACM New York, NY, USA. Grudin, J. (1994). CSCW: History and focus. IEEE Computer, 27(5):19– 26. Gutierrez, C., Hurtado, C., and Vaisman, A. (2007). Introducing time into rdf. IEEE Transactions on Knowledge and Data Engineering, 19(2):207. Halpin, H., Iannella, R., Suda, B., and Walsh, N. (2010). Rep- resenting vCard Objects in RDF. W3C Member Submission, http://www.w3.org/TR/vcard-rdf/. Hightower, R. T., Warkentin, M. E., Sayeed, L., and McHaney, R. (1998). Information Exchange in Virtual Work Groups. pages 199–216. Hitzler, P., Kr ”otzsch, M., Rudolph, S., and Sure, Y. (2008). Semantic Web: Grundla- gen. Springer. Horrocks, I. and Patel-Schneider, P. (2004). Reducing owl entailment to description logic satisfiability. Web Semantics: Science, Services and Agents on the World Wide Web, 1(4):345–357. Horrocks, I., Patel-Schneider, P., and Van Harmelen, F. (2003). From shiq and rdf to owl: The making of a web ontology language. Web semantics: science, services and agents on the World Wide Web, 1(1):7–26. IEEE (2000). IEEE Std 1471-2000, Recommended Practice for Architec- tural Description of Software-Intensive Systems. Technical report, IEEE Architecture Working Group. Ishii, H. (1990). TeamWorkStation: towards a seamless shared workspace. In Proceedings of the 1990 ACM conference on Computer-supported co- operative work, pages 13–26. ACM New York, NY, USA. 172 References

ISO/IEC (1996). ISO 10746-2: Information Technology – Open Distributed Processing – Reference Model: Foundations. International Organization for Standardization, http://www.iso.org/. ISO/IEC (1998). ISO 9241-11: Ergonomic requirements for office work with visual display terminals (VDT) – Part 11: Guidance on usability. International Organization for Standardization, http://www.iso.org/. ISO/IEC (1999). ISO 13407: Human-Centred Design Processes for In- teractive Systems. International Organization for Standardization, http://www.iso.org/. ISO/IEC (2005). ISO 19199: Geographic Information – Services. Interna- tional Organization for Standardization, http://www.iso.org/. Jabi, W. (2003). Reflections on computer-supported cooperative design systems. In Digital design: research and practice: proceedings of the 10th International Conference on Computer Aided Architectural Design Fu- tures, pages 169–180. Springer. Jacoby, R. and Rodriguez, D. (2008). Innovation, growth, and getting to where you want to go. Building Design Strategy: Using Design to Achieve Key Business Objectives, pages 43–51. Jarvenpaa, S. and Leidner, D. (1999). Communication and Trust in Global Virtual Teams. Organization Science, 10(6):791–815. Jarvenpaa, S. L. and Ives, B. (1994). The global network organization of the future: information management opportunities and challenges. J. Manage. Inf. Syst., 10(4):25–57. Johansen, R. (1988). Groupware: Computer support for business teams. The Free Press, New York, NY, USA. Johansson, C., Dittrich, Y., and Juustila, A. (1999). Software Engineering Across Boundaries Student Project in Distributed Collaboration. IEEE Transaction on Professional Communication, 42(4):286–296. Johnson, J. (1999). A field study of partially distributed group support. In Proceedings of the 32nd Hawaii International Conference on System Sciences, Hawaii. Jokela, T. (2002). Making user-centred design common sense: striving for an unambiguous and communicative UCD process model. In NordiCHI ’02: Proceedings of the second Nordic conference on Human-computer interaction, pages 19–26, New York, NY, USA. ACM Press. Ju, W., Neeley, L., and Leifer, L. (2007). Design, Design, and Design: An Overview of Stanford’s Center for Design Research. Workshop on Exploring Design as a Research Activity, CHI 2007. Jung, E. C., Sato, K., Chen, Y., He, X., MacTavish, T., and Cracchiolo, D. (2005). DIF Knowledge Management System: Bridging Viewpoints for Interactive System Design. In Proc. of 11th International Conference on Human-Computer Interaction (HCI’05), Las Vegas. References 173

Katzenbach, J. R. and Smith, D. K. (2001). The discipline of teams: a mindbook-workbook for delivering small group performance. Wiley, New York, NY. Kelley, D. and Hartfield, B. (1996). The Designer’s Stance. In Winograd, T., editor, Bringing Design To Software, pages 151–164. ACM New York, NY, USA. Kelley, T. and Littman, J. (2001). The Art of Innovation: Lessons in Cre- ativity from IDEO, America’s Leading Design Firm. Broadway Business. Kidane, Y. and Gloor, P. (2007). Correlating temporal communication pat- terns of the Eclipse open source community with performance and cre- ativity. Computational & Mathematical Organization Theory, 13(1):17– 27. Kim, J. and Wilemon, D. (1999). Managing the fuzzy front-end of the new product development process. In Management of Engineering and Tech- nology, 1999. Technology and Innovation Management. PICMET’99. Portland International Conference on, volume 1. Kim, J. and Wilemon, D. (2002). Strategic issues in managing innovation’s fuzzy front-end. European Journal of Innovation Management, 5(1):27– 39. Koen, P., Ajamian, G., Boyce, S., Clamen, A., Fisher, E., Fountoulakis, S., Johnson, A., Puri, P., and Seibert, R. (2002). Fuzzy front end: Effective methods, tools, and techniques. The PDMA toolbook for new product development. Koen, P., Ajamian, G., Burkart, R., Clamen, A., Davidson, J., D’Amore, R., Elkins, C., Herald, K., Incorvia, M., Johnson, A., et al. (2001). Pro- viding clarity and a common language to the” fuzzy front end”. Research- Technology Management, 44(2):46–55. Kondratieff, N. and Stolper, W. (1935). The Long Waves in Economic Life. The Review of Economics and Statistics, 17(6):105–115. Kraemer, K. and King, J. (1988). Computer-based systems for cooperative work and group decision making. ACM Computing Surveys (CSUR), 20(2):115–146. Kvan, T. (2000). Collaborative design: what is it? Automation in construc- tion, 9(4):409–415. Layzell, P., Brereton, O., and French, A. (2000). Supporting collaboration in distributed software engineering teams. In Seventh Asia-Pacific Soft- ware Engineering Conference, 2000. APSEC 2000. Proceedings., pages 38–45. Leifer, L., Culpepper, W., Cannon, D., Eris, O., Liang, T., Bell, D., Bier, E., and Pier, K. (2002). Measuring the performance of online distributed team innovation (learning) services. Proceedings of the e-Technologies in Engineering Education: Learning Outcomes Providing Future Possibili- ties. 174 References

Liang, T., Cannon, D., Feland, J., Mabogunje, A., Yen, S., Yang, M., and Leifer, L. (1999). New dimensions in internet-based design capture and reuse. In proceedings of the International Conference on Engineering Design, Munich, Germany, August, pages 24–26. Lim, Y. and Sato, K. (2001). Development of design information frame- work for interactive systems design. In Proceedings of the 5th Asian International Symposium on Design Research. Loftus, C., McMahon, C., and Hicks, B. (2008). Issues and challenges for improving email use in engineering design. In NordDesign 2008, Univer- sity of Technology, Tallinn, Estonia. Lurey, J. and Raisinghani, M. (2001). An Empirical Study of Best Practices in Virtual Teams. Information & Management, 38(8):523–544. Malone, T., Grant, K., Lai, K., Rao, R., and Rosenblitt, D. (1987). Semistructured messages are surprisingly useful for computer-supported coordination. ACM Transactions on Information Systems (TOIS), 5(2):115–131. Mankin, D. A., Bikson, T. K., and Cohen, S. G. (1996). Teams and technol- ogy : fulfilling the promise of the new organization / Don Mankin, Susan G. Cohen, Tora K. Bikson. Harvard Business School Press, Boston, Mass. Mao, J., Vredenburg, K., Smith, P., and Carey, T. (2005). The state of user-centered design practice. Communications of the ACM, 48(3):105– 109. Matthew, C., Laskey, K., McCabe, F., Brown, P. F., and Metz, R. (2006). Reference Model for Service Oriented Architecture 1.0. OASIS, http://www.oasis-open.org/committees/tc home.php?wg abbrev=soa-rm. Mayhew, D. J. (1999). The usability engineering lifecycle: a practitioner’s handbook for user interface design. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Maznevski, M. L. and Chudoba, K. M. (2000). Bridging Space Over Time: Global Virtual Team Dynamics and Effectiveness. Organization Science, 11(5):473–492. McBride, B. (2004). The Resource Description Framework (RDF) and its Vocabulary Description Language RDFS. In Staab, S. and Studer, R., editors, Handbook on Ontologies, chapter 3, pages 51–66. Springer Verlag. McCall, R., Bennett, P., D’Oronzio, P., Ostwald, J., Shipman, F., and Wal- lace, N. (1990). PHIDIAS: Integrating CAD Graphics into Dynamic Hy- pertext. In Hypertext: Concepts, Systems and Applications: Proceedings of the First European Conference on Hypertext, pages 152–165. Cam- bridge University Press. McDonough, E. F., Kahnb, K. B., and Barczaka, G. (2001). An investiga- tion of the use of global, virtual, and colocated new product development teams. Journal of Product Innovation Management, 18(2):110–120. References 175

McDowell, L., Etzioni, O., and Halevy, A. (2004). Semantic Email: Theory and Applications. Web Semantics: Science, Services and Agents on the World Wide Web, 2(2):153–183. McGrath, J. and Hollingshead, A. (1994). Groups interacting with tech- nology: Ideas, evidence, issues, and an agenda. Sage Thousand Oaks, Calif. Milne, A. J. (2005). An Information-theoretic Approach to the Study of Ubiquitous Computing Workspaces Supporting Geographically Distributed Engineering Design Teams as Goup-users. PhD thesis, Stanford Univer- sity, Stanford, CA. Mowbray, T. J. and Ruh, W. (1998). Inside CORBA: Distributed Object Standards and Applications. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. Mueller, C. (2008). Graphentheoretische Analyse der Evolution von Wiki- basierten Netzwerken f¨urselbstorganisiertes Wissensmanagement. Gito. Neumann, T. and Weikum, G. (2009). Scalable join processing on very large rdf graphs. In SIGMOD ’09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 627–640, New York, NY, USA. ACM. Nielsen, J. (1994). Usability Engineering. Morgan Kaufmann. Norman, D. and Draper, S., editors (1986). User Centered System Design: New Perspectives on Human-Computer Interaction. Lawrence Erlbaum Associates, Hillsdale, NJ. Nunamaker, J. F., Dennis, A. R., Valacich, J. S., Vogel, D., and George, J. F. (1991). Electronic meeting systems to support group work. Com- mun. ACM, 34(7):40–61. Olson, G. and Olson, J. (2000). Distance matters. Human-computer inter- action, 15(2):139–178. Olson, J. and Teasley, S. (1996). Groupware in the wild: lessons learned from a year of virtual collocation. In Proceedings of the 1996 ACM con- ference on Computer supported cooperative work, pages 419–427. ACM. O’Reilly, T. (2005). What Is Web 2.0 – Design Patterns and Busi- ness Models for the Next Generation of Software. O’Reilly Media, http://oreilly.com/web2/archive/what-is-web-20.html, Accessed Jan. 12, 2010. Overdick, H. (2007). The resource-oriented architecture. IEEE Congress on Services, pages 340–347. Pahl, G., Beitz, W., Wallace, K., Blessing, L., and Bauert, F. (1996). En- gineering design: a systematic approach. Springer Verlag. Perry, M. (2008). A Framework to Support Spatial, Temporal and Thematic Analytics over Semantic Web Data. PhD thesis, Wright State University. 176 References

Perry, M., Fruchter, R., and Rosenberg, D. (1999). Co-ordinating dis- tributed knowledge: A study into the use of an organisational memory. Cognition, Technology & Work, 1(3):142–152. Powell, A., Piccoli, G., and Ives, B. (2004). Virtual teams: a review of current literature and directions for future research. The DATA BASE for Advances in Information Systems, 35(1):7. Preist, C. (2004). A Conceptual Architecture for Semantic Web Services. The Semantic Web–ISWC 2004, pages 395–409. Ramesh, B. and Tiwana, A. (1999). Supporting collaborative process knowledge management in new product development teams. Decision Support Systems, 27(1-2):213–235. Reinertsen, D. (1999). Taking the fuzziness out of the fuzzy front end. Research technology management, 42(6):25–31. Resnick, P. (2001). RFC 2822 - Internet Message Format. Internet Engi- neering Task Force, http://tools.ietf.org/html/rfc2822. Rosenbaum, S., Rohn, J., and Humburg, J. (2000). A toolkit for strategic usability: results from workshops, panels, and surveys. In Proceedings of the SIGCHI conference on Human factors in computing systems, page 344. ACM. Sarker, S. and Sahay, S. (2002). Information Systems Development by US-Norwegian Virtual Teams: Implications of Time and Space. In Pro- ceedings of the 35th Annual Hawaii International Conference on System Sciences (HICSS’02)-Volume 1-Volume 1, page 18. IEEE Computer So- ciety. Schmidt, J., Montoya-Weiss, M., and Massey, A. (2001). New Product De- velopment Decision-Making Effectiveness: Comparing Individuals, Face- To-Face Teams, and Virtual Teams*. Decision Sciences, 32(4):575–600. Schmidt, K. (1998). Cooperative design: Prospects for CSCW in design. Design Sciences and Technology, 6(2):5–18. Schmidt, K. and Bannon, L. (1992). Taking cscw seriously. Computer Supported Cooperative Work (CSCW), 1(1):7–40. Sch¨on,D. (1992). Designing as reflective conversation with the materials of a design situation. Research in Engineering Design, 3(3):131–147. Schumpeter, J. A. (1942). Capitalism, Socialism, and Democracy. Harper and Brothers, New York. Schwaber, C., Rymer, J. R., and Stone, J. (2006). The Changing Face Of Application Life-Cycle Management. Forrester Research. Shannon, C. and Weaver, W. (1949). The mathematical theory of infor- mation. Urbana: University of Illinois Press, 97. Sheppard, S. (2003). A Description of Engineering: An Essential Backdrop for Interpreting Engineering Education. In Proceedings (CD), Mudd De- sign Workshop IV, Harvey Mudd College, Claremont, Cal. References 177

Shiu, E. and Lenhart, A. (2004). How Americans use instant messaging. Pew Internet & Americal Life Project. Skogstad, P. (2009). A Unified Innovation Process Model For Engineering Designers and Managers. PhD thesis, Stanford University, Stanford, CA. Skogstad, P., Steinert, M., Gumerlock, K., and Leifer, L. (2009). We need a universal design project outcome performance measurement metric: a discussion based on empirical research. In Bergendahl, M. N., Grimhe- den, M., Leifer, L., Skogstad, P., and Seering, W., editors, Proceedings of ICED’09, Design Methods and Tools, volume 6, pages 473–484. The Design Society. Sommerville, I. (2006). Software Engineering. Addison Wesley, 8th edition. Spenser, J. (2009). The Airplane: How Ideas Gave Us Wings. Harper Paperbacks. Sproull, L. and Kiesler, S. (1986). Reducing social context cues: Elec- tronic mail in organizational communications. Management Science, 32(11):1492–1512. Sproull, L. and Kiesler, S. (1992). Connections: New Ways of Working in the Networked Organization. MIT Press. Stefik, M., Foster, G., Bobrow, D., Kahn, K., Lanning, S., and Suchman, L. (1987). Beyond the chalkboard: computer support for collaboration and problem solving in meetings. Communications of the ACM, 30(1):47. Steinfield, C., Jang, C., Huysman, M., David, K., Lloyd, J., Goodman, E., Hinds, T., and Andriessen, E. (2002). Communication and collaboration processes in global virtual teams. Unpublished INTEnD report, Michigan State University. East Lansing, Michigan. Sterpe, P., Cullen, A., Gilpin, M., Schwaber, C., and Ranade, K. (2007). App Dev Managers Should Measure Team Productivity. Forrester Re- search. Stonebraker, M., Rowe, L. A., and Hirohama, M. (1990). The implemen- tation of postgres. In IEEE Transactions on Knowledge and Data Engi- neering, pages 340–355. Tang, J. and Leifer, L. (1991). An observational methodology for studying group design activity. Research in engineering design, 2(4):209–219. Tebay, R., Atherton, J., and Wearne, S. (1984). Mechanical engineering design decisions: instances of practice compared with theory. Proceedings of the Institution of Mechanical Engineers. Part B. Management and engineering manufacture, 198(6):87–96. Thompson, L. and Coovert, M. (2003). Teamwork online: The effects of computer conferencing on perceived confusion, satisfaction, and post- discussion accuracy. Group Dynamics: Theory, Research, and Practice, 7(2):135–151. Toye, G., Cutkosky, M., Leifer, L. J., Tenenbaum, J. M., and Glicksman, J. (1994). SHARE: A Methodology and Environment for Collaborative 178 References

Product Development. International Journal of Intelligent and Cooper- ative Information Systems, 3(2):129–153. Tucci, L. (2008). Managing the application development lifecy- cle requires solid metrics. http://searchcio.techtarget.com/news/arti- cle/0,289142,sid182 gci1293866,00, retrieved Jul. 16, 2010. Uflacker, M. (2007). Resource-oriented knowledge sharing in user-centered design communities. In Proceedings of the 10th IFAC/IFIP/IFORS/IEA Symposium on Analysis, Design, and Evaluation of Human-Machine Sys- tems. Uflacker, M. (2009). Implementation of a service platform to evaluate virtual team communication. Proceedings of the 3rd Ph.D. Retreat of the HPI Research School on Service-oriented Systems Engineering 27, Hasso-Plattner-Institut f¨urSoftwaresystemtechnik. Uflacker, M., Skogstad, P., Zeier, A., and Leifer, L. (2009). Analysis of virtual design collaboration with team communication networks. In 17th International Conference on Engineering Design (ICED’09), Stanford, CA. Uflacker, M. and Zeier, A. (2008a). d.store: Capturing team information spaces with resource-based information networks. In IADIS International Conference WWW/Internet 2008, Freiburg, Germany. Uflacker, M. and Zeier, A. (2008b). Extending the situated function- behaviour-structure framework for user-centered software design. In Pro- ceedings of the Third Internation Conference on Design Computing and Cognition, DCC’08. Uflacker, M. and Zeier, A. (2008c). A graph-based approach to assess- ing multi-modal team communication in global organizations. In IEEE Symposium on Advanced Management of Information for Globalized En- terprises. Uflacker, M. and Zeier, A. (2009). A platform for the temporal evalua- tion of team communication in distributed design environments. In 13th International Conference on Computer-Supported Cooperative Work in Design (CSCWD’09), Santiago, Chile. Uflacker, M. and Zeier, A. (2010a). An Instrument for Real-Time Design Interaction Capture and Analysis. In Meinel, C. and Leifer, L., editors, Design Thinking: Understand – Improve – Apply. Springer (in print). Uflacker, M. and Zeier, A. (2010b). A semantic network approach to ana- lyzing virtual team interactions in the early stages of conceptual design. Future Generation Computer Systems, 27(1):88–99. Ulferts, J. (2009). Strukturelle und inhaltliche Analyse der E-Mail- Kommunikation in global verteilten Designprojekten. Master’s thesis, Hasso Plattner Institute f¨urSoftwaresystemtechnik, Universit¨atPots- dam, Germany. References 179

Ulrich, K. and Seering, W. (1987). A computational approach to concep- tual design. In Proceedings of ICED’87, International Conference on Engineering Design. Usl¨ander,T. (2010). Service-oriented design of environmental information systems. PhD thesis. Venturi, G. and Troost, J. (2004). Survey on the ucd integration in the in- dustry. In Proceedings of the third Nordic conference on Human-computer interaction, page 452. ACM. Vosinakis, S., Koutsabasis, P., Stavrakis, M., Viorres, N., and Darzentas, J. (2007). Supporting conceptual design in collaborative virtual envi- ronments. In Proc. of 11th Panhellenic Conference on Informatics, PCI 2007. Vosinakis, S., Koutsabasis, P., Stavrakis, M., Viorres, N., and Darzentas, J. (2008). Virtual environments for collaborative design: requirements and guidelines from a social action perspective. CoDesign, 4(3):133–150. Vredenburg, K., Isensee, S., and Righi, C. (2002a). User Centered Design: An integrated approach. PTR Prentice Hall, Indianapolis. Vredenburg, K., Mao, J., Smith, P., and Carey, T. (2002b). A survey of user-centered design practice. In Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, pages 471–478. ACM New York, NY, USA. W3C (2004a). OWL Web Ontology Language Reference. W3C Recom- mendation, http://www.w3.org/TR/owl-ref/. W3C (2004b). RDF Primer. W3C Recommendation, http://www.w3.org/TR/rdf-primer/. W3C (2004c). RDF Semantics. W3C Recommendation, http://www.w3.org/TR/rdf-mt/. W3C (2004d). RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation, http://www.w3.org/TR/rdf-schema/. W3C (2004e). RDF/XML Syntax Specification (Revised). W3C Recom- mendation, http://www.w3.org/TR/rdf-syntax-grammar/. W3C (2004f). Resource Description Framework (RDF): Concepts and Abstract Syntax. W3C Recommendation, http://www.w3.org/TR/rdf- concepts/. W3C (2004g). SWRL: A Semantic Web Rule Language Com- bining OWL and RuleML. W3C Member Submission, http://www.w3.org/Submission/SWRL/. W3C (2008). SPARQL Query Language for RDF. W3C Recommendation, http://www.w3.org/TR/rdf-sparql-query/. Wageman, R., Hackman, J. R., and Lehman, E. V. (2005). Team diag- nostic survey: Development of an instrument. The Journal of Applied Behavioral Science, 41(4):373. 180 References

Wang, L., Shen, W., Xie, H., Neelamkavil, J., and Pardasani, A. (2002). Collaborative conceptual design—state of the art and future trends. Computer-Aided Design, 34(13):981–996. Weiss, L. (2002). Developing Tangible Strategies. Design Management Journal, 13(1):33–38. Whitman, M. E., Townsend, A. M., and Aalberts, R. J. (1999). Consider- ations for an effective telecommunications-use policy. Commun. ACM, 42(6):101–108. Wiegers, T. and Knoop, W. (1998). Visualisation of engineering process to support monitoring and control of design processes. In Proceedings of TMCE ’98 Tools and Methods of Concurrent Engineering Symposium, pages 360–367, Manchester, England. Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D., and Ding, L. (2003). Supporting scalable, persistent semantic web applications. Data Engi- neering, 51:33. Wilson, P. (1991). Computer supported cooperative work: an introduction. Springer. Winograd, T. (1986). A language/action perspective on the design of coop- erative work. In Proceedings of the 1986 ACM conference on Computer- supported cooperative work, pages 203–220. ACM New York, NY, USA. Winograd, T. (1996). Bringing design to software. ACM New York, NY, USA. Wong, S. and Burton, R. (2000). Virtual teams: what are their characteris- tics, and impact on team performance? Computational & Mathematical Organization Theory, 6(4):339–360. Yen, S. J. (2000). Capturing Multimodal Design Activities in Support of Information Retrieval and Process Analysis. PhD thesis, Stanford Uni- versity, Stanford, CA. Appendix A Case Study Data

A.1 Individual Team Member Scores

The following pages provide raw data of the individual team member in- teractions in the testbed projects, as queried from the d.store platform. The values have been measured at the end of the three project phases (i.e., academic quarters, approx. three month each). The three tables on the following pages show the non-aggregated query results.

Table A.1. Individual team member attributes queried from the generated Team Collaboration Networks. The ID serves as a reference for the data table entries on the following pages.

ID Description I01-d Number of distinct emails sent I02-d Number of distinct initiating emails sent I03-d Number of distinct email replies I04-d Number of attachments sent with distinct emails I05-d Number of URLs in distinct emails I06-b Number of wiki pages created I07-b Number of distinct wiki pages edited I08-b Number of total page edits and creations I09-b Number of new files uploaded to shared folder I10-b Number of distinct revised files uploaded to shared folder I11-b Number of total file uploads and creations I12-b Number of distinct files viewed in shared folder I13-b Number of total file views in shared folder I14-d Number of distinct emails sent to team-externals I15-d Number of distinct emails sent internally only A.1. Individual Team Member Scores 183

Individual Scores, 1st Period I01-d I02-d I03-d I04-d I05-d I06-b I07-b I08-b I09-b I10-b I11-b I12-b I13-b I14-d I15-d 1.1 34 14 20 1 12 6 7 24 163 136 300 206 252 2 32 1.2 34 24 10 6 4 5 6 16 15 1 16 122 124 2 32 1.3 21 12 9 1 10 22 9 61 8 2 10 33 37 5 16 2.1 12 0 12 0 4 0 3 3 8 4 12 19 23 0 12 Alpha 2.2 16 9 7 1 3 1 5 8 4 4 8 77 79 2 14 2.3 12 7 5 0 0 3 5 11 7 2 9 11 15 0 12 1.1 55 26 29 3 17 6 8 31 13 0 13 42 44 8 47 1.2 64 31 33 15 19 20 10 54 1 0 1 14 15 14 50 1.3 45 14 31 6 10 4 7 25 28 1 29 11 15 4 41 2.1 26 13 13 17 15 1 1 5 17 1 18 20 31 1 25 Beta 2.2 5 0 5 2 1 3 4 9 6 0 6 13 21 0 5 2.3 20 7 13 17 2 1 3 1 0 0 0 9 14 0 20 1.1 28 10 18 2 7 31 11 73 14 0 14 9 19 9 19 1.2 6 0 6 0 2 0 3 3 0 0 0 0 0 2 4 1.3 14 4 10 2 4 5 2 9 0 0 0 0 0 9 5 2.1 18 2 16 1 9 1 2 4 0 0 0 7 9 5 13

Delta 2.2 22 6 16 5 11 1 5 9 0 0 0 1 2 8 14 2.3 14 3 11 0 4 0 0 0 1 0 1 0 0 4 10 2.4 11 6 5 1 4 0 2 2 0 0 0 3 4 3 8 1.1 21 10 11 3 6 1 5 8 0 0 0 11 11 0 21 1.2 12 4 8 9 2 3 6 21 3 0 3 2 2 1 11 1.3 26 15 11 5 2 3 7 23 11 1 13 43 45 0 26 1.4 5 1 4 1 1 2 2 4 0 0 0 5 5 1 4 2.1 21 16 5 8 9 8 11 49 48 48 105 81 111 2 19 Epsilon 2.2 19 10 9 5 12 1 6 26 14 2 16 37 46 4 15 2.3 22 18 4 10 12 2 6 20 1 0 1 20 22 1 21 1.1 17 7 10 3 3 0 5 6 0 0 0 0 0 5 12 1.2 51 14 37 7 6 4 7 17 0 0 0 3 5 14 37 1.3 38 16 22 13 15 17 8 53 0 0 0 3 3 15 23 2.1 0 0 0 0 0 1 3 12 0 0 0 0 0 0 0 2.2 6 3 3 0 0 1 6 17 2 0 2 1 3 1 5 Gamma 2.3 0 0 0 0 0 1 6 11 0 0 0 1 2 0 0 2.4 14 7 7 0 0 5 10 50 2 0 2 1 1 3 11 1.1 17 8 9 3 4 14 24 79 0 0 0 8 8 4 13 1.2 57 33 24 40 4 12 15 44 2 1 3 97 97 5 52 1.3 11 4 7 3 2 7 9 30 0 0 0 1 1 2 9 2.1 28 15 13 15 5 9 19 55 110 2 113 23 26 0 28 2.2 1 1 0 3 0 0 1 6 0 0 0 4 6 0 1 2.3 5 5 0 3 0 0 2 5 7 2 9 9 10 0 5

Iota 2.4 8 7 1 3 1 2 4 13 0 0 0 19 22 0 8 2.5 4 2 2 0 2 0 4 5 0 0 0 9 9 0 4 2.6 7 4 3 1 0 3 3 8 0 0 0 4 5 0 7 2.7 5 0 5 0 0 0 2 5 39 1 44 18 23 0 5 2.8 3 0 3 1 0 1 1 2 0 0 0 7 9 0 3 1.1 43 31 12 26 9 7 9 37 3 2 77 1 6 14 29 1.2 30 15 15 5 15 1 5 9 0 0 0 1 1 6 24 1.3 86 15 71 10 38 6 12 48 0 0 0 0 0 48 39 2.1 6 0 6 0 6 2 4 6 0 0 0 3 4 1 5

Kappa 2.2 1 1 0 2 0 1 7 23 0 0 0 0 0 0 1 2.3 10 0 10 3 5 1 1 3 0 0 0 1 1 2 8 1.1 35 18 17 9 13 9 11 57 0 0 0 2 3 3 32 1.2 30 20 10 8 16 21 12 46 1 1 2 0 0 5 25 1.3 9 4 5 5 2 1 1 2 0 0 0 1 1 0 9 2.1 27 14 13 6 12 2 6 20 0 0 0 0 0 0 27 2.2 44 22 22 1 22 2 2 6 3 0 3 0 0 15 29 Lambda 2.3 13 0 13 0 8 1 5 10 0 0 0 3 3 1 12 2.4 22 5 17 0 12 1 5 16 0 0 0 0 0 0 22 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2 5 0 5 0 1 0 0 0 0 0 0 0 0 0 5 1.3 102 47 55 9 23 14 12 42 3 3 6 7 9 9 93 2.1 2 0 2 0 3 2 3 6 0 0 0 0 0 0 2

Omega 2.2 20 10 10 3 7 10 13 43 1 0 1 2 2 2 18 2.3 79 28 51 19 24 4 12 45 0 0 0 4 4 10 69 1.1 43 9 34 6 24 4 4 10 1 0 1 16 19 10 33 1.2 39 16 23 7 30 6 2 10 16 0 16 0 0 7 32 1.3 36 16 20 2 9 3 5 16 0 0 0 2 2 1 35

Pi 2.1 39 19 20 0 13 0 4 4 0 0 0 0 0 7 32 2.2 70 26 44 9 33 12 8 32 0 0 0 16 16 24 46 2.3 14 2 12 0 12 0 1 1 0 0 0 0 0 0 14 1.1 10 7 3 0 1 10 8 29 1 4 5 0 0 2 8 1.2 11 5 6 1 1 3 6 22 6 0 6 2 2 3 8 1.3 1 0 1 0 0 3 4 19 0 0 0 0 0 0 1 2.1 2 2 0 2 0 2 4 18 0 1 1 2 2 1 1

Theta 2.2 3 3 0 0 1 2 4 10 0 0 0 3 4 0 3 2.3 5 5 0 6 0 16 15 113 2 1 4 3 6 0 5 2.4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 184 Case Study Data

Individual Scores, 2nd Period I01-d I02-d I03-d I04-d I05-d I06-b I07-b I08-b I09-b I10-b I11-b I12-b I13-b I14-d I15-d 1.1 99 50 49 11 48 13 8 36 156 101 380 316 441 26 73 1.2 92 52 40 5 22 12 6 36 85 5 92 62 82 21 71 1.3 72 34 38 3 27 14 6 35 85 12 109 182 290 11 61 2.1 32 5 27 3 16 2 5 12 29 20 95 113 190 2 30 Alpha 2.2 23 8 15 7 15 2 4 7 16 16 32 135 165 0 23 2.3 20 8 12 2 4 0 4 6 7 6 14 91 248 0 20 1.1 128 63 65 5 59 2 1 10 34 2 36 164 173 27 103 1.2 95 48 47 16 25 1 0 4 84 1 85 11 11 17 78 1.3 67 27 40 20 29 0 0 5 3 0 3 350 358 7 60 2.1 82 46 36 26 69 1 0 2 1 0 1 102 113 16 68 Beta 2.2 44 27 17 43 11 0 0 0 0 0 0 24 24 6 41 2.3 42 8 34 12 8 0 1 0 5 1 6 11 11 2 41 1.1 50 23 27 5 20 36 23 117 227 2 229 233 465 17 33 1.2 26 5 21 10 7 1 7 10 45 8 54 139 182 10 16 1.3 15 4 11 4 0 3 1 8 30 20 53 162 342 3 12 2.1 14 2 12 2 4 4 9 24 45 0 45 291 373 3 11

Delta 2.2 53 27 26 17 9 6 15 42 0 0 0 124 171 10 43 2.3 35 15 20 1 6 7 13 43 22 15 41 188 314 10 25 2.4 22 3 19 6 1 1 5 7 168 32 200 80 87 8 14 1.1 9 2 7 1 0 0 2 4 1 1 2 25 29 0 9 1.2 19 5 14 21 8 0 1 4 12 0 12 24 27 2 17 1.3 34 27 7 8 8 6 9 30 23 6 29 34 41 3 31 1.4 5 0 5 0 0 0 2 2 7 0 7 15 15 1 4 2.1 24 18 6 5 8 3 6 36 171 33 210 116 277 0 24 Epsilon 2.2 27 20 7 13 8 5 8 57 19 3 23 45 65 0 27 2.3 36 32 4 12 10 10 11 68 14 2 17 24 38 2 34 1.1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1.2 50 16 34 3 14 2 2 6 2 1 3 26 28 11 39 1.3 55 29 26 30 7 0 3 7 0 0 0 35 36 12 43 2.1 23 13 10 11 2 0 2 4 2 0 2 78 78 1 22 2.2 15 11 4 5 4 1 1 7 33 0 33 17 20 0 15 Gamma 2.3 13 2 11 6 4 0 4 11 3 0 3 180 204 2 11 2.4 35 17 18 7 8 4 6 25 43 0 43 70 99 3 32 1.1 45 27 18 6 12 8 5 37 3 1 4 205 245 13 32 1.2 80 48 32 9 15 6 5 22 1773 1199 3324 2843 4628 25 56 1.3 4 2 2 2 1 0 3 4 1 0 1 0 0 3 1 2.1 18 14 4 4 5 4 3 26 79 3 82 8 8 1 17 2.2 1 0 1 0 2 0 1 1 0 0 0 41 41 0 1 2.3 5 5 0 3 0 0 1 2 101 0 101 356 376 0 5

Iota 2.4 20 12 8 9 5 3 6 13 3 0 3 74 76 2 18 2.5 3 1 2 4 0 1 2 3 28 1 29 77 80 0 3 2.6 7 1 6 2 1 0 2 2 9 0 9 56 59 1 6 2.7 6 1 5 4 0 0 3 6 2 0 2 18 20 1 5 2.8 4 1 3 4 2 0 2 2 0 0 0 26 26 0 4 1.1 26 13 13 10 7 0 1 3 6 4 10 54 79 6 20 1.2 16 10 6 7 1 0 3 8 26 7 36 100 113 4 12 1.3 99 49 50 25 37 0 0 46 14 2 17 10 11 50 49 2.1 7 0 7 2 1 0 3 8 1 0 1 5 5 0 7

Kappa 2.2 8 5 3 3 1 0 1 17 0 0 0 5 6 0 8 2.3 13 0 13 6 11 0 1 6 0 0 0 3 3 0 13 1.1 57 35 22 2 22 2 13 41 129 7 144 86 91 11 46 1.2 38 29 9 0 15 17 12 43 88 3 91 381 396 9 29 1.3 23 8 15 2 17 5 13 28 22 1 23 53 107 9 14 2.1 45 23 22 5 17 2 12 21 383 57 440 338 355 7 39 2.2 125 80 45 8 41 1 19 22 200 8 209 516 528 40 86 Lambda 2.3 22 4 18 4 8 26 20 81 1 3 5 25 42 1 21 2.4 22 4 18 2 9 4 7 19 0 0 0 40 95 0 22 1.1 14 3 11 0 1 0 0 0 0 0 0 0 0 0 14 1.2 76 40 36 20 10 1 4 8 2 2 14 33 85 4 72 1.3 197 99 98 22 36 0 1 1 5 4 9 29 185 25 172 2.1 14 2 12 3 0 0 1 1 0 0 0 0 0 0 14

Omega 2.2 53 32 21 12 9 2 3 7 0 0 0 0 0 0 53 2.3 132 72 60 34 47 6 3 26 171 0 171 0 0 7 125 1.1 43 11 32 8 21 1 2 3 0 0 0 1 1 4 39 1.2 40 8 32 1 24 0 0 0 2 0 2 2 2 3 37 1.3 99 26 73 14 24 15 8 34 4 0 4 2 2 1 98

Pi 2.1 34 3 31 11 11 0 1 2 0 0 0 3 44 5 29 2.2 79 33 46 29 38 5 5 13 8 1 9 6 6 13 66 2.3 44 4 40 6 14 3 7 14 1 0 1 13 15 6 38 1.1 9 6 3 2 1 1 6 20 3 1 4 1 3 6 3 1.2 18 3 15 0 4 8 3 24 0 0 0 0 0 15 3 1.3 3 1 2 0 2 2 3 41 1 0 1 3 5 1 2 2.1 0 0 0 0 0 0 0 3 36 0 36 2 3 0 0

Theta 2.2 0 0 0 0 0 0 1 2 1 1 2 14 29 0 0 2.3 4 3 1 1 1 16 5 95 5 0 5 11 22 1 3 2.4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 A.1. Individual Team Member Scores 185

Individual Scores, 3rd Period I01-d I02-d I03-d I04-d I05-d I06-b I07-b I08-b I09-b I10-b I11-b I12-b I13-b I14-d I15-d 1.1 136 50 86 3 52 12 9 34 11 9 21 33 48 75 61 1.2 82 41 41 16 23 11 6 31 2 0 2 2 2 30 52 1.3 15 6 9 0 3 3 0 5 0 0 0 0 0 4 11 2.1 28 1 27 2 9 2 3 15 14 13 29 36 46 2 26 Alpha 2.2 23 8 15 15 7 4 3 11 44 33 77 26 33 1 22 2.3 25 16 9 0 3 2 2 8 1 1 2 13 24 4 21 1.1 121 55 66 8 40 2 1 29 31 2 33 77 310 14 107 1.2 116 50 66 18 49 0 0 8 44 4 49 37 125 18 98 1.3 130 47 83 30 57 0 0 11 54 6 60 288 468 21 109 2.1 47 17 30 11 20 1 3 6 5 0 5 34 99 12 35 Beta 2.2 47 19 28 8 3 0 1 11 0 0 0 90 213 8 39 2.3 46 5 41 7 26 0 1 0 2 0 2 29 71 6 40 1.1 64 31 33 0 27 7 7 31 17 0 17 36 62 14 50 1.2 30 10 20 14 5 0 2 3 3 0 3 36 45 13 17 1.3 34 15 19 8 7 1 2 5 23 8 32 5 9 12 22 2.1 31 14 17 1 23 2 4 13 4 0 4 46 71 7 24

Delta 2.2 31 16 15 5 8 4 3 15 0 0 0 16 32 5 26 2.3 22 2 20 0 4 2 3 7 1 1 2 13 13 8 14 2.4 33 6 27 0 6 1 3 10 230 0 230 21 26 9 24 1.1 14 7 7 9 0 0 1 18 0 0 0 2 3 3 11 1.2 10 1 9 2 2 0 4 7 0 0 0 2 2 1 9 1.3 18 11 7 0 1 0 2 10 6 0 6 14 18 3 15 1.4 12 7 5 7 1 0 1 1 0 0 0 9 9 1 11 2.1 17 14 3 3 5 1 5 16 35 9 58 168 238 1 16 Epsilon 2.2 19 13 6 10 3 3 5 22 6 1 7 33 42 1 18 2.3 41 36 5 22 10 4 5 28 14 8 24 30 41 3 38 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2 107 28 79 12 9 1 0 3 4 0 4 1 1 48 60 1.3 49 25 24 43 3 0 0 0 0 0 0 1 1 19 30 2.1 38 15 23 13 3 0 0 0 0 0 0 0 0 13 25 2.2 38 11 27 14 1 0 1 5 0 0 0 0 0 19 19 Gamma 2.3 18 4 14 5 1 0 1 1 0 0 0 1 1 5 13 2.4 62 21 41 30 9 0 0 0 0 0 0 0 0 31 30 1.1 30 20 10 11 8 5 7 24 23 0 23 0 1 10 20 1.2 62 24 38 10 6 2 15 34 9 2 11 2 19 22 40 1.3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.1 12 7 5 2 1 7 7 28 0 0 0 4 5 0 12 2.2 0 0 0 0 0 18 6 31 0 0 0 4 6 0 0 2.3 1 1 0 3 0 0 1 1 4 0 4 5 0 0 1

Iota 2.4 13 5 8 4 10 1 5 16 0 0 0 0 1 2 11 2.5 2 1 1 1 0 0 4 9 0 0 0 0 0 0 2 2.6 4 0 4 0 0 0 0 0 0 0 0 3 4 0 4 2.7 3 0 3 0 2 2 5 11 0 0 0 1 1 0 3 2.8 0 0 0 0 0 0 1 1 0 0 0 125 126 0 0 1.1 39 25 14 25 27 0 0 6 3 0 3 15 25 17 22 1.2 78 47 31 21 36 0 0 7 4 2 6 4 6 35 43 1.3 108 31 77 19 23 0 0 12 0 0 0 11 11 53 55 2.1 1 0 1 1 0 0 1 11 0 0 0 0 0 0 1

Kappa 2.2 21 16 5 10 10 1 2 17 0 0 0 0 0 13 8 2.3 7 4 3 8 1 0 2 12 7 0 7 5 5 2 5 1.1 27 10 17 2 8 1 5 13 0 0 2 1 2 3 24 1.2 18 12 6 1 4 1 3 4 3 0 3 2 2 7 11 1.3 5 3 2 0 3 0 3 4 0 0 0 3 3 0 5 2.1 22 12 10 0 4 2 6 10 33 16 57 7 15 8 14 2.2 90 39 51 1 23 0 5 10 5 0 5 1 2 51 39 Lambda 2.3 3 1 2 0 2 5 4 18 0 0 0 2 2 1 2 2.4 7 5 2 0 11 2 1 3 0 0 0 1 1 2 5 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2 88 36 52 14 15 1 4 16 0 0 0 0 0 14 74 1.3 85 36 49 13 6 8 4 21 0 0 0 0 0 6 79 2.1 23 8 15 3 1 1 1 3 0 0 0 0 0 3 20

Omega 2.2 90 51 39 39 27 0 5 12 20 1 33 0 0 9 81 2.3 135 72 63 19 60 13 10 55 0 0 0 0 0 15 120 1.1 31 9 22 3 17 0 1 1 1 0 1 1 1 3 28 1.2 41 14 27 0 9 1 0 2 2 0 2 2 2 9 32 1.3 96 27 69 19 28 1 0 2 8 0 8 3 3 16 80

Pi 2.1 10 5 5 0 0 1 0 1 0 0 0 0 0 1 9 2.2 38 8 29 7 12 2 0 4 3 2 5 51 338 9 31 2.3 64 4 60 9 15 0 0 1 7 0 7 5 7 18 46 1.1 17 12 5 4 0 0 1 1 0 0 0 4 4 4 13 1.2 12 5 7 0 0 4 2 11 0 0 0 4 4 9 3 1.3 17 2 15 2 2 0 2 6 1 0 1 2 2 12 5 2.1 3 3 0 2 1 0 0 0 0 0 0 2 2 0 3

Theta 2.2 2 2 0 2 0 0 0 0 0 0 0 0 0 0 2 2.3 6 6 0 0 0 2 1 10 7 0 7 4 19 0 6 2.4 1 0 1 0 0 2 2 5 0 0 0 3 3 0 1 186 Case Study Data

A.2 Team Level Scores A.2. Team Level Scores 187

Table A.2. Team-level attributes queried from the generated Team Collaboration Networks.

ID Description T01-d Number of distinct emails sent T02-d Number of distinct initiating emails T03-d Number of distinct email replies T04-d Number of attachments sent with distinct emails T05-b Number of URLs in distinct emails

Team Scores, 1st Period Team Scores, 2nd Period Team Scores, 3rd Period T01-d T02-d T03-d T04-d T05-b T01-d T02-d T03-d T04-d T05-b T01-d T02-d T03-d T04-d T05-b Alpha 1 12 21 1 11 23 13 39 22 60 10 17 36 14 116 Beta 1 12 21 1 27 6 18 59 16 75 0 18 60 1 79 Delta 13 0 23 9 40 23 0 73 17 61 29 27 50 20 68 Epsilon 1 0 24 4 9 0 1 29 0 8 0 1 14 0 13 Gamma 13 1 15 11 38 11 1 38 7 29 5 3 71 5 135 Iota 3 0 16 3 11 1 0 44 9 46 2 2 24 8 34 Kappa 13 22 54 15 71 3 15 51 13 60 8 3 67 7 120 Lambda 8 5 18 4 24 54 1 81 24 77 19 13 72 15 72 Omega 5 0 18 7 24 6 0 24 8 36 18 3 21 21 47 Pi 33 3 37 29 49 33 1 38 20 32 36 0 39 22 56 Theta 4 1 16 2 6 17 3 17 17 23 10 7 50 9 25 B Regression Analysis Results 189

Regression: Outbound Email Communication vs. Team Satisfaction

b Variables Entered/Removed Model Variables Entered Variables Removed Method a 1 Ratio_Ext_Int_TEAM . Enter a. All requested variables entered. b. Dependent Variable: AVG_Hackmann_group_mean_neu

Model Summary Model R SUBGROUP_ IDENTIFIER = Stanford (11 cases) Std. Error of the (Selected) R Square Adjusted R Square Estimate a 1 ,534 ,285 ,206 ,38568 a. Predictors: (Constant), Ratio_Ext_Int_TEAM

b ANOVA Model Sum of Squares df Mean Square F Sig. a 1 Regression ,534 1 ,534 3,591 ,091 Residual 1,339 9 ,149 Total 1,873 10 a. Predictors: (Constant), Ratio_Ext_Int_TEAM b. Dependent Variable: AVG_Hackmann_group_mean_neu

a Coefficients Model Standardized Collinearity Unstandardized Coefficients Coefficients Statistics B Std. Error Beta t Sig. Tolerance VIF 1 (Constant) 3,601 ,217 16,572 ,000 Ratio_Ext_Int_TEAM 1,000 ,528 ,534 1,895 ,091 1,000 1,000 a. Dependent Variable: AVG_Hackmann_group_mean_neu

a Collinearity Diagnostics Model Dimension Variance Proportions Condition Ratio_Ext_ Eigenvalue Index (Constant) Int_TEAM 1 1 1,845 1,000 ,08 ,08 2 ,155 3,447 ,92 ,92 a. Dependent Variable: AVG_Hackmann_group_mean_neu

Page 1 190 Regression Analysis Results

Regression: Outbound Email Communication and Teaching Team Involvement vs. Team Satisfaction

b Variables Entered/Removed Model Variables Entered Variables Removed Method 1 No_of_distinct_emails_from_ . Enter tteam_total, a Ratio_Ext_Int_TEAM a. All requested variables entered. b. Dependent Variable: AVG_Hackmann_group_mean_neu

Model Summary Model R SUBGROUP_ IDENTIFIER = Stanford (11 cases) Std. Error of the (Selected) R Square Adjusted R Square Estimate a 1 ,692 ,479 ,349 ,34923 a. Predictors: (Constant), No_of_distinct_emails_from_tteam_total, Ratio_Ext_Int_TEAM b ANOVA Model Sum of Squares df Mean Square F Sig. a 1 Regression ,897 2 ,449 3,678 ,074 Residual ,976 8 ,122 Total 1,873 10 a. Predictors: (Constant), No_of_distinct_emails_from_tteam_total, Ratio_Ext_Int_TEAM b. Dependent Variable: AVG_Hackmann_group_mean_neu

a Coefficients Model Standardized Collinearity Unstandardized Coefficients Coefficients Statistics B Std. Error Beta t Sig. Tolerance VIF 1 (Constant) 4,072 ,336 12,105 ,000 Ratio_Ext_Int_TEAM 1,363 ,522 ,728 2,611 ,031 ,837 1,194 No_of_distinct_ -,005 ,003 -,481 -1,725 ,123 ,837 1,194 emails_from_tteam_ total a. Dependent Variable: AVG_Hackmann_group_mean_neu a Collinearity Diagnostics Model Dimension Variance Proportions No_of_distinct_ emails_from_tteam_ Eigenvalue Condition Index (Constant) Ratio_Ext_Int_TEAM total 1 1 2,776 1,000 ,01 ,03 ,01 2 ,174 3,991 ,13 ,92 ,05 3 ,049 7,503 ,86 ,05 ,94 a. Dependent Variable: AVG_Hackmann_group_mean_neu

Page 1 C d.store - API Reference

This section of the appendix documents the HTTP/1.1-based service in- terface of the d.store platform. A.1. Platform Index A.2. Network Collection 9 .tr P Reference API - d.store 192

URL Pattern: URL Pattern: http:///[index] http:///graphs

The platform index represents the root resource of the d.store application. It serves as an Represents the collection of information networks managed by the d.store server instance. entry point (i.e. a home page for HTML clients) by providing a brief overview of the current platform state and/or user-related information. Representation Types:  application/json

Representation Types:  text/html  application/json GET Not implemented

PUT Not supported GET Get the platform index. The HTML representation of this resource represents the d.store home page. POST Create a new team communication network resource. Query Parameters: Query Parameters: n/a n/a Server Response: Supported Request Entities: Response Code Description Content Type: application/x-www-form-urlencoded 200 – Ok A platform index with basic information is Field Name Description returned. id* a unique network identifier label a readable label for the network PUT Not supported description optional notes or description

POST Not supported Content Type: application/json

DELETE Not supported { id* : string, a unique network identifier label : string, a readable label for the network description : string optional notes or description }

Server Response: Response Code Description 201 – Created A resource for a new network instance has been created with the given ID.

400 – Bad Request Incomplete or malformed request entity. bad_request

DELETE Not supported

A.3. Network Instance A.4. Node Instance Collection

URL Pattern: URL Pattern: http:///graphs/ http:///graphs//resources[/] A network instance represents a team communication network consisting of resources These resources represent a collection of network nodes. Each node instance represents (nodes) and their directed relationships to each other (edges). meta-information about a Web resource identified by its URL. The list of nodes a resource represents is determined by an optional list of concatenated node types, Representation Types:  text/html separated by a ‘+’ character (encoded: %2B). This defines an AND filter on available  application/json network node types, meaning that only those node instances are returned that are of all types mentioned in the list.

GET Retrieve statistical information about a network instance. Representation Types:  text/html Query Parameters:  application/json  application/graphml n/a

Server Response: Response Code Description GET Get a (filtered) list of node instances for this network. Each instance represents contextual meta-information for an arbitrary resource on the 200 – Ok Basic information about the network state is Web. returned. Query Parameters: PUT Not supported Parameter Description

POST Not supported start Index of the first node instance in the total result set that is returned to the client. DELETE Remove the network resource and all other contained resources from the system. limit Maximum number of nodes to return.

Query Parameters: sort Specifies the attribute to which the node list will be n/a assorted. Accepted values: ‘id’ (default), ‘label’, ‘postdate’ Server Response: Response Code Description dir If omitted or value is ‘asc’: sort result list in ascending order. In all other cases: sort the result 200 – Ok The network resource has been deleted. list in descending order. 403 – Forbidden Authorization failed. Insufficient user rights. relations Defines the type of relations that the platform should return for each resource in the result set. If unspecified, no relations will be returned. Accepted values: ‘onto’,’graph’,’user’,’all’.

attributes Defines the type of attributes that the platform should return for each resource in the result set. If unspecified, no attributes will be returned. Accepted values: ‘onto’,’graph’,’user’,’all’.

Server Response: 193 Response Code / Description d.store Status 200 – Ok A (potentially empty) list of network nodes is returned. returned. value.

404 – Not Found The list of nodes is undefined because at least one 400 – Bad Reference API Request - d.store Incomplete or malformed request entity. 194 unknown_resourceclass of the types in the requested type filter does not bad_request exist. 400 – Bad Request RequestValidation PUT Not supported Details are enclosed in the response entity. POST Add a node to the network. This creates a new node resource to represent 403 – Forbidden Authorization failed. Insufficient user rights. meta-information about another resource identified by the posted URL. The node is merged into the network of existing nodes and relationships based DELETE Not supported on the attributes and relations that are posted. Query Parameters: n/a Request Representations: Content Type: application/x-www-form-urlencoded Field Name Description url* the URL of the annotated resource label a readable label for the resource no_redirect_check prevent the d.store server from checking whether the given URL is redirecting to a different location.

Content Type: application/json

{ url* : string, the URL of the annotated resource label : string, a readable label for the resource postdate : string, explicitly sets the nodes’s creation timestamp notes : string, notes or description tags : string, space-separated list of tag labels to be assigned attributes : array [ a list of attribute instances { name* : string, the attribute’s type name value* : string, the attribute’s value postdate : string, yyyy-mm-dd hh:mm:ss deletedate : string yyyy-mm-dd hh:mm:ss }, … ], relations : array [ a list of relation instances { name* : string, the relation’s type name target* : string, the relation’s target URL or node index postdate : string, yyyy-mm-dd hh:mm:ss deletedate : string yyyy-mm-dd hh:mm:ss }, … ] }

Server Response: Response Code Description 201 – Created A node instance for the specified URL has been created and is accessible using the returned index value. A.5. Node Type Collection A.6. Attribute Type Collection

URL Pattern: URL Pattern: http:///graphs//tags http:///graphs//attributes

Represents the collection of available node types in a network. Represents the collection of attribute types available in a network. An attribute is a typed, literal value that can be attached to any resource node in a team communication network. Representation Types:  application/json Representation Types:  application/json

GET Get a list of available node types. GET Get a list of available attribute types in the network. Query Parameters: Query Parameters: Parameter Description Parameter Description type Filter the result set by type class. Accepted parameter values are: ‘onto’ for ontology-defined type Filter the result set by type class. Accepted types, ‘graph’ for graph-level types, ‘user’ for user- parameter values are ‘onto’ for ontology-defined defined types. By default, all node types will be attributes, ‘graph’ for graph attributes, and ‘user’ for returned. user-defined attributes. By default, all attribute types will be returned. Server Response: Server Response: Response Code Description Response Code Description 200 – Ok A list of available node types is returned. 200 – Ok A list of available attribute types is returned. PUT Not supported PUT Not supported POST Not supported POST Not supported DELETE Not supported DELETE Not supported 195 A.7. Relation Type Collection A.8. Node Instance 9 .tr P Reference API - d.store 196

URL Pattern: URL Pattern: http:///graphs//relations http:///graphs//resources/

Represents the collection of relation types available in a network. A relation is a typed, Represents a resource node in the network. directed association between two resource nodes of a team communication network. Representation Types:  text/html Representation Types:  application/json  application/json  application/graphml

GET Get a list of available relation types in the network. Query Parameters: GET Get information about an annotated resource, such as types, attributes, and relationships to other resources. Parameter Description Query Parameters: type Filter the result set by class type. Accepted parameter values are ‘onto’ for ontology relations, Parameter Description ‘graph’ for graph relations, and ‘user’ for user- relations Specify the type of node relations that the platform defined relations. By default, all relation types will should return for this request. By default, no be returned. relations are returned. Accepted values are: ‘onto’,’graph’,’user’,’all’. Server Response: Response Code Description attributes Specify the type of node attributes that the platform should return for this request. By default, 200 – Ok A list of available relation types is returned. no attributes are returned. Accepted values are:

‘onto’,’graph’,’user’,’all’. PUT Not supported Server Response: POST Not supported Response Code Description DELETE Not supported 200 – Ok Meta-information about the resource represented by this node is returned. Resource Representations: Content Type: application/json

{ index : number, the unique node index url : string, the annotated resource label : string, a readable resource label notes : string, notes/description postdate : string, node creation date deletedate : string, node deletion data ontotypes : array, ontology type instances graphtypes : array, graph type instances usertypes : array, user type instances attributes : array [ array of attribute instances { name : string, the attribute’s type name type : string, ‘onto’, ’graph’, or ’user’ value : string }, … ], relations : array [ array of relation instances { name : string, type : string, value : string }, … ] } relations : array [ array of relation instances 400 – Bad Request RequestValidation { name : string, Details are enclosed in the representation. type : string, value : string 403 – Forbidden Authorization failed. Insufficient user rights. }, … ] } POST Not supported

PUT Update the properties of a node instance. Any field contained in the request DELETE Remove the node along with any associated attributes and incoming or entity will be updated with the according values. The state of any field that is outgoing relations from the network. omitted in the request will remain unchanged. Query Parameters: Query Parameters: n/a n/a Server Response: Request Representations: Response Code Description Content Type: application/x-www-form-urlencoded 200 – Ok The node has been deleted. Field Name Description label a readable resource label 403 – Forbidden Authorization failed. Insufficient user rights. notes notes or description

Content Type: application/json

{ label : string, a readable resource label postdate : string, explicitly sets the nodes’s creation timestamp notes : string, notes or description types : string, space-separated list of type labels to be assigned attributes : array [ a list of attribute instances { name : string, an attribute’s type name value : string, an attribute’s value postdate : string, yyyy-mm-dd hh:mm:ss deletedate : string yyyy-mm-dd hh:mm:ss }, … ], relations : array [ a list of relation instances { name : string, a relation’s type name target : string, a relation’s target URL or node index postdate : string, yyyy-mm-dd hh:mm:ss deletedate : string yyyy-mm-dd hh:mm:ss }, … ] }

Server Response: Response Code Description 200 – Ok The node has been updated with the property values included in the request entity. 197 400 – Bad Request Incomplete or malformed request entity. bad_request A.9. Node Attribute Collection A.10. Node Relation Collection 9 .tr P Reference API - d.store 198

URL Pattern: URL Pattern: http:///graphs//resources//attributes http:///graphs//resources//relations

Represents the collection of attributes assigned to a network node instance. Represents the collection of outgoing relations defined for a resource (node).

Representation Types:  application/json Representation Types:  application/json

GET Get a list of attributes assigned to a node. GET Get a list of relations for this resource node. Query Parameters: Query Parameters: Parameter Description Parameter Description type Filter the result set by type class. Accepted type Filter the result set by type class. Accepted parameter values are ‘onto’ for ontology attributes, parameter values are ‘onto’ for ontology relations, ‘graph’ for graph attributes, and ‘user’ for user- ‘graph’ for graph relations, and ‘user’ for user- defined attributes. By default, all attribute types will defined relations. If unspecified, all relations will be be returned. returned.

Server Response: Server Response: Response Code Description Response Code Description 200 – Ok A list of attribute instances is returned. 200 – Ok A list of relations to other resources is returned.

PUT Not supported PUT Not supported

POST Not implemented POST Not implemented

DELETE Not supported DELETE Not supported A.11. Node Type

URL Pattern: http:///graphs//tags/

Represents a node type. Depending on the network configuration, every type is of one of the class ontology, graph, or user. Ontology types are defined in one of the pre-defined team communication ontologies. Graph types are imported from other ontologies on network-specific level. User types are custom types that are created by clients at run-time.

Representation Types:  application/json

GET Get information about a node type Query Parameters: n/a Server Response: Response Code Description 200 – Ok Information about the type is returned.

PUT Not supported

POST Not supported

DELETE Not implemented

199 Acronyms

ABox Assertional Box ALM Application Lifecycle Management API Application Programming Interface

CAD Computer-aided Design CSCW Computer-supported Cooperative Work

FEI Front End of Innovation FFE Fuzzy Front End

HTML Hypertext Markup Language HTTP Hypertext Transfer Protocol

ICT Information and Communication Technology ISO International Organization for Standardization

JSON JavaScript Object Notation

KB Knowledge Base

NPPD New Product & Process Development

OWL OWL Web Ontology Language

RDBMS Relational Database Management System RDF Resource Description Framework RDFS Resource Description Framework Schema REST Representational State Transfer RPC Remote Procedure Call Acronyms 201

SOA Service-oriented Architecture SPARQL SPARQL Protocol and RDF Query Language SQL Structured Query Language SWRL Semantic Web Rule Language

TBox Terminological Box TCN Team Collaboration Network TCN-S Team Collaboration Network System

UCD User-centered Design URI Uniform Resource Identifier URL Uniform Resource Locator

VoIP Voice over Internet Protocol

W3C World Wide Web Consortium WebDAV Web-based Distributed Authoring and Versioning WWW World Wide Web

XML Extensible Markup Language Glossary

Architectural Style A coordinated set of architectural constraints that restricts the roles/fea- tures of architectural elements and the allowed relationships among those elements within any architecture that conforms to that style.

Computer-supported Cooperative Work (CSCW) Computer-assisted coordinated activity carried out by groups of collab- orating individuals.

Engineering Design (also: conceptual design, design) Comprises the team-based and inter- disciplinary activities during the early stages of an engineering project, in which opportunities and innovative concepts for a new product, soft- ware, or service are ideated, (re-)designed, and conceptualized.

Front End of Innovation (FEI) (also: Fuzzy Front End, FFE) Design activities that take place prior to the formal, well-structured new product and process development.

Groupware Computer-based systems that support groups of people engaged in a common task or goal and that provide an interface to a shared envi- ronment.

Resource Anything that is important enough to be referenced as a thing itself. Resources have a globally shared request message classification system called uniform interface and are addressable via uniform resource iden- tifiers.

Service A distinct part of the functionality that is provided by an entity through interfaces, whereby an interface is a named set of operations that char- acterize the behavior of an entity. Glossary 203

Team Collaboration Network Team Collaboration Networks (TCNs) are the formalization of (virtual) collaboration activities in engineering design teams. A TCN defines the semantics and the occurrences of relationships between individuals and/or collaboration resources.

Virtual Collaboration Two or more people working together to accomplish a task without the use of face to face interaction. Tasks and activities carried out with the help of information and communication technology.

Virtual Team A group of collaborating individuals whose members are mediated by time, distance, or technology. Index

ABox, 76 provenance, 94 ALM, 41 architecture, 52, 89 RDF, 58–61 client-server, 89 graph, 60 resource-oriented, 53 notation, 62 service-oriented, 52 Schema, 58 style, 52 statement, 60 triple, 60 class extension, 61 RDFS, 58 collaboration reification, 94 metric, 1 representation, 54 process, 1 resource, 54 tools, 5 descriptive, 56 virtual, 2 REST, 53–56, 89 component, 51 CSCW, 1, 3, 34 Semantic Web, 58 semantics, 58 d.store, 89 service, 52 design rationale, 35 interface, 89 Design Thinking, 3 orientation, 7 platform, 53 formal system, 58 SOA, 52 graph, 60 statement, 60 groupware, 3, 35 subsystem, 51 system, 51 HTTP, 92 formal, 58 inference rule, 59 TBox, 73 innovation, 2 Team Collaboration Network, 67, 79 instance, 61 definition, 68 instrumentation, 1, 4 system (TCN-S), 79 validity, 68 Jena, 90 time travel, 98 JSON, 92 time-point query, 100 namespace, 60 triple, 60 no-overwrite, 98 triple store, 94 notation, 62 URI, 58, 60 ontology, 59 User-centered Design, 3 organizational memory, 35 validity, 68, 91 OWL, 58, 61–62, 72 virtual team, 31 notation, 62 XML, 58 prefix, 60