}w  !"#$%&'()+,-./012345

Integration of Features into Web Information Systems

Tomáš Obšívač

Ph.D. Thesis Proposal Brno, September 7, 2011

Advisor: doc. Ing. Michal Brandejs, CSc.

Advisor’s signature 2 Contents

1 Introduction 5

2 State of the Art 7 2.1 Social Web Principles ...... 7 2.1.1 Critical Mass ...... 8 2.2 Web-based Information Systems ...... 9 2.2.1 Information System of Masaryk University ...... 9 2.3 Social Software ...... 10 2.3.1 Social Network Services ...... 10 2.3.2 Interpersonal Ties ...... 11 2.4 Key Social Software Features ...... 12 2.4.1 Social Translucence ...... 12 2.4.2 Identity and Reputation ...... 12 2.4.3 Relationships, Groups, Circles ...... 13 2.4.4 Authoring, Sharing and Conversations ...... 14 2.4.5 Tagging and ...... 14 2.4.6 Social-aware Full Text Search ...... 15 2.4.7 Shared Workspacies ...... 16 2.5 Network Science ...... 16 2.5.1 Social Network Analysis ...... 16 2.5.2 Link Mining (in Heterogeneous Networks) ...... 17 2.6 Recommendation Systems ...... 18

3 Aims of the Thesis 19 3.1 Objectives and Expected Results ...... 19 3.2 Schedule ...... 20

4 Achieved Results 23 4.1 IS MU Social Applications ...... 23 4.2 Academic Work ...... 24 4.3 Other Results ...... 25

5 Author’s Publications 27

6 Summary / Souhrn 29 6.1 Summary ...... 29 6.2 Souhrn ...... 29

7 References 30

3 4 1 Introduction

The Web is more a social creation than a technical one. ... The ultimate goal of the Web is to support and improve our web-like existence in the world. — Sir Tim Berners-Lee, [BL99]

From a global point of view, there is no doubt that the Web is behind many advancements in economic, social and business models. During the last two decades of the existence of the Web, we have seen hardly paralleled events unimaginable without the Web. For example the dot-com bubble1 and recent fall of authoritarian regimes of Middle East. The former event is seen by many as warning against too much techno-optimism. I would recommend to see it as exact part of the hype cycle, before the new tech- nology and its benefits is well understood and adopted to the production. The later event is a recent example of the power unleashed by the use of the Web, or specifically Social Network Services. Power to overcome distances, to encourage free communication and to support civil society (including academia), communities and collaboration. Active participation of the Web users—not only passive web page read- ing by consumers—was original intention of the Web design [Law05]. Sur- prisingly, it took more than 10 years until such behavior has been per- ceived. The Social Web has been promoted since then as a key part of the Social Web “next version of the Web”, slightly confusingly named as Web 2.0.2 Web 2.0 represents not a change in the technology, but the fundamen- tal change in the people involvement. Web is becoming truly participatory social media capable of interactive continuous dialogue, which was impos- sible on the previous form of the Web, i.e. one-to-many mass media and one-way information source. Humans are social beings, thus the emer- gence of the Social Web is simply our online world catching up with our offline world [Ada10]. The aim of my thesis is to apply as many general principles of the So- cial Web (chapter 2.1) as possible in a smaller-scale intranet applications and enterprise-level Web-based Information Systems (chapter 2.2). With those principles in mind, when properly implemented by state-of-the-art computer science techniques, it is possible to build efficient ways to con- nect people and potentially interesting information (chapter 2.6). It is also possible to gain previously unseen knowledge (chapter 2.5) interesting for example for the management to support work or learning engagement. I consider social software features to be the strategic information sys- tem components for improving the quality of services for users. An in- formation system built around people and enhanced by the social software

1 Measured by NASDAQ Composite index, dot-com bubble burst is responsible for fall from over 5000 points in March 2000 to under 2000 points one year later. Measured by market value, stock market crash caused the loss of 5 trillion USD in two years. 2Widely known Web 2.0 promotion is O’Reilly Media 2004 conference, which accented it as a business revolution. The term Web 2.0 has been used since 1999, mainly in con- junction with web sites design and development towards rich internet applications and to describe the Web as a platform.

5 features (chapter 2.4) should perform better in its main purpose to improve effectiveness and efficiency within the organization. It should also better support sharing and preservation of knowledge (chapter 2.4.7) which is of crucial importance to organizations or businesses [For10] in the modern Network Society3 and the Service Economy.

Many broad business goals, like increased knowledge sharing, better access to expertise, and increased innovation, are uniquely served by social technologies. Early proof-of-concept efforts and pilots have proven successful in supporting these endeavors. — Forrester Consulting, 2010 [For10]

Experimental implementation of social features, yet fully functional and adopted by a number of users, has been done at Information system of Masaryk University (IS MU). Features (chapter 4.1) have been rolled out gradually for several years now. Apart from that, a need of their measurement have arisen. My future effort will be put to evaluate an impact on the user behavior and the system usage to validate and improve the experimental implementation. There is a need of further investigation of relations among users and other objects managed by the system (chapter 2.5.1). Subsequent heterogeneous net- work data mining (chapter 2.5.2) from socially-enhanced applications are subject of my further research. My recent successful attempt presented in [BBG+11] confirms, that we can await new and valuable insights into the data acquired.

3“The network society itself is, in fact, the social structure which is characteristic of what people had been calling for years the information society or post-industrial society. It is a society where the key social structures and activities are organized around electron- ically processed information networks.” — Conversation with M. Castells, UC Berkeley, 2008

6 2 State of the Art

2.1 Social Web Principles A brief description of main attributes of the Web as a participation platform:

User generated content The role of producers and consumers in the digital age blurs, inducing rise of prosumer capitalism [RJ10]. Especially prosumer for digital natives, i.e. those interacting with digital technology from an early age, online reputation and the exposure is a culture, where “getting noticed is everything”. They create a non-monetary form of economy at the Long Tail ending [And06]. Users not only add value by contributing, they may be the application alone.4

The power of the crowd There are at least two reasons why we do not need to chase an expert: The aggregation of information in groups (with some conditions5) results in decisions that are often better than could have been made by any single member of the group [Sur04]. crowdsourcing Crowdsourcing is the outsourcing through an open call to an undefined large community. According to Jeff Howe, author of the term, crowdsourc- ing employs those who are most fit to perform tasks, solve complex prob- lems and contribute with the most relevant and fresh ideas. The online crowdsourcing market place Amazon Mechanical Turk or question-and- answer websites, e.g. Yahoo! Answers and Quora, belong among the web services helping mediate crowdsourcing.

Data on large scale Success of majority of Social Web services is in the ability to extract knowledge from user generated data and the way of ser- vice is used. It is essential for such application to continuously collect in- formation indirectly from the user each time he or she uses it.

The architecture of participation This term was used by Tim O’Reilly [O’R04] to describe the nature of systems that are designed for user contri- bution, for reuse and mash up or to rate system content, with low barriers to entry of newcomers. His examples were open source, The Comprehen- sive Perl Archive Network, The Open Directory Project or Wikipedia.

4“Today [June 2004], upwards of 430,000 people in the U.S. alone—more than are em- ployed worldwide by General Electric Co. and Procter & Gamble combined—earn a full- or part-time living on eBay selling everything from fashion to farm equipment, with the highest-sellers grossing up to $1 million a month.” — The Rise Of The Mompreneurs, Businessweek 5 “Wise crowds” conform to the following conditions: diversity of group members opin- ion, each member have independent opinion, decentralization allows specialized members with local knowledge and aggregation is possible by a mechanism of turning private judge- ments into a collective decision.

7 Figure 1: Original 35mm slide used by Metcalfe to support the future value of the network based on the Ethernet, a wired communication network protocol. [Hir06]

Network effect Product or service becomes more valuable the more peo- ple use it (a discussion of this fact is in the following chapter). Furthermore, the value of Web 2.0 service is built as a side-effect of the ordinary use of the application [O’R05]. It is created by the aggregation of many individual user contributions.

Openness Openness on the Web is about technological aspects, like data and applications portability, data feed formats and application program- ming interfaces, which allow to combine data and services in order to cre- mashup ate a new application, so called mashup.

2.1.1 Critical Mass

The telecommunication network growth and critical mass were of interest to Bob Metcalfe when he was promoting investments to the interconnec- tion of communication devices circa in 1980. He presented the quadratic growth of the value (proportional to the square of the number of connected Metcalfe’s devices), which was later named Metcalfe’s law (Figure 1). law The same idea have carried over to explain the growth of another tech- nologies (e.g. cell phones) and to the higher level of users and network ser- vices as well. Despite the ongoing debate with respect to the asymptotic complexity when the law is used in a sociodynamic context, exact point of crossover comes the sooner the greater number of users is involved. Appli- cation of Metcalfe’s law to Web 2.0, Semantic Web and their combination is discussed in [HG08]. Authors of [SK09] identified perceived critical mass—together with per- ceived playfulness—as the strongest determinant contributing to a person’s intention to use and actual usage of social network sites.

8 2.2 Web-based Information Systems A Web-based Information System, or a Web Information System, (both ab- breviated to WIS) is an information system that can be accessed through the Web typically by a Web browser as a front-end. Maintained data are published by using hypertext-based principles. I will also refer to the WIS as an intranet (application). There are a lot of WIS applications, e.g. cus- intranet (ap- tom systems in organizations, e-commerce or e-government services. Be- plication) cause of my thesis proposal and experimental research, I am interested mainly in Social Network Services (see chapter 2.3.1) and Learning Sup- port Systems [YF10].

When applying the Social Web principles, it is essential to understand the differences between the open Web and WIS.

Small to middle number of users Up to three quarters of developed country citizens have access to the Web. Internet penetration reached al- most every person in some European countries.6 Big Social Web sites have many millions of active users (e.g. hundreds of millions use Facebook reg- ularly, post messages on Twitter or shop on Amazon). Limited number of users in publicly inaccessible systems causes prob- lems with adoption of the principles functional on the open Web with a lot of users. Some of the principles need critical mass to take effect.

Trustworthy user identity Limited access to the intranet system, condi- tioned by affiliation to an organization, is a plus in the case of user identity. Users can trust more to each other and to the presented information.

Low irrational use Thanks to the absence of anonymity, users experi- ence much less of the unsolicited content. However, from our experience, nothing such simple can fully prevent malicious bahavior, flames, plagia- rism and so on.

2.2.1 Information System of Masaryk University

Faculty of Informatics, Masaryk University, has been running and devel- oping the Information System since 1999. It currently hosts numerous ap- plications utilized for managing study-related records, e-learning tools and those facilitating communication inside the University. It is used by more than 30,000 users (of the total of 44,000 staff members and students and over 100,000 graduates) a day. The development team has designed a unique technology for the system of this scale to run smoothly. The System was awarded the ISA Award 2009

6According to internetworldstats.com June and March 2011 statistics, Internet pene- tration is 78,3% in North America, 58,3% in Europe (with Nordic countries on the top of the list) and 65,6% in Czech Republic.

9 and EUNIS Elite Award in 2005. Having been successfully implemented at Masaryk University, it is also being put into operation at other schools [IS 11]. The experimental part of my research (see implemented applications in the chapter 4.1) and the social data gathering necessary for the further analysis are possible because of the existence and maturity of IS MU.

2.3 Social Software socialware Social software, or in the abbreviated form socialware, handles the captur- ing, storing and presentation of communication, together with mediating connections and interactions between a pair or group of users. The concept of sociality, the tendency to associate with or form social groups, is explored in [BHJ+08]. Authors prefer to focus on sociality—which cannot be designed; it can only be designed for—over focus on functional- ity. They provide a conceptual model and a design framework for social software. I can affirm the importance of the mind shift from functionality to the mechanisms triggering social behavior. Main requested deliverables of business uses of social software [For10]:

• 52% capturing and sharing knowledge

• 51% corporate communication

• 48% modernizing the intranet portals

• 48% fostering collaboration

• 41% reducing use of e-mail or face-to-face meetings

2.3.1 Social Network Services Bill Gates may be the first who introduced the “Content is King” mantra in [Gat96]. Some of traditional content producers even depicted the Internet without content as “a valueless collection of silent machines”. However the idea of communication or connectivity being more important than profes- sionally produced content was explored soon after:

The Internet is widely regarded as primarily a content delivery system. Yet historically, connectivity has mattered much more than content. [People are more interested in communication than entertainment.] Even on the Internet, content is not as important as is often claimed, since it is email that is still the true “killer app.” — Andrew Odlyzko, [Odl01]

Recent years confirm the dominance of the “connectivity” over enter- tainment content. Content providers have hard times to capitalize their production (and even protect it from stealing). The successful companies does not seem to produce their own content but “just” run services which

10 allow users or third parties to do so and they also run infrastructure or add value by processing the data (e.g. Google and eBay). The biggest textual content site is Wikipedia, the free encyclopedia that anyone can edit. The majority of visits and minutes spent online belongs to the sites with strong community and ongoing conversations (e.g. Youtube and Blogger) and the social network services (e.g. Facebook). Introductory article [BE07] offers a comprehensive definition, a descrip- tion of features and the history of Social Network Services, together with introduction to more deeply focused articles in the same journal. The pa- per [KJL10] attempts to organize the status, uses, and issues of Social Web sites into a comprehensive framework for discussing, understanding, us- ing, building, and forecasting the future of Social Web sites.

2.3.2 Interpersonal Ties

What kind of “connectivity”, regarding its mention in the previous chap- ter, can be supported by a social network service? Different relationships among people have been observed and examined by social scientists long before mass adoption of computers. Widely cited primary research of in- terpersonal ties is [Gra73]. Some relations, the so called strong ties, are between only very small number of closely related people. Strong ties are very important to per- son’s sense of well-being and we communicate mostly with them, even meet them in person a lot. However, strong ties are not usually present in information systems I am interested in. Much more relations, the so called weak ties, are present between more distant people, who know themselves but do not care so much about, com- municate and meet less frequently. Person can know few hundreds of weakly tied people7. The strength of weak tie lies in its ability to sup- port dissemination of information. It connects a person to many he or she couldn’t otherwise reach, new job offering being the classic example. To name all types of the ties here, there are absent ties too, defined for those relationships without substantial significance. I rather use the term temporary ties, presented by Paul Adams in [Ada10]. According to his def- inition, temporary ties are people that you have no recognized relationship with, but that you temporarily interact with. Temporary ties are becoming more commonplace online due to user generated content, e.g. someone’s one time answer in a discussion forum thread. We can meet in person only accidentally, unless the relation grow into weak tie. Sometimes these ties are people we would never have had a chance to meet in the offline world. Social Network Services are well-suited to maintain weak and even tem- porary ties cheaply. They make them visible8 and easily accessible by new

7It is suggested (and documented during the history, in first Neolithic villages, in Roman army, in present online games or Twitter conversations [GPV11]), that human brain is capable to maintain approximately 150 weak ties up-to-date. This cognitive limit known as the Dunbar’s number was predicted by the primate neocortex size and it is introduced in [Dun98]. In groups greater than 150, social cohesion begins to disintegrate.

11 communication channels (e.g. or e-mail). Such remote interactions are disembedded from time and space, which is in accordance with time-space distanciation, the most defining property of modern society (according to Giddens). My point is that strong ties are sustained by “water cooler talks” and phone calls. The support of weak and temporary ties should be a social network service designers primary objective.

2.4 Key Social Software Features

Focus on users and their relations is an obvious signature of social software. This chapter explains other attributes which should be taken into account, when someone wants to make an application social.

2.4.1 Social Translucence

Every digital communication system creates a barrier between its users. However, it is possible—if not inevitable—to design such systems that avoid “digital gap” by making participants and their activities visible to one an- other. The system has to reveal the presence of those using it, especially when social networking is its main purpose.

By allowing users to “see” one another, to make inferences about the activities of others, to imitate one another, we believe that digital systems can become environments in which new so- cial forms can be invented, adopted, adapted, and propagated, eventually supporting the same sort of social innovation and di- versity that can be observed in physically based cultures. —[EK00]

“Sign of life”, apart from the appearance of user generated content it- self, is usually provided through activity streams and e-mail or RSS notifi- cations which employs content aggregation. Without such notification, it would not be feasible to randomly check every single source of information to keep in touch with what is going on.

2.4.2 Identity and Reputation

Authors of [CK08] present as the very first feature of Web 2.0 sites users as first class entities in the system, with prominent profile pages, including such features as: age, sex, location, testimonials or comments about the user by other users. Some features of user profiles are of course optional, but one or more features may be essential for a given kind of social network service. For

8For example Facebook tries to make ties even more visible by allowing users to see so called Friendship Pages, a friends’ mutual content and connections (e.g. events attended, photos they’re both tagged in).

12 example testimonials are not as common as other features, but they are in- dispensable for LinkedIn, the business-related profesional network. Testi- monials, named “recommendations” at LinkedIn, help to establish the rep- utation and support credibility of individuals. It is recommended to fulfil the user profile prior the interaction with others, e.g. Seeking Alpha, the largest stock market , appeals: “Before you comment, why not tell people something about yourself?” Realistic pro- file information and visibility of connections support trust between service users without established mutual friendship, which is usually prerequisite to access to each others non-public profile information. We can see emergence of the identity providers on the Web, allowing cross-systems single sign-on. I want to focus only on providing of end ser- vice and assume that the authentication of users is not a problem on in- tranet and WIS. Neither I address here a rising need of providers of social social graph graph, a global mapping of everybody on the Web with interpersonal rela- tions.

2.4.3 Relationships, Groups, Circles

I have provided theoretical background of interpersonal relations in the chapter 2.3.2. There are two different ways of a connection between users to choose when designing social network service. The first is mutual friend- ship agreed by both users and the second is richer one-way following, when each user indicates the connection separately. The former is limited to two possible states: friend or unknown. The later have four possible states: my follower, followed by me, following each other (i.e. friends) and unknown. The choice of course influences the speed of network growth. Faster and more viral is the later asymmetric follow, which can attract a lot of lurkers9 to join conversations. The asymmetric model can be seen as a little bit bigger threat to the privacy. The symmetric model allows user to have greater control over the reach of published information by a strict friend selection. This was our primary reason to choose the symmetric friendship for IS MU. Social services should respect the fact, that people have independent groups of friends in real life.10 Group membership setting can be man- ageable in a user friendly method and the service needs appropriate ac- cess rights system to control to whom it shows what content. Groups are formed on a different base, e.g. lifestage (high school mates), hobby (beach volleyball) or shared experience (reading group). Users often want to com- municate or share only with the relevant group.

9A lurker is a person who passively accesses computer-mediated communication, e.g. forums, , but rarely or never participates actively. 10Usual number of main groups is assumed to be between 3 and 5 [Ada07], each formed by 2–10 people. Usually there is a very little cross-over between group members.

13 2.4.4 Authoring, Sharing and Conversations The importance of user generated content was described in the previous chapters. Various types of content, e.g. discussion forums, (micro)blogs (or status updates), social bookmarks or wikis are widely known and are subject of many research papers. Some systems provide a way to request new content, e.g. Wikipedia have the special page Requested_articles and internal IBM blogging plat- form have topic suggestion system described in [GD10]. What is perhaps most interesting about user contribution is the moti- Motivations vation for it. Social software developers quite often encourage users to for contributing contribute. The following list consists of only a few strong and different reasons, preferably self-motivations. I do not explain them and have no ambition to fully cover all possible motivations. For a better list of reasons I would recommend to read [MS07].

1. Anticipated reciprocity

2. Reputation-building (also known as “ego boost” or egoboo), increasing recognition of those, who want to be considered an expert11

3. Self-development

4. Sense of community and impact on the group/environment

5. Altruism participation Studying the Web users behavior, 90-9-1 participation inequality rule inequality arise. The numbers illustrate inequality of the active content creators (1%), the occasional contributors (9%) and the consuments (90%). An example of even more radical difference could be English Wikipedia with numbers around 99.8-0.2-0.003 [Nie06]. In the light of participation inequality rule, it is not surprising to see 16,000 IS MU users learn with the help of Drill, from 1,150 private coursebooks and 250 public coursebooks, giving the ratio 92-6.6-1.4 (e-learning tool Drill is described at the chapter 4.1). Regarding the conversations, fully functional social network service pro- vides private and public communication channels. A good communication software offers a way to limit private sideconversations only to those, who are of actual user interest.

2.4.5 Tagging and Folksonomy

tagging The tagging, a social annotation of resources, enable users to add key- words, usually one-word descriptions, to the resources without relying on a controlled vocabulary. Collection of tags created by many users within folksonomy a single system may be referred to as folksonomy (a portmanteau of folks and taxonomy). The folksonomy is a system of classification derived from

11Reputation-building is said to be a common motivation for the open source contributors and other voluntary workers.

14 the practice and method of collaboratively creating and managing tags to annotate and categorize content [Pet09]. Authors of [MNBD06] present model of tagging systems, specifically in the context of web-based systems, to help illustrate the tagging systems potential to improve search, spam detection, reputation systems, and per- sonal organization while introducing new modalities of social communica- tion and opportunities for data mining.

2.4.6 Social-aware Full Text Search Full text search is common navigation technique on the Web and it is im- portant in the enterprise systems as well. It is one of the ways to fight information overload. The main concern for developers are relevance of the search engine results. User generated data such as status updates, posts and comments are be- coming a crucial part of how people behave online. Results of processing these social “signals” can provide a significant improvement in the search engine performance. In [ABD07] authors have shown that incorporating user behavior data can significantly improve ordering of top results in real web search setting. They have explored the contributions of user feed- back compared to other common web search features. Incorporating im- plicit feedback can improve the accuracy of search ranking algorithms by as much as 31%. A model of -based personalized searches to enhance not only retrieval accuracy but also retrieval coverage is presented in [KRAS11]. The useful- ness of social annotations and their use for the search optimization is shown in [BXW+07]. Regarding the user interface of a search engine, a result page may be enhanced by additional information. One of the possible ways is to show details about the retrieved document author, especially if he or she is some- how related to the querying user or is recognized authority. Another way is to incorporate friend and acquaintance recommendations within the search results. Both ways are easily achievable in WIS with reliable metadata rather than on the open Web with insufficient and not trustworthy meta- data.

Search Engine Structures Very fast processing and efficient data struc- tures are needed for full text search purposes. We presented novel method which employs virtual tokens for encoding the special words directly into the search index (inverted index) in [vBKO10]. Such word is inserted to the index as if it was a part of the indexed document itself, but handled differently. Beside the use for storing the access rights, which is the main purpose mentioned in the paper, we are using virtual tokens to store ad- ditional metadata, e.g. authorship of the document. Metadata are used to enhance search results ordering. In IS MU search, we are able to add certain weight to the documents containing virtual token with the meaning “there is a person X related to

15 this document”. Search engine can match those documents with tokens extracted from one’s social dependencies and added to the search query. Thus search results are personalized and ordered with a preference given to results connected to acquaintances of the querying user.

2.4.7 Shared Workspacies

Social software can facilitate broad collaboration through various tools which can form a shared workspace, or collaborative workspace, where colleagues (or students) are able to share documents and information ei- ther in synchronous or asynchronous mode. Asynchronous tools are mainly wikis, where users may extend, undo and redo each other’s work on a given document, discussion forums and blogs. Basic form of a document exchange is shared file storage which can be en- hanced at least by document status information and document versioning. Synchronous tools—which are significantly harder to implement—include video or voice real time messaging, shared whiteboards or concurrent doc- ument (e.g. spreadsheet, presentation) editors. The role of Social Web and its associated tools for managing and sharing knowledge knowledge in companies is discussed in [KRS09], including roots of knowl- manage- ment edge management (KM), implications of Web 2.0 phenomenon on KM, in- troduction to case studies (external blogs on IBM website and wiki of Syn- taxon AG) and finally findings and suggestions for successful integration of KM 2.0 in enterprises.

2.5 Network Science Network science is an emerging, highly interdisciplinary research area that aims to develop theoretical and practical approaches and techniques to in- crease our understanding of natural and man-made networks. Network Science terminology and formalisms are very well covered by [BSV07], in- cluding sampling, modelling, measurement, dynamics and visualization. Network is used as an abstract/mathematical representation of complex relational data in diverse science disciplines, e.g. biology, engineering and communication. Similarities and universal laws have been discovered be- tween systems modelled on disparate datasets.

2.5.1 Social Network Analysis

Social networks are not new, but insights gathered by their analysis are. Data on structure of relationships between social entities, e.g. people on- line behavior, are gathered on a large scale. The information created in social networks is considered to be important and valuable. I will use my research as an example to document value of the so called link-based data. We have researched data mining of study-related data in order to learn a classifier to predict the success of study of a university students. As shown in [BBG+11], selected social network data of student’s engagement

16 Figure 2: Success of study classification accuracy. Results are better for the classifiers learned (also) on data acquired through Social Network Anal- ysis (SNA).

in community and communication can significantly improve the results of such classification (Figure 2). Regarding the history of Social Network Analysis, psychologist Jacob Moreno introduced sociogram, a representation of the social structure of sociogram a group of people, in 1933. The sociogram has found many applications and has grown into a field of SNA. Early science papers (see bibliography of [HR05]) concerning SNA have been published mostly in the field of sta- tistical and mathematical sociology since eighties. Comprehensive reference book reviewing network analysis methods is [WF94]. Shorter introduction can be found in [Hay96]. Applications of SNA is topic of ASNA conference12, with [CH10] being the last published proceedings.

2.5.2 Link Mining (in Heterogeneous Networks)

Data mining is a semi-automatic process of analyzing large scale data sets data mining in order to find novel patterns or to detect latent relationships [HKP11]. It involves many learning algorithms allowing computers to generalize the experience from a set of examples (training data) into the unseen data. Link mining refers to the data mining techniques that mine a linked link mining collection of interrelated objects. Common tasks include object ranking, group detection, collective classification, link prediction and subgraph dis- covery, all covered in the survey [GD05]. A lot of network mining methods assume that there is only one kind of relation between entities. In reality, there are numerous relations of different importance constructing heterogeneous networks (which may be heterogeneous less sparse). The mining of heterogeneous networks is up-to-date research networks topic.

12http://www.asna.ch/

17 2.6 Recommendation Systems With modern electronic communication tools we can advance beyond com- monplace word-of-mouth recommendations used by humankind for cen- Collaborative turies. Collaborative filtering is the process of filtering or evaluating items filtering through the opinions of other people. The core concepts of collaborative filtering are comprehensively covered by [SFHS07]. The applications of collaborative filtering encompasses for example rec- ommendation of social software items [GZC+09] or a community recom- mendation for social networking sites [CZC08]. User’s affinity, a similarity of users, may be used not only to recommend friends but also to recommend other potentially interesting resources. No- cera and Ursino present the concept of social folksonomy and techniques to find and recommend similar users and interesting resources in [NU11]. They based their model on the evaluation of resources posted, labelled, accessed, on a number of friends and other relevant indicators. Adaptive navigation is one of the possible research directions and chal- lenging recommendation task, currently beyond the scope of my thesis.

18 3 Aims of the Thesis

I have identified the key social software features and helped with their in- tegration into the enterprise-level Information System of Masaryk Univer- sity. Fairly large user adoption have enabled me to conduct first research experiments, realize bottlenecks and propose following Ph.D. thesis aims.

3.1 Objectives and Expected Results

The main focus of the proposed thesis is to identify and utilize relations among users and other objects managed by the web information system. I will discuss both the objectives of relations pre-processing and of their final use in the specific contexts. In particular, the thesis will consist of the intersection of the following areas:

1. Data warehousing Selected study-related data from IS MU are already periodically im- ported to the data warehouse Excalibur [BBGP11], a tool for data min- ing. My objective is to extend this raw data staging layer by data re- quired for analytical processing of social features. Records of user activity, crucial to system usage evaluation, are accessible only from very large and unclean system logs so far (more than a million of op- erations took place daily in the last 3 years). I will employ data warehouse Excalibur to maintain raw data and data history. This is obviously an interim solution. When the analysis is per- formed, the identified relations important for the contexts mentioned below have to be stored directly in IS MU and effectively accessible to the application software.

2. Social-aware full text search optimization We are able to incorporate metadata about the indexed document re- lations (e.g. identification of persons contributing to the given dis- cussion forum thread) into the search index. On the other side of search process, we have prepared the mapping of interpersonal re- lations (only the aggregated form of a number of types on the real system setting so far). The mapping serves to add querying person relations to the query. The IS MU search have the ability to handle both the metadata about the documents and about the user needed to reorder and thus person- alize search results now. What we are missing and what is my aim in this context is the analysis of the relations and influence weights. Relations are dependent on the document agenda (e.g. discussions or persons), so will be the weights.

3. E-learning tools

19 I provide the citation mentioned also in my paper [BvO09] (about ver- satility of e-learning tools): “Existing social software tools such as we- blogs, wikis and can be used to support e-learning activities. However, these tools are not developed for educational pur- poses, which means that a directed effort is necessary to develop ed- ucational social software tools to support learning activities.” [Dal06] Further integration of additional social features into learning support applications is promising—students’ motivation to engage in online learning may be stimulated by visible competition against each other.

4. Social software usage Measuring the social software usage is significantly harder than mea- suring basic web-based system usage by the log files analysis. Statis- tics may include simple traffic values (e.g. minutes spent) gathered from the user clicks, but this is not sufficient (as is shown for example in the analysis of collaborative learning [AFK+07]). Measuring trends is more useful than absolute numbers. I will compile indicators of overall system and selected agendas usage to track the trends. The question How many forum topics have been answered within one hour of posting? is example of such indicator. Additional data well suited for the improvement of data mining results should be found from identified indicators.

5. Analysis of the impact of implemented features The aim in this context is to validate expected positive effect of socially- enabled applications on the applications on the effectiveness and effi- ciency within the organization.

6. Miscellaneous objectives In the proposed text, I want to mention following aspects of social software integration into WIS:

(a) The implementation details useful for the information system de- velopers. (b) Method and results of measuring user satisfaction and engage- ment with social software tools. (c) A various possible examples of a hidden relations discovery (data not described by a classic relational data model of an information system) and use cases of link mining in IS MU heterogeneous net- works.

3.2 Schedule

20 Fall 2011 Defense of this proposal, doctoral exam. Designing and launching of regular user satisfaction questionnaire / e- vote. Spring 2012 Selecting of relevant social-related relations and indicators of usage, preparing of the data exports to maintain the history (in the Excalibur data warehouse). Fall 2012 Network data analysis and trends prediction. Publishing the results. Spring 2013 Evaluating and discussing the predicted trends. Publishing the results. Writing the Ph.D. thesis. Fall 2013 Submitting the thesis.

21 22 4 Achieved Results

4.1 IS MU Social Applications

Being IS MU developer (focused on application design and usability), I di- rectly influence features which are chosen to be implemented. I have con- tributed to a number of new and redesigned applications and pursued the development team to include social features when appropriate. I provide a brief description of socially enhanced system agendas and applications bellow.

Meet People (Schoolmates) and Graduates (Alumni Network) Sharing and communication tools based on Web 2.0 principles. Majority of the following tools are used within those general agendas. My Friends Mutually confirmed friendship between users. Friends are able to see additional information about each other, e.g. timetable, the precise time of the last access to the system or marked Noticeboard messages. There are more than 36,000 people (of total 56,000 active users) with at least one friend, 110,000 friendships in total, average number of friends is only 3 so far with 181 friendships per person at most. IS MU also provides voluntary option to reveal one’s own study field (even for applicants), curses and exam terms. Classes Social circles, either made in advance (on the base of study records) or ad-hoc created, to facilitate private communication and cooperation among their members. There are over 5,400 classes in the system with theoretically 188,744 memberships (the number includes old graduates who may never join the system). Activities Streams or feeds of latest friends and class members activity, which pro- vide social translucence. Following table shows the numbers of the most frequent activities.

Friendship Profile Person Circle Graduate Profile Book- change joined discus- joined photo mark circle sion class change 223,952 26,616 20,761 6,743 4,990 3,727 2,642 Blogs, Discussion groups, Noticeboard, Bookmarks Applications to foster content creation and communication. Some of them allow user to rate the content. The ratings are used, among oth- ers, to filter unsolicited content. Heavy used Discussion groups consist of almost 1 million posts with one quarter being rated at the moment. The rate of rating activity is higher on Noticeboard—more than 80% of messages have at least one rating. More than 12,700 bookmarks are shared between 21,390 users.

23 Tagging Free form content categorization is currently available on Publications, Bookmarks, Blog entries and Noticeboard messages. Users are able to take advantage of tagging in the form of a tag cloud of bookmark tags and people interests and in the other form of navigation and filtering in other applications, including full text search. Personal Pages Identity in the system is provided by the user profiles. There are 1,200 testimonials made by only 155 different users. Search Search results are (primitively) personalized and differ for each user. Better ordering based on social signals is of my actual research concern. E-learning Social applications are linked to learning support applications. Further integration of social features is among my current aims. Drill Thousands of courses in a semester with tens of thousands of students are numbers sufficient for the easy acceptance of the social e-learning tools. The application Drill can serve as an example. Based on spaced repetition algorithm, it helps to memorize small portions of information and it is usually used to learn vocabulary of foreign language. Stu- dents started to use Drill quite heavily right from the beginning. They answered 35 millions cards (2,200 on average) and have created over 1,400 and shared 250 coursebooks since then. During design of the social features, I have made decisions with the principles mentioned in the chapter 2.3 and 2.4 in mind. I am going to publish interesting results of the implementation phase of my experimental research as soon as an analysis of the impact on system usage and user behavior is made.

4.2 Academic Work I am an advisor of several bachelor (BT) and master (MT) theses: Fiala, J.: Correction and training of the typographic rules, BT, 2011 Havelka, J.: Social software development platforms, MT, to appear Ivašková, P.: E-shop graphical user interface optimization, BT, to appear Kouba T.: Search engines ranking factors evaluation, BT, 2010 Mazoch, B.: Web application black box testing, BT, to appear Olšák, L.: Municipal website with maps, BT, 2011 Smrčka, M.: Faster interaction between user and web application, BT, 2011 Svoboda, M.: Wine competition administration web application, BT, 2010 Zmrzlý, A.: Web browsers extensions for IS MU services, BT, to appear I also teach two courses focused on the Web Development: PV005 Com- puter Network Services and PV219 Webdesign seminar. As a teacher I have achieved above average results in the Course opinion poll. I am preparing

24 new course syllabus on Social Web, Networks and Services for Social In- formatics bachelor field of study. Syllabus of the course is partially based on the topic of my doctoral study. I am co-organizer of Wikinomics Forum13 (un)conference. Forum’s aim is to disseminate wikinomics principles and modern mass collaboration paradigms14 to business, research and public services. The first year was hosted by Faculty of Arts, Charles University in Prague in cooperation with Wikimedia Czech Republic. I would like to bring the second or third con- ference year to Brno, preferably to Masaryk University.

4.3 Other Results I regularly invite experts from industry to share their experience with stu- dents of Webdesign seminar and during special events. Three hackathons and lectures with Google developers hosted at MU was held by me as Google User Group organizer. Topics included HTML5, Google Chrome, mobile applications and semantic on the Web. I helped with first Brno Bar- camp, hosted by our faculty at 2011. I am a reviewer of ongoing authorized Czech translation of Content Ac- cessibility Guidelines (WCAG) 2.0.15

13http://www.wikinomie.cz/ 14The four principles of Wikinomics: openness, peering, sharing and acting globally. Wikinomics Forum 2011 topics included crowdsourcing, community, open data in public services and special Wikimedia ČR and WikiSkripta projects track. I was moderator of open lightning talks section. 15http://www.w3.org/TR/WCAG20/

25 26 5 Author’s Publications

I am co-author of the following publications.16

Improving the Classification of Study-related Data Through Social Network Analysis, MEMICS 2011 (to appear) A short summary of the paper [BBG+11] results is given in the chapter 2.5.1.

Access Rights in Enterprise Full-text Search, ICEIS 2010 The paper [vBKO10] is also available as the technical report FIMU-RS-2010-08. We present novel method for fast search engine results filtering and ordering. Social-aware search use case is described in the chapter 2.4.6.

Towards text mining in technology-enhanced learning, ECEL 2010 My contribution to the paper [BvG+10] is the part related to the pre-pro- cessing and the text classification of the diploma theses.

Search (software), 2010 https://is.muni.cz/publication/935935/en

Advantages of Versatile E-learning Tools, CELDA 2009 The short pa- per [BvO09] presents benefits of selected multi-purpose e-learning tools together with summary data from the last 5 years.

Alumni Web (software), 2008 https://is.muni.cz/publication/834469/en

16 https://is.muni.cz/person/obsivac#publikace

27 28 6 Summary / Souhrn

6.1 Summary The Web and Social Software principles can be transferred to the enterprise- class web-based Information Systems in order to find new ways to connect people and potentially interesting resources. The thesis proposal discusses what to consider during an integration of social features and then how to take advantage of their use by employing the results of Social Network Analysis and Link Mining in Heterogeneous Networks techniques. These techniques help to reveal previously unseen knowledge useful to support work and learning engagement. Examples, like the search engine results personalization, are taken from the Information System of Masaryk Uni- versity.

6.2 Souhrn Principy Webu a sociálního software mohou být přeneseny do podnikových webových informačních systémů s cílem nalézt nové způsoby propojení lidí a potenciálně zajímavých informací. Teze doktorské práce diskutují, co vzít v úvahu během integrace společenských (komunikačních a propojovacích) vlastností a jak je poté využít nasazením technik analýzy sociálních sítí a dolovaní vztahů v sítích s různými druhy vazeb a aktérů. Tyto techniky po- máhají odhalit dosud skryté znalosti, které jsou užitečné pro podporu práce a učení. Příklady, jako je personalizace výsledků vyhledávání, jsou převzaty z Informačního systému Masarykovy univerzity.

29 7 References

[ABD07] Eugene Agichtein, Eric Brill, and Susan Dumais. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th annual international ACM SIGIR, New York, 2007. ACM.

[Ada07] Paul Adams. Communication mapping: understanding anyone’s social network in 60 minutes. In Proceedings of the 2007 con- ference on Designing for User eXperiences, DUX ’07, pages 7:1–7:8, New York, NY, USA, 2007. ACM.

[Ada10] Paul Adams. The real life social network v2. http://www.slideshare.net/padday/ the-real-life-social-network-v2 , June 2010. Presen- tation at Voices That Matter Web Design Conference, San Francisco.

[AFK+07] Nikolaos Avouris, Georgios Fiotakis, Georgios Kahrimanis, Meletis Margaritis, and Vassilis Komis. Beyond logging of fin- gertip actions: analysis of collaborative learning using multiple sources of data. In Journal of Interactive Learning Research JILR, volume 18(2), pages 231–250. AACE Association for the Advancement of Computing in Education, 2007.

[And06] Chris Anderson. The Long Tail: Why the Future of Business is Selling Less of More. Hyperion, 2006.

[BBG+11] Jaroslav Bayer, Hana Bydžovská, Jan Géryk, Tomáš Obšívač, and Lubomír Popelínský. Improving the classification of study- related data through social network analysis. In Proceedings of Seventh Doctoral Workshop on Mathematical and Engineering Methods in Computer Science, 2011. to appear.

[BBGP11] Jaroslav Bayer, Hana Bydžovská, Jan Géryk, and Lubomír Popelínský. Excalibur. To appear, 2011.

[BE07] Danah M. Boyd and Nicole B. Ellison. Social network sites: Def- inition, history, and scholarship. Journal of Computer-Mediated Communication, 13(1):210–230, 2007.

[BHJ+08] Wim Bouman, Tim Hoogenboom, René Jansen, Mark Schnoodorp, Bolke de Bruin, and Ard Huizing. The Realm of Sociality: Notes on the Design of Social Software. http://primavera.feb.uva.nl/PDFdocs/2008-01.pdf, January 2008.

[BL99] Tim Berners-Lee. Weaving the Web: The Original Design and Ultimate Destiny of the by Its Inventor. Harper San Francisco, 1999.

30 [BSV07] Katy Börner, Soma Sanyal, and Alessandro Vespignani. Network science. Annual Review of Information Science and Technology, 41(1):537–607, 2007.

[BvG+10] Jaroslav Bayer, Matěj Čuhel, Jan Géryk, Tomáš Obšívač, and Lubomír Popelínský. Towards text mining in technology- enhanced learning. In Proceedings of the 9th European Con- ference on e-Learning ECEL 2010, pages 67–71. Academic Pub- lishing Limited, Porto, Portugal, 2010.

[BvO09] Michal Brandejs, Matěj Čuhel, and Tomáš Obšívač. Advantages of versatile e-learning tools. In Proceedings of the IADIS Inter- national Conference CELDA ’09, pages 521–522. IADIS Press, 2009.

[BXW+07] Shenghua Bao, Guirong Xue, Xiaoyuan Wu, Yong Yu, Ben Fei, and Zhong Su. Optimizing web search using social annotations. In Proceedings of the 16th international conference on World Wide Web, WWW ’07, pages 501–510, New York, NY, USA, 2007. ACM.

[CH10] Uwe Serdült Christian Hirschi, Karin Ingold, editor. Applica- tions of Social Network Analysis, volume 4 of Procedia - Social and Behavioral Sciences. Elsevier, 2010.

[CK08] Graham Cormode and Balachander Krishnamurthy. Key differ- ences between web1.0 and web2.0. First Monday, 13(6), 2008.

[CZC08] Wen-Yen Chen, Dong Zhang, and Edward Y Chang. Combina- tional collaborative filtering for personalized community recom- mendation. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining KDD 08, 2008.

[Dal06] Christian Dalsgaard. Social software: E-learning beyond learn- ing management systems. European Journal of Open and Dis- tance Learning (EURODL), July 2006.

[Dun98] Robin I. M. Dunbar. The social brain hypothesis. Evolutionary Anthropology, 6, issue 5:178–190, 1998.

[EK00] Thomas Erickson and Wendy A. Kellogg. Social translucence: an approach to designing systems that support social processes. ACM Trans. Comput.-Hum. Interact., 7:59–83, March 2000.

[For10] Social networking in the enterprise: Benefits and inhibitors, June 2010.

[Gat96] Bill Gates. Content is king. http://www.microsoft.com/ billgates/columns/1996essay/essay960103.asp, 1996. ac- cesible on web.archive.org.

31 [GD05] Lise Getoor and Christopher P. Diehl. Link mining: a survey. SIGKDD Explor. Newsl., 7:3–12, December 2005. [GD10] Werner Geyer and Casey Dugan. Inspired by the audience – a topic suggestion system for blog writers and readers. In Pro- ceedings of the 2010 ACM conference on Computer supported cooperative work, CSCW ’10, pages 237–240, New York, 2010. ACM. [GPV11] Bruno Goncalves, Nicola Perra, and Alessandro Vespignani. Val- idation of dunbar’s number in twitter conversations. ArXiv e- prints, May 2011. [Gra73] Mark S. Granovetter. The strength of weak ties. In American Journal of Sociology, volume 78, No. 6, pages 1360–1380. The University of Chicago Press, 1973. [GZC+09] Ido Guy, Naama Zwerdling, David Carmel, Inbal Ronen, Erel Uziel, Sivan Yogev, and Shila Ofek-Koifman. Personalized rec- ommendation of social software items based on social relations. In Proceedings of the third ACM conference on Recommender systems, RecSys ’09, pages 53–60, New York, NY, USA, 2009. ACM. [Hay96] Caroline Haythornthwaite. Social network analysis: An ap- proach and technique for the study of information exchange. In Library & Information Science Research, volume 18, issue 4, pages 323–342. Elsevier, 1996. [HG08] James Hendler and Jennifer Golbeck. Metcalfe’s law, web 2.0, and the semantic web. Web Semantics: Science, Services and Agents on the World Wide Web, 6(1):14–20, 2008. [Hir06] Mike Hirshland. Guest blogger bob metcalfe: Met- calfe’s law recurses down the long tail of social net- works, 2006. http://vcmike.wordpress.com/2006/08/18/ metcalfe-social-networks/. [HKP11] Jiawei Han, Micheline Kamber, and Jian Pei. Data Mining: Con- cepts and Techniques. The Morgan Kaufmann Series in Data Management Systems. Elsevier Science & Technology, 2011. [HR05] Robert A. Hanneman and Mark Riddle. Introduction to so- cial network methods. http://faculty.ucr.edu/~hanneman/ nettext/, 2005. [IS 11] Masaryk University Information System. https://is.muni.cz/ nas_system/?lang=en, 1999–2011. [KJL10] Won Kim, Ok-Ran Jeong, and Sang-Won Lee. On social web sites. Information Systems, 35(2):215–236, 2010. Special Sec- tion: Context-Oriented Information Integration.

32 [KRAS11] Heung-Nam Kim, Majdi Rawashdeh, Abdullah Alghamdi, and Abdulmotaleb El Saddik. Folksonomy-based personalized search and ranking in social media services. Information Sys- tems, In Press, Corrected Proof, 2011. [KRS09] Kathrin Kirchner, Liana Razmerita, and Frantisek Sudzina. New forms of interaction and knowledge sharing on web 2.0. In M. D. Lytras, E. Damiani, and P. Ordóñez de Pablos, editors, Web 2.0: The Business Model, pages 1–16. Springer US, 2009. [Law05] Mark Lawson. Berners-lee on the read/write web. http:// news.bbc.co.uk/2/hi/technology/4132752.stm, 2005. [MNBD06] Cameron Marlow, Mor Naaman, Danah Boyd, and Marc Davis. Ht06, tagging paper, taxonomy, flickr, academic article, to read. In Proceedings of the seventeenth conference on Hypertext and hypermedia, HYPERTEXT ’06, pages 31–40, New York, NY, USA, 2006. ACM. [MS07] Trevor D. Moore and Mark A. Serva. Understanding member motivation for contributing to different types of virtual com- munities: a proposed framework. In Proceedings of the 2007 ACM SIGMIS CPR conference on Computer personnel research: The global information technology workforce, SIGMIS CPR ’07, pages 153–158, New York, NY, USA, 2007. ACM. [Nie06] Jakob Nielsen. Participation Inequality: Encouraging More Users to Contribute. Alertbox, 2006. http://www.useit.com/ alertbox/participation_inequality.html. [NU11] Antonino Nocera and Domenico Ursino. An approach to pro- viding a user of a “social folksonomy” with recommenda- tions of similar users and potentially interesting resources. In Knowledge-Based Systems, volume 24, issue 8, pages 1277–1296. Elsevier BV, 2011. [Odl01] Andrew Odlyzko. Content is not king. First Monday, 6(2), 2001. [O’R04] Tim O’Reilly. The architecture of participation, 2004. http: //oreilly.com/lpt/a/5994 (print version). [O’R05] Tim O’Reilly. What is web 2.0: Design patterns and busi- ness models for the next generation of software, 2005. http: //oreilly.com/lpt/a/7425 (print version). [Pet09] Isabella Peters. : indexing and retrieval in Web 2.0. Knowledge & Information: Studies in Information Science. Walter de Gruyter, 2009. [RJ10] George Ritzer and Nathan Jurgenson. Production, consumption, prosumption: The nature of capitalism in the age of the digital ‘prosumer’. Journal of Consumer Culture, 10(1):13–36, 2010.

33 [SFHS07] J. Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. Collaborative filtering recommender systems. In ane Wolf- gang Nejdl Peter Brusilovsky, Alfred Kobsa, editor, The adaptive web: methods and strategies of web personalization, volume 4321 of Lecture Notes in Computer Science, pages 291–324. Springer Berlin / Heidelberg, 2007.

[SK09] Deb Sledgianowski and Songpol Kulviwat. Using social net- work sites: the effects of playfulness, critical mass and trust in a hedonic context. Journal of Computer Information Systems, 49(4):74–83, 2009.

[Sur04] James Surowiecki. The Wisdom of Crowds. Doubleday, 2004.

[vBKO10] Matěj Čuhel, Michal Brandejs, Jan Kasprzak, and Tomáš Obší- vač. Access rights in enterprise full-text search. In ICEIS 2010: Proceedings of the 12th International Conference on Enterprise Information Systems, Volume 1: Databases and Information Systems Integration. 2010, pages 32–39. INSTICC (Institute for Systems and Technologies of Information, Control and Commu- nication), Funchal, Portugal, 2010.

[WF94] Stanley Wasserman and Katherine Faust. Social network analy- sis: methods and applications. Structural analysis in the social sciences. Cambridge University Press, 1994.

[YF10] Jing Tao Yao and Lisa Fan. Web-based learning support sys- tem. In Lakhmi Jain and Xindong Wu, editors, Web-based Sup- port Systems, Advanced Information and Knowledge Process- ing, pages 81–95. Springer London, 2010.

34