Analyzing Reading Behavior by Blog Mining

Total Page:16

File Type:pdf, Size:1020Kb

Analyzing Reading Behavior by Blog Mining Analyzing Reading Behavior by Blog Mining Tadanobu Furukawa Yutaka Matsuo Ikki Ohmukai Koki Uchiyama Mitsuru Ishizuka AIST National Institute of Informatics Hotto Link Inc. University of Tokyo 1-18-13 Sotokanda 2-1-2 Hitotsubashi 1-6-1 Otemachi 7-3-1 Hongo, Bunkyo-ku Chiyoda-ku, Tokyo Chiyoda-ku, Tokyo Chiyoda-ku, Tokyo Tokyo 135-8656, Japan 101-0021, Japan 101-8430, Japan 100-0004, Japan Abstract Adamic & Glance 2005). Among several studies that an- alyzed social networks on the blogosphere, some have used This paper presents a study of the various aspects of citation (mention of urls in the posting) to show relations blog reading behavior. The analyzed data are obtained among blogs; others have used blogrolls or trackbacks as from a Japanese weblog hosting service, Doblog. Four kinds of social networks are generated and analyzed: ci- evidence of relations. At least one study has surveyed com- tation, comment, trackback, and blogroll networks. In ment relations among bloggers (Lento et al. 2006). addition, the user log data are used to identify reader- Another important relation exists among bloggers: read- ship relations among bloggers. After analysis of more ership relations. Readership relations are not observable than 50,000 users for about two years, we reveal some through publicly available data, but they are an important interactions between social relations and readership re- source of information because bloggers read other blogs and lations. We first show that bloggers read other weblogs write their own blogs. The importance of readership rela- on a regular basis (50% of weblogs that are read at least tions is described in the literature (such as (Efimova, Hen- three times are read every five times a user logs in). We drick, & Anjewierden 2005; Nardi, Schiano, & Gumbrecht call this relation a regular reading relation (RR rela- 2004)). A recent study has analyzed the structural properties tion). Then, prediction of RR relations is done using features from the four kinds of social networks. Lastly, of weblog readership networks (Marlow 2006). information diffusion on RR relations is analyzed and Although those studies reveal various interesting findings characterized. Results of this study show that the blogs related to the social networks of weblogs, no comprehen- in RR relations have an important role in bloggers’ ac- sive study of them has examined all social and behavioral tivities. We find the features which have a correlation relations simultaneously. A comparison among different re- with RR relations. lations provides the general overview of each relation and its associated pattern of interaction. For this study, we an- alyze multiple social networks of weblogs: citation, com- Introduction ment, trackback, and blogroll networks. Subsequently, the Web logs (blogs, or weblogs) constitute a prominent social user log data (which include which blogs a user browses medium on the internet that enables users to publish indi- and when) are used to identify readership relations among vidual experiences and opinions easily. Analyzing these bloggers. Therefore, five types of social and behavioral net- data helps to understand the user’s behavioral pattern on the works are analyzed in this paper. We use the database of a blogosphere, and it supports the development of web ser- blog-hosting service in Japan called Doblog1. vices such as recommendation of blogs. Numerous studies Analyses of 1.5 million entries made by more than 50,000 have examined weblogs, especially addressing their social users for about two years reveal interesting interactions in- aspects: Bloggers read other blogs and leave comments and volving social relations and readerships. This paper de- send trackbacks as they update their own blogs. Users might scribes salient aspects of the following analysis. mention other blogs in their postings, and express their sug- • We first illustrate the four kinds of social networks and gested contacts in blogrolls (a sidebar within a particular characterize them. blog listing the other blogs the blogger frequents). These activities present an interwoven record of multiple relation- • Bloggers read some blogs on a regular basis. We can dis- ships among blogs (sometimes called a multiplex graph in cern these behaviors quantitatively: 50% of weblogs that sociology), which is an interesting source of information to are read at least three times are read every five times a characterize user behavior, community structure, and infor- user logs in. We call this relation a regular reading rela- mation diffusion. tion (RR relation). To date, various studies have specifically examined so- • Link prediction of RR relations is done using features cial network aspects of weblog authors (Marlow 2004; from the four kinds of social networks. Some attributes Copyright c 2007, Association for the Advancement of Artificial 1Doblog (http://www.doblog.com/), provided by NTT Intelligence (www.aaai.org). All rights reserved. Data Corp. and Hotto Link, Inc. 1353 (such as graph distance) and some networks (blogrolls Kraut 2002). Computer-mediated communication (in par- and citations) are demonstrably useful for predicting RR ticular, e-mail) is less valuable for building and sustaining relations. close social relationships than face-to-face contact and tele- • Information diffusion through RR relations is analyzed phone conversations. R. Kumar et al. investigates profiles and characterized. Some information is likely to be of more than one million livejournal.com bloggers in 2004, conveyed through RR relations. Generally, information and analyzes users’ demographic and geographic character- propagates in a shorter time and with higher probability istics (Kumar et al. 2004). More recently, Ali-Hasan and through RR relations than through non-RR relations. L. Adamic find interesting characteristics of bloggers’ on- line and real-life relationships (ALi-Hasan & Adamic 2007). Our findings provide an overview of social relations and They investigate three blog communities using an online reading behavior. These results support those of existing survey, which reveals that few blogging interactions reflect studies of social network analyses of the blogosphere. close offline relationships; furthermore, many online rela- This paper is organized as follows: first, we describe re- tionships were formed through blogging. lated studies. Next, we explains the definition of social Nardi et al. conducted audiotaped ethnographic inter- relations among weblogs, and also define the RR relation views with 23 bloggers, with analysis of their blog posts and characterize it with social relations. We then produce a (Nardi, Schiano, & Gumbrecht 2004). The motivations model to infer the existence of RR relations as a link predic- of blogging are enumerated. They provide good insights tion problem, and analyze information diffusion through RR that support the background of our research: bloggers write relations and show the effect of RR relations. Finally, after a blogs to (1) update others on activities and whereabouts, (2) discussion of analytical limitations, we conclude the paper. express opinions to influence others, (3) seek others’ opin- ions and feedback, (4) “think by writing”, and (5) release Related Works emotional tension. Nardi et al. remark that Many studies have specifically undertaken analysis of the Our research leads us to speculate that blogging is as blogosphere as a social medium: trend detection, network much about reading as writing, and as much about lis- analysis, user profiling, and splog (spam blog) detection. We tening as talking. We specifically examined the pro- introduce several works that are closely related to ours. duction of blogs, but future research will address blog Several studies have analyzed social networks that exist readers and to assess the relations between blog writers in the blogosphere: L. Adamic and N. Glance study the link and blog readers precisely. patterns (citations and blogrolls) and discussion topics of po- In the following section, we investigate relations between litical bloggers (Adamic & Glance 2005). This study de- blog writers and readers from a social network perspective. tected differences in the behaviors of politically liberal and conservative blogs, with conservative blogs linking to each Four Social Networks among Weblogs other more frequently. Lento et al. conduct data analyses re- garding the Wallop system and compares users who remain In this section, we define four kinds of relations among blogs active to those who do not (Lento et al. 2006). Similarly and depict social networks defined by these relations. to our work, the use of data from the hosting service enables We consider four types of relations between two blogs: them to detect social ties such as comment relations and invi- Citation We define that there is a citation relation from A tation relations, which are usually impossible to obtain. G. to B if an entry of blog A includes a hyperlink to blog B. Mishne and N. Glance analyze blog comments (Mishne & Blogroll We assert a blogroll relation from A to B if a Glance 2006). Those studies extract some relations among blogroll (a list of weblogs in the front page) of A includes blogs and produce a social network for analysis. blog B. Social networks are used for several applications such as Comment A comment relation from A to B pertains if the blog/entry ranking and community detection: E. Adar et al. blogger of blog A comments on blog B. proposes a ranking algorithm called iRank, which is based on implicit routes of information transmission as well as Trackback A trackback relation from A to B exists if an explicit links (Adar et al. 2004). For community detec- entry of blog B contains a back-reference by the trackback tion, Y. Lin et al. seek interesting aspects of social relations function to blog A. (Lin et al. 2006). They develop a computational model for These are mentioned in (Marlow 2004) and other literature. mutual awareness that incorporates specific action types in- We call these four relations social relations because the re- cluding commenting and changing blogrolls. The mutual lations are publicly observable and therefore involve some awareness feature is used for community extraction.
Recommended publications
  • Uncovering Social Network Sybils in the Wild
    Uncovering Social Network Sybils in the Wild ZHI YANG, Peking University 2 CHRISTO WILSON, University of California, Santa Barbara XIAO WANG, Peking University TINGTING GAO,RenrenInc. BEN Y. ZHAO, University of California, Santa Barbara YAFEI DAI, Peking University Sybil accounts are fake identities created to unfairly increase the power or resources of a single malicious user. Researchers have long known about the existence of Sybil accounts in online communities such as file-sharing systems, but they have not been able to perform large-scale measurements to detect them or measure their activities. In this article, we describe our efforts to detect, characterize, and understand Sybil account activity in the Renren Online Social Network (OSN). We use ground truth provided by Renren Inc. to build measurement-based Sybil detectors and deploy them on Renren to detect more than 100,000 Sybil accounts. Using our full dataset of 650,000 Sybils, we examine several aspects of Sybil behavior. First, we study their link creation behavior and find that contrary to prior conjecture, Sybils in OSNs do not form tight-knit communities. Next, we examine the fine-grained behaviors of Sybils on Renren using clickstream data. Third, we investigate behind-the-scenes collusion between large groups of Sybils. Our results reveal that Sybils with no explicit social ties still act in concert to launch attacks. Finally, we investigate enhanced techniques to identify stealthy Sybils. In summary, our study advances the understanding of Sybil behavior on OSNs and shows that Sybils can effectively avoid existing community-based Sybil detectors. We hope that our results will foster new research on Sybil detection that is based on novel types of Sybil features.
    [Show full text]
  • Svms for the Blogosphere: Blog Identification and Splog Detection
    SVMs for the Blogosphere: Blog Identification and Splog Detection Pranam Kolari∗, Tim Finin and Anupam Joshi University of Maryland, Baltimore County Baltimore MD {kolari1, finin, joshi}@umbc.edu Abstract Most blog search engines identify blogs and index con- tent based on update pings received from ping servers1 or Weblogs, or blogs have become an important new way directly from blogs, or through crawling blog directories to publish information, engage in discussions and form communities. The increasing popularity of blogs has and blog hosting services. To increase their coverage, blog given rise to search and analysis engines focusing on the search engines continue to crawl the Web to discover, iden- “blogosphere”. A key requirement of such systems is to tify and index blogs. This enables staying ahead of compe- identify blogs as they crawl the Web. While this ensures tition in a domain where “size does matter”. Even if a web that only blogs are indexed, blog search engines are also crawl is inessential for blog search engines, it is still pos- often overwhelmed by spam blogs (splogs). Splogs not sible that processed update pings are from non-blogs. This only incur computational overheads but also reduce user requires that the source of the pings need to be verified as a satisfaction. In this paper we first describe experimental blog prior to indexing content.2 results of blog identification using Support Vector Ma- In the first part of this paper we address blog identifica- chines (SVM). We compare results of using different feature sets and introduce new features for blog iden- tion by experimenting with different feature sets.
    [Show full text]
  • By Nilesh Bansal a Thesis Submitted in Conformity with the Requirements
    ONLINE ANALYSIS OF HIGH-VOLUME SOCIAL TEXT STEAMS by Nilesh Bansal A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Computer Science University of Toronto ⃝c Copyright 2013 by Nilesh Bansal Abstract Online Analysis of High-Volume Social Text Steams Nilesh Bansal Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2013 Social media is one of the most disruptive developments of the past decade. The impact of this information revolution has been fundamental on our society. Information dissemination has never been cheaper and users are increasingly connected with each other. The line between content producers and consumers is blurred, leaving us with abundance of data produced in real-time by users around the world on multitude of topics. In this thesis we study techniques to aid an analyst in uncovering insights from this new media form which is modeled as a high volume social text stream. The aim is to develop practical algorithms with focus on the ability to scale, amenability to reliable operation, usability, and ease of implementation. Our work lies at the intersection of building large scale real world systems and developing theoretical foundation to support the same. We identify three key predicates to enable online methods for analysis of social data, namely : • Persistent Chatter Discovery to explore topics discussed over a period of time, • Cross-referencing Media Sources to initiate analysis using a document as the query, and • Contributor Understanding to create aggregate expertise and topic summaries of authors contributing online. The thesis defines each of the predicates in detail and covers proposed techniques, their practical applicability, and detailed experimental results to establish accuracy and scalability for each of the three predicates.
    [Show full text]
  • Blogosphere: Research Issues, Tools, and Applications
    Blogosphere: Research Issues, Tools, and Applications Nitin Agarwal Huan Liu Computer Science and Engineering Department Arizona State University Tempe, AZ 85287 fNitin.Agarwal.2, [email protected] ABSTRACT ging. Acknowledging this fact, Times has named \You" as the person of the year 2006. This has created a consider- Weblogs, or Blogs, have facilitated people to express their able shift in the way information is assimilated by the indi- thoughts, voice their opinions, and share their experiences viduals. This paradigm shift can be attributed to the low and ideas. Individuals experience a sense of community, a barrier to publication and open standards of content genera- feeling of belonging, a bonding that members matter to one tion services like blogs, wikis, collaborative annotation, etc. another and their niche needs will be met through online These services have allowed the mass to contribute and edit interactions. Its open standards and low barrier to publi- articles publicly. Giving access to the mass to contribute cation have transformed information consumers to produc- or edit has also increased collaboration among the people ers. This has created a plethora of open-source intelligence, unlike previously where there was no collaboration as the or \collective wisdom" that acts as the storehouse of over- access to the content was limited to a chosen few. Increased whelming amounts of knowledge about the members, their collaboration has developed collective wisdom on the Inter- environment and the symbiosis between them. Nonetheless, net. \We the media" [21], is a phenomenon named by Dan vast amounts of this knowledge still remain to be discovered Gillmor: a world in which \the former audience", not a few and exploited in its suitable way.
    [Show full text]
  • Web Spam Taxonomy
    Web Spam Taxonomy Zolt´an Gy¨ongyi Hector Garcia-Molina Computer Science Department Computer Science Department Stanford University Stanford University [email protected] [email protected] Abstract techniques, but as far as we know, they still lack a fully effective set of tools for combating it. We believe Web spamming refers to actions intended to mislead that the first step in combating spam is understanding search engines into ranking some pages higher than it, that is, analyzing the techniques the spammers use they deserve. Recently, the amount of web spam has in- to mislead search engines. A proper understanding of creased dramatically, leading to a degradation of search spamming can then guide the development of appro- results. This paper presents a comprehensive taxon- priate countermeasures. omy of current spamming techniques, which we believe To that end, in this paper we organize web spam- can help in developing appropriate countermeasures. ming techniques into a taxonomy that can provide a framework for combating spam. We also provide an overview of published statistics about web spam to un- 1 Introduction derline the magnitude of the problem. There have been brief discussions of spam in the sci- As more and more people rely on the wealth of informa- entific literature [3, 6, 12]. One can also find details for tion available online, increased exposure on the World several specific techniques on the Web itself (e.g., [11]). Wide Web may yield significant financial gains for in- Nevertheless, we believe that this paper offers the first dividuals or organizations. Most frequently, search en- comprehensive taxonomy of all important spamming gines are the entryways to the Web; that is why some techniques known to date.
    [Show full text]
  • Robust Detection of Comment Spam Using Entropy Rate
    Robust Detection of Comment Spam Using Entropy Rate Alex Kantchelian Justin Ma Ling Huang UC Berkeley UC Berkeley Intel Labs [email protected] [email protected] [email protected] Sadia Afroz Anthony D. Joseph J. D. Tygar Drexel University, Philadelphia UC Berkeley UC Berkeley [email protected] [email protected] [email protected] ABSTRACT Keywords In this work, we design a method for blog comment spam detec- Spam filtering, Comment spam, Content complexity, Noisy label, tion using the assumption that spam is any kind of uninformative Logistic regression content. To measure the “informativeness” of a set of blog com- ments, we construct a language and tokenization independent met- 1. INTRODUCTION ric which we call content complexity, providing a normalized an- swer to the informal question “how much information does this Online social media have become indispensable, and a large part text contain?” We leverage this metric to create a small set of fea- of their success is that they are platforms for hosting user-generated tures well-adjusted to comment spam detection by computing the content. An important example of how users contribute value to a content complexity over groupings of messages sharing the same social media site is the inclusion of comment threads in online ar- author, the same sender IP, the same included links, etc. ticles of various kinds (news, personal blogs, etc). Through com- We evaluate our method against an exact set of tens of millions ments, users have an opportunity to share their insights with one of comments collected over a four months period and containing another.
    [Show full text]
  • Spam in Blogs and Social Media
    ȱȱȱȱ ȱ Pranam Kolari, Tim Finin Akshay Java, Anupam Joshi March 25, 2007 ȱ • Spam on the Internet –Variants – Social Media Spam • Reason behind Spam in Blogs • Detecting Spam Blogs • Trends and Issues • How can you help? • Conclusions Pranam Kolari is a UMBC PhD Tim Finin is a UMBC Professor student. His dissertation is on with over 30 years of experience spam blog detection, with tools in the applying AI to information developed in use both by academia and systems, intelligent interfaces and industry. He has active research interest robotics. Current interests include social in internal corporate blogs, the Semantic media, the Semantic Web and multi- Web and blog analytics. agent systems. Akshay Java is a UMBC PhD student. Anupam Joshi is a UMBC Pro- His dissertation is on identify- fessor with research interests in ing influence and opinions in the broad area of networked social media. His research interests computing and intelligent systems. He include blog analytics, information currently serves on the editorial board of retrieval, natural language processing the International Journal of the Semantic and the Semantic Web. Web and Information. Ƿ Ȭȱ • Early form seen around 1992 with MAKE MONEY FAST • 80-85% of all e-mail traffic is spam • In numbers 2005 - (June) 30 billion per day 2006 - (June) 55 billion per day 2006 - (December) 85 billion per day 2007 - (February) 90 billion per day Sources: IronPort, Wikipedia http://www.ironport.com/company/ironport_pr_2006-06-28.html ȱȱǵ • “Unsolicited usually commercial e-mail sent to a large
    [Show full text]
  • Advances in Online Learning-Based Spam Filtering
    ADVANCES IN ONLINE LEARNING-BASED SPAM FILTERING A dissertation submitted by D. Sculley, M.Ed., M.S. In partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science TUFTS UNIVERSITY August 2008 ADVISER: Carla E. Brodley Acknowledgments I would like to take this opportunity to thank my advisor Carla Brodley for her patient guidance, my parents David and Paula Sculley for their support and en- couragement, and my bride Jessica Evans for making everything worth doing. I gratefully acknowledge Rediff.com for funding the writing of this disserta- tion. D. Sculley TUFTS UNIVERSITY August 2008 ii Abstract The low cost of digital communication has given rise to the problem of email spam, which is unwanted, harmful, or abusive electronic content. In this thesis, we present several advances in the application of online machine learning methods for auto- matically filtering spam. We detail a sliding-window variant of Support Vector Machines that yields state of the art results for the standard online filtering task. We explore a variety of feature representations for spam data. We reduce human labeling cost through the use of efficient online active learning variants. We give practical solutions to the one-sided feedback scenario, in which users only give la- beling feedback on messages predicted to be non-spam. We investigate the impact of class label noise on machine learning-based spam filters, showing that previous benchmark evaluations rewarded filters prone to overfitting in real-world settings and proposing several modifications for combating these negative effects. Finally, we investigate the performance of these filtering methods on the more challenging task of abuse filtering in blog comments.
    [Show full text]
  • 2 Uncovering Social Network Sybils in the Wild
    Uncovering Social Network Sybils in the Wild ZHI YANG, Peking University 2 CHRISTO WILSON,UniversityofCalifornia,SantaBarbara XIAO WANG, Peking University TINGTING GAO,RenrenInc. BEN Y. ZHAO,UniversityofCalifornia,SantaBarbara YAFEI DAI , Peking University Sybil accounts are fake identities created to unfairly increase the power or resources of a single malicious user. Researchers have long known about the existence of Sybil accounts in online communities such as file-sharing systems, but they have not been able to perform large-scale measurements to detect them or measure their activities. In this article, we describe our efforts to detect, characterize, and understand Sybil account activity in the Renren Online Social Network (OSN). We use ground truth provided by Renren Inc. to build measurement-based Sybil detectors and deploy them on Renren to detect more than 100,000 Sybil accounts. Using our full dataset of 650,000 Sybils, we examine several aspects of Sybil behavior. First, we study their link creation behavior and find that contrary to prior conjecture, Sybils in OSNs do not form tight-knit communities. Next, we examine the fine-grained behaviors of Sybils on Renren using clickstream data. Third, we investigate behind-the-scenes collusion between large groups of Sybils. Our results reveal that Sybils with no explicit social ties still act in concert to launch attacks. Finally, we investigate enhanced techniques to identify stealthy Sybils. In summary, our study advances the understanding of Sybil behavior on OSNs and shows that Sybils can effectively avoid existing community-based Sybil detectors. We hope that our results will foster new research on Sybil detection that is based on novel types of Sybil features.
    [Show full text]
  • The SAGE Handbook of Web History by Niels Brügger, Ian
    38 Spam Finn Brunton THE COUNTERHISTORY OF THE WEB what the rules are, and who’s in charge. And the first conversation, over and over again: This chapter builds on a larger argument what exactly is ‘spam?’. Briefly looking at about the history of the Internet, and makes how this question got answered will bring us the case that this argument has something to the Web and what made it different. useful to say about the Web; and, likewise, Before the Web, before the formalization that the Web has something useful to say of the Internet, before Minitel and Prestel about the argument, expressing an aspect of and America Online, there were graduate stu- what is distinctive about the Web as a tech- dents in basements, typing on terminals that nology. The larger argument is this: spam connected to remote machines somewhere provides another history of the Internet, a out in the night (the night because comput- shadow history. In fact, following the history ers, of course, were for big, expensive, labor- of ‘spam’, in all its different meanings and intensive projects during the day – if you, a across different networks and platforms student, could get an account for access at all (ARPANET and Usenet, the Internet, email, it was probably for the 3 a.m. slot). Students the Web, user-generated content, comments, wrote programs, created games, traded mes- search engines, texting, and so on), lets us sages, and played pranks and tricks on each tell the history of the Internet itself entirely other. Being nerds of the sort that would stay through what its architects and inhabitants up overnight to get a few hours of computer sought to exclude.
    [Show full text]
  • Social Software: Fun and Games, Or Business Tools?
    Social software: fun and games, or business tools? Wendy A. Warr Wendy Warr & Associates Abstract. This is the era of social networking, collective intelligence, participation, collaborative creation, and bor- derless distribution. Every day we are bombarded with more publicity about collaborative environments, news feeds, blogs, wikis, podcasting, webcasting, folksonomies, social bookmarking, social citations, collab- orative filtering, recommender systems, media sharing, massive multiplayer online games, virtual worlds, and mash-ups. This sort of anarchic environment appeals to the digital natives, but which of these so-called ‘Web 2.0’ technologies are going to have a real business impact? This paper addresses the impact that issues such as quality control, security, privacy and bandwidth may have on the implementation of social net- working in hide-bound, large organizations. Keywords: blogs; digital natives; folksonomies; internet; podcasts; second life; social bookmarking; social networking; social software; virtual worlds; Web 2.0; wikis 1. Introduction1 Fifty years ago information was stored on punch cards. SDI services appeared about 10 years later and databases were available online from about 1978. In 1988 PCs were in common use and by 1998 the web was being used as a business tool. The web of the 1990s might be thought of as ‘Web 1.0’, for now in 2008 there is much hype about Web 2.0, but what does that mean? Web 2.0 is an umbrella term for a number of new internet services that are not necessarily closely related. Indeed, some people feel that Web 2.0 is not a valid overall title for these technologies. A reductionist view is that of a read–write web and lots of people using it.
    [Show full text]
  • D2.4 Weblog Spider Prototype and Associated Methodology
    SEVENTH FRAMEWORK PROGRAMME FP7-ICT-2009-6 BlogForever Grant agreement no.: 269963 BlogForever: D2.4 Weblog spider prototype and associated methodology Editor: M. Rynning Revision: First version Dissemination Level: Public Author(s): M. Rynning, V. Banos, K. Stepanyan, M. Joy, M. Gulliksen Due date of deliverable: 30th November 2011 Actual submission date: 30th November 2011 Start date of project: 01 March 2011 Duration: 30 months Lead Beneficiary name: CyberWatcher Abstract: This document presents potential solutions and technologies for monitoring, capturing and extracting data from weblogs. Additionally, the selected weblog spider prototype and associated methodologies are analysed. D2.2 Report: Weblog Spider Prototype and Associated Methodology 30 November 2011 Project co-funded by the European Commission within the Seventh Framework Programme (2007-2013) The BlogForever Consortium consists of: Aristotle University of Thessaloniki (AUTH) Greece European Organization for Nuclear Research (CERN) Switzerland University of Glasgow (UG) UK The University of Warwick (UW) UK University of London (UL) UK Technische Universitat Berlin (TUB) Germany Cyberwatcher Norway SRDC Yazilim Arastrirma ve Gelistrirme ve Danismanlik Ticaret Limited Sirketi (SRDC) Turkey Tero Ltd (Tero) Greece Mokono GMBH Germany Phaistos SA (Phaistos) Greece Altec Software Development S.A. (Altec) Greece BlogForever Consortium Page 2 of 71 D2.2 Report: Weblog Spider Prototype and Associated Methodology 30 November 2011 History Version Date Modification reason Modified
    [Show full text]