<<

MASARYKOVA UNIVERZITA F}w¡¢£¤¥¦§¨  AKULTA INFORMATIKY !"#$%&'()+,-./012345

Blogosphere Player

MASTER’S THESIS

Martin Damek

Brno, Spring 2012 Declaration

Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source.

Advisor: doc. RNDr. Tomáš Pitner, Ph.D. & Michael Derntl, Ph.D.

ii Acknowledgement

It is a pleasure to thank those who made this thesis possible, primarily both my supervisors – Michael Derntl and Tomáš Pitner, whose guidance and support was very helpful. Furthermore, I would like to offer my regards to all of those who supported me in any respect during the completion of the thesis.

iii Abstract

Blogs are very popular and widespread type of websites that can be found on the . Along with other services of Web 2.0, they pro- vide easy to use means of content publishing for any user of the Internet. The thesis begins with introducing , what benefits they offer, and how they have contributed to the information revolution we are currently ex- periencing. Moreover, there are many attempts to utilize blogs in educa- tion, which is also discussed in one of the chapters. For teachers, it is im- portant to understand how the blogosphere ( entries, comments, con- nections between learners) evolves over time. A convenient application for blogosphere visualization might be very helpful. The thesis describes how to develop such an application that parses server logs (if available) and RSS/ feeds, extracts temporal information, and visualizes the net- work of visits, links between blogs, postings, and comments within a given set of bloggers.

iv Keywords blogosphere player, blogs, blogs in education, information visualization, blogosphere visualization, blogosphere analysis

v Contents

1 Introduction ...... 3 2 The Path to the Information Revolution ...... 5 2.1 The Rise of the Internet ...... 5 2.2 The World Wide Web ...... 7 2.3 Web 2.0 ...... 8 3 The Blogging Phenomenon ...... 11 3.1 Origins ...... 12 3.2 Key Elements of Blogs ...... 12 3.3 Usage and Popularity of Blogs ...... 14 4 Blogs in Education ...... 16 4.1 Change in Learning Process ...... 16 4.2 Introducing Blogs to Education ...... 18 5 Requirements Analysis ...... 22 5.1 Requirements ...... 22 5.2 Use Case Diagram ...... 24 5.3 Web Feeds ...... 25 5.3.1 RSS ...... 25 5.3.2 Atom ...... 27 5.4 LMS Server Logs ...... 28 5.5 Selection of Technologies ...... 29 5.5.1 Java ...... 29 5.5.2 Prefuse ...... 30 6 Design ...... 33 6.1 Application Architecture ...... 33 6.2 Blogosphere Model ...... 34 6.3 Players and Displays ...... 36 6.4 Input and Output ...... 41 6.5 Graphical User Interface ...... 41 7 Implementation ...... 45 7.1 Setting Up a Visualization Display ...... 45 7.2 Graph View ...... 47 7.3 Timeline View ...... 49

1 7.4 Blog View and Cloud ...... 51 7.5 Filters ...... 53 8 Application Usage Scenarios ...... 54 8.1 Visualization with Server Logs ...... 54 8.2 Visualization without Server Logs ...... 56 9 Conclusion ...... 58 Bibliography ...... 60 A Class Diagrams ...... 63

2 Chapter 1 Introduction

Blogging has been an ongoing phenomenon of the Internet for several past years. Today, with state-of-the-art software, keeping a blog has never been easier. Virtually anyone can easily establish his own blog and publish all sorts of information – ideas, thoughts, or even perhaps photos and video – with potentially thousands of readers. Moreover, large potential of blogs in means of sharing information and knowledge, communication, collaboration, or reflection can be utilized in education. However, the results of reported attempts are rather ambiguous. Therefore, it is very important to evaluate how the blogosphere – set of students’ blogs – evolves over time because proper understanding of bloggers’ behavior could provide valuable answers – what was good and bad, what to do better, what mistakes to avoid next time, or perhaps cancel using blogs if results are not satisfactory at all. The right tool for the blogosphere visualization might be very useful in providing these answers. The goal of the thesis is to develop an application that parses server logs and RSS/Atom feeds, extracts temporal information, and visualizes the network of visits, postings, and comments within a given set of bloggers. The visualization should represent one spot on the timeline; however, it should be possible to animate the visualization based on a ’time slider’, or playing the animation at a desired speed, rewinding, forwarding, pausing, etc. Following three chapters cover the theoretical background. The first one describes how information and communication technologies, the Internet and the World Wide Web have caused the information revolution – not only in means of accessing information, but also in means of publishing information and giving voice to any individual with an access to the Internet. One way how to easily publish articles, photos, and video is using a blog. That is the topic of the next chapter. It describes what a blog

3 1. INTRODUCTION is, how it works, what the key elements are, and how much popular blogs are. Next chapter takes a closer look at introducing blogs into education and reveals what the potential benefits are. Next four chapters describe how the application – Blogosphere player – was created. One chapter focuses on requirements and analysis, the next one presents design of the application, another chapter describes implementation phase, and the last one presents basic scenarios how to use the application. The very last chapter of the thesis summarizes the achievements and verifies if all defined goals have been reached and all requirements have been satisfied.

4 Chapter 2 The Path to the Information Revolution

Modern technologies, especially information and communications tech- nologies, along with the Internet as their most significant representative, influence and change our lives more than we can sometimes realize. They have changed the way we live, work, find information, buy things, and spend our free time. We can communicate with people on the other side of the planet, we can share photos and videos with millions of people, and a huge amounts of information are literally a few clicks away.

2.1 The Rise of the Internet

The impact of the Internet and the World Wide Web as one of its best- known services is often compared to the invention of the Gutenberg’s printing press in the mid-fifteenth century. It enabled a mass production and spread of printed books, and therefore spread of knowledge and information to general public. Before the invention of the printing press, books had to be laboriously transcribed from one to the other, and so books were very expensive commodity available only for the wealthiest people or certain institutions, e.g. medieval universities and monasteries. Written knowledge was practically inaccessible for common people. Increase of learning and literacy amongst the middle class in the late medieval times led to an increased demand for books, but process of hand- copying was simply too much time-consuming. High demand was just a perfect cause for inventing a new way how to create books. Eventually, it changed the whole society in times to come. According to Eisenstein (1999, p. 3): Acknowledgement of the printing press revolution is somehow low and inauspicious among the historians, and yet sociologists recognize cataclysmic effect on society of inventions of new media for the

5 2. THE PATH TO THE INFORMATION REVOLUTION

transmission of information among persons. The development of writing and later the development of printing, are examples.

Today, due to the expansion of the Internet, the proposition about large impact of new media on society is verified once more. Information and knowledge is even more available and quickly accessible practically for anybody living in the modern society. American writer and futurist Alvin Toffler discusses this digital revolution in his works (Toffler, 1990) and compares its significance to industrial revolution in the nineteenth cen- tury.

Information technology has enabled us to create and save documents in electronic form. Electronic documents can be easily distributed over the Internet, where distances play virtually no role. Such documents can be copied many times and financial costs are much lower, sometimes essentially zero, in comparison with printing, distribution and storage of physical documents. If printing press enabled liberalization and de- mocratization of access to printed information, then the Internet did the same thing with digital information. But on top of that, it also enabled liberalization and democratization of publishing information.

Nevertheless, at the beginning, the impact of computers and computer networks on the general public was not so significant. Computers were very expensive and huge machines occupying entire rooms, and thus they were usually owned only by some larger companies and government institutions such as universities – an interesting analogy with the period before the printing press invention. It was just a further development of these and other related technologies that provided foundations for the information revolution.

Expansion of relatively cheap personal computers and their connection to the global Internet network was a necessary step to the revolution. In 2011, International Telecommunication Union published statistical data (ITU, 2011) that describe a current state of usage and availability of the Internet. According to the data, of 1.8 billion households worldwide, one third have Internet access compared to only one fifth in year 2006. More precisely, number of Internet users reached 35% of the entire population of the planet in 2011, while it was only 18% in 2006. Percentage of Internet users in population is, of course, greater in more developed countries, but it is quite surprising that in absolute numbers there are more Internet users in developing countries. In 2006 it was quite the opposite.

6 2. THE PATH TO THE INFORMATION REVOLUTION

The access to the Internet is no longer a privilege only for highly developed countries, and thus with higher percentage of Internet users we can expect further and more significant changes in many areas of human life all over the world.

2.2 The World Wide Web

Next step to the information revolution started with proposing a concept of the World Wide Web by Tim Barners Lee in the early nineties. It changed the Internet once and for all. Until then, electronic documents in the Internet were isolated islands. They were stored in directories and they were usually very hard to find. Of course, there were also lists pointing to those directories and documents themselves, and search programs like Gopher were able to search through large and organized collection of files, but it was all only text-based (Vaughan, 2011).

Connecting isolated documents into a large network was a very simple, but powerful idea. Combination of text and links created hypertext, corner stone of the entire World Wide Web. Three fundamental technologies were created, as Tim Berners-Lee describes in his book Weaving the Web (1999):

• unique identifiers of documents on the Web (UDI – Uniform Docu- ment Identifier, later transformed to URI – Uniform Resource Iden- tifies and URL – Uniform Resource Locator);

• markup language for creating hypertext documents (HTML – Hy- pertext Markup Language);

• protocol for transferring documents in the Internet (HTTP – Hyper- Text Transfer Protocol).

With these technologies, cyberspace was no longer an environment of isolated fragments, but it was transformed into the system, where nodes of information are interconnected one with the other, and they literally create a web of interlinked hypertext documents.

This solution is very important because it provides better accessibility of information in the Internet. Instead of using catalogs, search engines can crawl the Web and index every web page which they come across.

7 2. THE PATH TO THE INFORMATION REVOLUTION

Only requirement is a hyperlink pointing to the web page (or information resource) from another page in the Web.

Apparently, a web page with many incoming hyperlinks, also called in- bound links or backlinks, will probably contain an interesting or valuable content because many people decided that it is worth of linking to. There- fore, number of incoming links may represent a valuable attribute how to estimate page popularity or relevancy. In fact, Google’s algorithm, called Google PageRank (PageRank, 2012), takes into consideration number of backlinks as criteria to determine position of web pages in search results. More backlinks usually means a better position in search results, and thus more traffic to the web page.

The idea of backlinks, and linked web pages in general, has interesting consequences. Quality of content is not decided by critics, reviewers or some specialized institutions, but it is determined by web page popularity based on number of backlinks. And these backlinks are in most cases established by all other creators of web content. In other words, Internet users decide what is important.

2.3 Web 2.0

World Wide Web has provided new, fast, and cheap ways how to elec- tronically publish documents, articles, messages, or any other type of information. It has also provided similarly convenient way how to ac- cess this information. Suddenly, we have an incredibly large amount of information within our reach, and we have been literally flooded by gigabytes of data. However, most of us have been more or less only passive audience. The reason for this behavior is quite simple. Unlike today, publishing something online was much more difficult.

Let’s take an example of creating web pages. In addition to knowledge of HTML language itself for creating HTML documents, there are many technologies that a web designer should master. He might use Cascading Style Sheets (CSS) for describing presentation semantics, some program- ming and scripting languages both server-side (e.g. PHP or ASP) and client-side (JavaScript), or SQL database systems for storing data. If he wants to add a nice look to his web pages than knowledge how to use a graphic editor might come handy. Of course, he can also use a WYSI-

8 2. THE PATH TO THE INFORMATION REVOLUTION

WYG editor for creating web pages, which can make work a little easier, but it is still quite difficult.

In the earlier times of the Web, there were many obstacles that could dis- courage people from publishing something online. Today, it is completely different story. With so-called Web 2.0 applications and services, online content publishing has become much more easier, and practically anyone with an access to the Internet can do it. Usually, no special knowledge of programming, scripting, coding HTML, or low-level technologies in general is required.

One of Web 2.0 fundamental ideas is harnessing collective intelligence through network applications. Tim O’Reilly (2009) – a man with whom a term Web 2.0 is closely associated – calls it “crowdsourcing” and asserts that “a large group of people can create a collective work whose value far exceeds that provided by any of the individual participants.”

Those network applications usually provide very simple and easy to use graphical interface, which facilitates process of sending user-generated content to the Web. Web pages have changed from static documents to an interactive medium, where users are becoming active participants in online content creation. They can collaborate on shared knowledge sites such as Wikipedia, post messages to discussion boards, show photos to the others thanks to photo sharing sites like Flickr or Picassa, or even post video on Youtube, Dailymotion, or other similar websites. Moreover, they can use social network sites like Facebook, Google+, or Twitter to share they thoughts and other personal information with their friends and other people.

When using mentioned websites, users can create, publish, and share information with others, but their contributions become only a part of a larger whole, and they possess control only over their own content. Moreover, options to change website appearance and personalize it to better reflect their individuality and needs are only limited.

On the one hand, centralization of content, producers, and consumers on a single website may bring many advantages, but on the other hand, many users may find it insufficient in comparison with ownership of their own website. Actually, it is quite similar to difference between perception of private and public property. Many large websites, where thousands of users can contribute and express themselves, can be perceived as public space, even if, in fact, they are owned by private owners. And if someone

9 2. THE PATH TO THE INFORMATION REVOLUTION wants to have a space owned just by himself, then having a blog, another popular phenomenon of Web 2.0, might be a perfect solution.

10 Chapter 3 The Blogging Phenomenon

A blog (sometimes also weblog) is a specific type of website with charac- teristics of a journal or personal diary located in the World Wide Web. It contains discrete entries, also called posts, which are often organized in reverse chronological order. This way, the earliest entry is placed on the top. Visitors may usually express their opinions or thoughts by posting comments to blog entries.

In most cases, a blog is maintained by a single individual, but it is also possible to have more contributors. Users who write blogs are called bloggers, and all blogs altogether form the blogosphere.

Some authors consider as blogs only those websites that have a certain type of software, called blog software, running in the background (Hill 2006, pp. 10–11). Hill also agrees that blog software has facilitated process of adding new content to a website because many blogging programs offer ready-to-use services that cut out the laborious and technical traditional process of building a website and adding new pages to it.

However, in more general terms we can consider as blogs also websites without this type of software. In fact, they can even be built from scratch, and bloggers can use traditional ways of writing pages in HTML editor and publishing them via FTP client. Although Technorati – an Internet search engine for searching and indexing blogs – states in its annual report that Google’s and Wordpress are most popular blog sites, it also takes other types of blogs into consideration, even blogs built from scratch (Technorati, 2011). Hence, recognizing a blog is more about its content and how it is organized, about website design, and functionality provided to readers.

11 3. THE BLOGGING PHENOMENON

3.1 Origins

The word ’blog’ has its origin in 1997, when called his site a ’weblog’ to describe a collection of links logged from the Internet. In 1999, another user, Peter Merholz jokingly broke the word into ’we blog’, and announced that he was going to pronounce it ’wee-blog’. Afterwards, it was inevitably shortened to simply ’blog’ (Blood, 2000; Wortham, 2007). The new term started to be used both as a noun – a website – and verb – to post or update an entry on someone’s blog. Soon, another term ’blogger’ appeared to indicate a person who is writing a blog.

Rebecca Blood (2000), the owner of one of the oldest weblogs, describes the beginnings of blogging: The original weblogs were link-driven sites. Each was a mixture in unique proportions of links, commentary, and personal thoughts and essays. Weblogs could only be created by people who already knew how to make a website. A weblog editor had either taught herself to code HTML for fun, or, after working all day creating commercial websites, spent several off-work hours every day surfing the web and posting to her site. These were web enthusiasts.

Today, with sophisticated blogging software and new online publishing tools writing a blog is not reserved only for technically proficient people, but virtually anyone can do it, even kids. This simplicity is one the major causes why the blogging has become so popular.

3.2 Key Elements of Blogs

In most cases, blogs contain several key elements that are more or less the same on every blog. Let’s take a closer look:

Home page: Also called index page. It is the most important page of every blog because it is usually the first thing that a user sees when he visits a blog. On the top, there will be a title of blog, and then several most recent entries – whole or shortened – will follow. They are usually displayed in reversed chronological order. It is not necessary to show all entries because the older ones can be accessed through links that are pointing to them. Home page usually contains several other components that can be added and

12 3. THE BLOGGING PHENOMENON

modified by blog owner. There is often a menu with links to the rest of entries.

Entries: Also called posts. They represent main content of every blog that is created by its owner. They look like short articles or messages, and every post has a title and exact date and time when it was written. Entries are usually text-based, in fact hypertext, but they can also contain photos or video. With modern blogging software, the process of posting new entries is very similar to sending an e-mail or posting a message on discussion board. Everything you need to do is open a compose screen, write an entry, and click post button. Blogging software will do the rest. It will create a new page with the entry, establish links, and update the home page. Indeed, even if the entire entry is published on the index page, a stand-alone page with a unique URL – often called – is created for the entry itself. This way it is possible to share links to individual entries.

Comments: Although a blog is in the first place a tool for self-expression of its owner, it is also possible for other visitors to contribute with their thoughts and ideas by posting comments. Comments are usually displayed under every blog post. Moreover, blog owner can often decide to enable or disable posting comments. If comments are disabled, then blog will become read-only for visitors. One way or another, comments are essential part of the blogosphere.

Tags or Categories: Bloggers may assign sets of keywords to blog entries that help to identify, categorize, and eventually locate individual entries. All categories are often displayed in sidebar and provide links to entries to which they have been assigned, or they can form a tag cloud with similar functionality.

Blogroll: Basically, it is a list of links to other blogs that are favored by the blog owner. It is a nice feature that transforms a set of separated blogs into an interconnected network. It is especially favorable to have a link at another very popular blog because it can generate a lot of traffic.

Trackbacks: They are links pointing to a blog entry from other place in the Internet. This feature helps authors to keep track of who is linking to their articles because they are usually notified when such a link is created.

13 3. THE BLOGGING PHENOMENON

Web Feeds: is an XML-based data format used for notifying users of new or updated content, for example when new entry has been posted. Content distributor syndicates a web feed, which is linked and accessible from blog web pages, and users can subscribe to it. It is a very useful feature because users do not have to check websites for a new content, but instead, they are informed of new posts through a feed aggregator. The two main web feed formats are RSS and Atom.

3.3 Usage and Popularity of Blogs

By the end of 2011, NM Incite company (formerly known as BlogPulse), tracked over 181 million blogs around the world, up from 36 million only five years earlier in 2006 (NM Incite, 2012). According to NM Incite, “Three out of the top 10 social networking sites in the U.S. – Blogger, WordPress and Tumblr – are for consumer-generated blogs. Blogger is the largest of these sites, and Tumblr was the fastest-growing blog site by the end of 2011.” Based on the given data, we can assume that blogging maintains its popularity, or it is becoming even more popular then before; however, there are also opposing views suggesting that blogging is in decline because of new social networking sites like Facebook, Twitter, or Google+ (Lenhart, 2010; Mims, 2011). Mentioned sites are mostly used by teenagers and young adults, and therefore shift from blogs to social networking sites should be most significant amongst young people. According to study, “In 2006, 28% of teens ages 12-17 and young adults ages 18-29 were bloggers, but by 2009 the numbers had dropped to 14% of teens and 15% of young adults. During the same period, the percentage of online adults over thirty who were bloggers rose from 7% blogging in 2006 to 11% in 2009” (Lenhart, 2010). This behavior may be explained by comparing similar features of blogs and social networking sites. Apparently, maintaining a blog and posting entries on a regular basis may be more challenging activity. In many cases, writing a blog is similar to writing a journal. For many people, it has been a tool for self-expression and connecting with other people, often friends. However, same functionality is provided by social networking

14 3. THE BLOGGING PHENOMENON sites, but it is somewhat easier and more convenient because it is more similar to behavior, communication, and interaction between people in the real world. It does not mean the end to the blogging, but it has slightly changed the way how blogs are used. Today, users can decide not to start a blog because Facebook or Goggle+ is just the right tool for their needs, and on the other hand, someone may find social networking sites insufficient and blogging can be a better choice. There are many reasons for blogging. Technorati (2011) presents several major options: • To share expertise and experience with others. • To speak mind on areas of interest. • To become more involved with passion areas. • To meet and connect with like minded people. • To gain professional recognition. • To advance career. • To make money or supplement income. • To get published or featured in traditional media. • To attract new clients for business. • To keep friends and family updated on blogger’s life. Blogs have been here for some time, and surely, they will be here in the near future. We encounter them practically all the time when browsing the Web because they are really ubiquitous.

15 Chapter 4 Blogs in Education

The Internet and other modern technologies have become important part of your lives as it was described in previous two chapters. Some of us have learned to exploit vast range of advantages they provide to us, use them for our own benefits, and incorporate them into our everyday lives, while others – especially older generations – remain somewhat reluctant and do not accept new technologies so easily. While we can argue about implications of new technologies to our lives, if it is good, or perhaps not, one thing remains clear – new generation takes new technologies as granted, and young people can hardly imagine their lives without computers, cell phones, the Internet, and the Web. They are often called ’digital natives’ (Palfrey, 2008, p. 1) because they have been living most of their lives, or even whole life, surrounded with digital technologies. Palfrey describes them in the book Born Digital as follows: “They were all born after 1980, when social digital technologies, such as and bulletin board systems, came online. They all have access to networked digital technologies. And they all have the skills to use those technologies.” Education system should reflect this situation, but question is how to do it, and what technologies should be used in education process? When to use them and, just as important, when not to use them. It is not easy to give a right answer. In many cases, it will virtually be a method of trial and error.

4.1 Change in Learning Process

The most important aspect that should be recognized by schools and teachers is an overall change in learning process of digital natives. The Internet has changed the way how information is searched, accessed, and

16 4. BLOGSIN EDUCATION processed. And, of course, subsequently it has changed the way how young people learn, not only in school, but every time when they deep into the cyberspace.

Learning of digital natives is much different from learning of people in previously ’analog’ world. Younger people may find traditional printed media obsolete because they are used to an interactive and multimedia world of the Internet. Visiting library and searching for some information in rooms full of books may seem too much complicated in compare to going online from their own living room. It does not mean that digital natives do not read books or newspapers, but they do it much less often – at least most of them.

Tho most significant characteristics of current trends may be summarized as follows (Palfrey, 2008, pp. 237–253):

• digital media over traditional media;

• Google searching instead of visiting library;

• potentially large amount of information sources easy to access;

• important and reliable information may be omitted due to many sources;

• belief in trustworthy and reliability of information without further verification;

• less book reading;

• not reading books from cover to cover, but only selected chapters or paragraphs;

• short formats over long formats (text, but audio and video as well);

• reading headlines only;

• scanning web pages instead of reading word by word;

• possible feedback through comments.

Some people – often teachers and parents – may find these patterns of behavior unpleasant and try to enforce traditional ways they have been used to. However, it does not have to be the right solution. Young people are not learning worse or less effective than previous generations. They have just adapted to the new possibilities.

17 4. BLOGSIN EDUCATION

Of course, in the first place there is an attempt to make things easier. Opening a web browser and quickly search requested information in only few minutes is that easy. Therefore, we can not be surprised because making things easier is only natural.

Second, due to a large amount of information it is necessary to quickly browse many sources. It is not possible to read all the articles and pages full of text, and so people scan those pages before they find something really relevant. This behavior is subsequently transferred also to printed media. On the other hand, if a relevant source of information is found, then it is quite common to read it thoroughly. This way, a deep insight into a selected topic may be reached if required.

And finally, having a feedback from other readers, as well as publishing own ideas and opinions may be very important. Actually, it is not only one-way feedback. In fact, it can be a feedback loop (Palfrey, 2008, p. 243) because an interaction may continue several times. For instance, an online magazine or a newspaper publishes an article, which is read by a user. He finds the article interesting and decides to let others know about his opinion. He can post comments to discussion under the article, send a message to his Facebook wall, or he can even write a blog entry. Other users can do the same thing, but they can also react to posted comments or blog entries that were written as reflection on the original article.

4.2 Introducing Blogs to Education

There are many technologies that have a potential to become regular tools for educating young people. Will Richardson (2007) presents the best choices:

• blogs

• wikis

• podcasting

• photo and video sharing

• web feeds

• social networks

18 4. BLOGSIN EDUCATION

Possible benefits of all mentioned technologies are large, but since the topic of this thesis is about blogs, only usage of blogs will be described little further in the following paragraphs. There are several options how to use blogs for educational purposes. Weller (2004) describes three primary purposes: 1. Group blogs: A community blog is established where group of students may contribute to specified subject areas. They can post new entries and discussion arises around these, while educator directs students and chooses blogging topics. 2. Academics keeping blogs: Unlike traditional academic publishing, writing a blog may have several appealing characteristic both for authors and readers. Posted entry is instantly published and author can receive feedback without delay. Moreover, readers will certainly appreciate blogger’s expertise as well as free and easy way of accessing information through online blog. 3. Students using blogs: Blogs can be used to facilitate collaborative learning where students use blogs as a journal or portfolio, demon- strating their thoughts, reflections and discussions on the subject area. If students work on given assignments, they can also use blogs to reflect the work itself, its progress, and obtained results. Integration of student blogs into education has been reported many times (Kim 2008); however, with inconsistent results in terms of the effectiveness of blog use in educational contexts. Some results were more or less successful (e.g. Chang & Farmer, 2007; Chen, 2008; Derntl, 2010), others not (Divitini, 2005). Nevertheless, educators agree that blogs may bring several positive effects if correctly used: • Knowledge and information will not remain only in classrooms, but can be accessed from any place in the world through the Internet. • Other users – not only from academic institutions – can contribute with their ideas and thoughts. They can correct wrong information. • Students can compare their results and ideas with work of others. Blogs enhance collaborative learning and reflective practice (Birney, 2006). • Blogs may serve as convenient tools for communication and collab- oration of students when working on assigned project.

19 4. BLOGSIN EDUCATION

• Posts and comments are automatically archived and categorized. Date and time is assigned to every post, therefore sorting and searching is very simple. • It may be easier for some students to overcome psychological barriers while using blogs in compare to presenting their ideas or work results in real-life classroom. • Writing a blog requires certain skills that are can be developed during blogging, for instance research, organization, and managing information skills – skills in digital literacy in general. But on the other way, there are also possible disadvantages. In the first place, it appears difficult to promote – or even enforce – regular blogging. Many Internet users prefer reading over writing, and so only small number of bloggers is responsible for most posts and blogging activity (Technorati, 2010). According to Technorati’s index, a minority of bloggers are posting daily, or even weekly, and bloggers in the Top 100 generate 36 times more content than the average blogger. Similar behavior can be also observed in educational environment (Farmer, 2007; Derntl, 2010). Second, although majority of students work with digital technologies virtually every day, digital skills of students do not have to be on the same level, and students with limited access to the Internet may not be so enthusiastic about using blogs. Finally, there is also a threat of cyber bulling, especially at lower levels of educational system. Curran (2011) states that “Blogs allow pupils to get in contact with each other for educational reasons yet this could also lead to cyber bulling or other people leaving threatening comments on children’s blogs.” He suggests creation of a private network that only allows members of the school to connect with other bloggers. If someone decides to try new technologies in teaching, in this case blogs, then he should follow Palfrey’s advice (2008, p. 246): The use of technology in teaching makes no sense if it’s just because we think that technology is cool. It’s easy to understand how we get to this place. The thinking goes like this: It’s fun and cool to blog; lots of people are doing it; we know that kids get some information from blogs; therefore, blogging must have a place in our schools. This orientation is a mistake. We should figure out, instead, how the use of technologies can support our pedagogical goals. Blogging might, or might not, be part of the approach we end up taking. The

20 4. BLOGSIN EDUCATION

right way to look at it is to ask whether blogging can meet a need that we have in our teaching. We need to determine what our goals are, as teachers and parents, and then figure out how technology can help us, and our kids, to reach those goals. Summed up, it is not exactly clear if using blogs in education is the right choice. However, advantages seem to overweight disadvantages, and potential gain for students as well as other Internet users – readers of blogs – is very tempting.

21 Chapter 5 Requirements Analysis

This chapter describes requirements to the Blogosphere player application, analyzes what information can be extracted from web feeds and server logs, and presents technologies that will be used. The application should be capable of importing data from web feeds as well as from server logs according to predefined schema. Server logs might not always be available, so it should be possible to create model of the blogosphere from web feeds only. Then, evolution of blogosphere should be visualized and animated over the time. Detailed information about bloggers, blogs, entries, and comments will be available, and it should also be possible to add personal tags and annotations to these items. A list of requirements based on this description is presented in the following section.

5.1 Requirements

1. Importing data: • Import RSS/Atom feeds from locally stored files or download them from the Internet. • Import data from locally stored LMS server logs (CSV or XML files). 2. Displaying blogosphere: • Network view: show blogs (persons) as nodes and the links between blogs as edges. • Timeline view: show blog posts and comments of one person or the whole blogosphere on a timeline (without links between blogs).

22 5. REQUIREMENTS ANALYSIS

• Blog view (optional): show blog as a tree where nodes are posts and comments. • Tag cloud view (optional): show tag cloud of selected blog. 3. Playing and animating blogosphere: • Select specific date and time with a slider. • Play animation. • Pause animation. • Stop animation. • Change speed of animation. 4. Filters: • Filter nodes by selecting users (network view). • Display only edges for selected node by clicking on it – it becomes a context-node (network view). • Filter edges by degree – minimum and maximum (network view). • Filter posts and comments by selecting users and tags (timeline view). 5. Personal tags and annotations: • Add, remove, and edit personal tags of blogs, posts, comments. • Add, remove, and edit annotations of blogs, posts, comments. 6. Details about the blogosphere: • Show overall details of the blogosphere. • Show details of selected blogs. • Show details of selected posts. • Show details of selected comments. 7. Saving and loading: • Save project (blogosphere data) to a file. • Load project (blogosphere data) from a file.

23 5. REQUIREMENTS ANALYSIS

5.2 Use Case Diagram

Use case diagram has been created according to requirements. There is only one actor (user) since the Blogosphere player will be a desktop application without individual user accounts.

Figure 5.1: Use Case Diagram.

24 5. REQUIREMENTS ANALYSIS

5.3 Web Feeds

Web feeds are valuable source of information for blogosphere visualiza- tion. Unlike HTML source code, in which web pages are written, web feeds contain structured and semantic information. In theory, it would be possible to extract data directly from HTML documents, but it would require preliminary analysis of every website before running an extraction program. The two main web feed formats are RSS and Atom.1 Both are XML- based and standardized, and thus perfect for automated data extraction. However, there are several disadvantages: • Web feeds may contain only certain information. It depends on decision and settings implemented by web feed providers because most of XML elements – both in RSS and Atom formats – are only optional. • Web feeds usually contain data only for several latest news (entries, articles, etc.). How many items are included depends on web feed provider. • Web feeds can be full text or partial only. It is necessary to download and backup web feeds on regular basis if complete data are required.

5.3.1 RSS

Currently latest version of RSS standard is 2.0 from year 2003. The root element of every RSS document is with specified version. Subordinate element is exactly one element. It contains all other elements with metadata about channel and its content. Example of RSS feed: My Blog http://exampleblog.org Interesting ideas and thoughts

1. All information about RSS and Atom standards is derived from specification doc- uments (Winer, 2003; Nottingham, 2005).

25 5. REQUIREMENTS ANALYSIS

that came to my mind. Sat, 07 Apr 2012 13:14:03 GMT Blogger [email protected]

My first blog entry http://exampleblog.org/entry01.html Hello, world! Recently, I have started blogging and this is my first blog entry. Sat, 07 Apr 2012 13:12:00 GMT urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a just for fun Channel always has to contain three direct sub-elements: title: The name of the channel. If website contains the same information as RSS file, then title should be the same as the title of the website. link: The URL to the HTML website corresponding to the channel. description: Phrase or sentence describing the channel.

Although channel may contain many other optional elements, only several are really useful for our blogosphere data model, and so only title and link from required elements will be used. If is present, which contains email address for person responsible for editorial content, then it will be used too.

More interesting are sub-elements that contain data about blog content – entries or comments. All sub-elements of every item are optional, but at least one of title or description must be present. Especially useful are these: title: The title of the item. link: The URL of the item. description: The item synopsis – usually whole or partial text of entry or comment. author: Email address of the author of the item.

26 5. REQUIREMENTS ANALYSIS category: Includes the item in one or more categories. guid: A string that uniquely identifies the item. pubDate: Indicates when the item was published.

5.3.2 Atom

Atom is the second web content and metadata syndication feed format. Current version is 1.0 from year 2005. Atom is preferred format of one of the most popular blogging platforms – Google’s Blogger. Similarly as in RSS format, Atom feed has a root element containing sub- elements with information about feed itself and its entries stored in elements. Example of Atom feed: My Blog Interesting ideas and thoughts that came to my mind. 2012-07-04T13:14:03Z John Doe [email protected] urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6

My first blog entry urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a 2012-07-04T13:12:00Z Hello, world! Recently, I have started blogging and this is my first blog entry. just for fun Unlike in RSS feeds, root element contains a little more valuable information. Title and link to website are present, but there is also element that conveys a permanent and universally unique identifier.

27 5. REQUIREMENTS ANALYSIS

Furthermore, there can be more information about author (a person, corporation, or similar entity) of feed and entries because the element has these sub-elements: name: A human-readable name for the person. email: E-mail address associated with the person. uri: It is in fact IRI2 – internationalized resource identifier of the person. Sub-elements of every element are again more or less similar to elements in RSS. Title, link, and author are the same and few other elements are either new or merely renamed: published: Indicates when the entry was published. updated: Indicates when the entry was changed. id: A string that uniquely identifies the entry (IRI). content: Either full text of the entry or link to that content. summary: A text that conveys a short summary, abstract, or excerpt of the entry.

5.4 LMS Server Logs

Although the Blogosphere player application should be able to visualize blogosphere only with data extracted from web feeds, additional data from server logs might be very useful. The original requirements to the application take server logs into consideration. A module on software architectures and web technologies was held at the University of Vienna, where students were working on assigned project. They were using their own blogs for purposes of reflecting on their problems, insights, and contributions during and after their task- related activities (Derntl, 2010). Blogs were externally hosted on Google’s Blogger site (formerly Blogspot), but students were accessing these blogs via centralized blog portal, which was implemented as an extension of the LMS (Learning Management System). This way, it was possible to log other data, namely visits and posts.

2. A generalization of the Uniform Resource Identifier (URI), a World Wide Web resource allowing the use of Unicode.

28 5. REQUIREMENTS ANALYSIS

The Blogosphere player will reflect those data, so it is important to know how the logs look like. Two types of activities have been logged:

1. visits

• user

• date and time

• type

• target user

• target post ID

• target post URL

2. posts

• user

• date and time

• post ID

• post URL

5.5 Selection of Technologies

5.5.1 Java

Using Java or Flash for programming the Blogosphere player application was recommended in the thesis assignment. Personally, Java programming language was an immediate choice because I have more experience with Java than with Flash, but it is essential to verify that Java is a suitable language for purposes of developing highly interactive application for information visualization.

Java provides useful libraries for programming graphical user interfaces and libraries for 2D graphic (Creating a GUI With JFC/Swing, 2012). Together they form JFC (Java Foundation Classes), which is a set of classes for building graphical user interfaces and adding rich graphics functionality and interactivity to Java applications.

29 5. REQUIREMENTS ANALYSIS

Swing GUI Components are classes included in the JFC. They can be used to create variety of things ranging from buttons, combo boxes, check boxes, radio buttons, or sliders, to tables, split panes, or frames. Swing components are extensible, customizable, configurable, and platform- independent because they are not implemented in platform-specific code, but instead, they are written entirely in Java. Java 2D API is another important group of JFC classes. It enables using 2D graphics, text, and images in applications and applets. Java with its libraries is also recommended as suitable programming language in Handbook of Data Visualization (Chen, 2008). However, creating interactive and dynamic visualization based on given requirements could be very difficult, and so another convenient Java library for data visualization could make things a lot easier.

5.5.2 Prefuse

Prefuse is an interactive information visualization toolkit for Java. Most importantly, it implicitly provides capability for creating network graphs, trees, and forests, and also enables visualization of data along a single axis or in two-dimensional scatter-plot. Prefuse is described as follows: Prefuse supports a rich set of features for data modeling, visualiza- tion, and interaction. It provides optimized data structures for tables, graphs, and trees, a host of layout and visual encoding techniques, and support for animation, dynamic queries, integrated search, and database connectivity. Prefuse is written in Java, using the Java 2D graphics library, and is easily integrated into Java Swing applications or web applets. (Prefuse documentation, 2007) Due to described features, Prefuse library is the perfect candidate for facilitating data visualization using Java language. It is important to understand how the library is organized and how it works because the architecture of the Blogosphere player will integrate this library into itself. The design of the prefuse toolkit is based upon the information visual- ization reference model, which is “a software architecture pattern that breaks up the visualization process into a series of discrete steps, from

30 5. REQUIREMENTS ANALYSIS data acquisition and modeling to the visual encoding of data to the presentation of interactive displays” (Prefuse documentation, 2007). The process is illustrated in the diagram 5.2.3

Figure 5.2: Diagram depicting the information visualization reference model.

1. Source data are any data that will be visualized. They can be stored in a file, SQL table, or they can be already present as objects in a computer memory. 2. Data are mapped into data tables in a process of data transformation. Data tables are internal representations of original data as it is to be visualized. Data tables must be precisely specified by data schemas. 3. Data tables are mapped into visual abstraction in a process of visual mapping. Visual abstraction includes all important visual features such as position of visual items, colors, shapes, and sizes. 4. Data in visual abstraction are rendered through a process of view transformation, in which the contents of the visual abstraction are drawn into any number of interactive views. 5. User may interact with visualization causing changes or updates at any stage of the visualization pipeline. For instance, he can highlight or drag items, hide or show selected items, or zoom into a view. A view is represented by prefuse.Display, which is a direct extension of javax.swing.JComponent class. Therefore, the view (display) can be easily integrated into other Swing components. Visual items can be updated directly within the display, and it will be also possible to trigger changes

3. The diagram is a copy from Prefuse webpage.

31 5. REQUIREMENTS ANALYSIS from outside, e.g. by moving a slider. Described characteristic were taken into consideration during the phase of designing the application.

32 Chapter 6 Design

This chapter is focused on design of the Blogosphere player application. It presents class diagrams created according to UML (Unified Modeling Language) principles (Arlow, 2005) and describes them. The application was created in iterative and incremental development process, where indi- vidual components were designed, implemented, and integrated into the application one by one. Original designs have been altered few times dur- ing the process, and so only final versions of diagrams are presented.

In-text diagrams are simplified because of their complexity. Some diagrams will present only the most important attributes and methods, while others may hide them completely. Complete class diagrams are included in the appendix of the thesis.

6.1 Application Architecture

Application structure is shown in figure 6.1 with following descrip- tion: blogosphere: Data model of the blogosphere. Extracted data from RSS feeds, Atom feeds and LMS server logs, as well as personal tags and annotations will be stored in this model. players: Classes responsible for both visualization and animation of the blogosphere. gui: Graphical user interface will be created by instantiating classes in this package. models: Custom models for graphic components such as tables, lists, and combo boxes.

33 6. DESIGN

Figure 6.1: Structure of the application. resources: All icons and text files for internationalization of the application will be stored here. io: Classes responsible for data extraction from feeds and server logs, downloading feeds from the Internet, and classes for saving and loading the project. utils: Auxiliary classes for various purposes, e.g. converting time from string representation to Calendar object.

6.2 Blogosphere Model

Blogosphere package consists of classes that represent fundamental data model of the application. The model is shown in the figure 6.2. Since the attributes are important, they are included in the diagram, while methods are simplified, e.g. standard get() and set() methods are applicable to all attributes. Classes are described as follows:

Blogosphere is the central class of the package. It follows singleton design pattern, and it serves as an access point to other objects – blogs,

34 6. DESIGN

Figure 6.2: Diagram of classes from blogosphere package.

35 6. DESIGN

entries, bloglinks, etc. Singleton pattern was chosen for its several advantages. It ensures that there will be only one instance of the blogosphere model at time, and thus all extracted data will be stored at one place. Also, the data are accessed from many other objects in the application, and so Blogosphere class provides a convenient way how to easily get them via calling static getInstance() method on the class. References to other object like blogs and entries are stored in several collections. Sometimes, there are more collections for just one type of objects because of faster searching, e.g. objects are mapped by ID in one collection and by URL in the other. Blog class represents a single blog. It is identified by its owner – user, ID, and URL. It contains a collection with posts extracted from web feeds and server logs. Entry represents a single post or comment. Since both posts and comments have basically the same attributes and methods, they are represented by the same class. However, timeline visualization displays posts and comments in different color, size, or shape, and so the type of entry must be known. For this purpose, there is a type attribute with either POST or COMMENT value from ItemType enumeration class. Self-reference of entry class indicates that posts (Entry objects with POST type) may contain some comments (Entry objects with COMMMENT type). Author is an analogy to element in Atom feeds. Bloglink represents a connection between blogs. Bookmark is used to store time bookmarks set up by user. ColorTag allows to assign colors to selected sets of tags (categories and personal tags). LookAndFeel is the second class that follows singleton design pattern. Various settings of visualization appearance such as colors, shapes, sizes, etc. are stored in the instance of this class.

6.3 Players and Displays

Based on information from Prefuse manual and documentation, it is pos- sible to create visualization by extending prefuse.Display class. Schemas

36 6. DESIGN for internal data representation are designed, tables are created, data are inserted into these tables, renderers are established, and various other settings are set up inside this class.1 When display is correctly config- ured, then it can be easily incorporated into existing graphic components – frames, panes, etc.

However, in this case, visualization should dynamically change in time, it should be animated. Therefore, I decided to design a class – let us call it player class – responsible for animation, to which the display is encapsulated. The application will have four different views, i.e. four different visualizations, thus there will be also four distinct displays:

• GraphDisplay,

• TimelineDisplay,

• BlogDisplay,

• CloudDisplay.

These classes will be encapsulated in corresponding players classes, but since both BlogDisplay and CloudDisplay are used to visualize individual blogs, where principles of doing so are virtually identical, both displays will be members of the same player class. Summed up, there will be only three players classes:

• GraphPlayer,

• TimelinePlayer,

• BlogPlayer.

General idea of player–display association is shown in diagram 6.3. Dis- play will handle the visualization and player will determine how to update it if required. Updates of visualization are initiated by triggering these events:

• user moves a slider,

• user drags a view itself (in case of timeline view),

• playback is turned on, thus updates are triggered automatically in a loop.

1. More detailed description will be presented in the next chapter.

37 6. DESIGN

Figure 6.3: General idea of player–display design.

Whenever a user switches between views, then relevant display is set up as currently active component of the main GUI frame, and all listed events will affect only the selected view. Each player also contains an inner class called AnimationTask that is used for animating the visualization while it is running in the background of the application. When playback is turned on, it starts a continuous loop that repaints the visualization until stopped. While it is running, messages are sent to the GUI to adjust date and time label as well as slider position. Figure 6.4 shows the activity diagram for this case.

Figure 6.4: Automated playback.

38 6. DESIGN

Figure 6.5 shows the activity diagram when slider is moved by a user. Basically the same activity diagram can be also applied to updating visualization by dragging the timeline view. When user moves the slider, an event is triggered and listener assigned to this event calls a method of currently active player to update the visualization. If AnimationTask was already running, then it is stopped immediately. The player object calculates a new date based upon relative position of the slider and selects items that should be made visible or hidden, then it passes these items to a display object that updates the actual visualization.

Figure 6.5: Updates initiated by slider.

Each player class is designed to work with different type of visualization; however, several attributes and methods can be applied to all of them. Most importantly, the inner class AnimationTask serves the same purpose in all three players: it increments date and time in a loop and calls repaint() method of a corresponding player. Therefore, a superclass was designed, and all three players inherit from this class. But since this class does not have associated any display, there is no sense in instantiating it, thus it has been designed as an abstract class – AbstractPlayer.

The overall design can be seen in the figure 6.6. In addition to the listed classes, there are also several others. In all cases, these classes are inner classes of displays, and they are used for purposes of changing appearance and position of visual items inside the visualization.

39 6. DESIGN

Figure 6.6: Diagram of classes from players package.

40 6. DESIGN

6.4 Input and Output

The package contains classes responsible for input and output of data from external files. Figure 6.7 shows the package structure with classes designed as follows: Extractor is an abstract class from which all other classes for extracting data are subclassed. FeedExtractor is a class for extracting data from web feeds – both RSS and Atom. FeedDownloader is an extension of FeedExtractor. It downloads feeds from provided URLs and parses data afterwards. HttpFetcher is a class used by FeedDownloader. It creates a HTTP GET request, sends it to a web server, and processes the response. LogExtractor is an abstract class with common attributes and methods for both types of LMS server logs extractors. CsvExtractor handles parsing data of CSV files. XmlExtractor handles parsing data of XML files. ImportTask manages process of parsing and importing data to the appli- cation using any of the listed extractors if needed. Since process of importing data can be very time consuming – especially if feeds are being downloaded from the Web, org.jdesktop.application.Task is implemented. Therefore, the task will be running in the background, and GUI will be updated via task monitor, e.g. a progress bar may show how much data has been imported so far. SaveTask saves the blogosphere model to an XML file. LoadTask loades the blogosphere model from previously saved XML file.

6.5 Graphical User Interface

The Blogosphere player is designed as an application based on Swing Ap- plication Framework. The main frame is represented by a class GuiMain, which is an extension of org.jdesktop.application.FrameView. It contains all three players (graph, timeline, blog) and has access to all displays

41 6. DESIGN

Figure 6.7: Diagram of classes from io package.

42 6. DESIGN owned by these players. When particular view is selected by a user, then corresponding player is set as currently active, and display is taken and set as a child component of the split pane, which is another graphical component of the main frame. Conceptual design of the main frame is shown in figure 6.8. Display with a visualization can be found under number 4.

Figure 6.8: Conceptual design of the main GUI frame.

Legend to figure 6.8:

1. menu bar,

2. toolbar (contains buttons for selecting visualization views),

3. tabbed pane with filters,

4. visualization display,

5. playback controls,

6. slider,

7. ’jump to time’ button,

8. speed combo box,

9. date and time labels.

43 6. DESIGN

The right component of the split pane is a visualization display, while the left one contains a tabbed pane with four individual panes – blogs, tags, personal tags, and colors. The first three are used as filters for selecting only desired items, and the fourth is for assigning colors to certain visual items.2 Controls and slider are located at the bottom part of the frame. Control buttons provide functions for playing, pausing, stopping, and changing speed of animation. In fact, play and pause controls are represented by a single two state toggle button. In addition to GuiMain frame, there are also several other classes: GuiBlogosphere displays overall information about the blogosphere. GuiBlog displays information about single or multiple blogs. GuiEntry displays information about single or multiple entries – posts or comments. GuiLookAndFeel is used for adjusting appearance of visualizations. Col- ors, shapes, and sizes of visual items can be altered. GuiImport guides user through process of importing data to the blogo- sphere model. GuiProgress shows progress of importing data process. It contains two progress bars – one for current task and one for all tasks together. GuiBookmarks is used for managing bookmarks with saved dates. GuiColorTag is used for assigning colors to single or multiple tags. GuiDateTime is a simple dialog for setting exact date and time. GuiTimelineSelection is used for selecting items during timeline visual- ization when user selects a visual item that is overlapped by other items. If such a case occurs, then this dialog pops up and user can choose which item to display. GuiHelp provides a brief manual how to understand visualizations and how to operate them. GuiAboutBox shows information about application itself.

2. More details will be presented in the next chapter.

44 Chapter 7 Implementation

This chapter describes several important aspects of the implementation phase, whereas the major focus is put on solutions of particular visual- ization views. But first, it will be described how visualization based on Prefuse toolkit is created in general within the application.

7.1 Setting Up a Visualization Display

Diagram 7.1 shows how classes in Prefuse library are organized.1 Firstly, schemas for internal data representation must be created. Data are stored in tables, no matter if they represent network graph, tree, or scatter plot. A schema is created by instantiating prefuse.data.Schema class and adding desired columns. Columns always have their name and data type. Afterwards, actual tables are constructed from the schemas and filled with data from the blogosphere model. In contrast to the the diagram 7.1, no library class from Prefuse toolkit is used for setting up data tables. The blogosphere model is more general data structure with various attributes without relevance to visualization itself. Therefore, the tables are created by selecting appropriate data from the model in a self-made method. Secondly, the resulting data tables are then subject to visual mappings to create a visual abstraction that is responsible for containing all the information needed to draw a visual representation of the data. It is simply done by associating tables with an instance of prefuse.Visualization class. Thirdly, several renderers must be set. Prefuse provides a variety of renderers for different purposes, e.g. ShapeRenderer for drawing simple

1. The diagram is a copy from Prefuse webpage.

45 7. IMPLEMENTATION

Figure 7.1: Relations between packages and classes of Prefuse toolkit. shapes, LabelRenderer for drawing labels, which consist of a text string, an image, or both, EdgeRenderer for drawing edges between nodes, or AxisRenderer for drawing axis ticks and labels. In fact, all listed renderers are utilized within the Blogosphere Player. Fourthly, processing actions are set. They are basically sets of rules that are applied to the visualization. For example, colors for different groups of visual items are assigned via actions. Also a layout for items can be set this way. Moreover, it is possible to write your own actions and benefit from a wider range of visualization adjustments. Actions can be triggered at a specified time, or more often, they can run continuously. Fifthly, controllers are registered to the visualization. As a matter of fact, controllers are simply listeners, and again, Prefuse toolkit contains sev- eral useful controllers, e.g. DragControl for dragging items, FocusControl for updating focus status of items by clicking on them, HoverAction- Control for executing an action when the mouse pointer passes over an item, NeighborHighlightControl for highlighting neighbours in a graph, PanControl for dragging the viewable region of the visualization, Zoom- Control and WheelZoomControl for zooming in and out. Finally, in the last step, visualization can be started by triggering registered actions to run.

46 7. IMPLEMENTATION

7.2 Graph View

Graph view is one of the four visualization views. It displays a graph where nodes represent blogs/bloggers and edges are connections between them. The backbone of the visualization is formed by a Graph data structure that consists of two tables – one for the nodes and one for the edges.

The table for nodes has these columns: ’user’ for identifying bloggers, ’label’ for displaying information when user hovers the pointer over a node, ’blogger’ with blogger’s name, ’posts’ for showing number of posts, ’visible’ for determining visibility of the nodes, ’color’ for changing default color of the nodes, and ’flash’ for changing color of nodes for a short time.

The table for edges has these columns: ’source’ with row number of the source node, ’target’ with row number of the target node, ’sourceUser’ and ’targetUser’ for identifying bloggers, ’degree’ for displaying actual degree of the edges, ’blogs’ with number of visits to blogs, ’posts’ with number of visits to posts, ’comments’ with number of posted comments, ’visible’ for determining visibility of the edges, and ’flash’ for changing color of edges for a short time.

In a dynamic visualization of a network graph, data are being changed over time, more precisely, edges are being added or removed. However, it proved to be quite complicated when rows in the table of edges were added or removed while visualization was already running because registered actions had problems to handle changes in underlying data tables. Moreover, when rows are removed, they are not deleted, but in fact, they are only invalidated.

To overcome these problems, all required rows are inserted into the data tables before running visualization itself, and then visibility flag is set to true to all items that should be visible and false to those that should not.

Features of the visualization are listed and described as follows:

• Nodes:

– Nodes represent blogs/bloggers.

– By default, they appear as circles with number of posts in the middle.

47 7. IMPLEMENTATION

– It is possible to change shapes and colors. – Labels with bloggers’ names under the nodes can be displayed or hidden. – Size of the nodes may vary based on the number of posts they have in certain time. – User may choose to display only selected nodes. – Descriptive information is displayed by hovering mouse pointer over a node. – In addition to base color of items, there is a different color when items is being hovered or selected. – A new window with detailed information about a blog is popped up when a node is double clicked. – Nodes can be dragged to desired position. – Neighboring nodes are highlighted when a node is being hov- ered. – Nodes are highlighted when rows in the list of blogs are selected.

Figure 7.2: Visualization of blogs in network graph.

48 7. IMPLEMENTATION

• Edges: – Edges represent connections (visits, comments) between blog- gers. – Degree of an edge is indicated by a small label with a number. – Width of the edges may vary based on their degree. – Edges are highlighted when a node is being hovered. – Edges can be filtered by selecting minimum or maximum de- gree. – Edges can be filtered by selecting one or more nodes. – Edges can be filtered by their type. – Edges can be filtered to incoming or outgoing only. • Other features: – Zooming in and out by dragging the right mouse button, by scrolling the mouse wheel, or clicking on buttons in the toolbar. – Viewable region of the visualization can be dragged. Every visible edge has one or two numbers. Number closer to a node rep- resents number of incoming visits, number farther from a node represents outgoing visits. It can be seen in figure 7.3.

Figure 7.3: Blogger A visited blog B five times and blogger B visited blog A two times.

7.3 Timeline View

Timeline view is basically a scatter plot for displaying posts and comments, where date and time determine horizontal position of the items and

49 7. IMPLEMENTATION size of the content (number of characters) determines vertical position. This solution has been chosen because it improves readability of the visualization. If the items were arranged along a single axis only, many items would overlap the others, and it would be hard to make sense of it.

However, in order to have a variable with size of the content, it is necessary to have data from web feeds. Sometimes, web feeds do not have to be available anymore, so data about entries are taken from server logs only. But it means that the variable would be set to zero, and items would be arranged in a line along horizontal axis. It does not have to be a problem, but when a lot of items from many blogs are displayed, visualization could be less readable. Therefore, there is also an option to switch to ordinal layout, where the same number is assigned to all posts and comments from the same blog and vertical position is determined by this number.

Unlike a graph visualization, timeline has only one data table. It has these columns: ’id’ with URI from a web feed, ’url’ of the items, ’user’ for identifying bloggers, ’type’ of the items (post or comment), ’label’ for displaying information when user hovers the pointer over an item, ’xField’ for horizontal position, ’yField’ for vertical position, ’length’ with a number of characters, ’order’ with an assigned ordinal number, ’color’ for changing default color of the items, and ’visible’ for determining visibility of the items.

Figure 7.4: Visualization of posts and comments in timeline view.

50 7. IMPLEMENTATION

Features of the visualization are listed and described as follows: • By default, items appear as circles while posts are a little larger than comments. • It is possible to change shapes and colors. • User may choose to display only certain items by selecting blogs, tags, and personal tags. • Descriptive information is displayed by hovering mouse pointer over an item. • In addition to base color of items, there is a different color when items are being hovered or selected. • A new window with detailed information about a post or comments is popped up when an item is double clicked. • If items overlap and user double clicks one, then a window with a list of items to select pops up. • Corresponding post is highlighted when a comment is being hovered. • All corresponding comments are highlighted when a post is being hovered. • Items are highlighted when rows in the list of blogs or tags are selected. • Zooming in and out by dragging the right mouse button, by scrolling the mouse wheel, or clicking on buttons in the tool bar. • Timeline can be dragged directly instead of moving slider.

7.4 Blog View and Tag Cloud

These two views are much simpler than previous two. They are designed to visualize only single selected blog at a time. Both blog view and tag cloud are represented by graphs, which, in fact, are trees. In case of blog view, there are three types of nodes. Root node represents a blog, its child nodes represent posts, and their children are comments. It has many features common with the graph view as well as with the timeline view. One major difference is that layout is implemented as

51 7. IMPLEMENTATION

Figure 7.5: Visualization of a blog.

Figure 7.6: Tag cloud.

52 7. IMPLEMENTATION a force directed layout. It tries to position the nodes of a graph so that all the edges are of more or less equal length, and so that there are as few crossing edges as possible. In case of trees, the algorithm included in Prefuse library works perfectly. It can be also used to arrange nodes in graphs other than trees, but it proved to be unsuitable for very dense graphs. Therefore, the graph view does not provide this feature. Tag cloud works on very similar principles, but instead of nodes rendered as geometric shapes, there are only text labels. They represent tags found in web feeds as they have changed in time. More frequently used tags are bigger, while the less used are smaller. Tag cloud can also display personal tags assigned by a user of the Blogosphere player.

7.5 Filters

There are four tabs with tables located on the left side of the Blogosphere player. Three of them serve as filters (blogs, tags, and personal tags), and the fourth tab is used for assigning colors to selected tags. Filters allow user to choose which items to display by selecting only certain blogs or tags. The union operation is performed on the rows of the same table, whereas the intersection operation is performed between whole tables. For example user will select blogs A and B, and tags ’html’ and ’css’. This way, he will display only those posts and comments from blogs A and B which have tags ’html’ or ’css’. There is also a special value ’’ that represents items (blogs, posts or comments) with no assigned tags. Tables are also used for highlighting visual items, e.g. when user high- lights certain blogs in the table, then corresponding visual items will be highlighted also.

53 Chapter 8 Application Usage Scenarios

The Blogosphere player application was primarily designed for particular case where blogging data were logged on regular basis. As such, the best results in blogosphere visualization will be reached only if logged data are imported. However, it is also possible to use the application with very limited data, in fact, only with a list of URLs. Following two sections will present two scenarios where both options are discussed.

8.1 Visualization with Server Logs

There are two types of logs – visits and posts, which can be represented by CSV or XML file. In addition, the application can import another file where users are associated with URLs of blogs. In this context, ’user’ is in fact an identifier that uniquely identifies the blogger. It can be name, number, or perhaps an e-mail. In this case, it is student’s ID. In a file with logged visits, sources and targets are identified by these IDs. It is necessary to use the same ’user’ identifier for the same blogger in all three files. When a new project is started, a dialog opens where user can choose to import any of these three files. At least one of them is required. Then he can select a folder where web feeds are saved. Furthermore, there is also an option to try downloading web feeds automatically from the Web.1 The application will subsequently process all available data and create a model of blogosphere. The quality of the model depends on many factors, mostly on accessibility of web feeds. It might happen that some blogs no longer exist, and if web feeds are not backed up, then the model will be only partial. Furthermore,

1. It might be a little time consuming to download all web feeds.

54 8. APPLICATION USAGE SCENARIOS

Figure 8.1: Import dialog with selected ’Blogs’ tab.

Figure 8.2: Import dialog with selected ’Feeds’ tab.

55 8. APPLICATION USAGE SCENARIOS recognizing feeds for comments is optimized only for Google’s blogging site – Blogger, so it is not guaranteed that comments will be imported also from other sites. When import of data is complete, several buttons on the main toolbar will be enabled, and user can switch between views. In addition to graph, timeline, blog, and tag cloud view, there is also a view where system messages are displayed – mainly during the process of importing data. All features described in previous chapter should be available.

8.2 Visualization without Server Logs

This option limits visualization possibilities because visits can not be obtained without server logs, and network graph might not be available. However, other views should work normally. Only necessity is having a file with users and URLs of blogs. The easiest way is to make a CSV file with the following structure: "User","URL" "blogger1","http://johndoe.blogspot.com/" "blogger2","http://richardroe.blogspot.com/" "blogger3","http://janeroe.blogspot.com/" When user opens such a file, then it is automatically checked if headers are correct. If not, it can be easily fixed by typing the correct form into the provided text fields. It is recommended to have URLs in standardized format with specified protocol, domain name, and path; however, the Blogosphere player will try to fix incorrectly written URLs. It is also good to have locally saved web feeds, but it is not required. The Blogosphere player can download them from the Web, but this way, there will be only the most recent entries. Except the graph view, all other views with all described features should be available in the same way as in the previous scenario.

56 8. APPLICATION USAGE SCENARIOS

Figure 8.3: Screenshot of the main GUI frame.

57 Chapter 9 Conclusion

Both Java programming language and Prefuse library proved to be very useful for purposes of information visualization. However, it is quite difficult to understand how Prefuse toolkit works – at least from the beginning. The learning curve is slow mainly because of lack of more comprehensive manual and tutorials. On the other hand, documentation of application interface is very well described, but it is not easy to understand which components to choose and how to use them effectively.

Although the given task to design and implement the Blogosphere player seemed simple enough, it turned out that it might be somewhat more complicated. Finally, more than 60 classes have been created, while inner classes are not included in this number. It may seem too much, but almost half of these classes represent graphical components – frames, dialogs, etc. Fortunately, NetBeans IDE provides very helpful Swing GUI builder, so that a lot of application code is automatically generated.

The goal of the thesis was to develop an application that parses server logs and RSS/Atom feeds, extracts temporal information, and visualizes the network of visits, postings, and comments within a given set of bloggers. The application consists of four different views for displaying network graph of connections between bloggers, timeline with posts and comments, representation of a blog and its entries by a tree graph, and tag cloud with tags that can be found in web feeds. Each view provides various useful features, e.g. zooming, dragging visual items, changing colors and shapes, highlighting, or displaying additional information.

Visualization can be animated using a time slider, or played automatically at desired speed. Several convenient filters can be used to display and hide visual items. User can assign notes and personal tags to all blogs, posts, and comments. Each project can be saved to an external XML file and loaded later.

58 Blogosphere player can serve for many various purposes when it is used for evaluating a set of students’ blogs. It is easy to find out when, how many, and how long entries were posted by individual students, or how often students visited blogs of their peers. It can help to identify influential and active bloggers, and find out what topics were the most blogged about by examining tags assigned to entries. If students are rewarded with better grades for active blogging, then the application could help with deciding which ones to reward. I assume that all requirements have been successfully accomplished. Although the application tries to provide the best possible means for visualization of blogs and interactions amongst them, there could be a room for improvement. I was coming up with new ideas during all phases of application development, so it is quite possible that someone else could come up with something better. Time spent on developing the Blogosphere player application was certainly worthwhile. I have learned many new things, especially in the field of information visualization.

59 Bibliography

[1] Arlow, J., & Neustadt, I. (2005). UML 2 and the unified process: practical object-oriented analysis and design. 2nd ed. Upper Saddle River, NJ: Addison-Wesley. [2] Berners-Lee, T., & Fischetti, M. (1999). Weaving the web, the original design and ultimate destiny of the world wide web by its inventor. New York: HarperOne. [3] Birney, R., Barry, M., & O hEigeartaigh, M. (2006). Blogs: Enhancing the Learning Experience for Technology Students. In E. Pearson, & P. Bohman (Eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2006 (pp. 1042- 1046). Chesapeake, VA: AACE. [4] Blood, R. (2000). Weblogs: A History and Perspective. In Rebecca’s Pocket. Retrieved on April 25, 2012, from http://www.rebecca blood.net/essays/weblog_history.html [5] Chang, Y.-J., & Chen, C.-H. (2007). Experiences of Adopting In-class Blogs in the Teaching of Hands-on Computer Laboratory Courses. In Proceedings of Seventh IEEE International Conference on Advanced Learning Technologies (ICALT 2007), Niigata, Japan (pp. 447-448). [6] Chen, Y., & Wu, L. (2008). Idea and Practice for Paperless Education. Technologies for E-Learning and Digital Entertainment, 5093, 147–152. Heidelberg: Springer. doi:10.1007/978-3-540-69736-7_16 [7] Chen, C.-H., Härdle, W., & Unwin, A. (Eds.). (2008). Handbook of Data Visualization. Berlin: Springer. [8] Creating a GUI With JFC/Swing. (2012). In The Java Tutorials. Re- trieved May 5, 2012, from http://docs.oracle.com/javase/tutorial/ uiswing/ [9] Curran, K., & Marshall, D. (2011). Blogs in education. Elixir Advanced Engineering Informatics, 36.

60 [10] Derntl, M. (2010). Revealing Student Blogging Activities Using RSS Feeds and LMS Logs. International Journal of Distance Education Technologies, 8(3), 16-30. [11] Divitini, M., Haugalokken, O., & Morken, E. M. (2005). Blog to sup- port learning in the field: lessons learned from a fiasco. In Proceedings of Fifth IEEE International Conference on Advanced Learning Tech- nologies (ICALT 2005), Kaohsiung, Taiwan (pp. 219-221). [12] Eisenstein, E. L. (1993). The Printing Revolution in Early Modern Europe. New York: Cambridge University Press. [13] Farmer, B., Yue, A., & Brooks, C. (2007). Using blogging for higher order learning in large-cohort university teaching: A case study. ICT: Providing choices for learners and learning. Proceedings ascilite Singapore 2007. [14] Hill, B. (2006). Blogging for Dummies. Hoboken, New Jersey: Wiley Publishing, Inc. [15] ITU - International Telecommunication Union. (2011). The World in 2011: ICT Facts and Figures. [16] Kim, H. N. (2008). The phenomenon of blogs and theoretical model of blog use in educational contexts. Computers & Education, 51, 1342–1352. doi:10.1016/j.compedu.2007.12.005 [17] Lenhart, A., Purcell, K., Smith, A. & Zickuhr K. (2010). Social Media and Young Adults. Pew Research Center. Retrieved on April 25, 2012, from http://www.pewinternet.org/Reports/2010/Social-Media-and- Young-Adults.aspx [18] Mims, Ch. (2011). Google+ Marks the End of Blogging as a Means of Personal Expression. In Technology Review. Retrieved on April 25, 2012, from http://www.technologyreview.com/blog/mimssbits/ 26986/ [19] Morris, A. (2010). Blogging Trends: There’s Only Enough Room in the Blogosphere for the 144 Million of Us. Ignite Social Media. Retrieved on April 25, 2012, from http://www.ignitesocialmedia.com/social- media-trends/2010-blogging-trends-blog-growth-statistics/ [20] NM Incite, Nielsen/McKinsey company. (2012). Buzz in the Blogo- sphere: Millions more bloggers and blog readers. Retrieved on April 25, 2012, from http://www.nmincite.com/?p=6531

61 [21] Nottingham, M., & Sayre, R. (Eds.). (2005). The Atom Syndication Format. Retrieved May 1, 2012, from http://www.atomenabled.org/ developers/syndication/atom-format-spec.php [22] O’Reilly, T., & Battelle, J. (2009). Web Squared: Web 2.0 Five Years On. Web 2.0 Summit 2009. Retrieved on May 22, 2012, from http://www.web2summit.com/web2009/public/schedule/detail/ 10194 [23] PageRank. (n.d.). In Wikipedia. Retrieved on April 25, 2012, from http://en.wikipedia.org/wiki/PageRank [24] Palfrey, J., & Gasser, U. (2008). Born Digital: Understanding the First Generation of Digital Natives. New York: Basic Books. [25] Prefuse documentation. (2007). In prefuse.org. Retrieved May 5, 2012, from http://prefuse.org/doc/ [26] Richardson, W. (2010). Blogs, Wikis, and Other Powerful Web Tools for Classrooms (3rd ed). Thousand Oaks, CA: Corwin. [27] Technorati. (2011). State of the Blogosphere 2011. Retrieved on April 25, 2012, from http://technorati.com/social-media/article/state-of- the-blogosphere-2011-introduction/ [28] Toffler, A. (1990). The Third Way. New York: Bantam Books. [29] Vaughan-Nichols, S. J. (2011). Before the Web: the Internet in 1991. In ZDNet. Retrieved on April 25, 2012, from http://www.zdnet.com/ blog/networking/before-the-web-the-internet-in-1991/834 [30] Weller, M., Pegler, C., & Mason, R. (2005). Use of innovative tech- nologies on an e-learning course. The Internet and Higher Education, 8(1), 61–71. doi:10.1016/j.iheduc.2004.10.001 [31] Winer, D. (2003). RSS 2.0 Specification. Retrieved May 1, 2012, from http://cyber.law.harvard.edu/rss/rss.html [32] Wortham, J. (2007). After 10 Years of Blogs, the Fu- ture’s Brighter Than Ever. In Wired. Retrieved on April 25, 2012, from http://www.wired.com/entertainment/theweb/news/ 2007/12/blog_anniversary

62 Appendix A Class Diagrams

Figure A.1: Classes in players package – part 1.

63 Figure A.2: Classes in players package – part 2.

64 Figure A.3: Classes in blogosphere package.

65 Figure A.4: Classes in io package – part 1.

66 Figure A.5: Classes in io package – part 2.

67