Efficient Construction of Metadata-Enhanced Web Corpora

Total Page:16

File Type:pdf, Size:1020Kb

Efficient Construction of Metadata-Enhanced Web Corpora Efficient construction of metadata-enhanced web corpora Adrien Barbaresi To cite this version: Adrien Barbaresi. Efficient construction of metadata-enhanced web corpora. 10th Web as Corpus Workshop, Association for Computational Linguistics (ACL SIGWAC), Aug 2016, Berlin, Germany. pp.7-16, 10.18653/v1/W16-2602. hal-01371704v2 HAL Id: hal-01371704 https://hal.archives-ouvertes.fr/hal-01371704v2 Submitted on 18 Oct 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Efficient construction of metadata-enhanced web corpora Adrien Barbaresi Austrian Academy of Sciences – Berlin-Brandenburg Academy of Sciences [email protected] Abstract evolutions. However, it is quite rare to find ready- made resources. Specific issues include first the Metadata extraction is known to be a discovery of relevant web documents, and second problem in general-purpose Web corpora, the extraction of text and metadata, e.g. because and so is extensive crawling with little of exotic markup and text genres (Schafer¨ et al., yield. The contributions of this paper are 2013). Nonetheless, proper extraction is neces- threefold: a method to find and download sary for the corpora to be established as scientific large numbers of WordPress pages; a tar- objects, as science needs an agreed scheme for geted extraction of content featuring much identifying and registering research data (Samp- needed metadata; and an analysis of the son, 2000). Web corpus yield is another recurrent documents in the corpus with insights of problem (Suchomel and Pomikalek,´ 2012; Schafer¨ actual blog uses. et al., 2014). The shift from web as corpus to web for corpus – mostly due to an expanding Web The study focuses on a publishing soft- universe and the need for better text quality (Vers- ware (WordPress), which allows for reli- ley and Panchenko, 2012) – as well as the limited able extraction of structural elements such resources of research institutions make extensive as metadata, posts, and comments. The downloads costly and prompt for handy solutions download of about 9 million documents in (Barbaresi, 2015). the course of two experiments leads after 1 processing to 2.7 billion tokens with us- The DWDS lexicography project at the Berlin- able metadata. This comparatively high Brandenburg Academy of Sciences already fea- yield is a step towards more efficiency tures a good coverage of specific written text gen- with respect to machine power and “Hi-Fi” res (Geyken, 2007). Further experiments includ- web corpora. ing internet-based text genres are currently con- ducted in joint work with the Austrian Academy The resulting corpus complies with formal of Sciences (Academy Corpora). The common ab- requirements on metadata-enhanced cor- sence of metadata known to the philological tradi- pora and on weblogs considered as a series tion such as authorship and publication date ac- of dated entries. However, existing typolo- counts for a certain defiance regarding Web re- gies on Web texts have to be revised in the sources, as linguistic evidence cannot be cited or light of this hybrid genre. identified properly in the sense of the tradition. Thus, missing or erroneous metadata in “one size 1 Introduction fits all” web corpora may undermine the relevance of web texts for linguistic purposes and in the hu- 1.1 Context manities in general. Additionally, nearly all ex- This article introduces work on focused web cor- isting text extraction and classification techniques pus construction with linguistic research in mind. have been developed in the field of information The purpose of focused web corpora is to comple- retrieval, that is not with linguistic objectives in ment existing collections, as they allow for better mind. coverage of specific written text types and genres, user-generated content, as well as latest language 1Digital Dictionary of German, http://zwei.dwds.de The contributions of this paper are threefold: This means that a comprehensive crawl could lead (1) a method to find and download large amounts to better yields. of WordPress pages; Regarding the classification of blogs, Blood (2) a targeted extraction of content featuring much (2002) distinguishes three basic types: filters, per- needed metadata; sonal journals, and notebooks, while Krishna- (3) an analysis of the documents in the corpus with murthy (2002) builds a typology based on func- insights of actual uses of the blog genre. tion and intention of the blogs: online diaries, sup- My study focuses on a publishing software with port group, enhanced column, collaborative con- two experiments, first on the official platform tent creation. More comprehensive typologies es- wordpress.com and second on the .at-domain. tablished on one hand several genres: online jour- WordPress is used by about a quarter of the web- nal, self-declared expert, news filter, writer/artist, sites worldwide2, the software has become so spam/advertisement; and on the other hand dis- broadly used that its current deployments can be tinctive “variations”: collaborative writing, com- expected to differ from the original ones. A ments from readers, means of publishing (Glance number of 158,719 blogs in German have pre- et al., 2004). viously been found on wordpress.com (Barbaresi and Wurzner,¨ 2014). The .at-domain (Austria) 2 Related work is in quantitative terms the 32th top-level domain 2.1 (Meta-)Data Extraction with about 3,7 million hosts reported.3 Data extraction has first been based on “wrappers” 1.2 Definitional and typological criteria (nowadays: “scrapers”) which were mostly rely- From the beginning of research on blogs/weblogs, ing on manual design and tended to be brittle and the main definitional criterion has always been hard to maintain (Crescenzi et al., 2001). These their form, a “reverse chronological sequences of extraction procedures have also been used early dated entries” (Kumar et al., 2003). Another for- on by blogs search engines (Glance et al., 2004). mal criterion is the use of dedicated software to ar- Since the genre of “web diaries” was established ticulate and publish the entries, a “weblog publish- before the blogs in Japan, there have been attempts ing software tool” (Glance et al., 2004), “public- to target not only blog software but also regular domain blog software” (Kumar et al., 2003), or pages (Nanno et al., 2004), in which the extraction Content Management System (CMS). These tools of metadata also allows for a distinction based on largely impact the way blogs are created and run. heuristics. 1996 seems to be the acknowledged beginning of Efforts were made to generate wrappers auto- the blog/weblog genre, with an exponential in- matically, with emphasis on three different ap- crease of their use starting in 1999 with the emer- proaches (Guo et al., 2010): wrapper induction gence of several user-friendly publishing tools (e.g. by building a grammar to parse a web page), (Kumar et al., 2003; Herring et al., 2004). sequence labeling (e.g. labeled examples or a Whether a blog is to be considered to be a web schema of data in the page), and statistical anal- page in its whole (Glance et al., 2004) or a website ysis and series of resulting heuristics. This analy- containing a series of dated entries, or posts, (Ke- sis combined to the inspection of DOM tree char- hoe and Gee, 2012) being each a web page, there acteristics (Wang et al., 2009; Guo et al., 2010) are invariant elements, such as “a persistent side- is a common ground to the information retrieval bar containing profile information” as well as links and web corpus linguistics communities, with the to other blogs (Kumar et al., 2003), or blogroll. categorization of HTML elements and linguistic For that matter, blogs are intricately intertwined in features (Ziegler and Skubacz, 2007) for the for- what has been called the blogosphere: “The cross- mer, and markup and boilerplate removal opera- linking that takes place between blogs, through tions known to the latter community (Schafer¨ and blogrolls, explicit linking, trackbacks, and refer- Bildhauer, 2013). rals has helped create a strong sense of community Regarding content-based wrappers for blogs in in the weblogging world.” (Glance et al., 2004). particular, targets include the title of the entry, the date, the author, the content, the number of com- 2http://w3techs.com/technologies/details/cm- wordpress/all/all ments, the archived link, and the trackback link 3http://ftp.isc.org/www/survey/reports/2016/01/bynum.txt (Glance et al., 2004); they can also aim at com- ments specifically (Mishne and Glance, 2006). with the “freshly pressed” page on WordPress as well as a series of trending blogs used as seed 2.2 Blog corpus construction for the crawls, leading to 222,245 blog posts and The first and foremost issue in blog corpus con- 2,253,855 comments from Blogger and WordPress struction still holds true today: “there is no com- combined, totaling about 95 million tokens (for prehensive directory of weblogs, although several the posts) and 86 million tokens (for the com- small directories exist” (Glance et al., 2004). Pre- ments). vious work established several modes of construc- The YACIS Corpus (Ptaszynski et al., 2012) is tion, from broad, opportunistic approaches, to the a Japanese corpus consisting of blogs collected focus on a particular method or platform due to the from a single blog platform, which features mostly convenience of retrieval processes.
Recommended publications
  • Blogs and Websites Basics
    P a g e | 1 Blogs and Websites Basics ABOUT THIS CLASS This class is designed to introduce you to the basic concepts and skills required for starting and maintaining a basic website in the form of a blog. Course Objectives By the end of this course, you will know: The difference between a blog and a website. The difference between hosted and non-hosted sites. What a blogging platform is and how to choose one for your site. How to choose a template (your blog’s appearance). How to post and add pages to your blog. How to avoid making common blogging/website mistakes. Best practices and essential things to know to make your blog/site as successful as possible. This booklet will serve as a guide as we progress through the class, but it can also be a valuable tool when you are working on your own. Any class instruction is only as effective as the time and effort you are willing to invest in it. I encourage you to practice soon after class. There will be additional computer classes in the near future, and I am usually available for questions during Tech Times on Tuesdays (10am-noon) and Thursdays (3pm-5pm). Feel free to call to verify that Tech Time is happening on the day that you’re planning to come. Meg Wempe, Adult Services Librarian P a g e | 2 Blogs vs. Websites What’s the difference between a blog and website? Blog: A blog (a combination of the term web log) is a discussion or informational site published on the World Wide Web and consisting of discrete entries ("posts") typically displayed in reverse chronological order (the most recent post appears first).
    [Show full text]
  • N8ek Kf N?8Kêj L
    :FM<IJKFIP Trackbacks in Drupal :fe]`^li`e^KiXZbYXZbj`e;ilgXc C<8M@E> 8o\cK\`Z_dXee#=fkfc`X KI8:BJ Trackbacks offer a simple means for bloggers to connect and share information. BY JAMES STANGER trackback is a way for a blogger With trackbacks, two seemingly unre- Several content management systems to automatically notify different lated conversations become more (CMSs) include trackback options. In N8EKKFBEFN 8blogs that he or she has either strongly associated. Each time an update Drupal [1], if you’ve enabled trackbacks, begun or extended a conversation with occurs in the conversation, the context a blogger on your system just has to another blogger. A trackback is one of becomes stronger and richer. Search en- enter the URL of a remote blogger who three main types of linkbacks (see the gines often rank pages higher if they are supports trackbacks, and the blogger N?8KÊJLGE<OK6 “Trackbacks and Linkbacks” box) that linked from other sites. Trackbacks thus will be notified. In this article, I describe bloggers use to keep track of each oth- promote higher ratings and perhaps how to set up trackbacks in Drupal with er’s postings and ensure that their read- more exposure for a project or product. examples based on the implementation ers can link to related content. Once a website has trackbacks enabled, one blogger can reach out to another on a separate site by sending a “ping” to that user. The ping simply says, “Here’s a topic that is related to what you’ve JL9J:I@9<KFC@ELO posted, check it out.” If a blogger on a separate site wants to D8>8Q@E<GI<M@<N# respond, the conversation between the two bloggers becomes stronger.
    [Show full text]
  • S.Tramp Et Al. / an Architecture of a Distributed Semantic Social Network
    Semantic Web 1 (2012) 1–16 1 IOS Press An Architecture of a Distributed Semantic Social Network Sebastian Tramp a;∗, Philipp Frischmuth a, Timofey Ermilov a, Saeedeh Shekarpour a and Sören Auer a a Universität Leipzig, Institut für Informatik, AKSW, Postfach 100920, D-04009 Leipzig, Germany E-mail: {lastname}@informatik.uni-leipzig.de Abstract. Online social networking has become one of the most popular services on the Web. However, current social networks are like walled gardens in which users do not have full control over their data, are bound to specific usage terms of the social network operator and suffer from a lock-in effect due to the lack of interoperability and standards compliance between social networks. In this paper we propose an architecture for an open, distributed social network, which is built solely on Semantic Web standards and emerging best practices. Our architecture combines vocabularies and protocols such as WebID, FOAF, Se- mantic Pingback and PubSubHubbub into a coherent distributed semantic social network, which is capable to provide all crucial functionalities known from centralized social networks. We present our reference implementation, which utilizes the OntoWiki application framework and take this framework as the basis for an extensive evaluation. Our results show that a distributed social network is feasible, while it also avoids the limitations of centralized solutions. Keywords: Semantic Social Web, Architecture, Evaluation, Distributed Social Networks, WebID, Semantic Pingback 1. Introduction platform or information will diverge. Since there are only a few large social networking players, the Web Online social networking has become one of the also loses its distributed nature.
    [Show full text]
  • Using RSS to Spread Your Blog
    10_588486 ch04.qxd 3/4/05 11:33 AM Page 67 Chapter 4 Using RSS to Spread Your Blog In This Chapter ᮣ Understanding just what a blog is ᮣ Creating the blog and the feed ᮣ Using your RSS reader ᮣ Creating a blog using HTML ᮣ Maintaining your blog ᮣ Publicizing your blog with RSS hat’s a blog, after all? Blog, short for Web log, is just a Web site with Wa series of dated entries, with the most recent entry on top. Those entries can contain any type of content you want. Blogs typically include links to other sites and online articles with the blogger’s reactions and com- ments, but many blogs simply contain the blogger’s own ramblings. Unquestionably, the popularity of blogging has fueled the expansion of RSS feeds. According to Technorati, a company that offers a search engine for blogs in addition to market research services, about one-third of blogs have RSS feeds. Some people who maintain blogs are publishing an RSS feed with- out even knowing about it, because some of the blog Web sites automatically create feeds (more about how this happens shortly). If you think that blogs are only personal affairs, remember that Microsoft has hundreds of them, and businesses are using them more and more to keep employees, colleagues, and customersCOPYRIGHTED up to date. MATERIAL In this chapter, I give a quick overview of blogging and how to use RSS with your blog to gain more readers. If you want to start a blog, this chapter explains where to go next.
    [Show full text]
  • Writingmatrix Connecting Students with Blogs, Tags, and Social
    March 20082008 Volume 11, Number 4 Top Writingmatrix: Connecting Students with Blogs, Tags, and Social Networking Vance Stevens Petroleum Institute, Abu Dhabi, United Arab Emirates <vancestev gmail.com> Nelba Quintana Escuela de Lenguas, National University of La Plata, La Plata, Argentina <nelbaq gmail.com> Rita Zeinstejer Asociación Rosarina de Cultura Inglesa, Rosario, Argentina <rita zeinstejer.com> Saša Sirk Tehniški šolski center Nova Gorica, Nova Gorica, Slovenia <sasa rthand.com> Doris Molero Universidad Dr. Rafel Belloso Chacín, Maracaibo, Venezuela <dorismolero yahoo.com> Carla Arena Casa Thomas Jefferson BiNational Center, Brasilia-DF, Brazil <carlaarena gmail.com> Abstract This paper describes an extensive online project, Writingmatrix [http://writingmatrix.wikispaces.com], involving several key elements essential to collaboration in Web 2.0, such as aggregation, tagging, and social networking. Participant teachers in several different countries--Argentina, Venezuela, and Slovenia--had their adult students at various levels of English competency interact using blogs and other Web tools, including RSS feed readers and Technorati [http://technorati.com]. The teachers describe their respective settings, how they got their students started in communicating through blogging and social networking, and how in future they intend to expand the Writingmatrix experience. Introduction TESL-EJ 11.4, March 2008 Stevens, et al. 1 A working understanding of aggregation, tagging, and RSS (Really Simple Syndication) is key to collaboration as well as to filtering and regulating the flow of information resources online. Tags allow people to organize the information available through their distributed networks in ways that are meaningful to them, and social networking enables nodes in these networks to interact with each other according to how these tags and other folksonomic (for example, socially intertwined and personally meaningful) data overlap.
    [Show full text]
  • Guida Alla Creazione Di Articoli Con Wordpress
    Livia G. Garzanti Dispense di WordPress Guida alla creazione di articoli con WordPress Un sito Web prevede normalmente una pagina di news o articoli. Con l’editor degli articoli di WordPress è abbastanza semplice e intuitivo creare e modificare le notizie, formattarne il testo, aggiungere link, inserirvi immagini, … Scrivere un nuovo articolo Una volta effettuato l’accesso al back-end del sito nella bacheca c’è la voce Articoli; posizionandovi il mouse sopra compare il sottomenu relativo, con le voci riportate nella figura seguente: Figura 1 – Il sotomenu della gestione degli articoli Per creare un nuovo articolo fare clic su Aggiungi articolo; si aprirà la finestra di composizione dell’articolo: Figura 2 – L’editor di creazione degli articoli di WordPress 1 Livia G. Garzanti Dispense di WordPress Nella casella di testo in alto va inserito il titolo (come suggerisce la scritta in essa visualizzata), nella zona più ampia sotto va composto il corpo dell’articolo, inserendo i blocchi1 che servono. Le Impostazioni documento Nella barra laterale a destra dell’editor dei blocchi, oltre alla scheda Blocco c’è la scheda Documento, cioè quell’area che raccoglie tutti i comandi per le impostazioni relative a caratteristiche dell’intero articolo di WordPress: • Stato e visibilità • Revisioni • Permalink • Categorie • Tag • Immagine in evidenza • Riassunto • Discussione NOTA: Alcune impostazioni sono uguali a quelle delle pagine, altre sono solo per gli articoli. Stato e visibilità Figura 3 – Le impostazioni di stato e visibilità In modo del tutto uguale all’impostazione delle pagine, in questa sezione si imposta la visibilità dell’articolo, che può essere di tre tipi: Pubblico, Privato, Protetto da password e vi troviamo le informazioni sulla data di pubblicazione dell’articolo e il nome (Autore) dell’utente di WordPress che l’ha creato; c’è anche un 1 Per le spiegazioni sull’editor a blocchi di WordPress si rimanda alla dispensa sulla creazione e gestione delle pagine.
    [Show full text]
  • Using RSS and Weblogs for E-Learning: an Overview
    May 19 2003 Strategies and Techniques for Designers, Developers, and Managers of eLearning THIS WEEK — DEVELOPMENT STRATEGIES New technologies are Using RSS and Weblogs for continually coming e-Learning: An Overview to the fore, and many of them have applica- BY BILL BRANDON tions to e-Learning. impler is better. There are a lot of “needs” in e-Learning, Weblogs and RSS and there’s often a limit to the time, talent, and money (an XML standard that can be thrown at them individually. For example, S for distribution of take facilitating communication across project teams during information) are design and development, or between instructors and learners making substantial when they are separated by time and space, or between learn- progress in 2003. ers while building a community. What if there was a way to use This overview will “public” infrastructure for this, reducing the cost of infrastruc- give you a taste of ture and software? Or what if there was a low-cost way to pub- what is being done lish learning objects for use by other developers, or to acquire with these by early learning resources for use in your own projects, or to make adopters. You may them available for direct use by learners? find some ideas that These challenges have at least two relat- two developments, and to some of the ed potential solutions that have been mak- more appropriate ways they are being will meet your needs ing rapid strides in the first half of 2003: applied to e-Learning. to do things more weblogs (or “blogs”) and RSS (an abbrevia- tion with more than one meaning, as you A fast overview simply and more will soon see).
    [Show full text]
  • Integrating with Blogs
    CHAPTER 5 Integrating with Blogs Blogs (also known as weblogs) have become lightweight, general-purpose platforms for publication, self-expression, and collaboration. Bloggers push the limits of new-media production, especially in the area of integration, because they want ultimately to discuss anything they can see or think or hear—without any effort, of course. Because you can directly tie blogs in with other systems—often without any programming on your own part—you’ll now study how to combine blogs with other applications and data sources. In this chapter, I cover end-user functionality that lets you publish content to a blog from a web site or a desktop application. In Chapter 7, you’ll study how you can program the relevant web APIs to read and publish blog content. I close this chapter by applying lessons from blog integration to wikis, which I believe are ripe for a similar type of remixing. In this chapter, you will do the following: * You’ll learn how to configure your WordPress or Blogger blog to receive pictures from Flickr through Flickr’s Blog This button. * You’ll study the mechanisms behind blog integration by studying how it’s done with Flickr. * You’ll learn how to use a desktop blogging client to take advantage of a richer writing environment for blogging. * You’ll see how the combination of syndication feeds and blogging can be recursive (that is, how content from blogs can be refashioned into new blog entries). * You’ll experience the forward-looking social browser integration of Flock, which combines a Web browser, Flickr photos, and blogging all in one user interface.
    [Show full text]
  • Link Rel Alternate Type Application Rss Xml Cafy
    Link Rel Alternate Type Application Rss Xml Tubby Ender rentes no quester objurgated conditionally after Eddy shoring discretionally, quite Ethiop. Telesthetic Maxwell crossband:legitimatises spherelike no vapourishness and colorfast ceding Tann sedately traipsing after quite Lyle wooingly shut-offs but creatively, mispunctuates quite loverless. her pipettes Lightish iridescently. Forrester still Specialized tool to rel alternate type rss element combined with a section which points to use an xml file, this post is a link element Kept private and that this link alternate type application rss feed button that the point. Readers to web and alternate type application rss xml file you could use rather than one of a blog. Charles morin contributed to the link alternate type application with the text. Element that follow the link alternate application rss feed use this channel provides a feed button on this link to view menu near the possibility to check the name. Most popular namespace element that the link rel alternate application rss readers automatically using the site you choose, resources are both xhtml have adopted the category of text. Class for reading and alternate type application with the view menu near the rss feed use rss feed autodiscovery by posting a year old and extracting the other media. Article is not rel alternate application rss feed in making it clear to an item to select from. Bar to be a link alternate type application rss xml file to represent the point of pingback server push are updated automatically using our blog has a feed. Believe that provides rel alternate type application rss xml file you are to your page.
    [Show full text]
  • Elie Bursztein, Baptiste Gourdin, John Mitchell Stanford University & LSV-ENS Cachan
    Talkback: Reclaiming the Blogsphere Elie Bursztein, Baptiste Gourdin, John Mitchell Stanford University & LSV-ENS Cachan 1 What is a blog ? • A Blog ("Web log") is a site, usually maintained by an individual with • Regular entries • Commentary • LinkBack • Entries displayed in reverse-chronological order. http://elie.im/blog Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Key Statistics • 184 Millions blogs • 73% of users read blogs • 50% post comments universalmccann Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Anatomy of a blog post Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Why blogs are special ? User Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Why blogs are special ? User Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 What is a TrackBack ? Elie Bursztein, Baptiste Gourdin, John Mitchell TalkBack: reclaiming the blogosphere from spammer http://ly.tl/p21 Trackback Illustrated Little Timmy said to me... "What's Trackback, Daddy?" "Wow! Jimmy Lightning has written the best 1. post ever! It's so funny! And it's true! That's "Best Post Ever" why it's so good. I need to tell the world!" "Check it out world! I've "Jimmy written all about Jimmy 2. Lightning is Lightning's post on my Elie Bursztein, Baptiste Gourdin, John Mitchell swell"TalkBack: reclaiming the blogosphere from spammerweblog. My weblog's http://ly.tl/p21 called 'The Unbloggable Blogness of Blogging'.
    [Show full text]
  • Blogging Glossary
    Blogging Glossary Glossary: General Terms Atom: A popular feed format developed as an alternative to RSS. Autocasting: Automated form of podcasting that allows bloggers and blog readers to generate audio versions of text blogs from RSS feeds. Audioblog: A blog where the posts consist mainly of voice recordings sent by mobile phone, sometimes with some short text message added for metadata purposes. (cf. podcasting) Blawg: A law blog. Bleg: An entry in a blog requesting information or contributions. Blog Carnival: A blog article that contains links to other articles covering a specific topic. Most blog carnivals are hosted by a rotating list of frequent contributors to the carnival, and serve to both generate new posts by contributors and highlight new bloggers posting matter in that subject area. Blog client: (weblog client) is software to manage (post, edit) blogs from operating system with no need to launch a web browser. A typical blog client has an editor, a spell-checker and a few more options that simplify content creation and editing. Blog publishing service: A software which is used to create the blog. Some of the most popular are WordPress, Blogger, TypePad, Movable Type and Joomla. Blogger: Person who runs a blog. Also blogger.com, a popular blog hosting web site. Rarely: weblogger. Blogirl: A female blogger Bloggernacle: Blogs written by and for Mormons (a portmanteau of "blog" and "Tabernacle"). Generally refers to faithful Mormon bloggers and sometimes refers to a specific grouping of faithful Mormon bloggers. Bloggies: One of the most popular blog awards. Blogroll: A list of other blogs that a blogger might recommend by providing links to them (usually in a sidebar list).
    [Show full text]
  • In the Spirit of Georgeʼs Request for Feedback from The
    In the spirit of Georgeʼs request for feedback from the employees I tossed together most of the things that buzz around my head as things we could do to take advantage of the fact we online media not print media. These are all technology related ideas and suggestions that apply both to efficiency issues internally and to B2B and consumers. Syndication Syndication on the web has formalized around RSS feeds. Aggregators like Google News use this, and users use this via RSS client software or aggregators like Google Reader. RSS clients exist for almost every platform from the PC to the iPhone. Optimally every page on the site that lists content should be available as an RSS feed including search results. For instance, I should be able to subscribe to an RSS feed for the “Europe” regional portal page, or the Terrorism Weekly, or the search results of the term “Putin”. Aggregation The flip side of pervasive syndication via RSS on the Web is the ability for users to specify feeds from other sites they would like to look at in what place or for sites like ours to pull content or links to content via RSS that is in context to our content. For instance, titles to recent NYTIMES articles on Europe on our Europe regional portal page. Links and References Weʼve been very good internally at taking advantage of our existing content and linking to previous articles in context within a new article. What we donʼt do so much is link to outside sources of information within our articles or provide links to relevent information as a footnote to articles.
    [Show full text]