Chapter to Hack

Home , Bloglines, News aggregator, Radio UserLand

05_597582_ch01.qxd 8/5/05 10:23 PM Page 3

Getting Ready chapter to Hack

hat are RSS and Atom feeds? If you’re reading this, it’s pretty likely you’ve already seen links to feeds (things such as W“Syndicate this Site” or the ubiquitous orange-and-white “RSS” buttons) starting to pop up on all of your favorite sites. In fact, you might already have secured a feed reader or aggregator and stopped visiting most of your favorite sites in person. The bookmarks in your browser have started in this chapter gathering dust since you stopped clicking through them every day. And, if you’re like some feed addicts, you’re keeping track of what’s new from ˛ Taking a Crash more Web sites and news sources than you ever have before, or even thought Course in RSS and possible. Atom Feeds If you’re a voracious infovore like me and this story doesn’t sound familiar, you’re in for a treat. RSS and Atom feeds—collectively known as syndication ˛ Gathering Tools feeds—are behind one of the biggest changes to sweep across the Web since the invention of the personal home page. These syndication feeds make it easy for machines to surf the Web, so you don’t have to. So far, syndication feed readers won’t actually read or intelligently digest content on the Web for you, but they will let you know when there’s something new to peruse and can collect it in an inbox, like email. In fact, these feeds and their readers layer the Web with features not altogether different than email newsletters and Usenet newsgroups, but with much more control over what you receive and none of the spam. With the time you used to spend browsing through bookmarked sites checking for updates, you can now just get straight to reading new stuff presented directly. It’s almost as though someone is publishing a newspaper tailored just for you. From the publishing side ofCOPYRIGHTED things, when you serve up your messages MATERIAL and content using syndication feeds, you make it so much easier for someone to keep track of your updates—and so much more likely that they will stay in touch because, once someone has subscribed to your feed, it’s practically effortless to stay tuned in. As long as you keep pushing out things worthy of an audience’s attention, syndication feeds make it easier to slip into their busy schedules and stay there. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 4

4 Part I — Consuming Feeds

Furthermore, the way syndication feeds slice up the Web into timely capsules of microcontent allows you to manipulate, filter, and remix streams of fluid online content in a way never seen before. With the right tools, you can work toward applications that help more cleverly digest content and sift through the firehose of information available. You can gather resources and collectively republish, acting as the editorial newsmaster of your own personal news wire. You can train learning machines to filter for items that match your interests. And the possibilities offered by syndication will only expand as new kinds of information and new types of media are carried and referenced by feed items. But that’s enough gushing about syndication feeds. Let’s get to work figuring out what these things are, under the hood, and how you can actually do some of the things promised earlier.

Taking a Crash Course in RSS and Atom Feeds If you’re already familiar with all the basics of RSS and Atom feeds, you can skip ahead to the section “Gathering Tools” later in this chapter. But, just in case you need to be brought up to speed, this section takes a quick tour of feed consumers, feed producers, and the basics of feed anatomy. Catching Up with Feed Readers and Aggregators One of the easiest places to start with an introduction to syndication feeds is with feed aggregators and readers, because the most visible results of feeds start there. Though you will be building your own aggregator soon enough, having some notion of what sorts of things other working aggregators do can certainly give you some ideas. It also helps to have other aggregators around as a source of comparison once you start creating some feeds. For the most part, you’ll find feed readers fall into categories such as the following:

Desktop newscasts, headline tickers, and screensavers Personalized portals Mixed reverse-chronological aggregators Three-pane aggregators

Though you’re sure to find many more shapes and forms of feed readers, these make a good starting point—and going through them, you can see a bit of the evolution of feed aggregators from heavily commercial and centralized apps to more personal desktop tools. Desktop Headline Tickers and Screensavers One of the most common buzzwords heard in the mid-1990’s dot-com boom was “push.” Microsoft introduced an early form of syndication feeds called Channel Definition Format (or CDF) and incorporated CDF into Internet Explorer in the form of Active Channels. These were managed from the Channel Bar, which contained selections from many commercial Web sites and online publications. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 5

Chapter 1 — Getting Ready to Hack 5

A company named PointCast, Inc., offered a “desktop newscast” that featured headlines and news on the desktop, as well as an animated screensaver populated with news content pulled from commercial affiliates and news wires. Netscape and Marimba teamed up to offer Netcaster, which provided many features similar to PointCast and Microsoft’s offerings but used different technology to syndicate content. These early feed readers emphasized mainly commercial content providers, although it was possible to subscribe to feeds published by independent and personal sites. Also, because these aggregators tended to present content with scrolling tickers, screensavers, and big and chunky user interfaces using lots of animation, they were only really practical for use in subscribing to a handful of feeds—maybe less than a dozen. Feed readers of this form are still in use, albeit with less buzz and venture capital surrounding them. They’re useful for light consumption of a few feeds, in either an unobtrusive or highly branded form, often in a role more like a desktop accessory than a full-on, attention-centric application. Figure 1-1 offers an example of such an accessory from the K Desktop Environment project, named KNewsTicker.

FIGURE 1-1: KNewsTicker window

Personalized Portals Although not quite as popular or common as they used to be, personalized portals were one of the top buzzworthy topics competing for interest with “push” technology back before the turn of the century. In the midst of the dot-com days, Excite, Lycos, Netscape, Microsoft, and Yahoo! were all players in the portal industry—and a Texas-based fish-processing company named Zapata even turned itself into an Internet-startup, buying up a swath of Web sites to get into the game. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 6

6 Part I — Consuming Feeds

The idea was to pull together as many useful services and as much attractive content as possible into one place, which Web surfers would ideally use as their home page. This resulted in modu- lar Web pages, with users able to pick and choose from a catalog of little components containing, among other things, headline links syndicated from other Web sites. One of the more interesting contenders in this space was the My Netscape portal offered by, of course, Netscape. My Netscape was one of the first services to offer support for RSS feeds in their first incarnations. In fact, the original specification defining the RSS format in XML was drafted by team members at Netscape and hosted on their corporate Web servers. Portals, with their aggregated content modules, are more information-dense than desktop tickers or screensavers. Headlines and resources are offered more directly, with less branding and presentation than with the previous “push” technology applications. So, with less window- dressing to get in the way, users can manageably pull together even more information sources into one spot. The big portals aren’t what they used to be, though, and even My Netscape has all but backed away from being a feed aggregator. However, feed aggregation and portal-like features can still be found on many popular community sites, assimilated as peripheral features. For example, the nerd news site Slashdot offers “slashbox” modules in a personalizable sidebar, many or most drawn from syndication feeds (see Figure 1-2).

FIGURE 1-2: Slashdot.org slashboxes

Other Open Source Web community packages, such as Drupal (http://www.drupal.org) and Plone (http://www.plone.org), offer similar feed headline modules like the classic portals. But although you could build and host a portal-esque site just for yourself and friends, this form of feed aggregation still largely appears on either niche and special-interest community sites or commercial sites aiming to capture surfers’ home page preferences for marketing dollars. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 7

Chapter 1 — Getting Ready to Hack 7

In contrast, however, the next steps in the progression of syndication feed aggregator technology led to some markedly more personal tools. Mixed Reverse-Chronological Aggregators Wow, that’s a mouthful, isn’t it? “Mixed reverse-chronological aggregators.” It’s hard to come up with a more concise description, though. Maybe referring to these as “blog-like” would be better. These aggregators are among the first to treat syndication feeds as fluid streams of content, subject to mixing and reordering. The result, by design, is something not altogether unlike a modern blog. Content items are presented in order from newest to oldest, one after the other, all flowed into the same page regardless of their original sources. And, just as important, these aggregators are personal aggregators. Radio UserLand from UserLand Software was one of the first of this form of aggregator (see Figure 1-3). Radio was built as a fully capable Web application server, yet it’s intended to be installed on a user’s personal machine. Radio allows the user to manage his or her own preferences and list of feed subscriptions, to be served up to a Web browser of choice from its own private Web server (see Figure 1-4).

FIGURE 1-3: The Radio UserLand server status window running on Mac OS X

FIGURE 1-4: The Radio UserLand news aggregator in a Firefox browser 05_597582_ch01.qxd 8/5/05 10:23 PM Page 8

8 Part I — Consuming Feeds

The Radio UserLand application stays running in the background and about once an hour it fetches and processes each subscribed feed from their respective Web sites. New feed items that Radio hasn’t seen before are stored away in its internal database. The next time the news aggregation page is viewed or refreshed, the newest found items appear in reverse-chronological order, with the freshest items first on the page. So for the first time, with this breed of aggregator, the whole thing lives on your own computer. There’s no centralized delivery system or marketing-supported portal—aggregators like these put all the tools into your hands, becoming a real personal tool. In particular, Radio comes not only with publishing tools to create a blog and associated RSS feeds, but a full development environment with its own scripting language and data storage, allowing the user-turned-hacker to reach into the tool to customize and extend the aggregator and its workings. After its first few public releases, Radio UserLand was quickly followed by a slew of inspired clones and variants, such as AmphetaDesk (http://www.disobey.com/amphetadesk/), but they all shared advances that brought the machinery of feed aggregation to the personal desktop. And, finally, this form of feed aggregator was even more information-dense than desktop newscasters or portals that came before. Rather than presenting things with entertaining but time-consuming animation, or constrained to a mosaic of on-page headline modules, the mixed reverse-chronological display of feed items could scale to build a Web page as long as you could handle and would keep you constantly up to date with the latest feed items. So, the number of subscribed feeds you could handle was limited only by how large a page your browser could load and your ability to skim, scan, and read it. Three-Pane Aggregators This family of feed aggregators builds upon what I consider to be one of the chief advances of Radio UserLand and friends: feeds treated as fluid streams of items, subject to mixing, reordering, and many other manipulations. With the bonds of rigid headline collections broken, content items could now be treated like related but individual messages. But, whereas Radio UserLand’s aggregator recast feed items in a form akin to a blog, other offerings began to look at feed items more like email messages or Usenet postings. So, the next popular form of aggregator takes all the feed fetching and scanning machinery and uses the familiar user interface conventions of mail and newsgroup applications. Figure 1-5, Figure 1-6, Figure 1-7, and Figure 1-8 show some examples. In this style of aggregator, one window pane displays subscriptions, another lists items for a selected subscription (or group of subscriptions), and the third pane presents the content of a selected feed item. Just like the mail and news readers that inspired them, these aggregators present feed items in a user interface that treats feeds as analogous to newsgroups, mailboxes, or folders. Extending this metaphor further, many of these aggregators have cloned or trans- lated many of the message-management features of email or Usenet clients, such as filtering, searching, archiving, and even republishing items to a blog as analogous to forwarding email messages or crossposting on Usenet. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 9

Chapter 1 — Getting Ready to Hack 9

FIGURE 1-5: NetNewsWire on Mac OS X

FIGURE 1-6: Straw desktop news aggregator for GNOME under Linux 05_597582_ch01.qxd 8/5/05 10:23 PM Page 10

10 Part I — Consuming Feeds

FIGURE 1-7: FeedDemon for Windows

Aggregators from the Future As the value of feed aggregation becomes apparent to more developers and tinkerers, you’ll see an even greater diversity of variations and experiments with how to gather and present feed items. You can already find Web-based aggregators styled after Web email services, other applications with a mix of aggregation styles, and still more experimenting with novel ways of orga- nizing and presenting feed items (see Figure 1-9 and Figure 1-10). In addition, the content and structure of feeds are changing, encompassing more forms of content such as MP3 audio and calendar events. For these new kinds of content, different handling and new presentation techniques and features are needed. For example, displaying MP3 files in reverse-chronological order doesn’t make sense, but queuing them up into a playlist for a portable music player does. Also, importing calendar events into planner software and a PDA makes more sense than displaying them as an email inbox (see Figure 1-11). 05_597582_ch01.qxd 8/5/05 10:23 PM Page 11

Chapter 1 — Getting Ready to Hack 11

FIGURE 1-8: Mozilla Thunderbird displaying feed subscriptions

FIGURE 1-9: Bloglines offers three-pane aggregation in the browser. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 12

12 Part I — Consuming Feeds

FIGURE 1-10: Newsmap displays items in an alternative UI called a treemap.

FIGURE 1-11: iPodder downloads podcast audio from feeds. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 13

Chapter 1 — Getting Ready to Hack 13

The trend for feed aggregators is to continue to become even more personal, with more machine smarts and access from mobile devices. Also in the works are aggregators that take the form of intermediaries and routers, aggregating from one set of sources for the consumption of other aggregators—feeds go in, feeds come back out. Far removed from the top-heavy centralized models of managed desktop newscasts and portal marketing, feeds and aggregators are being used to build a layer of plumbing on top of the existing Web, through which content and information filter and flow into personal inboxes and news tools. Checking Out Feed Publishing Tools There aren’t as many feed publishing tools as there are tools that happen to publish feeds. For the most part, syndication feeds have been the product of an add-on, plug-in, or template used within an existing content management system (CMS). These systems (which include packages ranging from multimillion-dollar enterprise CMS systems to personal blogging tools) can gen- erate syndication feeds from current content and articles right alongside the human-readable Web pages listing the latest headlines. However, as the popularity and usage of syndication feeds have increased, more feed-producing tools have come about. For example, not all Web sites publish syndication feeds. So, some tinkerers have come up with scripts and applications that “scrape” existing pages intended for people, extract titles and content from those pages, and republish that information in the form of machine-readable syndication feeds, thus allowing even sites lacking feeds to be pulled into your personal subscriptions. Also, as some people live more of their time online through aggregators, they’ve found it useful to pull even more sources of information beyond the usual Web content into feeds. System adminis- trators can keep tabs on server event logs by converting them into private syndication feeds. Most shipping companies now offer online package tracking, so why not turn those updates into feeds? If there are topics you’re interested in, and you often find yourself repeating the same keywords on search engines, you could convert those searches and their results into feeds and maintain a con- tinually updating feed of search results. And, although it might not be the brightest idea if things aren’t completely secure, some tinkerers have filtered their online banking account statements into private feeds so that they stay up to date with current transactions. Another form of feed publishing tool is more of a filter than a publisher. This sort of tool reads a feed, changes it, and spits out a new feed. This could involve changing formats from RSS to Atom or vice versa. The filter could insert advertisements into feed entries, not unlike inline ads on Web pages. Or, rather than ads, a filter could compare feed entries against other feeds and automatically include some recommendations or related links. Filters can also separate out categories or topics of content into more tightly focused feeds. Unfortunately, feed publishing tools are really more like plumbing, so it’s hard to come up with many visual examples or screenshots that don’t look like the pipes under your sink. However, these tools are a very important part of the syndication feed story, as you’ll see in future chapters. Glancing at RSS and Atom Feeds So, what makes an RSS or Atom feed? First off, both are dialects of XML. You’ve probably heard of XML, but just in case you need a refresher, XML stands for Extensible Markup Language. XML isn’t so much a format itself; it’s a framework for making formats. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 14

14 Part I — Consuming Feeds

For many kinds of data, XML does the same sort of thing Internet protocols do for network- ing. On the Internet, the same basic hardware such as routers and hubs enable a wide range of applications such as the Web, email, and Voice-over-IP.In a similar way, XML enables a wide range of data to be managed and manipulated by a common set of tools. Rather than reinvent the wheel every time you must deal with some form of data, XML establishes some useful common structures and rules on top of which you can build. If you have any experience building Web pages with HTML, XML should look familiar to you because they both share a common ancestry in the Standard Generalized Markup Language (SGML). If anything, XML is a cleaner, simpler version of what SGML offers. So, because both RSS and Atom are built on XML technology, you can use the same tools to deal with each. Furthermore, because RSS and Atom both describe very similar sets of data structures, you’ll be able to use very similar techniques and programming for both types of feeds. It’s easier to show than tell, so take a quick look at a couple of feeds, both containing pretty much the same data. First, check out the sample RSS 2.0 feed in Listing 1-1.

Listing 1-1: Example RSS 2.0 Feed

Testing Blog http://example.com/blog/ This is a testing blog! [email protected] Test #1 http://example.com/blog/2005/01/01/foo.html Tue, 01 Jan 2005 09:39:21 GMT tag:example.com,2005-01-01:example.001 This is an example blog posting. <a href=”http://www. Example.com/foobarbaz.html”>Foo Bar Baz</a>. Test #2 http://example.com/blog/2005/01/02/bar.html Tue, 02 Jan 2005 12:23:01 GMT tag:example.com,2005-01-01:example.002 This is another example blog posting. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 15

Chapter 1 — Getting Ready to Hack 15

The anatomy of this feed is pretty basic:

opens the document and identifies the XML data as an RSS feed. begins the meat of the feed. Although I’ll continue to refer to this generi- cally as the feed, the RSS specification refers to its contents as a “channel.” This terminology goes back to the origins of RSS in the days of portal sites. contains the title of this feed, “Testing Blog.” � <link> contains the URL pointing back to the human-readable Web page with which this feed is associated. � <description> contains some human-readable text describing the feed. � <WebMaster> provides the contact email of the person responsible for the channel. � Next comes the <item> tags. Again, here’s a terminology shift. I’ll refer to these as feed entries, while the official RSS terminology is “channel item”—same idea, different terms, but I’ll try to stay consistent. Each <item> tag contains a number of child elements: ■ <title> contains the title of this feed entry. ■ <link> contains the URL pointing to a human-readable Web page associated with this feed entry. ■ <pubDate> is the publication date for this entry. ■ <guid> provides a globally unique identifier (GUID). The isPermalink attribute is used to denote that this GUID is not, in fact, a URL pointing to the “permanent” location of this feed entry’s human-readable alternate. Although this feed doesn’t do it, in some cases, the <guid> tag can do double duty, providing both a unique identifier and a link in lieu of the <link> tag. ■ <description> contains a bit of text describing the feed entry, often a synopsis of the Web page to which the <link> URL refers. � Finally, after the last <item> tag, the <channel> and <rss> tags are closed, ending the feed document.</p><p>If it helps to understand these entries, consider of some parallels to email messages described in Table 1-1.</p><p>Table 1-1 Comparison of RSS Feed Elements to Email Messages</p><p>Email message Feed</p><p>Date: <rss> ➪ <channel> ➪ <item> ➪ <pubDate> To: None in the feed—a feed is analogous to a blind CC to all subscribers, like a mailing list.</p><p>Continued 05_597582_ch01.qxd 8/5/05 10:23 PM Page 16</p><p>16 Part I — Consuming Feeds</p><p>Table 1-1 (continued)</p><p>Email message Feed</p><p>From: <rss> ➪ <channel> ➪ <Webmaster> Subject: <rss> ➪ <channel> ➪ <item> ➪ <title> Message body <rss> ➪ <channel> ➪ <item> ➪ <description></p><p>In email, you have headers that provide information such as the receiving address, the sender’s address, a subject line, and the date when the message was received. Now, in feeds, there’s not usually a “To” line, because feeds are, in effect, CC’ed to everyone in the world, but you can see the parallels to the other elements of email. The entry title is like an email subject, the publication date is like email’s received date, and all of the feed’s introductory data is like the “From” line and other headers in an email message. Now, look at the same information in Listing 1-2, conveyed as an Atom 0.3 feed.</p><p>Listing 1-2: Example Atom 0.3 Feed</p><p><?xml version=”1.0” encoding=”utf-8”?> <feed version=”0.3” xmlns=”http://purl.org/atom/ns#”> <title>Testing Blog This is a testing blog! 2005-01-13T12:21:01Z John Doe [email protected] Test #1 2005-01-01T09:39:21Z 2005-01-01T09:39:21Z tag:example.com,2005-01-01:example.001

This is an example blog posting. <a href=”http://www. Example.com/foobarbaz.html”>Foo Bar Baz</a>.

Test #2 2005-01-02T12:23:01Z 05_597582_ch01.qxd 8/5/05 10:23 PM Page 17

Chapter 1 — Getting Ready to Hack 17

2005-01-02T12:23:01Z tag:example.com,2005-01-01:example.002

This is another example blog posting.

As you can see, with respect to RSS, other than the naming of tags used in this Atom feed and some small changes in structure, just about all of the information is the same:

opens the Atom feed, as compared to and in RSS. contains the title of this feed, “Testing Blog.” � <link> has an attribute named href that contains the URL pointing back to human- readable Web page with which this feed is associated. Atom differs from RSS here in that it specifies a more verbose linking style, including the content type (type) and relational purpose (rel) of the link along with the URL. � <description> contains some human-readable text describing the feed. � <author> provides the contact information of the person responsible for the channel. Again, Atom calls for further elaboration of this information: ■ <name> contains the name of the feed’s author. ■ <email> contains the email address of the feed’s author. � In Atom, the feed entries are contained in <entry> tags, analogous to RSS <item> tags. Their contents are also close to RSS: ■ <title> contains the title of this feed entry. ■ <link> points to a human-readable Web page associated with this feed entry. And, just like the feed-level <link> tag, the entry’s <link> is more verbose than that of RSS. ■ <issued> and <modified> specify the date (in ISO-8601 format) when this entry was first issued and when it was last modified, respectively. The <pubDate> tag in RSS is most analogous to Atom’s <issued>, but sometimes <pubDate> is used to indicate the entry’s latest publishing date, regardless of any previous revi- sions published. ■ <id> provides a GUID. Unlike <guid> in RSS, the <id> tag in Atom is never treated as a <a href="/tags/Permalink/" rel="tag">permalink</a> to a Web page. ■ <summary> contains a description of the feed entry, often a synopsis of the Web page to which the <link> URL refers. � Finally, after the last <entry> tag, the <atom> tag is closed, ending the feed document. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 18</p><p>18 Part I — Consuming Feeds</p><p>In general, the differences between RSS and Atom can be summed up like so:</p><p>� RSS stands for “Really Simple Syndication,” according to the RSS 2.0 specification, and this describes its aims—the format and structure are meant to remain simple and easy to use. � Atom places more of an emphasis on a more finely detailed model of feed data with a greater attention to well-defined specifications and compliance to the specs.</p><p>The more subtle and specific differences between RSS and Atom are subject to debate—even the trivial summary presented here might be heavily disputed, and some of the less-civilized discussions online have become legendary. For practical purposes, though, this book treats RSS and Atom feed formats as mostly equivalent and highlights any differences when they come up and as they affect your tinkering. The important thing is to get you working with feeds, not debating the finer points of specifications.</p><p>Gathering Tools Before you start digging into what you can do with RSS and Atom feeds, it would help to assemble a toolkit of some useful technologies. It also wouldn’t hurt if you could get these tools for free on the Web. With this in mind, this section briefly introduces you to Open Source packages such as the following:</p><p>� UNIX-based command shell tools � The Python programming language � XML and XSLT technologies</p><p>Although this chapter won’t make you an expert in any of these technologies, it should point you in the right directions to set yourself up with a decent working environment for hacking RSS and Atom feeds in the next chapters. Finding and Using UNIX-based Tools First off, you should get yourself a set of UNIX-based tools. Though most of the hacks you explore here can be done in many environments (for example, using the Command Prompt on Windows XP), things go more smoothly under a UNIX-based environment. So, the examples in the following chapters assume you have these tools at your disposal. Using Linux If you’re a Linux user, you’re probably already familiar with command shells, as well as how to install software packages. Rather than trying to cover all the available distributions and variations of Linux, this book focuses on the Debian Linux distribution. The Advanced Packaging Tool used by this distribution makes installing and updating software packages mostly painless, so you can get up and running quickly. If you have another favorite Linux distribution, you should be able to use whatever method is required by that distribution to get tools installed and configured. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 19</p><p>Chapter 1 — Getting Ready to Hack 19</p><p>In any case, you’ll want to be sure that your Linux installation has been installed with the full set of developer packages (for example, GCC, editors, and so on). Other than that, you should be ready to continue on. Using Mac OS X If you’re using Mac OS X, you may not yet be familiar with the UNIX-based foundation on which OS X is built. Thanks to that foundation, though, you already have most of the tools you’ll be using. You may need to find and check a few things, however. You’re going to be using the Terminal application a lot under OS X, so one of the first things you should do is find it and get acquainted with it. You can find the Terminal application at Applications ➪ Utilities ➪ Terminal. You might want to drag it to your Dock to be able to find it quickly in the future. A full tour and tutorial of the UNIX-based underpinnings available to you via the Terminal application would take up a book all on its own, but this at least gives you a way to begin hacking. Using Windows Working under Windows to build the projects in this book is not quite as nice an experience found under Linux and OS X, but it is still workable. Because you’ll be doing just about every- thing from the Command Prompt, you’ll want to locate it first thing. On Windows XP, you’ll find it under Start Menu ➪ Accessories ➪ Command Prompt. You may want to make a short- cut to it on your Desktop or Quick Launch bar, if you haven’t already. You may also want to install some UNIX-based tools, if the Command Prompt proves too cum- bersome. Most of the programs you build in this book will work using the Command Prompt, but occasionally an example here may not quite work in this context. A lot of options are available to get working UNIX-based tools on Windows, but my favorite is called Cygwin. With Cygwin (http://www.cygwin.com), you get a “Linux-like environment for Windows” where you can use the sorts of command shells found on Linux, and you can run many UNIX- based tools in the Windows Command Prompt. Cygwin is a sort of compromise between Windows and UNIX, giving you much of what you need. It’s not the same as an actual Linux environment, but it’s usually close enough. Check out the documentation on the Cygwin site if you’d like to install it and try it out. Installing the Python Programming Language Python is an extremely useful and flexible object-oriented programming language available for just about every operating system, and it comes with a lot of power you’ll need, right out of the box. Installing Python on Linux Under Debian Linux, you can install Python by logging in as root and using apt: # apt-get install python python-dev This should grab all the packages needed to get started with Python. 05_597582_ch01.qxd 8/5/05 10:23 PM Page 20</p><p>20 Part I — Consuming Feeds</p><p>Installing Python on Mac OS X Python is another thing that Mac OS X already provides, so you won’t need to do anything to get started. Well, actually, there is one thing you should do. For some reason, Python on OS X doesn’t come with readline support enabled, and so line editing and command history won’t work unless you install it. You can do this by opening a Terminal and running this command: # python `python -c “import pimp; print pimp.__file__”` -i readline What this does is install readline support using a Python package manager that comes with OS X. (Thanks to Bill Bumgarner for this tip at http://www.pycs.net/bbum/2004/1/21/ #200401211.) Installing Python on Windows For Windows, you can use an installer available at the Python home:</p><p>1. Visit the Python download page at http://www.python.org/download/ and click to download the Python Windows installer, labeled “Windows binary -- does not include source.” 2. After the download completes, double-click the installer and follow the instructions. This should result with Python installed as C:\Python24, depending on which version you install.</p><p>You may want to visit the Python Windows FAQ at http://www.python.org/doc/faq/ windows.html to read up on how to run Python programs and other Windows-specific issues. Installing XML and XSLT Tools RSS and Atom feeds are XML formats, so you should get your hands on some tools to manipulate XML. One of the most useful and most easily installed packages for dealing with XML in Python is called 4Suite, available at: http://4suite.org/ At that URL, you’ll be able to find downloads that include a Windows installer and an archive for installation on Linux and Mac OS X. You’ll see this package mentioned again a little later, but it’s worth installing right now before you get into the thick of things. Installing 4Suite on Windows As of this writing, this is a URL to the latest version of the Windows installer: ftp://ftp.4suite.org/pub/4Suite/4Suite-1.0a3.win32-py2.3.exe Once downloaded, simply double-clicking the installer will get you set up. However, if you want to be guided through the process, check out this Windows installation HOWTO: http://4suite.org/docs/howto/Windows.xml 05_597582_ch01.qxd 8/5/05 10:23 PM Page 21</p><p>Chapter 1 — Getting Ready to Hack 21</p><p>Installing 4Suite on Linux and Mac OS X For Linux and Mac OS X, you’ll want this archive: ftp://ftp.4suite.org/pub/4Suite/4Suite-1.0b1.tar.gz Once downloaded, check out this UNIX installation HOWTO: http://4suite.org/docs/howto/UNIX.xml You can install this package with a series of commands like the following: $ tar xzvf 4Suite-1.0b1.tar.gz $ cd 4Suite-1.0b1 $ python setup.py install Depending on what account you’re logged in as, that last command may need root privileges. So, you may need to login as root or try something like this (particularly under Mac OS X): $ sudo python setup.py install It’s worth noting that just about every Python package used later in the book follows this same basic installation process—that is, download the package, unpack the archive, and run setup.py as root.</p><p>Summary After this chapter, you should have the “50,000-foot view” of syndication feeds and feed aggregation technology in terms of the sorts of tools you can find and the number of feeds you can manage. In the coming chapters, you’ll have the opportunity to build working versions of many of the things mentioned here. Also, you should have a start at a working environment used in this book, with Python and XML tools at your disposal. You might want to read up on these tools, because this book won’t be spending much time explaining basic Python or XML concepts. Instead, you’ll be jumping right into writing working code, so it might help to have at least gotten past the “Hello World” stage first. So, with that, continue on to Chapter 2, where you’ll be building your first simple feed aggregator! 05_597582_ch01.qxd 8/5/05 10:23 PM Page 22</p> </div> </div> </div> </div> </div> </div> </div> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.6.1/jquery.min.js" integrity="sha512-aVKKRRi/Q/YV+4mjoKBsE4x3H+BkegoM/em46NNlCqNTmUYADjBbeNefNxYV7giUp0VxICtqdrbqU7iVaeZNXA==" crossorigin="anonymous" referrerpolicy="no-referrer"></script> <script src="/js/details118.16.js"></script> <script> var sc_project = 11552861; var sc_invisible = 1; var sc_security = "b956b151"; </script> <script src="https://www.statcounter.com/counter/counter.js" async></script> <noscript><div class="statcounter"><a title="Web Analytics" href="http://statcounter.com/" target="_blank"><img class="statcounter" src="//c.statcounter.com/11552861/0/b956b151/1/" alt="Web Analytics"></a></div></noscript> </body> </html><script data-cfasync="false" src="/cdn-cgi/scripts/5c5dd728/cloudflare-static/email-decode.min.js"></script>