PROGRAMMING : Screen Scrapers

Gtk-GUIs and Web requests play along in the Perl Object Environment Winning Team Player The Perl Object Environment (POE) provides a platform for scripts to perform without any help from the ’s scheduler.This month’s application lets a Gtk-driven graphical interface interact with time-consuming web requests with- out hiccups. BY MICHAEL SCHILLI

UI-based applications are usually There is an alternative that avoids another window over the ticker precisely event driven. The program will both these drawbacks – cooperative at this point. In this case the application Gtypically have a main loop that it multitasking with POE, the Perl Object has to redraw the concealed area (this is uses to wait for events such as mouse Environment [2] centered around its known as refreshing). clicks and keyboard input. It is important main developer, Rocco Caputo. The Unfortunately, because it is busy wait- for the program to process these events environment is implemented as a ing for HTTP results to trickle in, it without any delay and quickly return to state machine that runs exactly one doesn’t receive the redraw event, and the main event loop. This prevents the process with a single , but has a this leaves a nasty gray hole on the desk- user from noticing the temporary userspace “kernel” that allows multiple top – not a pretty sight. unavailability of the interface. tasks to be performed quasi simultane- The stock price ticker program we will ously. Asynchronous Approach be looking at in this month’s article, peri- It would be more elegant to transmit odically connects to the Yahoo financial Keeping Track a Web request, and get back to re- pages to request an update for selected Developers who want to query share freshing the graphical display imme- share prices (see Figure 1). Depending prices in Perl will typically opt for the diately, without waiting for the results. on the network connection, a request CPAN module Yahoo::FinanceQuote: When a response from the Yahoo server including DNS resolution of the server finally arrives, it should cause some name may take a few seconds to com- use Finance::YahooQuote; kind of alert. That would mean quickly plete. I would like the application’s my @quote = U updating the ticker window and jumping display to keep on working during this getonequote($symbol); back into the main GUI loop. period. This is exactly what the POE frame- Developers can use multiprocessing or Unfortunately, the module works syn- work does. It provides a kernel where multithreading to achieve this aim. How- chronously, and this makes it difficult to individual applications register sessions, ever, both techniques make the program achieve the smooth scrolling effect we and state machines move between states far more complex. Critical sections need were looking for. The getonequote func- and exchange messages. I/O activity to be protected against parallel access to tion sends a HTTP request to the Yahoo is asynchronous. Instead of opening ensure data integrity, thus avoiding server, waits for a response and then a file or a socket and waiting for its errors that might be difficult to debug. If returns the results. data, you simply say “Hey Kernel, I you have ever needed to analyze a core We want to keep the display running want to read some information. Can dump with 200 active threads, you will while the program is waiting – and Mur- you wake me up when it becomes avail- know what I mean. phy’s law states that someone will drag able?”

68 May 2004 www.linux-magazine.com Perl: Screen Scrapers PROGRAMMING

Data in the Fast Lane PoCoCli::HTTP is a so-called com- companies we are interested in, for Although read and write operations are ponent of the POE framework: a state example YHOO,MSFT,TWX. not actually asynchronous (under the machine that defines its own session The Yahoo server responds as follows: hood, POE simply uses the non-blocking (named useragent in Listing 2, line 73), syswrite or sysread functions), any avail- accepts Web requests in its request "YHOO",45.38,+0.35 able information is drawn in state, and then enters the "MSFT",27.56,+0.19 or pushed out at top speed. POE framework until it "TWX",18.21,+0.75 The cooperative aspect of receives a full HTTP response. POE is the fact that the ses- At this point, useragent asks Gtkticker accepts the response line by sions rely on competing the kernel to tell the calling line, uses the commas to split the strings, sessions not to dilly-dally. If a session, ticker, to enter a state and bundles the information off to the task does not totally pre- called yhoo_response that was GUI display. occupy the CPU, the session passed to useragent previ- has to hand control back to ously. Home Directory the kernel. A single uncoop- The kernel tells the ticker Configuration erative part in a program Figure 1:The GTK- session to do exactly that, and Lines 13 and 14 specify .gtkicker in the would impact the whole sys- based share price the session accepts the HTTP user’s home directory as the symbol file tem. ticker periodically response, which is waiting for to be displayed by the ticker. Lines 30 Multitasking with a single contacts the Yahoo the session, refreshes the through 37 parse the file line by line, dis- thread facilitates program financial page. share price widgets in the dis- carding any lines that start with # as development – you don’t play, before handing control comments (line 33). The implicit for need to worry about locks, there are no back to the kernel without any delay. In loop at the end of line 35 surprises with race conditions, and even line 69, the POE::Component::Client:: if an error occurs, it is typically easy to HTTP component launches with ... for /(\S+)/g; locate. POE will cooperate with the main spawn() and specifies that gtkticker/0.01 event loops of several graphical environ- should appear in the server-side User executes the expression to its left for all ments. POE recognizes both Perl/ and Agent string, and that requests should the words in a line, and leaves the stock gtkperl automatically and integrates time out after 60 seconds. exchange symbol in the $_ variable. This them seamlessly. allows multiple symbols separated by This allows the kernel to assign GUI Yahoo Manual space characters to exist in a single line. events time slots just like the explicitly Lines 10 to 12 in Listing 2 define the URL The push function stacks the stock sym- defined sessions. This is the answer to for Yahoos share price service. The CGI bols in the @SYMBOLS array. Listing 1 the refresh problem. interface of this service expects two shows a sample file. parameters: Despite using the POE framework, gtk- Alerting Kernel •A format parameter (f=) with the ticker employs regular synchronous I/O The ticker in Listing 2 uses the POE::Ses- names of requested fields: s (symbol), functions to read the configuration file, sion state machine as shown in Figure 2. l1 (share price) and c1 (percentage as the file is short and the POE kernel is The initialization phase, _start, generates change since the last stock exchange not running at this point. the GTK interface and sets the alias working day) and name to ticker to allow us to more easily •a symbol parameter that contains a Let the Dance Begin identify the session later. Control is comma-separated list of the stock Line 39 defines the ticker state machine. returned to the kernel following this ini- exchange abbreviations for the listed The inline_states parameter uses a hash tial state. The state machine enters the wake_up state every 60 seconds (via an Returns immediately alert), or when someone clicks the yhoo_response Update button in the GUI. This launches Stock Widgets another POE::Component::Client::HTTP refreshed Yahoo reply ‘Update’ or alert (after available wake_up? type state machine, and then immedi- 60 seconds) Kernel event sent to ately hands control back to the kernel. Yahoo Listing 1: Configuration file

01 # ~/.gtkticker GTK interface initialized Returns immediately _start 02 TWX 03 MSFT 04 YHOO AMZN RHAT Figure 2:The gtkticker state machine. After the initialization state, _start,control is handed back to the 05 DODGX kernel.The machine enters the wake_up state every 60 seconds, wakes up another state machine to 06 JNJ COKE IBM SUN perform an HTTP request and hands control back to the kernel.

www.linux-magazine.com May 2004 69 PROGRAMMING Perl: Screen Scrapers

reference to map functions to the states. in which the program remains until it is actually been replaced by Gtk2, but the The kernel will jump to these functions if shut down. That’s it! new version has some issues with POE. and when the machine enters the corre- The POE::Session object construction Never mind, the venerable GTK module sponding state. Line 53 then pushes the shown previously had a side effect. It ran does a fantastic job. wake_up state for the ticker session to the start() routine defined in line 58, An object of the Gtk::Window class the kernel which maps to the _start state. The latter represents the main window of the sets the alias name for the session to application. It has a typical menu $poe_kernel->post("ticker", U ticker and then jumps to my_gtk_init(). bar at the top, giving access to a "wake_up"); This function, which starts in line 98, File pull-down menus, which in turn constructs the GTK GUI. has a single entry for Quit. This via the variable $poe_kernel exported by entry uses a callback routine, POE. Line 55 launches the main kernel GUIs with GTK Gtk->exit(0), to terminate the applica- loop Gtk is a CPAN module by Marc tion. There is a Gtk::AccelGroup object Lehmann, who was so kind as to check that allows the user to press [Ctrl]+ $poe_kernel->run(); the draft for this article. The module has [Q] to quit the program. The object adds

Listing 2: Gtkticker 001 #!/usr/bin/perl 042 _stop => sub { 083 HTTP::Request->new( 002 ############################# 043 INFO "Shutdown" }, 084 GET => $YHOO_URL . 003 # gtkticker 044 yhoo_response => 085 join ",", @SYMBOLS); 004 # Mike Schilli, 2004 045 \&resp_handler, 086 005 # ([email protected]) 046 wake_up => 087 $STATUS->set( 006 ############################# 047 \&wake_up_handler, 088 "Fetching quotes"); 007 use warnings; 048 } 089 008 use strict; 049 ); 090 $poe_kernel->post( 009 050 091 'useragent', 010 my $YHOO_URL = 051 my $STATUS; 092 'request', 011 "http://quote.yahoo.com/d?". 052 093 'yhoo_response', 012 "f=sl1c1&s="; 053 $poe_kernel->post( 094 $request); 013 my $RCFILE = 054 "ticker", "wake_up"); 095 } 014 "$ENV{HOME}/.gtkticker"; 055 $poe_kernel->run(); 096 015 my @LABELS = (); 056 097 ############################# 016 my $UPD_INTERVAL = 60; 057 ############################# 098 sub my_gtk_init { 017 my @SYMBOLS; 058 sub start { 099 ############################# 018 059 ############################# 100 019 use Gtk; 060 101 my $w = Gtk::Window->new(); 020 use POE qw( 061 DEBUG "Starting up"; 102 $w->set_default_size( 021 Component::Client::HTTP); 062 103 150,200); 022 use HTTP::Request; 063 $poe_kernel->alias_set( 104 023 use Log::Log4perl qw(:easy); 064 'ticker'); 105 # Create Menu 024 use Data::Dumper; 065 my_gtk_init(); 106 my $accel = 025 066 107 Gtk::AccelGroup->new(); 026 Log::Log4perl->easy_init( 067 $STATUS->set("Startup"); 108 $accel->attach($w); 027 $DEBUG); 068 109 my $factory = 028 069 POE::Component::Client::HTTP 110 Gtk::ItemFactory->new( 029 # Read config file 070 ->spawn( 111 'Gtk::MenuBar', 030 open FILE, "<$RCFILE" or 071 Agent => 112 "

", $accel); 031 die "Cannot open $RCFILE"; 072 'gtkticker/0.01', 113 032 while() { 073 Alias => 'useragent', 114 $factory->create_items( 033 next if /^\s*#/; 074 Timeout => 60, 115 { path => '/_File', 034 push @SYMBOLS, $_ 075 ); 116 type => '', 035 for /(\S+)/g; 076 } 117 }, 036 } 077 118 { path => 037 close FILE; 078 ############################# 119 '/_File/_Quit', 038 079 sub upd_quotes { 120 accelerator => 039 POE::Session->create( 080 ############################# 121 'Q', 040 inline_states => { 081 122 callback => 041 _start => \&start, 082 my $request = 123 [sub { Gtk->exit(0) }],

70 May 2004 www.linux-magazine.com Perl: Screen Scrapers PROGRAMMING

so-called accelerators to the mouse acter, allowing the user to press a key- method places the elements from the top controls. board shortcut (such as [Alt]+[F]) to down, while pack_end() stacks widgets navigate the menu. The callback para- bottom up. We can see that the following Factory-Made Menus meter specifies the function that Gtk will statement: The menu is created by using the perform whenever the user selects the Gtk::ItemFactory class which is first used entry with the mouse, or presses the key- $vb->pack_start($menu_bar, U to create a Gtk::MenuBar menu bar. The board shortcut defined by the accelerator $expand, $fill, $padding); menu entries and their subordinate pull- parameter. downs are created using the create_ places the menu bar at the top of items() method. Layout Manager the VBox. In line 132 gtkticker uses The path parameter specifies the posi- Two different approaches are used to $factory->get_widget(‘

‘) to tion of the menu item – for example, arrange widgets geometrically: Gtk:: retrieve the bar’s object by name. The /_File/_Quit defines the Quit entry below VBox and Gtk::Table. The Gtk::VBox con- $expand parameter used in pack_start() File in the menu bar. The underscores _ tainer element aligns the widgets it specifies whether the area that the wid- tell Gtk to underline the following char- contains vertically. The pack_start() get occupies should grow if the user

Listing 2: Gtkticker 124 }); 165 $row, $row+1); 206 $resp->content()) { 125 166 } 207 126 my $vb = Gtk::VBox->new( 167 } 208 my($symbol, $price, 127 0,0); 168 209 $change) = 128 my $upd = Gtk::Button->new( 169 $w->add($vb); 210 split /,/, $_; 129 'Update'); 170 211 130 171 # Destroying window 212 chop $change; 131 $vb->pack_start( 172 $w->signal_connect( 213 $change = "" if 132 $factory->get_widget( 173 'destroy', sub { 214 $change =~ /^0/; 133 '

'), 0, 0, 0); 174 Gtk->exit(0)}); 215 134 175 216 $symbol =~ s/"//g; 135 # Button at bottom 176 # Pressing update button 217 $LABELS[$count][0]-> 136 $vb->pack_end($upd, 177 $upd->signal_connect( 218 set($symbol); 137 0, 0, 0); 178 'clicked', sub { 219 $LABELS[$count][1]-> 138 179 DEBUG "Sending wakeup"; 220 set($price); 139 # Status line on top 180 $poe_kernel->post( 221 $LABELS[$count][2]-> 140 # of buttons 181 'ticker', 'wake_up')} 222 set($change); 141 $STATUS= Gtk::Label->new(); 182 ); 223 $count++; 142 $STATUS->set_alignment( 183 $w->show_all(); 224 } 143 0.5, 0.5); 184 } 225 144 $vb->pack_end($STATUS, 185 226 $STATUS->set(""); 145 0, 0, 0); 186 ############################# 227 146 my $table = 187 sub resp_handler { 228 1; 147 Gtk::Table->new( 188 ############################# 229 } 148 scalar @SYMBOLS, 3); 189 my ($req, $resp) = 230 149 190 map { $_->[0] } 231 ############################# 150 $vb->pack_start($table, 191 @_[ARG0, ARG1]; 232 sub wake_up_handler { 151 1, 1, 0); 192 233 ############################# 152 193 if($resp->is_error()) { 234 DEBUG("waking up"); 153 for my $row (0.. 194 ERROR $resp->message(); 235 154 @SYMBOLS-1) { 195 $STATUS->set( 236 # Initiate update 155 for my $col (0..2) { 196 $resp->message()); 237 upd_quotes(); 156 my $label = 197 return 1; 238 157 Gtk::Label->new(); 198 } 239 # Re-enable timer 158 $label->set_alignment( 199 240 $poe_kernel->delay( 159 0.0, 0.5); 200 DEBUG "Response: ", 241 'wake_up', $UPD_INTERVAL); 160 push @{$LABELS[$row]}, 201 $resp->content(); 242 } 161 $label; 202 162 203 my $count = 0; 163 $table->attach_defaults( 204 164 $label, $col, $col+1, 205 for(split /\n/,

www.linux-magazine.com May 2004 71 PROGRAMMING Perl: Screen Scrapers

drags the mouse to expand the main the subroutine tidies up the Gtk session onward, the ticker will automatically window. If so, fill specifies whether and quits the program. After completing update its share prices every 60 seconds, the widget itself should also grow – the widget definitions, the show method without the user having to press the this will allow buttons to grow to (line 183) displays them in the main Update button. enormous proportions. Finally, $padding window (line 183) of the screen. specifies the minimum number of Installation pixels that should separate the widget The Kernel Strikes Back It is best to use a CPAN shell to install vertically from its neighbors. Gtkticker In the yhoo_response state, the POE ker- the required POE modules, POE and displays status messages in an un- nel jumps to the function listed below POE::Component::Client::HTTP. If the obtrusive Gtk::Label widget directly line 187, resp_handler. By definition, POE::Component::Client::DNS module is above the Update button. The set_align- POE::Component::Client::HTTP will store also installed, DNS requests will then be ment() method uses the following a request and a response packet in ARG0 performed asynchronously; gethostby- syntax and ARG1 when this happens. POE uses name() may otherwise cause a slight this slightly strange approach to passing delay. $STATUS->set_alignmentU parameters after introducing new func- Installing Gtk from CPAN means (0.5, 0.5); tions representing numerical constants, resolving a few dependencies and such as KERNEL, HEAP, ARG0, ARG1. caused a few issues on my own system. to align the text horizontally and verti- POE’s authors expect programmers to However, cally. If you want to experiment, a use them to index the array of function horizontal value of 0.0 stands for left- parameters, @_. For example, $_[KER- touch ./Gtk/build/U justified, and 1.0 for right-justified text. NEL] will always return the kernel perl-gtk-ref.pod In contrast to Gtk::VBox, the Gtk:: object, helping to keep the index that perl Makefile.PL U Table container element provides Perl KERNEL points to transparent. --without-guessing programmers with a tool to conveniently The request and response packets just arrange widgets in a table. The attach_ mentioned are references to arrays, in the distribution directory, followed defaults() method expects five para- whose first elements contain HTTP:: by make install worked around these meters: the widget, which is to be Request or HTTP::Response objects. The obstacles. aligned and two column and row coordi- map command in line 190 extracts them The script logs debugging messages to nates between which the widget will to $req and $resp. STDOUT, if this gets in your way, simply be located. For example, the following If a HTTP error occurs, line 195 gener- set the verbosity via Log::Log4perl (also statement: ates an appropriate message in the status from CPAN) and in line 27 from $DEBUG widget and the function returns. Other- to $ERROR. It is fascinating to see how $table->attach_defaultsU wise, the two dimensional global array of smooth the display is. Even if you are ($label, 0, 1, 1, 2); label widgets is refreshed. The widgets fooling around with the menu while the display the stock exchange symbol, the application is performing an automatic specifies that the Gtk::Label object that current price, and the change as a per- update across a slow network, the GUI $label points to will be added to the first centage (the special case zero percent is stays intact. Expensive-looking, but row (“between 0 and 1”) and the second simply ignored). cheaply implemented! ■ column (“between 1 and 2”) of a table called $table. Periodically Delayed Alert INFO A wake_up event in the POE kernel calls [1] Listings for this article: And Action! the wake_up_handler() routine defined http://www.linux-magazine.com/ You can assign actions to Gtk::Button in line 232 and the following lines. It Magazine/Downloads/42/Perl/ type widgets. Gtk performs the action calls the upd_quotes() function, which is [2] POE:http://poe.perl.org assigned when a user presses the button. implemented in line 79 and following. [3] Jeffrey Goff,“A Beginner’s Introduction to The method called in line 177, The function defines a HTTP::Request POE”,2001:http://www.perl.com/pub/a/ signal_connect(), specifies that Gtk object and uses an event to send it to the 2001/01/poe.html should send a wake_up event to the POE POE::Component::Client::HTTP compo- [4] Matt Sergeant,“Programming POE”,talk kernel when the user clicks the Update nent. The target state for the ticker is set at TPC 2002:http://axkit.org/docs/ button. to yhoo_response. presentations/tpc2002 The main window also has an action After completing these preparatory [5] Gtkperl:http://gtkperl.org assigned – users can click the X in the steps, wake_up_handler() uses the top right-hand corner of the window to kernel’s delay() method to set an alert. [6] Gtkperl tutorial: http://personal.riverusers.com/ terminate the application. As in the fol- This will cause a wake_up event to occur ~swilhelm/gtkperl-tutorial/ lowing code: in the ticker session when the num- ber of seconds defined as the [7] Eric Harlow,“Developing Linux Applications with GTK+ and GDK”: $w->signal_connect('destroy', $UPD_INTERVAL (60 seconds in this New Riders,1999,ISBN 0735700214 sub {Gtk->exit(0)}); case) has elapsed. From this point

72 May 2004 www.linux-magazine.com