Winning Team Player
Total Page:16
File Type:pdf, Size:1020Kb
PROGRAMMING Perl: Screen Scrapers Gtk-GUIs and Web requests play along in the Perl Object Environment Winning Team Player The Perl Object Environment (POE) provides a platform for scripts to perform cooperative multitasking without any help from the operating system’s scheduler.This month’s application lets a Gtk-driven graphical interface interact with time-consuming web requests with- out hiccups. BY MICHAEL SCHILLI UI-based applications are usually There is an alternative that avoids another window over the ticker precisely event driven. The program will both these drawbacks – cooperative at this point. In this case the application Gtypically have a main loop that it multitasking with POE, the Perl Object has to redraw the concealed area (this is uses to wait for events such as mouse Environment [2] centered around its known as refreshing). clicks and keyboard input. It is important main developer, Rocco Caputo. The Unfortunately, because it is busy wait- for the program to process these events environment is implemented as a ing for HTTP results to trickle in, it without any delay and quickly return to state machine that runs exactly one doesn’t receive the redraw event, and the main event loop. This prevents the process with a single thread, but has a this leaves a nasty gray hole on the desk- user from noticing the temporary userspace “kernel” that allows multiple top – not a pretty sight. unavailability of the interface. tasks to be performed quasi simultane- The stock price ticker program we will ously. Asynchronous Approach be looking at in this month’s article, peri- It would be more elegant to transmit odically connects to the Yahoo financial Keeping Track a Web request, and get back to re- pages to request an update for selected Developers who want to query share freshing the graphical display imme- share prices (see Figure 1). Depending prices in Perl will typically opt for the diately, without waiting for the results. on the network connection, a request CPAN module Yahoo::FinanceQuote: When a response from the Yahoo server including DNS resolution of the server finally arrives, it should cause some name may take a few seconds to com- use Finance::YahooQuote; kind of alert. That would mean quickly plete. I would like the application’s my @quote = U updating the ticker window and jumping display to keep on working during this getonequote($symbol); back into the main GUI loop. period. This is exactly what the POE frame- Developers can use multiprocessing or Unfortunately, the module works syn- work does. It provides a kernel where multithreading to achieve this aim. How- chronously, and this makes it difficult to individual applications register sessions, ever, both techniques make the program achieve the smooth scrolling effect we and state machines move between states far more complex. Critical sections need were looking for. The getonequote func- and exchange messages. I/O activity to be protected against parallel access to tion sends a HTTP request to the Yahoo is asynchronous. Instead of opening ensure data integrity, thus avoiding server, waits for a response and then a file or a socket and waiting for its errors that might be difficult to debug. If returns the results. data, you simply say “Hey Kernel, I you have ever needed to analyze a core We want to keep the display running want to read some information. Can dump with 200 active threads, you will while the program is waiting – and Mur- you wake me up when it becomes avail- know what I mean. phy’s law states that someone will drag able?” 68 May 2004 www.linux-magazine.com Perl: Screen Scrapers PROGRAMMING Data in the Fast Lane PoCoCli::HTTP is a so-called com- companies we are interested in, for Although read and write operations are ponent of the POE framework: a state example YHOO,MSFT,TWX. not actually asynchronous (under the machine that defines its own session The Yahoo server responds as follows: hood, POE simply uses the non-blocking (named useragent in Listing 2, line 73), syswrite or sysread functions), any avail- accepts Web requests in its request "YHOO",45.38,+0.35 able information is drawn in state, and then enters the "MSFT",27.56,+0.19 or pushed out at top speed. POE framework until it "TWX",18.21,+0.75 The cooperative aspect of receives a full HTTP response. POE is the fact that the ses- At this point, useragent asks Gtkticker accepts the response line by sions rely on competing the kernel to tell the calling line, uses the commas to split the strings, sessions not to dilly-dally. If a session, ticker, to enter a state and bundles the information off to the task does not totally pre- called yhoo_response that was GUI display. occupy the CPU, the session passed to useragent previ- has to hand control back to ously. Home Directory the kernel. A single uncoop- The kernel tells the ticker Configuration erative part in a program Figure 1:The GTK- session to do exactly that, and Lines 13 and 14 specify .gtkicker in the would impact the whole sys- based share price the session accepts the HTTP user’s home directory as the symbol file tem. ticker periodically response, which is waiting for to be displayed by the ticker. Lines 30 Multitasking with a single contacts the Yahoo the session, refreshes the through 37 parse the file line by line, dis- thread facilitates program financial page. share price widgets in the dis- carding any lines that start with # as development – you don’t play, before handing control comments (line 33). The implicit for need to worry about locks, there are no back to the kernel without any delay. In loop at the end of line 35 surprises with race conditions, and even line 69, the POE::Component::Client:: if an error occurs, it is typically easy to HTTP component launches with ... for /(\S+)/g; locate. POE will cooperate with the main spawn() and specifies that gtkticker/0.01 event loops of several graphical environ- should appear in the server-side User executes the expression to its left for all ments. POE recognizes both Perl/Tk and Agent string, and that requests should the words in a line, and leaves the stock gtkperl automatically and integrates time out after 60 seconds. exchange symbol in the $_ variable. This them seamlessly. allows multiple symbols separated by This allows the kernel to assign GUI Yahoo Manual space characters to exist in a single line. events time slots just like the explicitly Lines 10 to 12 in Listing 2 define the URL The push function stacks the stock sym- defined sessions. This is the answer to for Yahoos share price service. The CGI bols in the @SYMBOLS array. Listing 1 the refresh problem. interface of this service expects two shows a sample file. parameters: Despite using the POE framework, gtk- Alerting Kernel •A format parameter (f=) with the ticker employs regular synchronous I/O The ticker in Listing 2 uses the POE::Ses- names of requested fields: s (symbol), functions to read the configuration file, sion state machine as shown in Figure 2. l1 (share price) and c1 (percentage as the file is short and the POE kernel is The initialization phase, _start, generates change since the last stock exchange not running at this point. the GTK interface and sets the alias working day) and name to ticker to allow us to more easily •a symbol parameter that contains a Let the Dance Begin identify the session later. Control is comma-separated list of the stock Line 39 defines the ticker state machine. returned to the kernel following this ini- exchange abbreviations for the listed The inline_states parameter uses a hash tial state. The state machine enters the wake_up state every 60 seconds (via an Returns immediately alert), or when someone clicks the yhoo_response Update button in the GUI. This launches Stock Widgets another POE::Component::Client::HTTP refreshed Yahoo reply ‘Update’ or alert (after available wake_up? type state machine, and then immedi- 60 seconds) Kernel event sent to ately hands control back to the kernel. Yahoo Listing 1: Configuration file 01 # ~/.gtkticker GTK interface initialized Returns immediately _start 02 TWX 03 MSFT 04 YHOO AMZN RHAT Figure 2:The gtkticker state machine. After the initialization state, _start,control is handed back to the 05 DODGX kernel.The machine enters the wake_up state every 60 seconds, wakes up another state machine to 06 JNJ COKE IBM SUN perform an HTTP request and hands control back to the kernel. www.linux-magazine.com May 2004 69 PROGRAMMING Perl: Screen Scrapers reference to map functions to the states. in which the program remains until it is actually been replaced by Gtk2, but the The kernel will jump to these functions if shut down. That’s it! new version has some issues with POE. and when the machine enters the corre- The POE::Session object construction Never mind, the venerable GTK module sponding state. Line 53 then pushes the shown previously had a side effect. It ran does a fantastic job. wake_up state for the ticker session to the start() routine defined in line 58, An object of the Gtk::Window class the kernel which maps to the _start state. The latter represents the main window of the sets the alias name for the session to application. It has a typical menu $poe_kernel->post("ticker", U ticker and then jumps to my_gtk_init(). bar at the top, giving access to a "wake_up"); This function, which starts in line 98, File pull-down menus, which in turn constructs the GTK GUI.