<<

Mojolicious Web Clients brian foy Web Clients by

Copyright 2019-2020 © brian d foy. All rights reserved.

Published by School.

ii | Table of contents

Preface vii What You Should Already Know ...... viii Some eBook Notes ...... ix Installing Mojolicious ...... x Getting Help ...... xii Acknowledgments ...... xiii Perl School ...... xiv Changes ...... xiv 1 Introduction 1 The Mojo Philosophy ...... 1 Be Nice to Servers ...... 4 How HTTP Works ...... 6 Add to the Request ...... 15 httpbin ...... 16 Summary ...... 18 2 Some Perl Features 19 Perl Program ...... 19 Declaring the Version ...... 20 Signatures ...... 22 Postfix Dereference ...... 25 Indented Here Docs ...... 25 Substitution Returns the Modified Copy ...... 26 Summary ...... 27 3 Basic Utilities 29 Working with ...... 30 Decoding JSON ...... 34 Collections ...... 45 Random Utilities ...... 50

TABLE OF CONTENTS | iii Events ...... 52 Summary ...... 55 4 The 57 Walking Through HTML or XML ...... 57 Modifying the DOM ...... 61 Summary ...... 61 5 CSS Selectors 63 Finding the thing ...... 63 Selecting the Root Element ...... 65 Selecting Any Element ...... 66 Selecting by Tag ...... 68 Selecting by Attribute ...... 69 Selecting by Attribute Value ...... 71 Selecting by Relative Position ...... 79 Summary ...... 96 6 Requests 97 The Request Process ...... 97 Adding Headers ...... 99 Authentication ...... 106 Adding Message Bodies ...... 112 Uploading a File ...... 117 Miscellaneous Settings ...... 124 Summary ...... 128 7 Handling Cookies 129 Cookie Basics ...... 130 The Cookie Jar ...... 132 Cookie Jars in the User-Agent ...... 141 Adding Cookies to Requests ...... 143 Summary ...... 146 8 Handling Responses 147 Did It Happen? ...... 147 Inspecting Headers ...... 150 Redirection ...... 152 Dealing with Content ...... 155 Caching and Conditional Requests ...... 158 Fluent Content ...... 165 iv | TABLE OF CONTENTS Progress Bars ...... 170 Reading the data yourself ...... 171 Handling Meta Tags ...... 173 Restricting Response Sizes ...... 175 Pagination ...... 176 Summary ...... 177 9 Concurrency 179 Some Misconceptions ...... 179 Non-blocking Requests ...... 182 Promises ...... 185 Summary ...... 201 10 Content Generators 203 Existing Content Generators ...... 203 Creating Your Own Generator ...... 207 Summary ...... 211 11 Command-Line Programs 213 The mojo Command ...... 213 Perl One-Liners ...... 219 Fetching a Resource ...... 221 Playing with the DOM ...... 224 Extending ojo ...... 225 Summary ...... 226 12 Conclusion 229 Uncovered Topics ...... 230 13 Resources 235

TABLE OF CONTENTS | v vi | TABLE OF CONTENTS Preface

Just Shut Up And Take My Money!

I’ve wanted to write this sort of book for 20 years. I’m quick to use web client automation to get things done with onerous web sites or to test web applications. I’ve spent a lot of time working in the guts of the HTTP protocol to watch things work, and my first presentation at my first Perl conference was about a web sniffer tool, which I hope that no one ever finds because its embarrassing now.

First there was Randal Schwartz’s Perl 4-era chat2.pl. At the time it was amazing. When I required that I could easily talk to other things. But that was a simpler time.

Then, Graham Barr’s LWP (also known as libwww-perl) was the tool everyone used. And it was amazing. I liked it and it solved many problems for me.

Then Mojolicious reinvented the , and it’s amazing. It rein- vented the web client with some advanced features that I had wished LWP had. Someday I may look back on it with the reverence I have for its pre- decessors, but something many times better would have to exist to displace it.

I had thoroughly enjoyed ’s Network Programming with Perl, but it was a comprehensive book about all sorts of network programming. I dreamed of something smaller and more focused on web clients and less

| vii about sockets.

But, there was always another book that had my attention. After I finished 6 (the language now named Raku) I wanted to work on some- thing lighter that I could deliver much faster. Instead of 450 pages I wanted to do something in 50 pages (I’m guessing I went over that). I can finally get this web client book off my to-do list.

These sort of books are fun. My Learning Perl, , and Learn- ing Perl 6 books are tutorial books where I’m limited to using only the fea- tures I’ve previously explained. Not so with this book. This isn’t a tutorial. You should try the examples but I won’t have exercises. I’ll reach for the Perl features that I tend to use when I actually program. If something’s a bit tricky I’ll probably explain it quickly and move on. Often I won’t quickly move on because I can’t help myself; don’t worry though—I employ no black magic or special cleverness.

I set the goal of releasing a preview version of this book at the 2018 Nordic Perl Workshop in September. That was about three months to pull it all to- gether. How much could I get done? Since I’m not committing it to paper distribution, I could easily fix problems later. As an eBook, I should beable to release this often. The end of this preface has version and change infor- mation.

This book is an experiment for this type of publishing. I’ll see how it turns out.

What You Should Already Know

This is a book about the web user-agent that comes with the Mojolicious web framework. You don’t need to know that much about the web and how it works; you should realize that the web exists, it has a particular protocol, and clients can fetch resources using that protocol.

If you are new to Perl this isn’t a good first book. You should know some Perl—the level of Intermediate Perl. You need to be comfortable using viii | Preface modules and working with objects, but not necessarily creating classes or modules on your own.

Web design, CSS, and other things don’t matter that much here—I’m the wrong person to give you any advice on that. This book is strictly about exchanging content and extracting data. It’s helpful if you know a little HTML or CSS, but you don’t need to know how to it do anything fancy.

Having said that, I should also point out that I completely ignore the exis- tence of JavaScript. Perl web agents don’t handle that for you. It’s a big cut-out in the topics you might expect, and I’m not going to include any workarounds for that.

Some eBook Notes

This is an eBook first and any other sort of book after that. Any designde- cision starts with the eBook presentation even if other forms suffer. Partic- ularly, a PDF version of this book is not part of that design even though it’s the form I prefer when I can’t have paper.

Will the eBook work? Can I produce more content more frequently for less money? Are people more interested when I can update it frequently (maybe twice a year)? Does that make up for the other trade-offs? eBooks and their readers weren’t designed around technical books with sig- nificant code examples. The resolution is quite low which is a problem for normal code organization when you’re accustomed to an 80 character line. I suggest that you use the largest page size that you can, but some readers make odd choices about how much of the screen they should give a single page. The reader I like magically switches to a two page layout when the page size gets large enough.

Some lines shouldn’t wrap, such as the one-liners in chapter 11, but they do and I preëmpt that. I use the ⏎ character to note that the line is continued on the next line in the book but really belongs together:

Some eBook Notes | ix % perl -Mojo -E "p('http://example.com' =>⏎ form => { robot => 'Bender'})"

In other places I’ve made code style concessions to fit in the format and the behavior of the eBook readers that I’ve tested. Your favorite reader might handle long lines just fine, but some other readers have problems. With this experiment I can figure out what works the best; let’s hope that’s at least good.

Let me know what works for you and send me screenshots of the places where your eBook reader doesn’t do what you think I meant.

Installing Mojolicious

Mojolicious is a mostly self-contained system. Start by installing the basic framework with your favorite CPAN client:

% Mojolicious % cpanm Mojolicious

Use the Perl (ppm) if you are using ActivePerl:

% ppm install Mojolicious

Check the version you installed. I’m basing this book off version 8 and the examples should work with that. If you have an earlier version, upgrade! If you have a later version some things might have changed but are probably a small adjustment:

% mojo version CORE Perl (v5.28.0, darwin) Mojolicious (8.12, Supervillain)

OPTIONAL x | Preface Cpanel::JSON::XS 4.04+ (4.04) EV 4.0+ (n/a) IO::Socket::Socks 0.64+ (n/a) IO::Socket::SSL 2.009+ (2.056) Net::DNS::Native 0.15+ (n/a) Role::Tiny 2.000001+ (2.000006)

This version is up to date, have fun!

Notice that it also lists optional modules that Mojolicious can use if you have them installed. Remember this command in case you have to report an issue; you should include this output in your report.

Check that your version of OpenSSL and install the latest IO::Socket::SSL. These ensure that you’ll be able to access servers using SSL and all of its fancy features:

% openssl version LibreSSL 2.2.7 % cpan IO::Socket::SSL

Your particular system may have specialized ways to package and manage modules, libraries, and tools. You’ll have to figure out that end on your own.

Once you’ve installed Mojolicious, try it out. There’s a ojo module designed to load from the command line; with -M it looks like “Mojo”:

% perl -Mojo -E 'say g(shift)->body'⏎ https://www.mojolicious.org

That g() does a GET request and returns the response. The body method extracts the content. To look at only the headers (a common task in working out automation tasks) use h() to do a HEAD request:

% perl -Mojo -E 'say h(shift)->headers⏎ ->to_string' https://www.mojolicious.org

If those worked out you’re ready for everything in this book.

Installing Mojolicious | xi Third-party tools

Web client programming isn’t just Perl. You’ll use interactive browsers (so, Safari, Chrome, Firefox, Opera) to figure out what’s happening so you can implement that in code. These are more important than they used to be; the world is moving toward ubiquitous HTTPS so the secure networking precludes external sniffer tools. I won’t explain these tools here but they shouldn’t be too hard to figure out:

• In Safari, turn on the developer tools in the Advanced tab of prefer- ences. You should see a Develop menu item after you do that.

• For Chrome and Firefox, look for extensions that named like “HTTP Headers”. There are many of varying quality. Try them until you find one that you like.

• The Postman tool on macOS has been helpful to me for exploring .

• httpbin.org has a simple that can show you what’s happening in HTTP transactions. There’s even a dockerized version of it. I’ll use that in many examples.

Getting Help

I don’t have special techniques for debugging Mojo code. I still use my fa- vorite debugger that’s available in almost any : print (or puts or println or …).

Running my programs with the environment variable MOJO_CLIENT_DEBUG set to some true value shows me what’s happening underneath all of my client code. That might elucidate my problem. You’ll see an example on that in chapter 1.

Pare down your situation to a small program that demonstrates the problem. Remove as many distractions as possible; often that gives me my answer. xii | Preface Not only do I solve the problem but I learn about how things work and how I probably misunderstood things. If I can’t figure it out I have all the makings of a good question.

There are many people waiting to help. Develop a good question and include the versions you are using, what you expected to happen, and how that is different than what you got.

• Mojolicious

• Mojolicious Google Group

• #mojo on IRC

• Stackoverflow

• The GitHub project

Acknowledgments

The Mojolicious core developers have been quite kind in their support of the book. Sebastian Riedel even designed the cover after he saw my kludgey at- tempt. Joel Berger reviewed early drafts and offered great advice. Marcus Ramberg was quite helpful and supportive as well. The Oslo invited and sponsored me for the 2018 Nordic Perl Workshop where I pre- sented some of the book and worked on it in the hackathon.

David Cross, the publisher of Perl School, gave me the impetus to try an eBook. He’s publishing low-cost, easy-to-update eBooks on a variety of niche subjects that traditional publishers won’t invest in. He walked me through the publishing process he developed after trying many different ways his idea could work.

Mark Anderson and Gabor Szabo provided helpful edits that led to updates.

Acknowledgments | xiii Perl School

The Perl School brand has its roots in a series of low-cost Perl training courses that Dave Cross ran in 2012. By running low-cost training at the weekend, he hoped to encourage more to keep their Perl knowledge up to date. These courses were run regularly for about a year before the idea was put on hold for a while.

Dave always knew that he would want to return to the Perl School brand at some point and late in 2017 he realised what the obvious next step was low- cost Perl books. He had already developed a pipeline for creating e-books from Markdown files so it was a short step to republishing some of his training materials as books.

The first Perl School book, Perl Taster was published at the end of 2017 (just in time for the London Perl Workshop). The second was Selenium and Perl and this is the third. Dave has plans to publish more over the coming months.

If you are interested in writing a book for the Perl School range, then please get in touch. We are @perl_school on Twitter.

Changes

April 2019

• Added an example for processing a response in chunks

• Added a few more one-liner examples

February 2020

• Minor fixes xiv | Preface January 2020 - Minor fixes

• Minor fixes

December 2019

• First release

Changes | xv xvi | Preface Chapter 1

Introduction

My story is a lot like yours, only more interesting ‘cause it involves robots.

This chapter looks a little bit at Mojolicious and its philosophy. I’ll show some simple programs to review basic HTTP principles while I introduce the framework itself.

The Mojo Philosophy

The Mojolicious framework is self-contained. It needs only core Perl and so that changes in CPAN modules have minimal impact on the basic operations. It handles just about anything that you need to do in a or client. That includes a couple of web servers to run your application, but that’s the subject of a different book.

The client, however, isn’t everything you’d need for a full-fledged interactive web browser (for instance, it’s not going to handle a graphic interface or run JavaScript) but it has most things you’d need to automate web tasks.

| 1 Mojo reinvented some wheels but those wheels came out even better. Since almost everything is under the control of the Mojolicious team, they can fix, extend, and control almost every aspect of your experience. This gives you a consistent experience across the entire framework.

However, the developers move as quickly as they want to go even if they might break things. And, they do sometimes break backward compatibility, but they tend to give fair warning and wait for a major version changes. There is a three-month deprecation period, so parts of this book might be outdated by the time you read it. When that happens I’ll try to update the book.

Mojo only guarantees that it works with supported perls. That’s the current and previous stable versions. This saves time testing and working around older versions. As I write this, that’s v5.28 and v5.30 (see official perl support policy). Mojo may work on early versions of perl but don’t rely on it. If you are worried about that, don’t update things. And, even if you aren’t worried about that, try things before updating. Come up with some system to rollback changes to a known good system. That’s just good advice no matter what you are using.

Futurama

Many of the examples and sample data in the Mojo documentation draw from the animated series Futurama. I’ll continue that here. The pithy state- ments and references probably come from that show. You don’t need to know anything about the show to understand the structure and purpose of the code, but it can’t hurt. I’ll leave it to you and your web browser to dis- cover who Bender, Fry, Leela, and the others are.

Fluent programming

The Mojo interface is an example of “fluent programming”—a type of inter- face from Eric Evans and Martin Fowler that relies on chained method calls

2 | Introduction where most methods return another object (which you call more methods on):

use Mojo::UserAgent;

# shift() outside a subroutine works on @ARGV say Mojo::UserAgent->new->get( shift )->result->body;

That’s a soup of text where it’s hard to pick out what’s going on. I like to rewrite these things to highlight major operations on their own line (and it works out better for eBooks):

say Mojo::UserAgent->new ->get( shift ) ->result ->body;

The new returns a Mojo::UserAgent object. The get returns the transaction object, which includes the request and the response. The result extracts the response object, and the body gets its content. At first you’ll struggle with the documentation to keep all this straight, but you’ll get used to it.

If something goes wrong, that code will croak. Most often you’ll make the transaction then check that it works. This is generally a good idea with any- thing that interacts with the outside world. Things fail for all sorts of reasons, many outside of your control:

use Mojo::UserAgent;

my $tx = Mojo::UserAgent->new->get( shift ); unless( $tx->result->success ) { die "There was a problem\n"; }

say $tx->result->body;

Often I’ll define a method in UNIVERSAL (a particular no-no I don’t recom- mend for production code). Every object will inherit these methods; I can

The Mojo Philosophy | 3 call the NAME method on any object to see the class name for an object:

package UNIVERSAL { sub NAME { ref $_[0] } }

use Mojo::UserAgent;

my $tx = Mojo::UserAgent->new->get( shift ); say $tx->NAME; # Mojo::Transaction::HTTP unless( $tx->result->is_success ) { die "There was a problem\n"; }

say $tx->result->NAME; # Mojo::Message::Response

In some code examples in Mojolicious, you’ll see a code reference used as a method. It’s valid Perl. The argument has the invocant as the first element so you can use that to do more localized debugging:

my $namer = sub { ref $_[0] };

...

say $tx->result->$namer(); # Mojo::Message::Response

Some future version of Perl might have reflection features that make this sort of trickery obsolete.

Be Nice to Servers

This is a book essentially about how to annoy servers with your own pro- grams. Sometimes these programs are called “bots” because they operate on their own without supervision. Sometimes they can go awry and wreak havoc with too many requests, repetitive requests, and so on. These are typ- ically different from the programs that connect to a couple of to

4 | Introduction grab particular resources then shut down.

Many of the things that you’ll likely want to create will be a force for good: automating repetitive tasks, collating information, and other things. How- ever, not everyone sees it that way. It’s their server so it’s their rules even if I don’t like them.

Some sites have Terms of Service. If you want to interact with these sites with your own program, read that agreement. Some sites might outright disallow it, but others have instructions on how to do it within their frameworks.

Don’t send several hundred requests to a server right away. Spread them out a bit by pausing slightly between requests. A couple requests a second is a rule that I like to use (and the rate that many rate-limiting APIs seem to like). You don’t want to draw too much attention to your program lest the server stops responding to you (or worse, complaining to your hosting provider).

Remember that your actions can cost the server side real money in band- width costs. You’d be surprised how many things that look like sophisticated big businesses are really some guy at his kitchen table.

/robots.txt

Servers might have a /robots.txt file that specifies some of their rules ac- cording to the Standard for Robot Exclusion. This was a more popular thing a couple decades ago and seems to have fallen out of favor. I’m guessing most bots didn’t care and servers developed much better methods to respond to unfriendly bots).

It specifies rules by the user-agent string. You should compare your user- agent identifier to the rules to see what the servers wants you toexclude. This one wants Mojo programs to ignore everything:

User-agent: Mojolicious (Perl) Disallow: /

Perhaps it tells you that parts of the website are inappropriate for any bots

Be Nice to Servers | 5 by using a * as the user-agent string:

User-agent: * Disallow: /cgi-bin/gnarly-db-query/

Maybe it tells you that everything is disallowed:

User-agent: * Disallow: /

Although many servers don’t provide these, some of the big web crawlers (such as Google) look for and respect them. Remember that when you start creating your own Mojolicious servers (but not in this book!).

I assume that you have permission to do what you are doing. If it’s something worthwhile you’re likely to want to keep doing it and not be shut out from the server. Play nice so you can keep playing.

How HTTP Works

HTTP (and HTTPS, the secure version) work on a request-response protocol. The client sends a request and the server sends a response. On top of that there are various tricks to make things more efficient but you don’t need to worry about that for this book. Or, when that’s not sufficient you’ll read more about that topic.

Most of this is defined in RFC 7230 and RFC 7231. If you want to do high- powered web client programming you should study these documents. Mojo handles the mechanics but you need to know what to tell the server.

HTTP messages

Each request and response is an HTTP “message”. There are three parts—the “start line”, the “header”, and the “message body”. Here’s an example:

6 | Introduction POST / HTTP/1.1 Host: www.example.com Content-Length: 10

name=value

The first line is the start line of the request. It gives the HTTP verb, thepath, and the HTTP version. The headers immediately follow and are line-oriented collection of names and values. The message body is set apart from the head- ers by a blank line (technically, double carriage return / newline pair).

The request triggers a response. It’s start line includes the protocol version, the HTTP status number that describes what happened, and a string version of that status. It has headers and possibly a message body:

HTTP/1.1 200 OK Content-Type: text/plain Content-Length: 6

Bender

The headers

The header contains various metadata, instructions, and other tracking things. The message body is the content. That might be HTML, JSON, or almost any- thing else. A request can send a body to the server and the server can send a body back. Or either can have an empty body.

The headers are line-oriented. There’s a header name, a colon, and the header value. Here’s a short program to print the request header then the response header. After you make the request you get a transaction object in $tx (the customary name). That transaction has both the request (req) and response (result). The headers extracts that part of the message. The to_string formats it nicely for you:

use Mojo::UserAgent;

How HTTP Works | 7 my $tx = Mojo::UserAgent->new->get( shift ); unless( $tx->result->is_success ) { die 'Could not make request'; }

say "Request\n", '-' x 50, "\n", $tx->req->to_string; say "Response\n", '-' x 50, "\n", $tx->result->headers->to_string;

The output shows each side:

% ./show_transaction https://www.mojolicious.org/ Request ------Content-Length: 0 Accept-Encoding: User-Agent: Mojolicious (Perl) Host: www.mojolicious.org

Response ------Content-Type: text/;charset=‑UTF8 Connection: keep-alive Server: cloudflare Date: Wed, 20 Jun 2018 01:13:45 GMT Content-Length: 13420 Expect-CT: max-age=604800, report-uri="..." CF-RAY: 42da5fe7cd5421d4-EWR Set-Cookie: __cfduid=...

The request headers output in the previous section excludes the the start line of the transaction, which specifies the request verb, the resource path, and the protocol version:

GET / HTTP/1.1

You can reconstruct this yourself from the request object since it has all the

8 | Introduction information stored in various internal objects:

say join ' ', $tx->req->method, $tx->req->url->path, 'HTTP/' . $tx->req->version ;

You’ll often want to use an interactive browser to watch this happen so you can figure out what you need to tell Mojo to do. The “Show Developer Tools” in Safari’s Advanced preferences gives you the Develop menu. The HTTP Headers extension for Chrome or the HTTP Headers Live extension for Fire- fox are useful. There are various other developer tools that show you the HTTP transactions.

Request verbs

Each request specifies a “verb” and each of these has a particular intent and meaning. In this book you’ll see them in all uppercase to distinguish them from normal text. A “resource” is a fancy word for data the server or client provides.

These are the HTTP methods:

• GET - Return something but don’t change the server state (idempo- tency)

• HEAD - Return the header for a resource

• POST - Here are some data; do something with it that possibly changes the server state

• PUT - Create (or replace) this resource on the server

• PATCH - Change this resource on the server

• DELETE - Remove this resource from the server

How HTTP Works | 9 • OPTIONS - Return information about what you can do to the resource

Some of these are idempotent, meaning that you can request that resource as often as you like without changing the state of the server. You make the same request over and over and the server stays the same. Think about view- ing your bank’s balance. Check as often as you like without that changing the number you see. The GET and HEAD requests are supposed to be like that.

Other requests may change the state of the server (but not necessarily). Your request might add or update a record in the . If you repeated your request you might create additional database records or change the server in some other way. You don’t want to repeat those or make duplicate records. All of the other request methods besides GET and HEAD are like these.

There’s nothing particularly special about these methods or their names. There are others out there. You can make up your own if you like; you just need a server that understands it.

Each of the HTTP methods have a convenience method. To make a HEAD request to get just the headers and metadata for a resource, use the head method:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new; my $tx = $ua->head( 'https://www.mojolicious.org/' );

Response codes

The first line of the response also has a start line. It shows the protocol ver- sion, the HTTP status code (a number), and the status description:

HTTP/1.1 200 OK

The stringification of the request headers didn’t show this line butyoucan reconstruct it:

10 | Introduction say join ' ', 'HTTP/' . $tx->result->version, $tx->result->code, $tx->result->message, ;

Each group of hundreds is a type of status and they are the code that Mojo uses to determine if the request succeeded or failed:

• 100 - Informational

• 200 - Successful request

• 300 - Redirection

• 400 - An error in the request

• 500 - An error in the server

Expand the earlier program to be more particular about the status of the response. These methods come from the response object (so look in Mojo ::Message::Response):

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new; my $tx = $ua->get( shift );

if( $tx->result->is_success or $tx->result->is_redirect ) { say "Request\n", '-' x 50, "\n", join( ' ', $tx->req->method, $tx->req->url->path, 'HTTP/' . $tx->req->version ), "\n", $tx->req->headers->to_string;

say "\nResponse\n", '-' x 50, "\n",

How HTTP Works | 11 join( ' ', 'HTTP/' . $tx->result->version, $tx->result->code, $tx->result->message ), "\n", $tx->result->headers->to_string; } elsif( $tx->result->is_error ) { if( $tx->is_server_error ) { say "Oops! Server Error!"; } else { say "A 4xx error"; } }

Try this with a working address and you get mostly the same thing. But try it with HTTP instead of HTTPS:

% ./show_transaction http://www.mojolicious.org/ Request ------GET / HTTP/1.1 Host: www.mojolicious.org Accept-Encoding: gzip User-Agent: Mojolicious (Perl) Content-Length: 0

Response ------HTTP/1.1 301 Moved Permanently Server: cloudflare Location: https://www.mojolicious.org/ Content-Length: 0 Expires: Wed, 20 Jun 2018 03:38:53 GMT Connection: keep-alive Date: Wed, 20 Jun 2018 02:38:53 GMT Cache-Control: max-age=3600 CF-RAY: 42dadc9f367391a6-EWR

12 | Introduction Now there’s a different sort of response. The status is 301. This domain im- mediately redirects to the HTTPS site. The status message says “Moved Per- manently” which normally means your client should remember the new lo- cation and use that preferentially next time.

By default Mojo doesn’t follow that redirection for you. Set the max_redirects to some positive value and Mojo will follow these redirections up to that many times. Three is a good number (because you can get into infinite loops):

my $ua = Mojo::UserAgent->new; $ua->max_redirects(3);

This works as one method chain too because the methods to configure the user-agent object return that same object:

my $ua = Mojo::UserAgent->new->max_redirects(3);

The same request ends up with a different response now. Mojo saw the redi- rection, looked in its Location header, and made another request for that URL. You didn’t have to look at each step of the process:

% ./show_transaction http://www.mojolicious.org/ Request ------GET / HTTP/1.1 User-Agent: Mojolicious (Perl) Host: www.mojolicious.org Accept-Encoding: gzip Content-Length: 0

Response ------HTTP/1.1 200 OK Connection: keep-alive Set-Cookie: __cfduid=... Expect-CT: max-age=604800, ... Content-Type: text/html;charset=‑UTF8 CF-RAY: 42daf614cc229260-EWR Content-Length: 13420

How HTTP Works | 13 Server: cloudflare Date: Wed, 20 Jun 2018 02:56:16 GMT

It gets even better. The $tx knows about the previous HTTP requests and responses that led to the final one. If there were a redirect and a new trans- action, you can see the previous transaction. Wrap that output in a loop to show all of the transactions leading to the final response:

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new->max_redirects(3); my( @txs ) = $ua->get( shift );

while( my $previous_tx = $txs[0]->previous ) { unshift @txs, $previous_tx; }

foreach my $tx ( @txs ) { ... }

For me, this interface for the transaction object and everything that hap- pened is a big benefit over LWP. Everything is wrapped into a tidy transaction package.

To see everything that happened, set the MOJO_CLIENT_DEBUG environment variable to a true value. Exporting that variable sets it for the rest of the session. This uses bash syntax to set the value for the entire session:

% export MOJO_CLIENT_DEBUG=1 % ./show_transaction http://www.mojolicious.org/

Setting it before the command does it for only that one invocation:

% MOJO_CLIENT_DEBUG=1 ./show_transaction http://www.mojolicious.org/

Either way you’ll see the initial request, the redirection response, and the new request with its response. You wouldn’t need to output the transactions yourself.

14 | Introduction I’ve written a tiny tool called 3xx to show the entire redirection chain. It has a Mojo example, but also a Ruby and Python example.

Add to the Request

You can affect your request in almost any way that you like. You’ll seemore of this later but here’s a simple one. Instead of naming your client “Mojoli- cious (Perl)” as you see in the User-Agent field in the previous section, change it to something that’s not quite as suspicious.

Modify the previous example to include a new line:

my $ua = Mojo::UserAgent->new->max_redirects(3); $ua->transactor->name( 'Planet Express' );

Now the request is slightly different:

% ./show_transaction https://www.mojolicious.org/ Request ------GET / HTTP/1.1 Content-Length: 0 Accept-Encoding: gzip User-Agent: Planet Express Host: www.mojolicious.org

Sometimes you’ll fake being another sort of user-agent because some servers are content to look only at the User-Agent header to decide what to do, such as returning user-agent specific content (e.g. “Update to a supported browser”). Grab one that you like from www.useragentstring.com.

Add to the Request | 15 httpbin httpbin is a webserver suitable for testing user-agent code. It has many dif- ferent endpoints to show you different things. Here’s your IP address:

% curl http://httpbin.org/ip { "origin": "107.152.104.229, 107.152.104.229" }

It shows the headers from your request:

% curl http://httpbin.org/headers { "headers":{ "Accept": "*/*", "Host": "httpbin.org", "User-Agent": "curl/7.54.0" } }

Or anything in the request:

% curl http://httpbin.org/anything?robot=Bender { "args":{ "robot": "Bender" }, "data": "", "files": {}, "form": {}, "headers":{ "Accept": "*/*", "Host": "httpbin.org", "User-Agent": "curl/7.54.0" }, "json": null, "method": "GET",

16 | Introduction "origin": "89.46.102.12, 89.46.102.12", "url": "https://httpbin.org/anything?robot=Bender" }

Or some XML (that I’ve truncated in this example):

% curl http://httpbin.org/xml

...

Run httpbin in docker httpbin is handy for isolated checks about what you are doing, but you don’t want to overwhelm a free service. If you really want to bang on it, you can run your httpbin server locally by using it through docker (Getting Started with Docker):

% docker run -p 80:80 kennethreitz/httpbin % curl localhost:80/headers { "headers":{ "Accept": "*/*", "Host": "localhost", "User-Agent": "curl/7.54.0" } }

Add an entry for httpbin.org to your /etc/hosts and you don’t have to change the httpbin addresses you see in this book:

# /etc/hosts 127.0.0.1 httpbin.org

httpbin | 17 Summary

You have the basic ideas behind Mojo, HTTP, and fluent programming. The rest of the book builds on those concepts, but also assumes you’ll read more about the HTTP specification on your own.

By now, you should have installed Mojo and played around with some simple examples to see that it works. Before you dive into more Mojo features, I’m going to use the next couple of chapters to catch up on some Perl features you’ll see throughout the book.

18 | Introduction Chapter 2

Some Perl Features

The whole world must learn of our peaceful ways…by force!

This chapter has a short explanation for most -v5.008 features I use, but assumes that you are comfortable with anything in Learning Perl and Inter- mediate Perl. This book assumes you are using Perl v5.28, the latest version as I write it. There are some features that you might not have seen before or haven’t used extensively but that I want to use in later examples.

If you’ve kept up with the latest Perl features, you can safely skip this short chapter.

Perl Program Basics

This isn’t Learning Perl so I won’t explain features in depth as I do there. For the most part, assume these pragma are at the top of any programs you see in this book:

use strict;

| 19 use warnings; use utf8; use open qw(:std :utf8);

The strict and warnings are programming safety pragmas. They are an- noying at times but they tend to find the stupid sorts of errors that human typing tends to add to the program.

The use utf8 tells perl to treat your program text as UTF‑8 (and you must then save it in that encoding). That’s all that does. People tend to think it turns on fancier features, but it is modest in its scope. For the gory details, read ’s StackOverflow answer to Why does modern Perl avoid UTF‑8 by default?.

You’re likely to output UTF‑8 encoded text. The use open with :std sets the encoding on the standard input, output, and error filehandles. Since :utf8 comes after the :std, the encoding is UTF‑8.

I’m not going to include these in every program although they were likely there when I was writing the program. Mojo modules automatically turn on warnings for you.

Declaring the Version

Perl v5.10 introduced several new, backward-incompatible features along with a way to include (or exclude) them in your program. Declaring the min- imum version automatically imports new features:

use v5.10;

One of the features is the say keyword. It’s like print but appends a newline (or other, default line ending) to the end of the string:

use v5.10; say 'Bite my shiny metal ass!';

20 | Some Perl Features Starting with v5.12, declaring the minimum version also turns on strict for you:

use v5.12; # strict for free use warnings;

For the rest of the book I assume that there’s a line that declares at least v5.28, mostly because I’ll often use very recent features.

Importing features

You usually want to declare a minimum version to ensure that the perl peo- ple use with your program have the necessary features. The require will also work but it doesn’t automatically import anything (just as it doesn’t with modules):

require v5.10;

This form doesn’t implicitly import any features but you can later do that on your own:

use feature qw(say)

This is lexically scoped so you can employ a feature in a scope in which you won’t disturb anything else going on in the program:

{ use feature qw(say);

say 'I only work in here!'; }

Most of these importable features involve keywords (such as say) or changes to existing syntax.

Declaring the Version | 21 Experimental features

Some features are experimental. You can use them but they might change or even disappear completely. This allows the perl developers to test fea- tures in the wild before fully committing to them. These are documented in perlexperiment and I write about many of them in The Effective Perler.

You have to enable experimental features explicitly. Even then you’ll get warnings. Turn off warnings in the experimental:: class with the same name as the feature. Remember that Mojo turns on warnings, so you need to turn warnings off after you load anything Mojo:

use Mojo::UserAgent; use feature qw(signatures); no warnings qw(experimental::signatures);

Signatures

Perl passes arguments to subroutines as a flat list in @_. Normally you have to process it yourself. A list assignment works:

sub bend { my( $width, $height ) = @_; ... }

Perl v5.20 added the experimental subroutine signature support to handle the number of parameters and their default values.

Each parameter is required and is filled in from the arguments you provide. If you supply too few or too many you get a fatal error:

use v5.20; use feature qw(signatures); no warnings qw(experimental::signatures);

22 | Some Perl Features sub bend ( $width, $height ) { say "W: $width H: $height"; }

# bend( 5 ); # Error! Too few # bend( 5, 10, 15 ); # Error! Too many bend( 5, 10 );

Assign to the parameter to set a default value. This is still a required param- eter but it assigns a value if you don’t provide one:

sub fly ( $direction = 'up' ){ say "Direction is $direction" }

fly(); # Direction is up fly( 'down' ); # Direction is down fly( 'up', 'down' ); # Error! Too many!

These signatures don’t handle types or other constraints (yet) so you’ll still have to check those yourself.

Mojo::Base

Mojo::Base is a shortcut for much of the infrastructure you’ll need for a Mojo project. It’s import routine can set up many things for you. By itself it doesn’t import anything:

use Mojo::Base;

Pass it the -strict flag to enable strictures in the current namespace. This looks like you are saying “remove strict” because of the minus sign, but that’s really a Perl trick to quote a bareword. I’ll use this throughout the book:

use Mojo::Base -strict;

Signatures | 23 This turns on many other things:

use strict; use warnings; use utf8; use feature qw(:5.10);

You can use Mojo::Base to turn on the signatures feature (if you have an appropriate version of Perl). It must be the second thing you import:

use Mojo::Base -strict, -signatures;

When you do that, you still get everything you imported in the previous ex- ample along with the signatures features:

use strict; use warnings; use utf8; use feature qw(:5.10); use feature qw(signatures); no warnings qw(experimental::signatures);

There are other uses for Mojo::Base. Each of these import everything that -strict does and then does a little more. To set up the current namespace to inherit from another, import -base. This adds Mojo::Base as the parent class:

use Mojo::Base -base; use Mojo::Base -base, -signatures;

To use a parent class that’s not Mojo::Base, use that namespace in place of -base:

use Mojo::Base 'MyBaseClass'; use Mojo::Base 'MyBaseClass', -signatures;

To load its roles support, as you’ll see later in chapter 9 with Promises, use -roles:

24 | Some Perl Features use Mojo::Base -role; use Mojo::Base -role, -signatures;

Postfix Dereference

The legacy dereferencing syntax is clunky (see Intermediate Perl). The gen- eral form puts the reference in braces. It can get ugly and a subscript makes it even worse:

my %hash = ( Bender => [ qw( Beer Beer Beer ) ], Fry => [ qw( Sandwich ) ], );

say join ' ', @{ $hash{'Bender'} }; say 'First thing is ', @{ $hash{'Bender'} }[0];

Postfix dereferencing moves that to the end with an arrow and sigil. A * converts the reference to its type (an array reference becomes an array). A subscript after the sigil accesses individual elements:

say join ' ', $hash{'Bender'}->@*; say 'First thing is ', $hash{'Bender'}->@[0]; say 'Count is ', $hash{'Bender'}->$#*;

Now the operations read from left to right instead of inside out. This works with hashes too. See the perlref documentation for more.

Indented Here Docs

Prior to v5.26 a here doc included all of the whitespace before each line, and the here doc terminator had to be the first and only thing on the line. Thisis

Postfix Dereference | 25 rather ugly because it is necessarily outdented:

my $string = <<'HERE' Pushed to the left All the way to the left Starts with some space HERE

With v5.26 you can indent the text and the here doc terminator similar to how you do it in Ruby. Put a ~ after the << The space before the terminator is removed from all of the lines in the string. You can indent the string to keep it under it’s variable declaration:

my $string = <<~'HERE' Still pushed to the left All the way to the left Starts with some space HERE

Substitution Returns the Modified Copy

Perl v5.14 added the / flag to s/// so the substitution operator could return the modified string while leaving the original as it is:

use v5.14; my $modified = $original =~ s/\s+\z//r;

This is handy with inline code such as map. Without the /r this map returns a list of numbers that represent the number of substitutions made (I make this bug about once a month) and changes the values in @original:

my @new_strings = map { s/\s+\z// } @originals; # WRONG!

With the /r it returns the modified string instead and leaves @originals alone. This does what I usually :

26 | Some Perl Features my @new_strings = map { s/\s+\z//r } @originals;

Summary

This chapter is a survey of some new Perl features you may see in this book. I’ll use these without further explanation. Most of these make the code a bit easier to read as long as you know what they are doing. Since Mojo officially supports the last two Perl versions I don’t feel so bad about using the latest tricks.

Summary | 27 28 | Some Perl Features Chapter 3

Basic Utilities

It’s water! Ordinary water—laced with a little bit of LSD!

Mojo comes with its own implementations of many common tools, and often these reinventions work better or are more pleasing than the tools they re- placed. This also allows Mojo to severely limit its dependencies; that makes development, maintenance, and installation much easier. Sometimes Mojo will preferentially use a better tool if you have it installed (but can still work without it).

This chapter covers some of these tools on their own and in isolation. You don’t need to go through this chapter right away but it sets the foundation for the code you’ll see in the rest of the book.

Dumping structures

Plain ol’ print is one of the world’s most popular debuggers and Data::Dumper is a quick way to turn a value into a string for quick inspection:

| 29 use Data::Dumper; my $hash_reference = { robot => 'Bender', pilot => 'Leela', }; say Dumper( $hash_reference );

The output shows you a pretty version of the data structure:

$VAR1 = { 'pilot' => 'Leela', 'robot' => 'Bender' };

Mojo::Util’s dumper which uses several Data::Dumper settings to make friendlier output:

use Mojo::Util qw(dumper); my $hash_reference = { robot => 'Bender', pilot => 'Leela', }; say dumper( $hash_reference );

The $VAR is gone and the indention is not as severe:

{ "pilot" => "Leela", "robot" => "Bender" }

Working with URLs

Mojo::URL knows how to put together and take apart addresses. You usually don’t need to handle this yourself because most of the interface will silently

30 | Basic Utilities convert a string into the right object. Sometimes you’ll want to modify the URL yourself, though.

Start with a URL as a string, create a new object, and look at the pieces:

use Mojo::Base -strict; use Mojo::URL;

my $url = Mojo::URL->new( shift );

{ no warnings 'uninitialized'; say join "\n", ' Scheme: ' . $url->scheme, ' Host: ' . $url->host, ' Port: ' . $url->port, ' Path: ' . $url->path, ' Query: ' . $url->query, 'Fragment: ' . $url->fragment ; }

Run this with a few examples. Notice that you only see what you explicitly included. These don’t include default values for the port, path, or other parts:

% ./url.pl http://www.example.com/ Scheme: http Host: www.example.com Port: Path: / Query: Fragment: % ./url.pl http://www.example.com:8080/?robot=Bender Scheme: http Host: www.example.com Port: 8080 Path: / Query: robot=Bender Fragment: % ./url.pl http://www.example.com:8080/?robot=Bender#foo

Working with URLs | 31 Scheme: http Host: www.example.com Port: 8080 Path: / Query: robot=Bender Fragment: foo

Change the object to set other values for each of the parts:

use Mojo::Base -strict; use Mojo::URL;

my $url = Mojo::URL->new( 'http://www.example.com' ); $url->path( '/employee/fry' ); say $url; # http://www.example.com/employee/fry

Often you want to base a new URL on an existing one. Clone the original and change it. For example, create an object for the base URL for a web API then change the parts:

use Mojo::Base -strict; use Mojo::URL;

my $base = Mojo::URL->new( 'https://api.github.com/' );

my $username = $ARGV[0] // 'briandfoy'; my $path = '/users/:username/repos'; $path =~ s/:username/$username/;

my $url = $base->clone->path( $path ); say $url; # https://api.github.com/users/briandfoy/repos

Start with a piece and turn it into the full URL with to_abs. This requires a Mojo::URL-like object as an argument:

use Mojo::Base -strict; use Mojo::URL;

my $path = Mojo::URL->new( '/employee/fry' );

32 | Basic Utilities my $base = Mojo::URL->new( 'http://www.example.com' ); my $url = $path->to_abs( $base ); say $url; # http://www.example.com/employee/fry

This also works with URI object because it has mostly the same interface. Notice URI set the default port for you:

use Mojo::Base -strict; use Mojo::URL; use URI;

my $path = Mojo::URL->new( '/employee/fry' ); my $base = URI->new( 'http://www.example.com' ); my $url = $path->to_abs( $base ); say $url; # http://www.example.com:80/employee/fry

You might prefer to convert the foreign objects as soon as you get them. The stringified version of the URI object can feed into Mojo::URL:

use Mojo::Base -strict; use Mojo::URL; use URI;

my $base = URI->new( 'http://www.example.com' ); say Mojo::URL->new( $base->as_string ); # also, "$base"

The Mojo::URL object also automatically stringifies in a double quoted string:

use Mojo::Base -strict; use Mojo::URL;

my $base = Mojo::URL->new( 'http://www.example.com' ); say "URL is $base"; # URL is http://www.example.com

Working with URLs | 33 Working with paths

It’s easy to work with just the path of a URL. This example starts with the empty path and adds to parts to it:

use Mojo::Base -strict; use Mojo::URL;

my $url = Mojo::URL->new( 'http://www.example.com' ); push $url->path->@*, qw(some path); say "URL is $url";

The output shows that two components were added to the path:

URL is http://www.example.com/some/path

You can go the other way too. Start with a URL with a path and extract its parts:

use Mojo::Base -strict; use Mojo::URL;

my $url = Mojo::URL->new( 'http://www.example.com/root/next/then/finally.txt' ); say join "\n", $url->path->@*;

Decoding JSON

JSON (JavaScript Object Notation) is a stringified form of a data structure. It’s easy to parse (in Learning Perl 6 I used a simple grammar to do it), it handles most of the things people commonly want, and it seems to be about everywhere. Web APIs like to consume and produce it.

Here’s an example from the GitHub API to list teams. This is an array of objects:

34 | Basic Utilities [ { "id": 1, "node_id": "MDQ6VGVhbTE=", "url": "https://api.github.com/teams/1", "name": "Justice League", "slug": "justice-league", "description": "A great team.", "privacy": "closed", "permission": "admin", "members_url": "https://api.github.com/teams/1/members{/member}", "repositories_url": "https://api.github.com/teams/1/repos", "parent": null } ]

Encoding data to JSON

Create a simple Perl data structure. Don’t include objects (blessed refer- ences); JSON doesn’t know how to handle those.

“Encoding” is the process of taking a data structure in your program and converting it to a UTF‑8 string that represents that in the JSON format. The Mojo::JSON provides the encode_json subroutine to do that for you:

use Mojo::Base -strict; use Mojo::JSON qw(encode_json);

my $perl_data = { 'employees' => [ { name => 'Bender', race => 'robot', position => 'steel worker', }, { name => 'Leela',

Decoding JSON | 35 race => 'mutant', position => 'pilot', } ], };

my $encoded = encode_json( $perl_data ); say $encoded; # !!!

This produces compact JSON output:

{"employees":[{"name":"Bender","position":"steel worker", "race":"robot"},{"name":"Leela","position":"pilot", "race":"mutant"}]}

All JSON is UTF‑8 encoded. That’s just the way it is according to RFC 8259. The Mojo::JSON module handles those details for you. Normal text opera- tions on this may produce tears since what you think is a single character is actually several octets each treated as separated “characters”. Use the :raw IO layer to save it to a file with no translation:

open my $fh, '>:raw', 'test.' or die "Could not open file: $!"; print {$fh} $encoded; close $fh;

You don’t need to do this yourself because Mojo::File will do it for you:

my $file = Mojo::File ->new( 'test.json' ) ->spurt( $encoded );

For much of Mojo you don’t have to worry about encoding your Perl data as JSON. A POST request takes a URL and a stream type. The json content generator takes a your Perl data structure and then handles all the encoding for you:

use Mojo::Base -strict;

36 | Basic Utilities use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my $url = 'http://www.example.com'; my $perl_data = { 'rainbow' => '', # U+1F308, 0xF09F8C88 in ‑UTF8 }; my $tx = $ua->post( $url, json => $perl_data );

say $tx->req->to_string;

The output shows the request. The message body came from that json gen- erator:

% ./rainbow-json POST / HTTP/1.1 Accept-Encoding: gzip Content-Length: 18 Content-Type: application/json Host: www.example.com User-Agent: Mojolicious (Perl)

{"rainbow":""}

You’ll see how to create your own content generators in chapter 10.

True and false

JSON has two special values: true and false. These aren’t strings so you don’t quote them.

{ "robot": false, "name": "Leela" } { "robot": true, "name": "Bender" }

This is a problem for Perl because it has strings and numbers. It also has its own idea of truthiness. To get around this, Mojo::JSON uses references to 0

Decoding JSON | 37 or 1 to represent these:

use Mojo::JSON qw(encode_json); say encode_json( { "robot" => \0, "name" => "Leela" } ); say encode_json( { "robot" => \1, "name" => "Bender" } );

Now the true and false correctly show up in the JSON:

{"name":"Leela","robot":false} {"name":"Bender","robot":true}

If you don’t like that, you can use methods from Mojo::JSON:

use Mojo::JSON qw(encode_json); say encode_json( { "robot" => Mojo::JSON->false, "name" => "Leela" } ); say encode_json( { "robot" => Mojo::JSON->true, "name" => "Bender" } );

Reading these values back from JSON give back the references. This is a good case for the scalar postfix dereferencing to get back the boolean values:

use Encode qw(encode); use Mojo::JSON qw(decode_json); use Mojo::Util qw(dumper);

my $raw_octets = encode( 'UTF-8', '{"name":"Bender","robot":true}' );

my $hash = decode_json( $raw_octets );

say "$hash->{name} is a robot? ", $hash->{robot}->$* ? 'Yes' : 'No';

38 | Basic Utilities Decoding text to data

“Decoding” goes the other way. It takes the raw UTF‑8 octets and turns them into Perl characters and ultimately into the Perl data structure.

Start with this data in employees.json. Ensure that your editor encodes this as UTF‑8:

{"employees":[{"name":"Bender","position":"steel worker", "type":"robot"},{"name":"Leela","position":"pilot", "type":"mutant"}]}

Mojo::File can slurp in one gulp the data in file; it gives the contents back as raw octets. The decode_json turns those octets into a Perl data structure:

use Mojo::Base -strict; use Mojo::File; use Mojo::JSON qw(decode_json); use Mojo::Util qw(dumper);

my $path = 'employees.json'; my $file = Mojo::File->new( $path ); my $data = decode_json( $file->slurp );

say dumper( $data );

The dumper from Mojo::Util pretty prints the data structure:

{ "employees" => [ { "name" => "Bender", "position" => "steel worker", "type" => "robot" }, { "name" => "Leela", "position" => "pilot",

Decoding JSON | 39 "type" => "mutant" } ] }

This call to the GitHub API (v3) grabs the public data about a user’s reposi- tories and outputs the raw data in the body:

use Mojo::Base -strict; use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my $url = 'https://api.github.com/users/briandfoy/repos'; my $tx = $ua->get( $url ); unless( $tx->result->is_success ) { ... }

say $tx->result->body;

This outputs the raw body (a huge wall of text):

% ./github-user [{"id":2208440,"node_id":"MDEwOlJlcG9zaX...

Change the program slightly to call json on the result. That decodes the mes- sage body as JSON and returns a data structure. This example happens to be a JSON array; use the postfix dereference to turn it into a list for foreach:

use Mojo::Base -strict; use Mojo::URL; use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my $base = Mojo::URL->new( 'https://api.github.com/' );

my $username = $ARGV[0] // 'briandfoy'; my $path = '/users/:username/repos'; $path =~ s|/:username/|/$username/|; my $url = $base->clone->path( $path );

40 | Basic Utilities my $tx = $ua->get( $url ); unless( $tx->result->is_success ) { ... }

foreach my $repo ( $tx->result->json->@* ) { next if $repo->{fork}; # only original repos say $repo->{full_name}; }

The output is the extracted full name of the first page’s worth of repositories:

% ./github-user briandfoy/Acme-MyFirstModule-BDFOY briandfoy/app-grepurl briandfoy/app-module-lister briandfoy/app-ppi-dumper briandfoy/app-rhich ...

Converting objects to JSON

The Mojo framework will use an object’s TO_JSON method to encode the JSON. Here’s a short program that defines a Hermes class. It inherits from Mojo ::Base to set up all the object stuff including the new method. The object is empty but that doesn’t matter.

The meat of the class is the TO_JSON method. That doesn’t actually create the JSON characters. In it, create the data structure (without blessed references) that will then convert to the string form:

use Mojo::Base -strict; use Mojo::JSON qw(encode_json);

package Hermes { use Mojo::Base -base;

sub TO_JSON { return { correct => "technically" };

Decoding JSON | 41 } }

my $accountant = Hermes->new; say encode_json( $accountant );

The output shows the JSON string version of the Hermes instance:

{"correct":"technically"}

This is handy when your object tracks information that you would want to serialize or when a different structure of the data in your object is thebest way to represent it for other uses.

Pretty printing JSON

Previously you had the file employees.json with this JSON:

{"employees":[{"name":"Bender","position":"steel worker", "type":"robot"},{"name":"Leela","position":"pilot", "type":"mutant"}]}

This is relatively easy to pick apart because it’s short, but eventually you’ll run into much larger structures. Here’s a much longer structure (and you’ll see more one-liners in chapter 11):

% mojo get https://api.github.com/users/briandfoy/repos

This spews out about 150,000 characters (most of them “https://api.github.com/re- pos/briandfoy” over and over again). Where does one repo’s information end and another’s begin?

42 | Basic Utilities With jq

In chapter 11, you’ll see Mojo’s built-in features to extract parts of a response with JSON pointers (although those don’t pretty print). In these examples, I’m ignoring the mojo command that you will see later.

Some sites offer alternate views of data. For example, attaching .json to the end of some Reddit URLs gets the JSON version. This means that you don’t have to scrape HTML (as you’ll do in chapter 5). That’s not the point of this section though. Imagine that you have a JSON dump and you want to look at that.

The jq tool (for JSON Query) can help with that. It’s not a Perl or Mojoli- cious thing, but its powerful and popular. Querying . gets the entire data structure, which it then pretty prints (about 6,000 lines):

% curl -o - https://www.reddit.com/user/briandfoy.json⏎ | jq . { "kind": "Listing", "data":{ "modhash": "", "dist": 25, "children":[ { "kind": "t1", "data":{ "total_awards_received": 0,

The result of that particular API response is a JSON array of objects. You can look at the first object only to get a sense of its structure:

% curl -o - https://www.reddit.com/user/briandfoy.json⏎ | jq .data.children[0]

Drill down into the parts that you want. These things are particular to the peculiar data format Reddit uses:

Decoding JSON | 43 % curl -o - https://www.reddit.com/user/briandfoy.json⏎ | jq -r '.data.children[] | select(.kind=="t3") | .data.url' http://blogs.perl.org/users/brian_d_foy/... http://news.perlfoundation.org/2019/04/...

With perl

You can recreate all of jq in perl (but why?). Inside a program you don’t particularly care about pretty data structures. Mojo::Util has a dumper con- venience function for that:

use Mojo::Util qw(dumper) use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new; my $tx = $ua->get('https://api.github.com/users/briandfoy/repos') say dumper $tx->result->json;

But that’s now a Perl dump and not JSON:

[ { "archived" => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ), "assignees_url" => "...", "blobs_url" =>

Or, if you wanted part of it:

say dumper $tx->result->json->[0]->{owner};

You might not mind that for short debugging output, though. Which one you use depends on what you are trying to do.

44 | Basic Utilities Faster JSON

The Mojo::JSON module is quite fast compared to the pure-Perl JSON::PP, but Cpanel::JSON::XS is much faster still (and by a couple of orders of magni- tude). Simply install that module and Mojo::JSON will use it automatically to encode or decode data.

Collections

A collection is a mostly fixed list. Once you create it you don’t add orremove elements, but you can change the ones you already have:

use Mojo::Collection; my $collection = Mojo::Collection->new( qw( Bender Fry Leela Zoidberg ));

say "There are ", $collection->size, ' employees'; say "First is ", $collection->first; say "Last is ", $collection->last;

The output isn’t surprising:

There are 4 employees First is Bender Last is Zoidberg

The () function is a shortcut:

use Mojo::Collection qw(c); my $collection = c( qw(Bender Fry Leela Zoidberg) );

If you need to change the number of elements, create a new container of the right size. It’s essentially a fancy list (an immutable thing unlike an array):

Collections | 45 use Mojo::Collection; my $collection = Mojo::Collection->new( qw( Bender Fry Leela Zoidberg ));

my $smaller = Mojo::Collection->new( $collection->first, $collection->last ); say join ', ', $smaller->to_array->@*;

That last line can be a bit shorter since the object understands the postfix deference:

say join ', ', $smaller->@*;

Most of the Mojo::Collection methods are straightforward. The uniq re- turns a new collection with all of the duplicates removed. Use join to turn them into a single string:

use Mojo::Collection qw(c); my $c = c( split//, qw/001100010010011110100001101101110011/ ); my $u = $c->uniq; say $u->join( ' ' ); # '0 1'

A code reference argument gets the item to check as its argument. Whatever value that code reference returns becomes the value that uniq uses to deter- mine “uniqueness”. It returns the original value (like the filter behavior of Perl’s grep) if it hasn’t seen that computed value yet.

This example uses a mod2 subroutine that returns either 0 or 1. The uniq will find the first item that returns 0 and the first item that returns 1.Anyitems past that can’t provide unique values:

use Mojo::Collection qw(c); my $c = c( 1 .. 10 ); my $u = $c->uniq( \&mod2 ) ; say $u->join( ' ' );

46 | Basic Utilities sub mod2 { say "Checking [$_[0]]"; ( $_[0] ** 2 ) % 2 }

The output shows that uniq still scans the entire collection but only returns the first two items because those are the first ones to evaluate to theunique computed values:

Checking [1] Checking [2] Checking [3] Checking [4] Checking [5] Checking [6] Checking [7] Checking [8] Checking [9] Checking [10] 1 2

Here’s another example that chooses items based on their unique first letter:

use Mojo::Collection qw(c); my $c = c( qw( Bender Fry Leela Farnsworth ) ); my $u = $c->uniq( \&first_letter ) ; say $u->join( ' ' );

sub first_letter { substr $_[0], 0, 1 }

The output shows only the first instance of names that start with that letter. “Farnsworth” is left out because “Fry” already evaluated to the unique item.

Bender Fry Leela

A string argument to uniq is a method name for the item it’s testing. Here’s a collection of Time::Moment objects constructed with offsets from the current time. The day_of_month argument to uniq is the method to call on each of

Collections | 47 those objects to filter them:

use Time::Moment; my $tm = Time::Moment->now;

use Mojo::Collection qw(c); say c( qw( -187000 -87000 -50000 -3600 0 +3600 ) ) ->map( sub { $tm->plus_seconds( $_ ) } ) ->uniq( 'day_of_month' ) ->join( "\n" );

That selected the first time for each day of the month (and it doesn’t care which month in this example):

2018-08-20T12:41:54.842659-04:00 2018-08-21T16:28:34.842659-04:00 2018-08-22T02:45:14.842659-04:00

Processing as a list

Use each to turn a collection into a list suitable for list-like things in other Perl syntax:

use Mojo::Collection; my $employees = Mojo::Collection->new( qw( Bender Fry Leela Zoidberg ));

foreach my $person ( $employees->each ){ say $person }

You’ll use these methods in the next section as you process HTML.

48 | Basic Utilities Creating new collections

Mojo::Collection has grep and map methods that act similarly to the Perl built-ins with the same name. The grep can take a object (qr//) or a coderef. The current element is in $_:

use Mojo::Collection; my $employees = Mojo::Collection->new( qw( Bender Fry Leela Zoidberg ));

my $selected = $employees->grep( qr/[aeiou]\z/i ); say $selected->join( ', ' ); # Fry, Leela

my $long = $employees->grep( sub { length > 4 } ); say $long->join( ', ' ); # Bender, Leela, Zoidberg

The map takes a coderef too:

my $upper = $employees->map( sub { uc } ); say $upper->join( ', ' ); # BENDER, FRY, LEELA, ZOIDBERG

A string argument to map is the method to call on $_. The result is the element in the new:

my $urls = Mojo::Collection ->new( @urls ) ->map( sub { Mojo::URL->new($_) } );

my $hosts = $urls->map( 'host' ); # $_->host

say join "\n", $hosts->@*;

Any extra arguments in map become the arguments to the method:

my $base = Mojo::URL->new( 'http://www.example.com' ); my $abs = $urls->map( 'to_abs', $base ); # $_->to_abs( $base )

Collections | 49 Random Utilities

The Mojo::Util module comprises several of the tools you probably use in most programs or wished that Perl had as builtins. Rather than repeatedly coding these small tasks, use the subroutines that do them for you. Mojo ::Util exports some of these subroutines on request.

Trimming text

Insignificant whitespace around a value is the scourge the world suffersbe- cause it allows people to create data. Perl has no builtin subroutine to deal with this but you can do it with a couple of substitutions. The /r flag on the substitution operator returns the modified string without changing the original:

my $string = "\t Bender \r\n"; my $trimmed = $string =~ s/\A\s+//r =~ s/\s+\z//r; say $trimmed; # Bender

Instead of the long way or the inefficient short way there’s trim. It doesn’t change the original and returns the modified value:

use Mojo::Util qw(trim);

my $string = "\t Bender \r\n"; my $trimmed = trim( $string ); # 'Bender'

Base64

Sometimes you’ll run into Base64 text. This is an ASCII encoding for binary data where each character in the encoding represents six bits of the input (hence 64, which is 2⁶). This makes it 8-bit data safe for channels that might interpret it as text. It shows up in a few other uses as well:

50 | Basic Utilities use Mojo::Util qw(b64_encode);

# the pack represents each group or 8 1's and 0's # as one octet my $bits = pack 'B*', # some random binary data '0011000100100111101000011011011100110000';

say b64_encode( $bits ); # MSehtzA=

In the web context, you see this in the Authorization header where the user- name and password are joined with a colon then the whole thing is base64 encoded:

my $username = 'Bender'; my $password = 'Kill @ll Humans'; my $value = b64_encode( join ':', $username, $password ); say $value; # QmVuZGVyOktpbGwgQGxsIEh1bWFucw==

This isn’t encryption; it isn’t a way to hide the authorization details. Anyone who can see this value can easily reverse it. The b64_decode gets it back:

my( $u, $p ) = split /:/, b64_decode( $value ); say "Username: $u\nPassword: $p";

Here’s the recovered username and password:

Username: Bender Password: Kill @ll Humans

You won’t have to create the Authorization header yourself though. For Basic authentication you can put the username and password in the author- ity portion of the URL:

my $tx = $ua->get( "https://username:[email protected]/" );

Random Utilities | 51 Events

Mojo is built around an event module. When something happens in one part of the code, other parts of the code can find out about it and react to that. Most parts of the Mojo inherit from Mojo::EventEmitter and individ- ual classes provide various events. It’s so important it might even deserve its own book instead of a couple of pages.

Start with a simple Beer class with a drink method. When you call that method it emits a event named glug, which has a count. The glug incre- ments a counter. That’s it.

After creating the Beer object, “subscribe” to the glug event by calling on. You tell it which event you want; the argument is a code reference to run each time the event occurs:

package Beer { use Mojo::Base 'Mojo::EventEmitter', -signatures;

sub new ( $class ) { bless {}, $class }

sub drink ( $self ) { state $count = 0; $self->emit( glug => $count++ ); } }

use Mojo::Util qw(dumper);

my $beer = Beer->new; $beer->on( glug => sub ( $beer, $count ) { say "glug glug beer $count"; } );

$beer->drink for 0 .. 3;

That code’s arguments are the object that emitted the event and the rest of the arguments from emit. In this case that’s the Beer object and the count:

52 | Basic Utilities glug glug beer 0 glug glug beer 1 glug glug beer 2 glug glug beer 3

That’s a single callback. You could add another and both will respond to the event:

$beer->on( glug => sub ( $beer, $count ) { say "glug glug beer $count"; } ); $beer->on( glug => sub ( $beer, $count ) { say "Get that robot a beer!"; } );

Both of these callbacks do their work whenever there’s a glug event:

glug glug beer 0 Get that robot a beer! glug glug beer 1 Get that robot a beer! glug glug beer 2 Get that robot a beer! glug glug beer 3 Get that robot a beer!

You can unsubscribe to an event. The return value of on is the callback itself. Use that callback to tell unsubscribe which thing to remove:

my $beer = Beer->new; my $cb1 = $beer->on( glug => sub ( $beer, $count ) { say "glug glug beer $count"; } ); my $cb2 = $beer->on( glug => sub ( $beer, $count ) { say "Get that robot a beer!"; } );

for( 0 .. 3 ) { $beer->unsubscribe( glug => $cb2 ) if $_ == 2;

Events | 53 $beer->drink; };

Now that second callback won’t respond after the second iteration:

glug glug beer 0 Get that robot a beer! glug glug beer 1 Get that robot a beer! glug glug beer 2 glug glug beer 3

There’s a one-shot callback too. Subscribing with once runs the callback only on the first emit then removes it immediately:

my $beer = Beer->new; my $cb0 = $beer->once( glug => sub ( $beer, $count ) { say "Let's get drunk!"; } ); my $cb1 = $beer->on( glug => sub ( $beer, $count ) { say "glug glug beer $count"; } ); my $cb2 = $beer->on( glug => sub ( $beer, $count ) { say "Get that robot a beer!"; } );

for( 0 .. 3) { $beer->unsubscribe( glug => $cb2 ) if $_ == 2; $beer->drink; };

You see that callback’s result once and only once:

Let's get drunk! glug glug beer 0 Get that robot a beer! glug glug beer 1 Get that robot a beer! glug glug beer 2

54 | Basic Utilities glug glug beer 3

Most of the Mojo infrastructure is aware of these features and either emit or respond to events. This allows you to easily add and remove behavior to the process without worrying about all of the steps yourself.

Summary

This chapter is an overview of the features that you’ll use in the rest of the book without further explanation. You saw them isolated in this chapter so you can focus on their part of the task rather than all the web stuff happening around them. The next step is to integrate them into useful tasks.

Summary | 55 56 | Basic Utilities Chapter 4

The Document Object Model

I’ve never seen anything so mind-blowing. Ooh, a reception table with muffins!

The Document Object Model, or simply DOM, represents structured data such as HTML or XML so that you can easily inspect and modify it. The Mojo::DOM module handles both of those formats. This chapter only intro- duces some of the common things that you can do with it, but you’ll see more in chapter 5.

Walking Through HTML or XML

What are you going to do with that HTML once the server sends it back to you? Like every other who has touched a keyboard you’ve probably pulled out a regex to get out the parts that want. There’s a better way to do that.

Mojo represents the HTML in a Document Object Model (DOM). This makes way for some powerful tools. Parsing some text with Mojo::DOM defaults to

| 57 parsing HTML:

use Mojo::DOM;

my $html = <<~'HTML';

HTML

my $dom = Mojo::DOM->new( $html );

my $links = $dom->find( 'a' ); # a collection

say $links->join( "\n" );

The find takes a CSS selector (coming up in the next chapter, chapter 5) and walks the DOM looking for matching pieces. It returns a collection of what it finds. The example extracts all of the anchor tags (complete with opening tag, contents, and closing tag):

Bender Fry Hermes

It gets better. The elements in that collection are still Mojo::DOM::HTML ob- jects. Use the attr method to get the attribute values. Use that as the first argument to map and give it href as its argument:

my $links = $dom->find( 'a' )->map( attr => 'href' ); say $links->join( "\n" );

Now you get the URL in every anchor:

http://www.example.com/bender http://www.example.com/fry http://www.example.com/hermes

58 | The Document Object Model Use text to get the stuff inside the tags:

my $links = $dom->find( 'a' )->map( 'text' ); say $links->join( "\n" );

This time the output is stuff between the opening and closing tags:

Bender Fry Hermes

This only includes text not inside of nested HTML tags. Try this by parsing this paragraph with some styled text in it. The new can work with snippets of HTML:

say Mojo::DOM->new( '

Bender Fry Leela

' ) ->find( 'p' ) ->map( 'text' ) ->join( "\n" );

The output finds the text outside the tag. There are two spaces between the words because there are two spaces around the interior tag:

Bender Leela

Use all_text instead to strip out interior HTML:

say Mojo::DOM->new( '

Bender Fry Leela

' ) ->find( 'p' ) ->map( 'all_text' ) ->join( "\n" );

Now the output has the text from the interior tags too:

Bender Fry Leela

In this example you know that you have exactly one P tag so you could have

Walking Through HTML or XML | 59 also written it without the map:

say Mojo::DOM->new( '

Bender Fry Leela

' ) ->at( 'p' ) ->all_text;

You can extract parts of the DOM using much more sophisticated situations with fancier selectors. Those get their own chapter in chapter 5.

Parsing XML

You might be tempted to use XML::Simple to turn an XML string into a Perl data structure; many people are. Do not be seduced to the dark side! That module is only suitable for the simplest problems or until you can arrange to use something better—perhaps XML::Hash::XS or XML::Fast. Or, for a really nasty problem, XML::Twig.

However, if you only need to grab a few values out of the XML data you can extract them from the DOM directly. Mojo::DOM switches to its XML mode if it sees the XML declaration at the beginning of the string. Almost nothing else in your program needs to change:

use Mojo::DOM;

say Mojo::DOM->new( '

Bender Fry Leela

' ) ->find( 'p' ) ->map( 'all_text' ) ->join( "\n" );

How does Mojo::DOM know that it’s XML? There is the DOCTYPE declaration there! But, XML and XHTML (and clean HTML) play by the same rules so the situation is not that different.

60 | The Document Object Model Modifying the DOM

There are many methods to modify the DOM but I’m going to ignore those other than this tiny example that builds a new HTML document from an empty string:

use Mojo::Base -strict; use Mojo::DOM; my $dom = Mojo::DOM->new( '' ); $dom->append_content( '' ); $dom ->at('html') ->append_content( '

Planet Express

' ); say $dom->to_string;

Each modification returns a new DOM object but you can assign thatback to the same variable:

Planet Express

Summary

This is just the start of your DOM processing. The mechanism is easy, and in the next chapter you’ll see the myriad ways to find exactly the parts of HTML you want to extract.

Modifying the DOM | 61 62 | The Document Object Model Chapter 5

CSS Selectors

The alien mothership! If we hit that bullseye, the rest of the dominoes will fall like a house of cards. Checkmate.

Cascading Style Sheets (CSS) provide a way to apply style and behavior to HTML tags based on their name, attributes, or other features. Mojo::DOM supports these up to Selectors Level 3 with some experimental features from Selectors Level 4. This chapter highlights the common ones you use.

The first part of the chapter shows the basic Mojo mechanism. After that, I try to give you an example of each of the selectors. You don’t need to read this chapter straight through. Read the first part to see the basic Mojo mechanism then refer back to the selector as you need it. Scan the chapter just to see what’s possible.

Finding the next thing

Mojo::DOM’s at returns the first thing that matches the selector. The result is another Mojo::DOM object. This finds the section of the DOM for the first a

| 63 tag:

my $first_a_found = $dom->at( 'a' );

In that example, the selector is simply the tag name with no other filtering. Any a tag will do.

The find returns a Mojo::Collection object that contains everything that matches the selector. This one returns all the portions of the DOM that are a tags:

my $all_the_a = $dom->find( 'a' ); # collection

The response object (which I haven’t shown yet) provides a DOM and this is how you will normally receive such a thing. Use at and find on that:

my $tx = $ua->get( 'http://www.example.com' ); unless( $tx->result->is_success ) { ... }

my $first_a_found = $tx->result->dom->at( 'a' ); my $all_the_a = $tx->result->dom->find( 'a' );

Sometimes it’s a bit of a trick to determine the appropriate selector. Play with the DOM on its own until you get the selectors right. Grab the HTML source and save it rather than make a fresh request each time. You don’t want to annoy websites with your broken code. Save a copy of the page and work with that until you get it. This program caches the result of a web request then uses the saved result:

use Mojo::UserAgent;

my $saved_html = 'saved.html';

unless( -e $saved_html ) { say "Fetching fresh HTML"; my $ua = Mojo::UserAgent->new; my $tx = $ua->get( 'https://www.mojolicious.org' ); unless( $tx->result->is_success ) { ... }

64 | CSS Selectors $tx->result->save_to( $saved_html ); }

my $data = Mojo::File->new( $saved_html )->slurp;

say "Tags are: ", Mojo::DOM->new( $data ) ->find( '*' ) # the "universal" selector ->map( 'tag' ) ->uniq ->join( ' ' );

Running this program twice shows you that you fetched the resource the first time only. After that you have a saved copy of the data and use that while you figure out how you want to transform it:

% perl save-file.pl Fetching fresh HTML Tags are: html head link title script style body a img div picture form input h1 p h2 ul li b pre code % perl save-file.pl Tags are: html head link title script style body a img div picture form input h1 p h2 ul li b pre code

Selecting the Root Element

The DOM is a tree (in the computer science sense). A tag in the DOM knows it’s parent (what contains it), what’s next to it, and what it contains.

The root element of an HTML document is usually the tag so this may be interesting with XML. The special :root selector starts at whatever it thinks the start is:

use Mojo::DOM;

Selecting the Root Element | 65 my $data = <<~'XML'; Fry Leela Bender XML

my $dom = Mojo::DOM->new( $data );

say "Root tag is: ", $dom->at( ':root' )->tag;

The output shows the top level tag without having advance knowledge of the structure (and you can ignore DTDs for this book):

Root tag is: employees

Selecting Any Element

The * “universal selector” matches any tag. With at that gets you the first tag. The at with the universal selector finds the first tag while the find with * matches every tag:

use Mojo::DOM;

66 | CSS Selectors my $data = <<~'XML'; Fry Leela Bender XML

my $dom = Mojo::DOM->new( $data );

say "Root tag is: ", $dom->at( '*' )->tag;

say "All the tags are: ", $dom->find( '*' ) ->map( 'tag' ) ->uniq ->join( " " ) ;

The output shows the root element and the names of all of the tags it in the XML:

Root tag is: employees All the tags are: employees employee name

It’s the same thing with HTML although the data look a bit different:

use Mojo::DOM;

my $data = <<~'XML';

  • Fry
  • Leela
  • Bender
XML

Selecting Any Element | 67 my $dom = Mojo::DOM->new( $data );

say "Root tag is: ", $dom->at( '*' )->tag; say "All the tags are: ", $dom->find( '*' ) ->map( 'tag' ) ->uniq ->join( " " ) ;

Now the output shows the HTML tag names:

Root tag is: html All the tags are: html ul li

Selecting by Tag

One step up from matching any tag is specifying exactly which tag you want. This is technically called the “type selector” in CSS lingo.

Here’s the same as the program from before with some changes to find and the parts that you extract with map. Now it matches all the a tags and gets their hrefs:

use Mojo::UserAgent;

my $saved_html = 'saved.html';

unless( -e $saved_html ) { say "Fetching fresh HTML"; my $ua = Mojo::UserAgent->new; my $tx = $ua->get( 'https://www.mojolicious.org' ); unless( $tx->result->is_success ) { die "Could not fetch! Error is ", $tx->result->code }

68 | CSS Selectors $tx->result->save_to( $saved_html ); }

my $data = Mojo::File->new( $saved_html )->slurp;

say "Links are:\n\t", Mojo::DOM->new( $data ) ->find( 'a' ) ->map( attr => 'href' ) ->uniq ->join( "\n\t" );

Here are some of the links it extracted:

Links are: http://mojoconf.org https://metacpan.org/release/Mojolicious https://mojolicious.org https://mojolicious.org/perldoc https://github.com/mojolicious/mojo/wiki https://github.com/mojolicious/mojo

Selecting by Attribute

Tags can have attributes and those attributes can have values. In this exam- ple the a tag has an href attribute; the value is the stuff inside the quotes:

Mojo

You’ve already extracted these in the example in the previous section. You used map instead of selectors to filter the tags:

my @links = Mojo::DOM ->new( $data ) ->find( 'a' )

Selecting by Attribute | 69 ->map( attr => 'href' ) ->each;

You can use an “attribute selector” to select tags with particular attributes. Here’s an amendment to the previous program. It now extracts all the img tags and filters for the alt text:

say "Alt text is:\n\t", Mojo::DOM->new( $data ) ->find( 'img' ) ->map( attr => 'alt' ) ->grep( sub { defined }) ->uniq ->join( "\n\t" );

You need to add a grep there because some tags may not have the alt at- tribute; you get uninitialized value warnings without those. You selected too many things and had to discard some of them. Rather than doing that you can put the attribute you want in braces after the tag. The img[alt] selector only extracts img tags that have have the alt attribute, so you no longer need the grep:

say "Tags are:\n\t", Mojo::DOM->new( $data ) ->find( 'img[alt]' ) ->map( attr => 'alt' ) ->uniq ->join( "\n\t" );

Here are a few more attribute selector examples. Select all the a with href, all the script with src, or all the meta with charset:

a[href] script[src] meta[charset]

70 | CSS Selectors Selecting by Attribute Value

In the previous section you selected by the presence of an attribute. One better than that is selecting by a particular value in that attribute. There are a few ways you can do this.

By exact match

Specify an exact value for an attribute by placing an = after the attribute name. The img[alt=“Bender”] selector matches the exact value Bender (in- cluding case):

use Mojo::DOM;

my $html = <<~'HTML'; Bender Bender working Bender drinking Bender laughing HTML

my $dom = Mojo::DOM->new( $html );

say $dom->find( 'img[alt="Bender"]' )->join( "\n" );

The output shows that the selector found only one of the img tags. Note that the stringified form of the DOM is not necessarily the same as the original:

Bender

There’s an experimental feature where you can match with case-insensitivity. Put an i flag at the end of the selector. This selector still matches “Bender” even though the value in this selector is all lowercase:

say $dom->find( 'img[alt="bender" i]' )->join( "\n" );

Selecting by Attribute Value | 71 By starting text

Put a ^ in front the = to match partial text at the beginning of a value (sort of like the Perl beginning-of-line anchor). This does not have to match the entire value. Here’s a modified line for the last line of the previous program using the same data:

say $dom->find( 'img[alt^="Bender"]' )->join( "\n" );

Now the selector find all the of elements that begin with Bender (case sensi- tive):

Bender Bender working Bender drinking Bender laughing

By ending text

Match the ending text is similar to matching the beginning, but with a $ before the = (similar to the end-of-line anchor):

say $dom->find( 'img[alt$="ing"]' )->join( "\n" );

Now the selector matches three of the img tags:

Bender working Bender drinking Bender laughing

72 | CSS Selectors By substring

Put a * before the = to match an exact substring in the middle of the attribute value:

say $dom->find( 'img[alt*="kin"]' )->join( "\n" );

Now you get two of the img tags. Each of these have kin somewhere in the alt attribute:

Bender working Bender drinking

By exact match on multiple values

An attribute value may be treated as multiple distinct values. Use a ~ before the = to treat the value as if it is multiple values separated by whitespace.

This HTML uses the alt tag to list each person in the image. You can’t match the substring Bender because that shows up in the middle of the distinct name BenderBot2000. The text must match one of those virtually distinct values exactly:

use Mojo::DOM;

my $html = <<~'HTML'; Bender Fry Leela Leela Fry Bender Bender Leela BenderBot2000 HTML

my $dom = Mojo::DOM->new( $html );

say $dom->find( 'img[alt~="Bender"]' )->join( "\n" );

Selecting by Attribute Value | 73 This pulls out the img tags that have exactly Bender:

Bender Fry Leela Bender Fry Bender Leela

Using a | instead of the ~ assumes that the values are separated by hyphens. This is useful when one of the elements has it’s own whitespace (such as Bender Bot2000):

use Mojo::DOM;

my $html = <<~'HTML'; Bender-Fry-Leela Leela Bender-Fry Bender-Leela Bender Bot2000 HTML

my $dom = Mojo::DOM->new( $html );

say $dom->find( 'img[alt|="Bender"]' )->join( "\n" );

This selects the same img tags although the values are formatted slightly differently:

Bender-Fry-Leela Bender-Fry Bender-Leela

By class

There are two shortcuts for selecting by attribute values; the class and id attributes have special syntax. To select by a class value, separate the tag name and the value with a dot, such as div.mobile. This is a “class selector”:

74 | CSS Selectors use Mojo::DOM;

my $data =<<~'HTML';

Fry

Leela

HTML

my $dom = Mojo::DOM->new( $data )->at( 'div.mobile' ); say $dom;

This one selects only the div with the mobile tag:

Leela

The class tag is special because it allows several whitespace-separated val- ues. Each of these divs has two distinct classes rather than one value with a space in it:

use Mojo::DOM;

my $data =<<~'HTML';

Employee of the Month

Bender

Fry

Leela

Employee of the Day

Bender

Fry

Leela

HTML

say "Tags are:\n\t", Mojo::DOM->new( $data ) ->find( 'div.bender' ) ->join( "\n\t" );

The div.bender selector finds each div tag (and everything it contains) that

Selecting by Attribute Value | 75 has the bender class:

Tags are:

Bender

Bender

Append another dot and class value to select a combination of classes:

say "Tags are:\n\t", Mojo::DOM->new( $data ) ->find( 'div.bender.day' ) ->join( "\n\t" );

Now you get only one of the divs:

Tags are:

Bender

Note that class is treated specially here. Since the actual value is multiple values separated with spaces, you can’t use a part of that bigger value to find just one of the classes:

say "Tags are:\n\t", Mojo::DOM->new( $data ) ->find( 'div[class=bender]' ) # won't find it ->join( "\n\t" );

The *= looks for a substring in the attribute, but it’s still has the substring problem. It might find bend in the middle of bender:

say "Tags are:\n\t", Mojo::DOM->new( $data ) ->find( 'div[class*=bend]' ) ->join( "\n\t" );

76 | CSS Selectors Any element with that class

The class selector doesn’t need to select the same tag each time. The universal selector (implicit or explicit) matches any tag that has that combination of classes.

use Mojo::DOM;

my $data =<<~'HTML';

Robot of the Day

Human of the Day

Mutant of the Day

Bender

Fry

Leela

HTML

say "Tags are:\n\t", Mojo::DOM->new( $data ) ->find( '.leela' ) ->join( "\n\t" );

The result is the HTML with the leela class:

Tags are:

Mutant of the Day

Leela

I don’t recommend creating HTML that has all the data then using CSS to display only some of it. But, you are going to see many interesting situations when you start scraping the crazy ideas other people implement!

By ID

Tags can have a special id attribute to uniquely identify that tag directly. No other tag should have the same id value. This can make it easy to find

Selecting by Attribute Value | 77 parts of the content when the structure changes since you can go directly to the tag that you want.

Selecting by id is similar to that for a class but you use a # instead of the dot. There should only be one tag with a particular id value, so at is appropriate:

use Mojo::DOM;

my $data =<<~'HTML';

Bender

Fry

Leela

HTML

my $dom = Mojo::DOM->new( $data )->at( 'div#second' ); say $dom;

The at finds the first thing that matches the selector. The tag musthavean id attribute and that value must be exactly second. You get back the div with Fry:

Fry

The id value should be unique so you don’t need the tag name. Leave that off and specify only the id value. This also works:

my $dom = Mojo::DOM->new( $data )->at( '#second' );

A * (the universal selector) can specify the id value:

my $dom = Mojo::DOM->new( $data )->at( '*#second' );

Multiple attributes

You aren’t limited to matching one attribute of a tag. Specify multiple values in additional square braces:

78 | CSS Selectors use Mojo::DOM;

my $data = <<"HTML"; robot robot robot robot HTML

say Mojo::DOM ->new( $data ) ->find( 'img[alt=robot][width=500]' ) ->join( "\n" );

Several of the img tags have the right alt value and some have the right width value. There’s only one that has both of them at the same time:

robot

Selecting by Relative Position

A selector can match parts that are next to another tag. That tag might come before or after a different tag, or be contained by or contain that tag. You might know that a particular div contains what you want even though you’re not sure what exactly will be in that div.

Preceded by

A + between two selectors matches the element in the righthand part of the selector only if it immediately follows the element in the lefthand part. Noth- ing else can be between the two tags in the selector. Here’s some HTML with some h2 elements followed by some p elements. You only want the topic

Selecting by Relative Position | 79 paragraphs immediately after the headings. The h2 + p does that:

use Mojo::DOM;

my $data = <<~'HTML';

There are different types of characters.

Robots

Robots were supposed to help humans but now there are more robots than humans.

Humans

Humans were the dominant species on Earth.

There are other sorts of characters.

HTML

say Mojo::DOM ->new( $data ) ->find( 'h2 + p' ) ->join( "\n" );

You only match two paragraphs. The first p in the HTML snippet doesn’t follow a header; although the last paragraph follows a header it isn’t imme- diately after it. Neither of those show up in the results:

Robots were supposed to help humans but now there are more robots than humans.

Humans were the dominant species on Earth.

If you want any element that follows, use a ~ (tilde) instead of the +:

say Mojo::DOM ->new( $data ) ->find( 'h2 ~ p' ) ->join( "\n" );

This matches everything but the first paragraph since it’s before any h2. It matches the two p tags that come after the last h2 tag:

80 | CSS Selectors

Robots were supposed to help humans but now there are more robots than humans.

Humans were the dominant species on Earth.

There are other sorts of characters.

Succeeded by

This is a trick section because you don’t have a way to do this yet, at least not through Mojo. Often you want to extract elements only if they contain some other element. For example, you want to get all of the table rows that contain a td element (and not th elements).

You may have encountered the jQuery :has:

$( "tr" ).has( "td" ) $( "tr:has(td)" )

There is a :has in CSS Selectors Level 4, but as I write this almost no browser supports it.

There’s no surprise reveal. You don’t get to use this in Mojo, but I knew you were going to want it just like I want to use it.

However, you can do a little bit of extra work. You can select all the the TR tags, for example, then reject any that don’t have a TD tag. Consider this table example:

NameSpecies
BenderRobot

Selecting all the TR tags includes the header row with the TH tags. If you tried a selector such as tr td, you don’t get the results by rows:

use Mojo::DOM;

Selecting by Relative Position | 81 my $html = <<~'HTML';

NameSpecies
BenderRobot
FryHuman
HTML

my $dom = Mojo::DOM->new( $html );

say $dom ->find( 'tr td' ) ->join("\n")

The output is a series of TD tags where you don’t necessarily know the group- ing without advance knowledge of the table (and even then you now have to reconstruct it):

Bender Robot Fry Human

Do it in two steps instead. Select all the tr tags and use grep to get the ones that you want:

use Mojo::DOM;

my $html = <<~'HTML';

NameSpecies
BenderRobot
FryHuman
HTML

my $dom = Mojo::DOM->new( $html );

82 | CSS Selectors say $dom ->find( 'tr' ) ->grep( sub { $_->at('td')}) ->join("\n")

Children

The official CSS Selectors definition says a tag is a “child” of anothertagif that other tag is a parent of the first one. It then declines to define whata parent is. It’s probably a tag that has children. That’s not so helpful.

A tag is a child of another tag if it’s immediately contained in that tag. In this snippet, the div immediately contains a p tag. That p is a child of the div:

This is a child paragraph

A parent can have multiple children (but each child has exactly one parent). This div has three child p tags:

This is a child paragraph

This is another child paragraph

This is the last child paragraph

This top level div also has multiple children: two this time. However, it con- tains another div that also has two children. Those “grandchild” tags are still descendents of the top-level div, but they aren’t children of the top-level div because they are not immediately contained in it:

This is a child paragraph

This is another child paragraph

Selecting by Relative Position | 83

This is the last child paragraph

Here’s an example that has three img tags. Two of them are children to div elements. To select just the imgs inside a div, put the parent and child tags next to each other with a > between them:

use Mojo::DOM;

my $html = <<~'HTML';

HTML

say Mojo::DOM->new( $html ) ->find( 'div > img' ) ->map( attr => 'src' ) ->join( "\n" ) ;

The output only shows the image names from the tags inside the div:

Bender.png Leela.png

A parent or child can be more specific. This selector adds a class name tothe div:

say Mojo::DOM->new( $html ) ->find( 'div.mutants > img' ) ->map( attr => 'src' ) ->join( "\n" ) ;

The output shows only one image name now because there’s only one div that has the mutant class:

84 | CSS Selectors Leela.png

Change the HTML a little to wrap an img in a span. Now that img tag is a child of span but not a child of div:

use Mojo::DOM;

my $html = <<~'HTML';

HTML

say Mojo::DOM->new( $html ) ->find( 'div > img' ) ->map( attr => 'src' ) ->join( "\n" ) ;

This doesn’t select the image of Bender because its tag is not a child of a div (it’s a child of the span).

Leela.png

Line up several tags to get grandchildren (however many deep that you like:

say Mojo::DOM->new( $html ) ->find( 'div > span > img' ) ->map( attr => 'src' ) ->join( "\n" ) ;

Now it’s just one image name:

Bender.png

Selecting by Relative Position | 85 The universal selector works where you don’t know the intervening tag. This has the same output as the previous example:

say Mojo::DOM->new( $html ) ->find( 'div > * > img' ) ->map( attr => 'src' ) ->join( "\n" ) ;

Child by count

A parent can have many children but that doesn’t mean that you want to select all of them. Selecting ul > li matches all the list items:

use v5.14; # for s///r use Mojo::DOM;

my $html = <<~'HTML';

  • Kif Kroker
  • Zapp Brannigan
  • Nibbler
  • Scruffy
HTML

say Mojo::DOM->new( $html ) ->find( 'ul.characters > li' ) ->map( sub { $_->text =~ s/\s+\z//r } ) ->join( "\n" ) ;

Note the complicated code in map. Since the HTML leaves off the closing tag for the li, the DOM closes it for you when it runs into the next li. That leaves the newline and indent space in the text. You could fix that by closing the li tags but it seems nobody does that. Sometimes you need to massage the text a little. Every item in the list (without its trailing whitespace) is in

86 | CSS Selectors the output:

Kif Kroker Zapp Brannigan Nibbler Scruffy

Use :nth-child(N) to specify a particular child in a series (counting from 1). This selects the second child:

say Mojo::DOM->new( $html ) ->find( 'ul.characters > :nth-child(2)' ) ->map( sub { $_->text =~ s/\s+\z//r } ) ->join( "\n" ) ;

The output has one line:

Zapp Brannigan

Modify the ul to put a p in it after the first list item (now closed). Don’t worry about the invalid HTML; the web is full of it:

use v5.14; # for s///r use Mojo::DOM;

my $html = <<~'HTML';

  • Kif Kroker
  • Why is this here?

  • Zapp Brannigan
  • Nibbler
  • Scruffy
HTML

say Mojo::DOM->new( $html ) ->find( 'ul.characters > :nth-child(2)' ) ->map( sub { $_->text =~ s/\s+\z//r } )

Selecting by Relative Position | 87 ->join( "\n" ) ;

The output shows the text from the p:

Why is this here?

You probably didn’t mean to select that p. Constrain the child further by making it an li tag:

say Mojo::DOM->new( $html ) ->find( 'ul.characters > li:nth-child(2)' ) ->map( sub { $_->text =~ s/\s+\z//r } ) ->join( "\n" ) ;

This outputs nothing. It’s not looking for the second li. It’s looking at the second child then checking to see if that is an li. It’s not. You’ll fix that in a moment. Without the errant p it works out.

Changing the definite number to odd or even finds the elements whose posi- tions match that (still counting from 1):

say Mojo::DOM->new( $html ) ->find( 'ul.characters > :nth-child(even)' ) ->map( sub { $_->text =~ s/\s+\z//r } ) ->join( "\n" ) ;

The output has multiple elements this time:

Zapp Brannigan Scruffy

88 | CSS Selectors Counting backward

:nth-last-child(N) is like :last-child(N) but counts from the end instead. The last element is 1, the next to last is 2, and so on.

say Mojo::DOM->new( $html ) ->find( 'ul.characters > :nth-last-child(2)' ) ->map( sub { $_->text =~ s/\s+\z//r } ) ->join( "\n" ) ;

Using the same HTML snippet from the previous section, you get the second to last element:

Nibbler

You can use odd and even here and those are evaluated from the positions starting at the end too. The last element is position 1 and is odd regardless if there are an even number elements in either direction.

Shortcuts for special positions

:first-child is a shortcut for :nth-child(1) and :last-child is shortcut for :nth-last-child(1). Here’s a table that has a column with scores. Extract those scores and sum them:

use Mojo::DOM;

my $html = <<~'HTML';

IDNameScore
1 Nibbler 1023
27Scruffy 39
5 Zoidberg5834

Selecting by Relative Position | 89 HTML

my @scores = Mojo::DOM->new( $html ) ->find( 'table.scores > tr > td:last-child' ) ->map( 'text' ) ->each ;

use List::Util qw(sum); my $grand = sum( @scores ); say "Grand total: $grand";

The selector finds each table row that has td tags (and not the table heading tags th). In each row it extracts the last column to get all of the scores. From that it’s easy compute the sum:

Grand total: 6896

You don’t need List::Util there, though. Mojo::Collection has reduce method that can handle that:

my $grand = Mojo::DOM->new( $html ) ->find( 'table.scores > tr > td:last-child' ) ->map( 'text' ) ->reduce( sub { $a + $b } ) ;

Counting by type

The :nth-child(N) counted by position of all siblings then looked at the tag. The nth-of-type(N) does that the other way around by matching the type then looking at its position in the siblings of just that type. You must specify the element type for this one. This doesn’t require that a tag be a child, either. As before, this starts counting at 1.

In a prior example you saw a stray p element inside a list. That counts as a

90 | CSS Selectors sibling to those li tags and :nth-child(N) counts it. Instead, use li:nth-of- type(N) to skip over any siblings that aren’t li tags:

use v5.14; # for s///r use Mojo::DOM;

my $html = <<~'HTML';

  • Kif Kroker
  • Why is this here?

  • Zapp Brannigan
  • Nibbler
  • Scruffy
HTML

say Mojo::DOM->new( $html ) ->find( 'ul.characters > li:nth-of-type(2)' ) ->map( sub { $_->text =~ s/\s+\z//r } ) ->join( "\n" ) ;

Now you get the second list item:

Zapp Brannigan

Changing that from a definite number to odd selects elements in the odd po- sitions:

say Mojo::DOM->new( $html ) ->find( 'ul.characters > li:nth-of-type(odd)' ) ->map( sub { $_->text =~ s/\s+\z//r } ) ->join( "\n" ) ;

Now there are two results:

Kif Kroker Nibbler

Selecting by Relative Position | 91 There’s also :first-of-type and :last-of-type, but you don’t need another example for those.

Descendents

A descendent of a tag is any tag that’s a child of that tag or a child of any descendent of that tag. That sounds a bit circular (again). A child is a de- scendent tag of its parent. Any child of that child (grandchildren) is also a descendent of the original parent.

To select a descendent, put the tags next to each other with only whitespace between them. The div img selector finds any img tags found anywhere in a div:

use Mojo::DOM;

my $html = <<~'HTML';

HTML

say Mojo::DOM->new( $html ) ->find( 'div img' ) ->map( attr => 'src' ) ->join( "\n" ) ;

Now the output contains the image name that was under the span:

Bender.png Leela.png

92 | CSS Selectors Combinations

Combining multiple selectors with a comma matches any elements that match any one of them. Here are a bunch of img and p tags. Some of the img tags have alt attributes and one of the p tags has a class. You want to select just those elements:

use Mojo::DOM;

my $html = <<~'HTML'; Bender Leela

Planet Express is closed.

HTML

say Mojo::DOM->new( $html ) ->find( 'img[alt], p.copyright' ) ->join( "\n" ) ;

The output shows the elements that matched any of the selectors:

Bender Leela

The :matches() is an experimental feature from Selectors Level 4 that does the same thing (unless it changes):

say Mojo::DOM->new( $html ) ->find( ':matches( img[alt], p.copyright )' ) ->join( "\n" ) ;

Selecting by Relative Position | 93 Matching elements that aren’t

The :not() matches elements that don’t match what you’ve specified. This is an experimental feature from Selectors Level 3 and may change, but here’s what it does so far. It selects elements by what they aren’t. You want to match everything other than those particular img tags:

say Mojo::DOM->new( $html ) ->find( ':not( img[alt] )' ) ->join( "\n" ) ;

Anything that’s not an img tag with an alt tag shows up in the output:

Planet Express is closed.

Selectors Level 4 allows you to specify a series of selectors inside :not() to ex- clude any element that matches any of these. That is, the excluded element doesn’t have to match every selector you specified. This selector excludes anything that’s an img or any element that class a copyright class:

say Mojo::DOM->new( $html ) ->find( ':not( img, .copyright )' ) ->join( "\n" ) ;

The non-classed paragraph is the only match:

Planet Express is closed.

Some sites can trap web scrapers by including links that they don’t actually display. In this snippet, the a tag for the first link has a style attribute that specifies display:none. You won’t see that link in your interactive browser because the styling hides it. However, you’ll probably extract that link in a

94 | CSS Selectors web spider. If you know that trick, you can filter those links before you use them:

use Mojo::DOM;

my $html = <<~'HTML';

This is a paragraph

Bender Leela Fry HTML

my $dom = Mojo::DOM->new( $html );

say $dom ->find( 'a:not( [style^=display] )' ) ->map( attr => 'href' ) ->join( "\n" );

This selects all the links that don’t have that

/leela /fry

Here’s a better example that fetches the Unicode 9 emojis. In this page they are listed in li tags that each have an a tag. However, there are other ele- ments that would match li a. Those extra li tags are under ul tags with a class, though, so you can exclude them by attaching the :not to the ul:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my $url = 'https://blog.emojipedia.org/new-unicode-9-emojis/'; my $tx = $ua->get( $url );

die "That didn't work!\n" unless $tx->result->is_success;

say $tx->result ->dom

Selecting by Relative Position | 95 ->find( 'ul:not( [class] ) li a' ) ->map( 'text' ) ->join( "\n" );

This particular code example is expanded in Find the new emojis in Perl’s Unicode support.

Summary

Selectors allow you to extract parts of a structured document. Mojo supports CSS Selectors Level 3 which has some powerful ways to slice and dice the data. Most of these are simple examples but you can combine them in various complex ways to get most of what you want.

96 | CSS Selectors Chapter 6

Requests

When you do things right, people won’t be sure you’ve done anything at all.

A request is the start of the HTTP conversation and there’s quite a bit more that Mojo does to make this happen. This chapter focuses on the various ways to build a request and send the request. chapter 9 expands on this to manage non-blocking concurrent requests.

The Request Process

When you call get it makes the request immediately and your program waits until it is done. It returns a Mojo::Transaction::* object:

my $ua = Mojo::UserAgent->new; my $tx = $ua->get( 'https://www.httpbin.org' );

Several steps happen before the transactor actually makes the request. Let’s

| 97 go through them even though you usually don’t need to implement this level of detail.

First, the build_tx fills in the request portion of a new transaction before contacting a server. There’s a transactor object that prepares everything:

my $tx = $ua->build_tx( GET => 'https://www.httpbin.org' );

The build_tx is the same as diving deeper to use the transactor object itself, although normally you wouldn’t want to do this:

my $t = Mojo::UserAgent::Transactor->new; my $tx = $t->tx( GET => 'https://www.httpbin.org' );

Since you haven’t contacted the server yet, you can change or augment the request through that transaction:

$tx->req->headers->header( 'X-Bender' => 'Bite my shiny metal ass!' );

The start then takes a transaction and actually performs the request:

$ua->start( $tx );

This makes the request, receives the response, and parses the response to create the result object that it adds to the transaction. After that you look in that transaction object to see what happened. You’ll have to wait until the next chapter to read how you’ll deal with those.

98 | Requests Adding Headers

The request is a Mojo::Message::Request object, although most of the stuff you’ll want to do comes from the more general Mojo::Message base class. You can configure a request without contacting any server. At the higher levels, you’ll use the same techniques to create and make the request in one step.

Adding to a single request

The easiest way to add headers is with a hash reference right after the URL. You can add any headers you like here, although special, application-defined header names often begin with X- so they don’t conflict with pre-defined header names:

my $tx = $ua->get( 'https://www.httpbin.org/headers' => { 'X-Bender' => 'Bite my shiny metal ass', } ); say $tx->req->to_string;

The X-Bender header line now shows up:

GET /headers HTTP/1.1 Accept-Encoding: gzip Content-Length: 0 Host: www.httpbin.org X-Bender: Bite my shiny metal ass User-Agent: Mojolicious (Perl)

Use an array reference for the header value to create multiple header lines with the same name:

Adding Headers | 99 my $tx = $ua->build_tx( GET => 'http://www.httpbin.org', => { 'X-Bender' => [ 'Bite my shiny metal ass', 'Beer!', ], } ); say $tx->req->to_string;

The from_hash allows you to do the same thing after you’ve built the trans- action:

my %extra_headers = ( 'X-Bender' => [ 'Bite my shiny metal ass', 'Beer!', ], );

$tx->req->headers->from_hash( \%extra_headers );

Either way produces a request with two separate X-Bender header lines:

GET / HTTP/1.1 Content-Length: 0 X-Bender: Bite my shiny metal ass X-Bender: Beer! Accept-Encoding: gzip User-Agent: Mojolicious (Perl) Host: www.httpbin.org

Call headers (plural) on the request to get the object that organizes all the headers, then call header (singular) on that to set a particular header:

my $tx = $ua->build_tx( ... ); $tx->req->headers->header( 'X-Bender' => 'Bite my shiny metal ass'

100 | Requests );

The arguments after the first one are separate values for the same header name. This creates two X-Bender header lines:

my $tx = $ua->build_tx( ... ); $tx->req->headers->header( 'X-Bender' => 'Bite my shiny metal ass', 'Beer!' );

This is the same as calling add multiple times:

$tx->req->headers->add( 'X-Bender' => 'Bite my shiny metal ass', ); $tx->req->headers->add( 'X-Bender' => 'Beer', );

But this is fluent programming and add returns the headers object; you can call add again immediately:

$tx->req->headers ->add( 'X-Bender' => 'Bite my shiny metal ass' ); ->add( 'X-Bender' => 'Beer' );

There’s a trick here, though. Some operations “finalize” the headers and con- tent. For example, peeking at the request finalizes the request:

say $tx->req->to_string;

Once the request stringifies the headers to make the actual request, it caches the result. You can make changes to the object, but the request that comes out doesn’t change.

You can mess with the innards of the object to delete the cached strings, but you’re not supposed to do that. Instead, ensure that you set up your request all at once then inspect it.

Adding Headers | 101 Removing a header

If you’d like to delete a header that’s easy too:

$tx->req->headers->remove( 'User-Agent' );

Removing a header doesn’t mean it won’t be in the request. There might be other things after your call to remove that add back the same headers.

Multiple values in one header

Some headers can take multiple values in a single line. The Accept header tells the server what sorts of content that your client can process. The append adds a value to header line by appending a comma, a space, and the next value. This works so long as the server expects that sort of arrangement.

The Accept header values are MIME types such as text/plain. Interactive browsers often set this to */* because they handle most things they are likely to encounter but your application may be different. Perhaps you only want JSON or plain text:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new; my $tx = $ua->build_tx( GET => 'http://www.httpbin.org', );

$tx->req ->headers ->accept( 'application/json' ) ->append( Accept => 'text/plain' ) ;

say $tx->req->to_string;

The output shows that the Accept header has two MIME types in the same

102 | Requests line, and that the values are separated by a comma:

GET / HTTP/1.1 Host: www.httpbin.org Accept: application/json, text/plain User-Agent: Mojolicious (Perl) Accept-Encoding: gzip Content-Length: 0

The GitHub API is particularly interesting in its use of Accepts to turn on experimental features for enhanced responses. In the repositories part of their API, include a repositories “topics” by including a special MIME type in Accept:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my $url = 'https://api.github.com/users/briandfoy/repos'; my $topics_type = 'application/vnd.github.mercy-preview+json';

my $tx = $ua->build_tx( GET => $url ); $tx->req ->headers ->append( Accept => $topics_type ) ;

And, now that you know about topics, go to GitHub and add “perl” as a topic to all of your Perl projects or “mojolicious” to all of those projects. Let your (perl) freak flag fly.

Pre-defined header name methods

Some headers have pre-defined method names in Mojo::HTTP::Headers. These allow you a shortcut to setting their value.

There’s a User-Agent heading that identifies your client. If you don’t change anything, Mojo sets this to Mojolicious (Perl) as you’ve already seen in the

Adding Headers | 103 request output.

Some servers use this to allow or deny access or to provide different features. You may not want the other side to know what you are so you can change that value. The user_agent method does that. Note that the hyphen in the header field becomes an underscore in the method name:

$tx->req->headers->user_agent( $agent_name );

However, the transactor has a name for this. Instead of changing the header for a specific request, change the name at a higher level to apply to allre- quests:

$tx->name( $agent_name );

The Referer header (yes, it’s a historical misspelling) tells the server how you discovered the URL. Some applications may enforce a clear path through its interface, so you may need to specify what the prior page should have been for the next part to work correctly. Beyond that, some servers use this to find who’s linking to resources for analytics or to find out the sourceofbad or outdated links. The method name for this header has “referrer” spelled correctly:

$tx->req->headers->referrer( $source_url );

There’s an ETag response header field that identifies a resource version bya string that you can provide (and it does not specify how you come up with that string). A caching client can tell the server which versions it has with the If-None-Match header field and that string:

my %etags = ( 'https://www.perl.com' => "5b5fcb69-8248", ); if( exists $etags{$url} ) { $tx->req->if_none_match( $etags{$url} ); }

In chapter 8 you’ll see more about conditional responses and caching so you

104 | Requests don’t have to transfer the resource again.

Adding headers to all requests

Sometimes you want to add a header to all requests. This often happens in programs that deal with a single server. The Mojo::UserAgent object emits the prepare event before it tries to connect to the server, and passes the user- agent and transaction as the arguments:

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new; $ua->on( prepare => sub ( $ua, $tx ) { ... } );

You can do whatever you like in that code reference. You could add a header to every request. Modify the transaction object before it connects:

$ua->on( prepare => sub ( $ua, $tx ) { $tx->req->headers->header( 'X-Robot: Bender' ); } );

You don’t have to modify the request in the handler; do whatever you like. Perhaps you merely count how many times you make the request:

my $count = $ua->on( prepare => sub ( $ua, $tx ) { state $count = 0; say "======Request ", $count++; } );

Adding Headers | 105 Authentication

Many services require of some sort of verification. There’s something that you need to include in the request to prove that you are who you say you are. Sometimes that’s as simple as including a shared secret in the header, but sometimes it’s not so simple.

There are actually several concepts that I’ll conflate into “authentication”. There’s the verification of your identity (authentication), the permission fora known user to do something (authorization), and the enforcement of various security rules (access).

Sometimes these distinctions matter. For example, the GitHub API allows you to use a Personal Access Token. That establishes who you are (authen- tication). However, each token can have various “scopes”, such as read-only access to a repo or admin rights on that repo (authorization). Even though GitHub knows who you are through that token, it won’t let you do anything that token’s scopes disallow. In that case, authentication and authorization are tied to the same shared secret.

You can create different credentials to compartmentalize permissions. If your user-agent needs to only read information, it shouldn’t be able to change any of it, for example. For those situations where you want to change some- thing, you can have a separate token with different scopes.

Basic authentication

Basic authentication is as simple as it sounds. You have a username and password that you supply to the server. Mojo allows you to add this to the URL so you don’t have to worry about the headers:

use Mojo::URL; use Mojo::UserAgent;

my $username = ...; my $password = ...;

106 | Requests my $url = Mojo::URL ->new( "http://$username:$password\@www.example.com" );

say $url;

my $ua = Mojo::UserAgent->new; my $tx = $ua->build_tx( GET => $url ); say $tx->req->to_string;

Although you added the userinfo, it does not show up when you stringify the URL. When you build the request, it does show up in the Authorization header:

http://www.example.com GET / HTTP/1.1 Host: www.example.com Authorization: Basic Zm9vOmJhcg== Content-Length: 0 User-Agent: Mojolicious (Perl) Accept-Encoding: gzip

This has the advantage of properly dealing with a literal @ in the password, whereas the URL parser would think it ended the password (leaving the rest of the password in the host):

use Mojo::UserAgent;

my $username = 'foo'; my $password = 'b@r';

my $url = Mojo::URL ->new( "http://$username:$password\@www.example.com" );

say $url;

say "host: ", $url->host; say "userinfo: ", $url->userinfo;

Authentication | 107 The output shows that the parser gets confused:

http://r%40www.example.com host: [email protected] userinfo: foo:b

Instead of letting the parser guess, add the username and password with a method instead:

my $url = Mojo::URL ->new( "http://www.example.com" ) ->userinfo( join ':', $username, $password );

This still has the problem that a colon in the username might cause a prob- lem, so don’t use colons in usernames.

I tend to prefer adding the Authorization header myself since I’m often us- ing some of the authorization methods you’ll see in a moment. I do need to Base64 encode the joined username and password on my own though:

use Mojo::Util qw(b64_encode);

my $tx = $ua->new( $url => { Authorization => 'Basic ' . b64_encode( join ':', $username, $password ) } );

Notice that the header value starts with Basic to denote the type of autho- rization.

API keys

Server APIs may use other authorization methods and their documentation should tell you how to do that. Typically, you register with the API to get a token string. Add that to the Authorization header according to the API documentation. For example, the GitHub API calls it token instead of Basic

108 | Requests (although it handles both):

my $tx = $ua->new( $url => { Authorization => "token $token" } );

OAuth

Complex authentication schemes are one of the tougher things to handle in a command-line client. There’s often multiple web conversations happening to just get to the point that you are allowed to continue. You can implement that yourself but its tedious.

I don’t want to write a book about OAuth (and I don’t think anyone really does). You want to get access and refresh tokens without caring about the machinations to produce them. Fortunately, much of the work has already been done for you. Unfortunately, there are some manual steps at the begin- ning. Once started, though, your interaction is almost nothing.

Let’s use the Google Drive REST as an example. You need a token but you also need to interact with Google to start the process. Register your applica- tion (Google Developers Console) to get an ID and a secret. Use those with OAuth::Cmdline::GoogleDrive to generate both the token and the refresh to- ken that you need (when your token expires, the refresh token lets you get a new one).

There’s still the manual step; OAuth::Cmdline::Mojo starts a server that links to the right Google page that asks you for permission for programmatic ac- cess to services. When you confirm that, it redirects to your Mojo server with the tokens you need:

my $client_id = ...; my $client_secret = ...;

use OAuth::Cmdline::GoogleDrive; use OAuth::Cmdline::Mojo;

Authentication | 109 my $oauth = OAuth::Cmdline::GoogleDrive->new( client_id => $client_id, client_secret => $client_secret, login_uri => "https://accounts.google.com/o/oauth2/auth", token_uri => "https://accounts.google.com/o/oauth2/token", scope => "https://www.googleapis.com/auth/drive ", );

my $app = OAuth::Cmdline::Mojo->new( oauth => $oauth, );

$app->start( 'daemon', '-l', $oauth->local_uri );

This creates a .google-drive.yml file with your tokens. OAuth::Cmdline::GoogleDrive has the values it needs. It also refreshes the token for you. When you use that module in another program, it supplies an access token for you. Use it at the start of every request to add the token:

use Mojo::UserAgent;

use OAuth::Cmdline::GoogleDrive;

my $url = Mojo::URL->new( 'https://www.googleapis.com/drive/v3/files' );

my $google_drive = Mojo::UserAgent->new; $google_drive->on( start => sub { state $oauth = OAuth::Cmdline::GoogleDrive->new; my( $ua, $tx ) = @_; my $token = $oauth->access_token; $tx->req->headers->authorization( "Bearer $token" ); } );

my $tx = $google_drive->get( $url ); foreach my $hash ( $tx->result->json->{files}->@* ) { say join " -> ", $hash->@{ qw(name mimeType) } }

110 | Requests These credentials eventually expire, but OAuth::Cmdline::GoogleDrive should refresh those for you.

Multiple user-agents

You’re not limited to one user-agent though. Construct different user-agents for the servers you might connect to. This compartmentalizes their config- uration and operation. Rate limiting on one user-agent doesn’t have to af- fect the other, for instance. In this example, the user-agent connecting to Travis CI adds an Authorization request header and also sets a special value for Accept. The user-agent connecting to AppVeyor doesn’t need the special Accept:

my $travis_ua = Mojo::UserAgent->new(); $travis_ua->transactor->name( 'My Travis UA/1.0.0' ); $travis_ua->on( start => sub { my( $ua, $tx ) = @_; $tx->req->headers->authorization( "token $ENV{TRAVIS_API_KEY}" ); $tx->req->headers->accept( 'application/vnd.travis-ci.2.1+json' ); } );

my $appveyor_ua = Mojo::UserAgent->new(); $appveyor_ua->transactor->name( 'My AppVeyor UA/1.0.0' ); $appveyor_ua->on( start => sub { my( $ua, $tx ) = @_; $tx->req->headers->authorization( "Bearer $ENV{APPVEYOR_API_KEY}" ); } );

Notice that both use API keys, but they denote different types of authoriza- tion.

Aside from this, you can generalize this a bit by looking for the right request header value in a hash or other data structure. If that exists for the server you are contacting, you add the header. If it doesn’t exist, you don’t add the

Authentication | 111 header. Lately I’ve preferred separate user-agents, though.

Adding Message Bodies

A request can send data. A message body contains those data (and the head- ers describe it). You can do this yourself or use one of the content generators to do it for you. You’ll see more of this in chapter 10 when you make your own formatter.

JSON

Sending JSON data is so easy that you don’t need to know how to construct it yourself or even understand what it is. Specify json when you construct the request and give it a Perl data structure. Mojo will convert that data structure to the correct data structure. You’re likely to need to use a POST or other non-idempotent method if you are sending JSON:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my %hash = ( robot => 'Bender', human => 'Fry', mutant => 'Leela', others => [ qw(Farnsworth Nibbler) ], ); # httpbin will show the JSON you passed my $url = 'http://www.httpbin.org/post'; my $tx = $ua->build_tx( POST => $url, json => \%hash );

say $tx->req->to_string;

The request shows the converted data, the correct Content-Type request header, and the correct Content-Length:

112 | Requests POST / HTTP/1.1 Host: www.example.com Content-Type: application/json Content-Length: 83 Accept-Encoding: gzip User-Agent: Mojolicious (Perl)

{"human":"Fry","mutant":"Leela","others": ["Farnsworth","Nibbler"],"robot":"Bender"}

Run this to look at the response that httpbin.org sends back. Add this line to the end of your program:

say $tx->result->body;

The JSON is embedded in another JSON response:

HTTP/1.1 200 OK Content-Length: 587 Date: Sun, 14 Apr 2019 23:06:54 GMT Access-Control-Allow-Origin: * Server: nginx Connection: keep-alive Content-Type: application/json Access-Control-Allow-Credentials: true

{ "args": {}, "data": "{\"human\":\"Fry\",\"mutant\":\"Leela\", \"others\":[\"Farnsworth\",\"Nibbler\"],\"robot\": \"Bender\"}", "files": {}, "form": {}, "headers":{ "Accept-Encoding": "gzip", "Content-Length": "83", "Content-Type": "application/json", "Host": "httpbin.org", "User-Agent": "Mojolicious (Perl)"

Adding Message Bodies | 113 }, "json":{ "human": "Fry", "mutant": "Leela", "others":[ "Farnsworth", "Nibbler" ], "robot": "Bender" }, "origin": "107.152.104.229, 107.152.104.229", "url": "https://httpbin.org/post" }

Form data

You have two ways to add form data. First, it can show up in the URL. This is easy when something else constructs the URL for you, such as something you cut and paste from an interactive browser:

my $url = 'http://www.httpbin.org/get?human=Fry⏎ &mutant=Leela&others=Farnsworth⏎ &others=Nibbler&robot=Bender'; my $tx = $ua->build_tx( GET => $url );

That shows up in the first line of the request because it’s part of theURL:

GET /?human=Fry&mutant=Leela&others=Farnsworth⏎ &others=Nibbler&robot=Bender HTTP/1.1 User-Agent: Mojolicious (Perl) Host: www.httpbin.org Accept-Encoding: gzip Content-Length: 0

The second way lets Mojo figure out how to put the URL together. Specify form with a hash reference when you build the request, just as you did with

114 | Requests json:

my %hash = ( robot => 'Bender', human => 'Fry', mutant => 'Leela', others => [ qw(Farnsworth Nibbler) ], ); my $url = 'http://www.httpbin.org/get'; my $tx = $ua->build_tx( GET => $url, form => \%hash );

say $tx->req->to_string;

The form data are in the first line with the path. Mojo handled all the con- version and put it there for you:

GET /?human=Fry&mutant=Leela&others=Farnsworth⏎ &others=Nibbler&robot=Bender HTTP/1.1 User-Agent: Mojolicious (Perl) Host: www.example.com Accept-Encoding: gzip Content-Length: 0

Try this in your browser and httpbin will show you the form parameters in its output:

% curl "http://httpbin.org/get?human=Fry⏎ &mutant=Leela&others=Farnsworth&others=Nibbler⏎ &robot=Bender" { "args":{ "human": "Fry", "mutant": "Leela", "others":[ "Farnsworth", "Nibbler" ], "robot": "Bender" },

Adding Message Bodies | 115 "headers":{ "Accept": "*/*", "Host": "httpbin.org", "User-Agent": "curl/7.54.0" }, "origin": "107.152.104.229, 107.152.104.229", "url": "https://httpbin.org/get?human=Fry& mutant=Leela&others=Farnsworth&others=Nibbler &robot=Bender" }

Change that to a POST request:

my $url = 'http://www.httpbin.org/post'; my $tx = $ua->build_tx( POST => $url, form => \%hash ); say $tx->req->to_string;

Now the form data show up in the message body, but there’s also a Content- Type and Content-Length header:

POST /post HTTP/1.1 Accept-Encoding: gzip Content-Length: 68 Host: www.example.com Content-Type: application/x-www-form-urlencoded User-Agent: Mojolicious (Perl)

human=Fry&mutant=Leela&others=Farnsworth⏎ &others=Nibbler&robot=Bender

In rare cases, I’ve run into the servers that expect the data in a particular order. In that case you can’t rely on the order that Mojo might pick. Make the message body yourself in that case:

my $tx = $ua->build_tx( POST => 'http://www.example.com' );

$tx->req->body( 'mutant=Leela&robot=Bender' ); $tx->req->headers->content_type(

116 | Requests 'application/x-www-form-urlencoded' );

Mojo will automatically update the Content-Length for you, but you still need to set the Content-Type header.

Uploading a File

A file upload is a particular HTML form input field. You’ve seen “Choose file” buttons that bring up a file picker dialog or allow you to draganddrop a file onto it. Uploading a file through Mojo doesn’t take muchwork.

If the form value is a hash reference and it has a key named file, Mojo finds that file, creates a Mojo::Asset::File object to represent it, and adds that as a multipart form to the message body. It will read the file and send it to the server in chunks when it makes the actual request so it doesn’t waste memory. If it can’t find the file it does the same thing with no data (so,check the file existence first!). The form name doesn’t matter; here it’s file_upload _field_name but it could be anything:

my $ua = Mojo::UserAgent->new; my $tx = $ua->build_tx( POST => 'http://httpbin.org/post' => form => { file_upload_field_name => { file => '/home/bender/upload.txt' } } );

say $tx->req->to_string;

The file contents show up in the message body:

POST /post HTTP/1.1 Accept-Encoding: gzip Content-Length: 136

Uploading a File | 117 Host: httpbin.org Content-Type: multipart/form-data; boundary=tMXDX User-Agent: Mojolicious (Perl)

--tMXDX Content-Disposition: form-data;⏎ name="file_upload_field_name"; filename="upload.txt"

This is the file contents --tMXDX--

Extra keys in the hash show up as headers for that part of the upload. You can say more about the file type or anything else you want todo:

my $ua = Mojo::UserAgent->new; my $tx = $ua->build_tx( POST => 'http://httpbin.org/post' => form => { file_upload_field_name => { file => '/home/bender/upload.txt', 'X-Robot' => 'Bender', 'Content-Type' => 'text/plain', } } );

say $tx->req->to_string;

POST /post HTTP/1.1 Host: httpbin.org Content-Type: multipart/form-data; boundary=umEJX User-Agent: Mojolicious (Perl) Accept-Encoding: gzip Content-Length: 161

--umEJX X-Robot: Bender Content-Disposition: form-data;⏎ name="file_upload_field_name"; filename="upload.txt" Content-Type: text/plain

118 | Requests Death to humans! --umEJX--

There’s still room for the other form input types. You can have a file upload with other query values:

my $tx = $ua->build_tx( POST => 'http://httpbin.org/post' => form => { file_upload_field_name => { file => '/home/bender/upload.txt' }, city => 'New New York', planet => 'Earth', } );

say $tx->req->to_string;

Each form value shows up in its own part. You don’t really care about this because you almost never need to look at it:

POST /post HTTP/1.1 Content-Length: 271 Content-Type: multipart/form-data; boundary=j67eX Accept-Encoding: gzip User-Agent: Mojolicious (Perl) Host: example.com

--j67eX Content-Disposition: form-data; name="city"

New New York --j67eX Content-Disposition: form-data;⏎ name="file_upload_field_name"; filename="upload.txt"

This is the file contents

Uploading a File | 119 --j67eX Content-Disposition: form-data; name="planet"

Earth --j67eX--

You don’t need an actual file though. If the hash key is literally content then its value is the “file” contents and shows up in the request directly. Youcan specify the name with the filename key (this works with a file key too):

my $tx = $ua->build_tx( POST => 'http://httpbin.org/post' => form => { file_upload_field_name => { content => 'Bite my shiny metal ass!', filename => 'taxes.txt', } } );

say $tx->req->to_string;

The string shows up in the multipart data and has the filename that you specified:

POST /post HTTP/1.1 Accept-Encoding: gzip Content-Type: multipart/form-data; boundary=2NGJX Content-Length: 127 Host: example.com User-Agent: Mojolicious (Perl)

--2NGJX Content-Disposition: form-data; name="file_upload_field_name"; filename="taxes.txt"⏎

Bite my shiny metal ass! --2NGJX--

120 | Requests You aren’t limited to one file upload. Use more keys to specify extra files:

my $tx = $ua->build_tx( POST => 'http://httpbin.org/post' => form => { file_upload_field_name => { content => 'Bite my shiny metal ass!', filename => 'taxes.txt', }, other_upload_field_name => { file => '/home/bender/upload.txt', filename => 'taxes2.txt', } } );

say $tx->req->to_string;

Now there is more than one part in the message body:

POST /post HTTP/1.1 Host: httpbin.org Content-Length: 253 Content-Type: multipart/form-data; boundary=J3hmX Accept-Encoding: gzip User-Agent: Mojolicious (Perl)

--J3hmX Content-Disposition: form-data;⏎ name="file_upload_field_name"; filename="taxes.txt"

Bite my shiny metal ass! --J3hmX Content-Disposition: form-data;⏎ name="other_upload_field_name"; filename="taxes2.txt"

This is the file contents

--J3hmX--

Uploading a File | 121 Using an array as the form value allows you to specify multiple files for the same form name:

my $tx = $ua->build_tx( POST => 'httpbin.org/post' => form => { file_upload_field => [ { file => '/home/bender/upload.txt', }, { file => ''/home/bender/upload2.txt', }, ], } );

say $tx->req->to_string;

The form field name is the repeated in the message body but both filesshow up:

POST /post HTTP/1.1 User-Agent: Mojolicious (Perl) Content-Type: multipart/form-data; boundary=hUVkX Content-Length: 247 Accept-Encoding: gzip Host: httpbin.org

--hUVkX Content-Disposition: form-data;⏎ name="file_upload_field"; filename="upload.txt"

This is the file contents

--hUVkX Content-Disposition: form-data;⏎ name="file_upload_field"; filename="upload2.txt"

All humans must die!

122 | Requests --hUVkX--

Streaming uploads

What if the data you want to send to the server are very large (whatever that means to you)? You likely don’t want to read in the contents of a very large file, build a request around it, then send the whole package to theserver.

You can build the initial transaction without the message body. After that, you can specify that the content for the request will come from something handled by Mojo::Asset::*. Mojo then knows where the content will come from and will stream it to the server when that time comes:

my $tx = $ua->build_tx( PUT => $url ); $tx->req->content->asset( Mojo::Asset::File->new( path => $filename ) );

Attach a progress meter to this by subscribing to the progress and finish events. There’s a slight complication because these events are handled by a request; you need to wait for a request to subscribe to them. For that, use the start handler on the user-agent object to get the request when its built, then subscribe to those events:

STDOUT->autoflush;

use Mojo::Base -strict, -signatures; use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new; my $tx = $ua->build_tx( POST => 'http://httpbin.org/anything' );

my $filename = $ARGV[0] // die "Specify a file!\n";

Uploading a File | 123 $tx->req->content->asset( Mojo::Asset::File->new( path => $filename ) );

$ua->on( start => \&start_handler );

$ua->start($tx);

# when the user-agent builds a transaction, add some # handlers to the request object sub start_handler ( $ua, $tx ) { $tx->req->on( progress => \&progress ); $tx->req->on( finish => sub { say "\nUploaded!" } ); }

sub progress ( $message, $part, $offset ) { # don't do anything is there is no message body return unless my $length = eval { $message->headers->content_length; }; # ignore the progress for 'head' return unless $part eq 'body'; print "Progress: ", int( 100 * $offset / $length ), "%\r"; }

You’ll see another progress meter in chapter 8. That one does it a bit differ- ently but it’s the same idea.

Miscellaneous Settings

There are a few other things that you probably rarely need or want to change but are there for you.

124 | Requests Timeouts

Some requests won’t ever reach a server, which might not exist or might not accept your connection (intentionally, overloaded, or otherwise). You don’t want this to stop your program.

$ua->connect_timeout;

If that’s too short (or too long) you can change it. The valuer is the number of seconds to wait:

$ua->connect_timeout(30);

Reüsing connections

If you connect to a site you’re likely to connect to it again very soon. Mojo automatically keeps a certain number of connections open (this is the HTTP idea of “keep alive”). There’s a default number that Mojo will “keep alive”:

my $ua = Mojo::UserAgent->new; say "Max connections is ", $ua->max_connections; # 5

You can change that default. You can have a large number (and maybe larger than the number of the simultaneous connections the computer will allow you):

$ua->max_connections(50_000);

If you set the number to zero (or lower), Mojo doesn’t keep any connection open. Every request creates a new connection. You might like that but real- ize every request will do some extra work:

$ua->max_connections(0); # turn off

Miscellaneous Settings | 125 These settings apply to the entire user-agent. If you want to check that a transaction used this feature, check kept_alive:

say 'Kept alive? ', $tx->kept_alive;

If you don’t want this for all requests, remember that you can have as many user-agents as you like.

Dealing with proxies

Tom Leek’s answer to “Does an HTTPS proxy encrypt traffic between proxy client and server for HTTP requests?” is a gentle introduction to the me- chanics of proxies. But, you don’t need to know any of that because Mojo handles it for you.

Use the proxy method to set both the http and https proxies with whatever information that your network administrators gave to you (or the URLs for the Mojo servers that you built to be your proxies). Mojo::UserAgent::Proxy takes care of this for you; get to that with the proxy method:

$ua->proxy ->http('http://localhost:8080') ->https('https://localhost:8080');

Or maybe you’d like to use SOCKS instead. The short answer is that a SOCKS proxy operates at a lower level than HTTP (so can proxy other protocols too):

$ua->proxy ->http('socks://127.0.0.1:9050')

There are some free proxies you can try, many of which are listed on Free Proxy List. Or, use Mojolicious to write your own!

Instead of specifying the proxies in code, set the MOJO_PROXY to a true value and the user-agent will try to automatically detect the proxies by checking the values in the HTTP_PROXY and HTTPS_PROXY environment variables (all up-

126 | Requests percase or all lowercase).

That automatic detection happens at the start of a request and not when you construct the user-agent. You can do it right away yourself with detect:

my $ua = Mojo::UserAgent->new; my $proxy = $ua->proxy->detect;

The NO_PROXY (or no_proxy) environment variable is a comma-separated list of host substrings to exclude from the proxy. The host name of each request URL is matched against those substrings. If any match then that connection won’t use the configured proxy:

use Mojo::UserAgent;

$ENV{NO_PROXY} = '.mojolicious.org,.perl.org';

my $ua = Mojo::UserAgent->new; my $proxy = $ua->proxy->detect;

my @host = qw( www.httpbin.org www.perl.org www.perl.com );

foreach ( @host ) { say "$_ needs proxy? ", $proxy->is_needed( $_ ) ? 'Yes' : 'No'; }

Some URLs need proxies and some don’t:

www.httpbin.org needs proxy? No www.perl.org needs proxy? No www.perl.com needs proxy? Yes

You don’t need the environment variable. Give the not method an array reference of substrings for hosts or domains that can bypass the proxy:

Miscellaneous Settings | 127 my $ua = Mojo::UserAgent->new; my $proxy = $ua->proxy->detect; $proxy->not( [ qw( .mojolicious.org .perl.org ) ] );

That completely replaces all previous settings; it’s not designed to be some- thing you continually change. To add to the list you need to get the current list and include those in the new value:

my $not = $proxy->not; push @$not, qw( .mojolicious.org .perl.org ); $proxy->not( $not );

Summary

This chapter was about constructing requests to get what you need. If you understand what you need to do for the HTTP bits to work, you should have an easy time preparing the necessary request object. Although there are many situations you did not see in this chapter, the mechanisms to add headers and content stay the same. Figure out what you need for your re- quest and put it in the right place. In the next chapter you’ll make requests and then deal with the responses.

128 | Requests Chapter 7

Handling Cookies

Just make a simple cake. And this time, if someone’s going to jump out of it, make sure to put them in after you cook it.

The web was originally designed to be stateless; each call would be indepen- dent for that. But, as with most things, the actual use outgrew the design and people started to create workarounds.

Cookies are small (or sometimes not so small) bits of information that a server asks a user-agent to remember. When the user-agent makes another request it includes that information in the request; this allows the server to tie that request to previous ones. The web virtually breaks without them now.

Mojo handles all of the details of accepting, storing, and selecting cookies. You don’t have to think about any of this. However, there are some tasks where you might want to handle that yourself. If you don’t have one of those situations, you might skip over this chapter for now. Come back to it when you need to know more.

| 129 Cookie Basics

The server sends a cookie with the Set-Cookie response header. The user- agent processes those, stores them, and includes the appropriate cookies in future requests in the Cookie request header.

The cookie includes several attributes, as defined in RFC 6265. The name is the only required attribute (a missing value is the same as the empty string):

• a name (required)

• a value

• a domain

• a path

• an expiry time

• a secure flag

• an HttpOnly flag

Mojo::Cookie creates these for you and stringifies to the value that would show up in the Set-Cookie response header:

use Mojo::Cookie::Response;

my $cookie = Mojo::Cookie::Response->new( # required items name => 'robot', # optional items value => 'Bender', # empty string by default domain => 'localhost', path => '/', httponly => 1, secure => 1, expires => time + 3600, );

130 | Handling Cookies say $cookie;

The string version is the value for the Set-Cookie response header:

robot=Bender; expires=Thu, 16 Aug 2018 16:23:21 GMT; domain=localhost; path=/; secure; HttpOnly

But, this is fluent programming so you may prefer a method chain:

my $cookie = Mojo::Cookie::Response->new ->name( 'robot' ) ->value( 'Bender' ) ->domain( 'localhost' ) ->path( '/' ) ->expires( time + 3600 ) ->secure(1) ->httponly(1) ;

The name and value are the interesting parts of the cookie; you’ll add these to your request. The value may be the remembered information itself or a key that the server can use to look up whatever it wanted to remember. The rest of the information allows you to select appropriate cookies.

A Domain specifies that you can include that cookie to any hosts under that domain. A response from bender.example.com might be able to set cookies for .example.com (but not .com!). Without a Domain, you should only return that cookie’s name-value pair to exactly the same hostname that sent the cookie to you.

Without a stated expiration, a cookie typically lasts as long as your browser session. A server may specify when a cookie expires, creating a persistent cookie that you may use between sessions (different runs of your program).

The expiry time comes in two flavors. The server can specify an absolute time or a maximum age (i.e. offset from the current time). You shouldn’t send that cookie in your request after that time has passed. If the response

Cookie Basics | 131 header doesn’t not specify an expiration then the cookie, it should expire at the end of the “session” (when your user-agent finishes whatever its doing).

The two flags Secure and HttpOnly limit the use of the cookie. If Secure was specified in Set-Cookie, you can add it only to requests that you’ll send over a secure protocol (such as HTTPS). If the HttpOnly was specified, you can send it only in requests using HTTP.

Again, Mojo handles all of this in most cases.

The Cookie Jar

For the most part, Mojo handles cookies for you without any configuration or attention on your part. When you make a request and the response includes a Set-Cookie header, Mojo’s cookie jar will store that and handle the details for expiry, paths, and so on.

Still, go through the process to see how it works. Create a cookie jar, create some cookies to go into it, and use add to put those cookies in the jar:

use Mojo::Cookie::Response; use Mojo::UserAgent::CookieJar;

my $jar = Mojo::UserAgent::CookieJar->new;

my @cookies = ( Mojo::Cookie::Response->new ->name( 'ruler' ) ->value( 'Nibbler' ) ->domain( 'six.vergon.org' ) ->path( '/' ), Mojo::Cookie::Response->new ->name( 'doctor' ) ->value( 'Zoidberg' ) ->domain( 'ten.decapod.org' ) ->path( '/' ), );

132 | Handling Cookies $jar->add( @cookies );

The all method iterates through each cookie in the jar:

say $_->name for $jar->all->@*;

The output shows the names of all of the cookies:

doctor robot

To extract the cookies appropriate for a request, give find the URL that you want to fetch. From that Mojo can figure out the protocol, secure setting, domain, and path:

say $_->name for $jar->find( Mojo::URL->new( 'http://six.vergon.org' ) )->@*;

Now you only see the cookie name for that domain:

robot

Modify that example to add an expiry value to the cookie for six.vergon.org. Specify that it should expire in ten seconds. After you add the cookies, pause for 11 seconds (after which the cookie should expire):

use Mojo::Cookie::Response; use Mojo::UserAgent::CookieJar; use Mojo::URL;

my $jar = Mojo::UserAgent::CookieJar->new;

my @cookies = ( Mojo::Cookie::Response->new ->name( 'ruler' ) ->value( 'Nibbler' )

The Cookie Jar | 133 ->domain( 'www.vergon6.org' ) ->path( '/' ) ->expires( time + 10 ), Mojo::Cookie::Response->new ->name( 'doctor' ) ->value( 'Zoidberg' ) ->domain( 'www.decapod10.org' ) ->path( '/' ), );

$jar->add( @cookies );

say "==All cookies, right away"; say $_->name for $jar->all->@*;

sleep 11;

say "==All cookies, after 11 seconds"; say $_->name for $jar->all->@*;

say "==Cookies for vergon6"; say $_->name for $jar->find( Mojo::URL->new( 'http://six.vergon.org' ) )->@*;

Notice that the expired cookie is still in the cookie jar, but find does not re- turn it:

==All cookies, right away robot doctor ==All cookies, after 11 seconds robot doctor ==Cookies for vergon

134 | Handling Cookies Replacing a cookie

If you add a cookie with the same name, path, and domain as a previously stored cookie, the new cookie completely replaces it:

use Mojo::Cookie::Response; use Mojo::UserAgent::CookieJar; use Mojo::Util qw(dumper); use Mojo::URL;

my $jar = Mojo::UserAgent::CookieJar->new;

my $old = Mojo::Cookie::Response->new ->name( 'ruler' ) ->value( 'Nibbler' ) ->domain( 'six.vergon.org' ) ->path( '/' );

$jar->add( $old ); say join ": ", $_->name, $_->value for $jar->all->@*;

my $new = Mojo::Cookie::Response->new ->name( 'ruler' ) ->value( '' ) ->domain( 'six.vergon.org' ) ->path( '/' );

$jar->add( $new ); say join ": ", $_->name, $_->value for $jar->all->@*;

The output shows that the ruler cookie changed its value:

ruler: Nibbler ruler:

The Cookie Jar | 135 This is important for expiry times too; you can update their expiration times. That new time might even be in the past, in which case the cookie immedi- ately expires.

Ignoring a cookie

Mojo::UserAgent::CookieJar has a mechanism to inspect cookies before it stores them. Give ignore a code reference that returns true if you want to ignore the cookie in a transaction:

my $ua = Mojo::UserAgent->new; my $jar = $ua->cookie_jar;

$jar->ignore( sub { ... } );

That code reference runs when the user-agent calls Mojo::UserAgent::CookieJar’s collect on a transaction. The collect uses the code reference from ignore to test each cookie. Here’s a program that fetches the URLs from standard input:

use IO::Interactive qw(interactive); use Mojo::UserAgent; use Mojo::Util qw(dumper);

my $ua = Mojo::UserAgent->new; my $jar = $ua->cookie_jar;

$jar->ignore( \&inspect_cookies );

LOOP: { print { interactive } "URL> "; $_ = ; last unless $_; chomp;

$ua->get( $_ ); say "\tThere are ", scalar $jar->all->@*, ' cookies now';

136 | Handling Cookies redo; }

sub inspect_cookies ( $cookie ) { say "\tCookie: $cookie"; return; }

Most of the output comes from the code reference in ignore. It shows the cookies. After each URL there’s a count of the number of cookies in the jar. Since nothing returns true, no cookie is ignored:

URL> https://www.google.com Cookie: 1P_JAR=2018-08-21-21; expires=... Cookie: NID=137=AC7LvF7HV4bDRpYo-irEZf... There are 2 cookies now URL> https://www.vulture.com Cookie: first-nymcid=578af01... There are 3 cookies now URL> https://www.facebook.com Cookie: fr=07fAdRp4gbCpQlrtc..BbfIi8.R... Cookie: sb=vIh8W7nYrC9ffZ9ZH-bhybdw; e... There are 5 cookies now

Change the subroutine to ignore cookies from facebook.com (and skip the output that convinces you it’s working):

sub inspect_cookies ( $cookie ) { return 1 if $cookie->domain =~ /\.facebook\.com\z/; return; }

Now the same sequence doesn’t add the cookies from that domain:

URL> https://www.google.com There are 2 cookies now URL> https://www.vulture.com There are 3 cookies now URL> https://www.facebook.com

The Cookie Jar | 137 There are 3 cookies now

This only works through collect and a transaction (something the user- agent does for you). If you put the cookie in the jar with add directly, it’s going in there. The add isn’t going to filter the cookies at all. You can do that yourself though:

use Mojo::Cookie::Response; use Mojo::UserAgent::CookieJar;

my $jar = Mojo::UserAgent::CookieJar->new;

my $old = Mojo::Cookie::Response->new ->name( 'ruler' ) ->value( 'Nibbler' ) ->domain( 'six.vergon.org' ) ->path( '/' );

$jar->add( $old ) unless exclude_cookie( $old );

sub exclude_cookie ( $cookie ) { return 1 if $cookie->domain =~ /vergon\.org/; return; }

Ignoring a cookie based on its length

Some servers may try to put too much information in their cookie value. Some of the ones you’ll get from the examples I’ve shown already flow over several lines. The cookie jar automatically filters out very long values through its max_cookie_size setting of 4096. The jar doesn’t store the cookie if the length (defined by Perl’s length function) is larger than that value.

You can set a different value:

$jar->max_cookie_size( 137 );

138 | Handling Cookies Any cookie with a value exceeding that length is silently ignored. If you set it to a non-numeric value the length is likely 0 (with no warning):

$jar->max_cookie_size( 'Bender' );

Expiring a cookie immediately

Your jar may have a cookie that expires far into the future (whether far means ten seconds or ten years). You already know that you can replace a cookie with the same name and path. To cause that cookie to expire sooner, add another cookie with a different time:

use Mojo::Cookie::Response; use Mojo::UserAgent::CookieJar; use Mojo::URL;

my $jar = Mojo::UserAgent::CookieJar->new;

my $old = Mojo::Cookie::Response->new ->name( 'ruler' ) ->value( 'Nibbler' ) ->domain( 'six.vergon.org' ) ->expires( time + 86400 ) ->path( '/' );

say "Cookies before expiration:"; $jar->add( $old ); say "\t", $_->name for $jar->find( Mojo::URL->new( 'http://six.vergon.org' ) )->@*;

my $new = Mojo::Cookie::Response->new ->name( 'ruler' ) ->value( '' ) ->domain( 'six.vergon.org' )

The Cookie Jar | 139 ->expires( '-1' ) ->path( '/' );

say "Cookies after expiration:"; $jar->add( $new ); say "\t", $_->name for $jar->find( Mojo::URL->new( 'http://six.vergon.org' ) )->@*;

The output shows that find returns the unexpired cookie at first. After you change the expiration time to a negative value, find returns no cookies:

Cookies before expiration: ruler Cookies after expiration:

Perhaps you want to go through all the cookies to set a particular expiration time:

foreach $cookie ( $jar->all->@* ) { $_->expires( time + 60 ); }

Or expire them all by setting a time in the past:

foreach $cookie ( $jar->all->@* ) { $_->expires( -1 ); }

Or even extend all of them (although this might conflict with what the server expects or saves):

foreach $cookie ( $jar->all->@* ) { $_->expires( $_->expires + 86400 ); }

That expiration time is merely a number that Mojo compares to the current epoch time. The -1 is less than any epoch time so it always expires. Many

140 | Handling Cookies strings, including date strings, are not what you expect:

my $new = Mojo::Cookie::Response->new ->name( 'ruler' ) ->value( '' ) ->domain( 'six.vergon.org' ) ->expires( '13 Apr 3000 00:00::00 GMT' ) ->path( '/' );

That number is actually 13 when converted to a number. The method isn’t cagey enough to convert that to an epoch time. You can do that with Mojo ::Date though:

my $epoch = Mojo::Date ->new( '13 Apr 3000 00:00::00 GMT' ) ->epoch;

Cookie Jars in the User-Agent

Mojo::UserAgent provides a default cookie jar that it constructs for each Mojo ::UserAgent object (and disappears when that object does). Most of the mod- ules you’ve seen in this chapter are already loaded by Mojo::UserAgent:

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new; my $jar = $ua->cookie_jar;

If you don’t like the default cookie jar you can make your own:

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new; my $jar = Mojo::CookieJar->new; $ua->cookie_jar($jar);

Cookie Jars in the User-Agent | 141 Share that cookie jar with multiple user-agents. Those user-agents might be different in other aspects but they’ll have a common cookie jar:

use Mojo::UserAgent;

my $ua1 = Mojo::UserAgent->new; my $ua2 = Mojo::UserAgent->new;

my $jar = Mojo::CookieJar->new;

$ua1->cookie_jar($jar); $ua2->cookie_jar($jar);

Mojo doesn’t offer a way to store the cookie jar between runs of your pro- gram. You can do this yourself though. Get the cookie object, serialize it in some fashion, and store it:

use Mojo::UserAgent; use Mojo::Util qw(dumper);

use Sereal qw(encode_sereal decode_sereal); my $binary_thingy;

# Make the first one, and let it go away { say "Making the first user-agent"; my $ua = Mojo::UserAgent->new; my $jar = $ua->cookie_jar; my $cookie = Mojo::Cookie::Response->new ->name( 'robot' ) ->value( 'Bender' ) ->domain( 'nine.chapek.net' ) ->path( '/' ) ; $jar->add( $cookie ); $binary_thingy = encode_sereal( $jar ); say dumper( $cookie ); }

# make another one

142 | Handling Cookies { say "Making a new user-agent"; my $ua = Mojo::UserAgent->new; my $jar = decode_sereal( $binary_thingy ); $ua->cookie_jar( $jar ); say dumper( $jar ); }

The Sereal module avoids several of the security problems in Data::Dumper, Storable, and other persistence mechanisms.

You might also look at Mojo::UserAgent::CookieJar::Role::Persistent which uses roles to save and load cookies from a Netscape-compatible cookies file.

Adding Cookies to Requests

Mojo takes care of most of the cookie details if you are using a cookie jar. You don’t handle on your own anything you’ve seen so far. It processes cookies from the responses, adds them to the jar, and selects the appropriate ones for each request.

Beyond the automagical cookie processing, you might want to add your own. Add those cookies to the cookie jar but then those cookies are available to all requests.

For a single request, add a cookie to a single request by setting the Cookie re- quest header yourself. Separate multiple name-value pairs with a semicolon (so, don’t use append):

my $tx = $ua->build_tx( GET => 'http://www.example.com' ); $tx->req->headers->cookie( "Robot=Bender; Object=beam" ); say $tx->req->to_string;

If you have a list of cookies, perhaps from processing them with Mojo::Cookie ::Request, you can add the multiple cookies with from_hash. This merely builds the request so you can see it:

Adding Cookies to Requests | 143 use Mojo::Cookie::Request; my $cookies = Mojo::Cookie::Request ->parse( 'Robot=Bender; Object=beam' ); my $tx = $ua->build_tx( GET => 'http://www.example.com' ); $tx->req->headers->from_hash( { Cookie => $cookies } ); say $tx->req->to_string;

The hash after the URL in an immediate request can add to the request head- ers, including the Cookie header:

my $tx = $ua->get( 'http://www.example.com', { Cookie => "Robot=Bender; Object=beam" } ); say $tx->req->to_string;

Here’s a short program that uses a special Twitter “guest mode” to show a limited list of the tweets for the users that you specify in the cookie (I first read this in Twitter’s Secret “Guest Mode” from Terence Eden). Most of the ugliness of this program is the extraction of the information from the HTML:

use Mojo::Base -strict, -signatures; use Mojo::UserAgent; use Mojo::Util qw(dumper trim);

my @users = qw( briandfoy_perl kraih joelaberger preaction );

my $ua = Mojo::UserAgent->new; my $tx = $ua->get( 'https://mobile.twitter.com/i/guest', { Cookie => 'guest_timeline_users=' . join '|', @users }, );

if( $tx->result->is_success ) {

144 | Handling Cookies say $tx->req->to_string; say $tx->result->dom ->find( 'div.timeline table.tweet' ) ->map( \&extract_tweet ) ->map( sub { join "| ", $_->@{ qw(username tweet) } }) ->join( "\n" ); } else { say "Error!"; }

sub extract_tweet ( $dom ) { my %hash; $hash{'display_name'} = trim( $dom ->at( 'td.user-info strong.fullname' ) ->text ); $hash{'username'} = trim( $dom ->at( 'td.user-info div.username' ) ->text ); $hash{'tweet'} = trim( $dom ->at( 'div.tweet-text div.dir-ltr' ) ->text );

\%hash; }

The output is a bit odd because the links and images have disappeared be- cause I didn’t want to do the extra work to deal with them—that would be a much longer and more complicated program that obscures the use of the added cookie:

briandfoy_perl| Process XML with Mojo::DOM briandfoy_perl| I think that will need to be a joke in a new Learning Perl. briandfoy_perl| I wrote a little Mojo::UserAgent program to download all the Major League Baseball home

Adding Cookies to Requests | 145 game schedules to check for games as I travel. briandfoy_perl| Some thoughts on "How to get your Pull Request accepted" kraih| FYI: There was no follow-up to this. So IRC notifications on are most likely dead. joelaberger| At least no one asked Rep Omar to leave a restaurant! joelaberger| So while the rest of the field keeps talking about whatever, keeps showing why she should running the country! joelaberger| Wait, seriously?! preaction| Using Service Workers to build an undetectable botnet:

And, for what it’s worth, try the cookie inspection program you saw earlier in the chapter. This twitter guest service tries to set eight cookies!

Summary

Cookies are bits of information that servers send for user-agents to store and send back on future requests. Mojo::UserAgent handles most of the details for you, but you can filter the incoming cookies and add other cookies on your own if you need something fancier.

146 | Handling Cookies Chapter 8

Handling Responses

It’s like a party in my mouth and everybody’s throwing up!

In the previous chapter you constructed requests but didn’t do anything with them. Now it’s time to talk to the server and understand what it tells you. Sometimes you have the answer that you want; other times you have to take additional actions.

By writing your own client you are telling the server that you’re taking re- sponsibility for your side of the conversation. Although I won’t go into those mechanics in depth, you should understand what is actually happening un- der the hood.

Did It Happen?

When you make a request, most of the time a server sends a response and that response has an HTTP status code (you saw these in chapter 1). The get and other verb methods return a transaction object from which you can

| 147 extract that status:

use Mojo::UserAgent;

my $tx = Mojo::UserAgent->new->get( shift ); say $tx->result->code;

Here are a few runs:

% perl status.pl http://www.perl.org 200 % perl status.pl http://www.perl.com 302 % perl status.pl http://www.example.com 200 % perl status.pl http://www.example.com/not-there.html 404 % perl status.pl http://localhost/ Connection refused ...

Notice that last request never received a response because there was no server at localhost when I ran this.

Sometimes the type of the code is enough. There are convenience methods that translate the number of the code into the general message for that type:

use Mojo::UserAgent;

my $tx = Mojo::UserAgent->new->get( shift );

my $res = $tx->result; say do { if( $res->is_success ) { # 2xx series 'It worked!' } elsif( $res->is_redirect ) { # 3xx series 'Need to look over there: ' . $res->headers->location }

148 | Handling Responses elsif( $res->is_client_error ) { # 4xx series 'Problem with the request' } elsif( $res->is_server_error ) { # 5xx series 'Problem with the response (server fault)' } };

What it the server didn’t send a response? What if there was no server? After you check if it’s a client or server error (a complete HTTP transaction) you can check if it’s any other sort of error:

say do { ... else { ## couldn't even 'Some other error!'; } };

A few runs show the different types of responses for different sites. You’ll see this program later too:

% perl response-types.pl http://www.perl.org It worked! % perl response-types.pl http://www.perl.com Need to look over there: https://www.perl.com/ % perl response-types.pl http://www.example.com It worked! % perl response-types.pl http://www.example.com/not-there.html Problem with the request % perl response-types.pl http://localhost/ Some other error!

Perhaps you couldn’t connect to the server because it’s down or its name doesn’t resolve. You might not be connected to a network. There’s still a “response” in the transaction but it doesn’t have all the usual things filled in (such as the HTTP status code).

Did It Happen? | 149 The result method die s if it couldn’t connect to the server. Use a simple eval to trap that:

my $result = eval { $ua->get( shift )->result }; unless( $result ) { my $error = $@; ...; # do something about it }

If the code did not die, the eval returns the last evaluated expression (the response object). Otherwise it returns undef and puts the error message in $@. Use your favorite exception handling technique here. There are modules that provide try-catch semantics but I don’t take sides on those. Try::Tiny seems to be popular.

Inspecting Headers

A response has headers just like a request does. Use what you’ve already seen in the previous chapter to inspect those (although you probably don’t want to change anything).

There are the same shortcuts for pre-defined headers:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my $tx = $ua->get( shift );

if( $tx->result->is_success ) { say "Content-type: ", $tx->result->headers->content_type; } else { say "Error: ", $tx->result->code; }

150 | Handling Responses Running this shows the content type for that resource. Notice that it’s not merely a MIME type; there’s some encoding information in there too:

% perl response-headers.pl https://www.perl.org Content-type: text/html; charset=utf-8

Inspect any particular header by specifying it as an argument to header in- stead of the shortcut method:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my( $url, $header ) = @ARGV; $header //= 'Content-Type';

my $tx = $ua->get( shift );

if( $tx->result->is_success ) { my $value = $tx->result->headers->header( $header );

say "$header: ", $tx->result->headers->header( $header ) // '(not present)' ; } else { say "Error: ", $tx->result->code; }

Some servers set a cookie and some don’t:

% perl response-headers.pl https://www.perl.org set-cookie set-cookie: (not present) % perl response-headers.pl⏎ https://www.perl.org content-type content-type: text/html; charset=utf-8

Headers are usually meta-data about the resource, but sometimes the head-

Inspecting Headers | 151 ers have control information for your user-agent. The GitHub API, for ex- ample, tells you your rate limit information, how to fetch the next page of results, and many other things:

% perl response-headers.pl⏎ https://api.github.com/users/briandfoy/repos⏎ X-RateLimit-Limit X-RateLimit-Limit: 60 % perl response-headers.pl⏎ https://api.github.com/users/briandfoy/repos Link Link: ;⏎ rel="next",⏎ ;⏎ rel="last"

Redirection

Sometimes the resource you want isn’t where you think it is and the server might be able to tell you to look somewhere else. This could be a temporary or permanent redirection and could be for the same site or to a different one. For most situations, Mojo is going to handle this for you once you let it.

Here’s a likely case. The request uses HTTP but the server wants HTTPS. The response asks you to request a different resource that uses HTTPS:

% perl response-types.pl http://www.perl.com Need to look over there: https://www.perl.com/

Interactive clients handle this so you never know this was an issue. They see the instructions and make another request with the new location. You can do the same with Mojo by setting max_redirects to some non-zero number:

$ua->max_redirects( 3 );

Choosing a sensible number depends on your task, but 3 seems an appro-

152 | Handling Responses priate number. Set it much higher and you’re getting into situations where the transaction is probably broken, such as an infinite redirection between two resources pointing at each other (or weird clickbait sites that send you through several levels of affiliate links).

But you might not want this to happen automatically. For a link checker to go through HTML documents, you may want to know all the addresses that redirect so you can fix them with their correct addresses. If Mojo redirected automatically you wouldn’t know the original address was not the one you wanted:

$ua->max_redirects( 0 );

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new->max_redirects(0); while( <> ) { chomp; my $tx = $ua->get( $_ ); my $code = $tx->result->code; printf "%3d %s\n", $code, $_; }

Some of these URLs redirect to another (I’ve left out the input):

% perl check_links.pl < links.txt 200 http://www.perl.org 302 http://www.perl.com 301 http://search.cpan.org/ 301 http://metacpan.org/ 404 https://mojolicious.org/not_there.html

The new URL is in the Location response header. This program outputs the value of that header (the new address) when it’s a redirection response:

use Mojo::Base -strict; use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new->max_redirects(0);

Redirection | 153 while( <> ) { chomp; my $tx = $ua->get( $_ ); my $code = $tx->result->code; printf "%3d %s\n", $code, $_; say "\t" . $tx->result->headers->location if( 300 <= $code and $code <= 399 ); }

Now the output shows the new URL for redirection:

% perl check_links.pl < links.txt 200 http://www.perl.org 302 http://www.perl.com https://www.perl.com/ 301 http://search.cpan.org/ https://metacpan.org/ 301 http://metacpan.org/ https://metacpan.org/ 404 https://mojolicious.org/not_there.html

This is a simple link checker that’s probably inadequate except for the small- est uses since each request has to complete before the program can move on. You’ll want to check links concurrently with the techniques you see in chap- ter 9. For a better example (and one that wouldn’t fit in this book), see the program I wrote to check the links in the articles for Perl.com.

I wrote a small tool, 3xx to show a chain of redirections. There’s a small Mojo server to provide the situations, and examples in various languages.

Permanent redirection

There are two types of redirections. The first one, status 301, tells the client that the resource has moved and that the next time it wants to fetch the resource it should use the new address. This status allows you to change a POST request into a GET. That’s not something I cover in this book, but in

154 | Handling Responses general that was not a good idea and exists as a historical oddity.

The 308 status is also a permanent redirection but doesn’t allow you to change the POST into GET. You shouldn’t worry about any of this because Mojo will handle it for you.

Both of these are concerned with the number of times that a request should be replayed. A GET should be a request that you can perform as often as you like without it changing things. A POST is something designed to change the state of the server, which you likely don’t want to perform repeatedly. For example, you should be able to check your bank balance as often as you like without changing it, but you don’t want to continually reload a request that transfers money.

Temporary redirection

The second type of redirection, statuses 302 or 307, are temporary. This means that you don’t need to update any stored addresses to the new one for all future requests. The redirection might be related only to the particu- lar request you made, but not all requests. A 302 status allows you to change a POST to a GET but a 307 doesn’t.

Dealing with Content

The resource probably has some content—that’s kinda the point of the web. You can get the raw or processed content. Mojo directly handles some con- tent types, such as HTML, XML, and JSON. Others you may want to store in a string or save to a file.

Mojo represents content through various abstractions. You can get that from the result. The “content” might be a single thing (Mojo::Content::Single) or multiple things (Mojo::Content::Multipart). The overall content might contain a mix of different sorts of things:

Dealing with Content | 155 my $content = $ua->get( $url )->result->content;

Inside the content object are the “assets” which represent the storage for those thingys. You might store something in a file or in memory (or some- thing else that you define). You store the data through one of these assets:

my $asset = $ua->get( $url )->result->content->asset;

The body method reads through those object layers and gives you that data:

my $body = $ua->get( $url )->result->body;

The dom method interprets the data as something that’s structured and that you can extract parts from just as you did in chapter 5:

my $dom = $ua->get( $url )->result->dom;

Saving to a file

Sometimes you want to save the data to a file. When you download aPDF file, you’re likely to want to save it.The save_to method can do that:

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new;

my $url = Mojo::URL->new( 'https://hop.perl.plover.com/book/pdf/HigherOrderPerl.pdf' ); my $filename = $url->path->parts->[-1]; my $tx = $ua->get( $url ) ->result ->save_to( $filename );

In older code, you may see move_to, which is the same thing with a couple of

156 | Handling Responses extra steps:

my $tx = $ua->get( $url ) ->result ->content ->asset ->move_to( $filename );

Suggested filenames

In the previous section you took the filename from the URL. You have to chose a name that you liked. Sometimes the server can give you a hint other than the URL path (which might be a path to a program) by using the Content- Disposition header. A filename portion can suggest the name:

Content-Disposition: Attachment;⏎ filename=rainbow_dinosaur.pdf

It’s a little bit more complicated. That filename is limited to Latin-1. A filename* (with the star) can declare a different encoding:

Content-Disposition: attachment;⏎ filename*= UTF-8''%F0%9F%8C%88_%F0%9F%A6%96

Or you can do both at the same time:

Content-Disposition: attachment;⏎ filename="rainbow_dinosaur";⏎ filename*=UTF-8''%F0%9F%8C%88_%F0%9F%A6%96

With more than one suggestion and possible encodings, you have a little work to do. Mojo::Util has utilities to handle most of it though:

use open qw(:std :utf8); use Mojo::Util qw(url_unescape decode);

Dealing with Content | 157 # pretend there is a response my $disposition = q/Content-Disposition: attachment;⏎ filename="rainbow_dinosaur";⏎ filename*=UTF-8''%F0%9F%8C%88_%F0%9F%A6%96/;

my @filenames = map { ! /\AUTF-8''(.+)/ ? $_ : decode( 'UTF-8', url_unescape( $1 ) ) } $disposition =~ m/;\s*filename\*?="?(.+?)"?(?=;|\z)/g;

say "Raw:\n\t", join "\n\t", @filenames;

Now it’s up to you to pick which filename that you want, probably choosing the one least offensive to the local filesystem:

Raw: rainbow_dinosaur _

Caching and Conditional Requests

If you already have the resource, you don’t want to force a lot of work to get what hasn’t changed. This is especially true when the resource is large. There are various ways that you can handle this by caching the resource and asking the server if it’s changed.

Not every server provides the information to handle these sorts of things but you should be willing to handle it if they do. This is especially important if you intend to make many (hundreds, thousands, or more!) requests. Be nice to your bandwidth and be nice to those servers!

In this book, you’re interested in the user-agent details and not production code with all the bells and whistles. You’ll see some naïve and simple caching

158 | Handling Responses mechanisms, but you should have something meatier in real code. Any par- ticular mechanism makes assumptions about your situation so I’m not going to choose at all. Don’t power a high-traffic production website with what I do here.

By date

When you fetch a page you can save its contents and re-use them if they haven’t changed since the last time you downloaded it. One way to do that is to look at the modification time of what you’ve already saved. Send that same time in a If-Modified-Since header, making this a conditional request.

In this example, you want to download Mark Jason Dominus’s book Higher Order Perl, which I consider one of the top three books ever written about Perl. However, if you’ve already downloaded it, you don’t want to increase his bandwidth costs by completely downloading it again:

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new;

my $url = Mojo::URL->new( 'https://hop.perl.plover.com/book/pdf/HigherOrderPerl.pdf' );

my $tx = $ua->get( $url => { 'If-Modified-Since' => get_mtime($url) } );

$tx->result ->content ->asset ->move_to( get_filename( $url ) );

sub get_filename ( $url ) { $url->path->parts->[-1]; }

Caching and Conditional Requests | 159 sub get_mtime ( $url ) { my $filename = get_filename( $url ); my $time = -e $filename ? ( stat( $filename ) )[9] : time; Mojo::Date->new( $time ); }

The output shows that the request sent the If-Modified-Since header and that the response responded with status 304 denoting that the resource has not been modified:

GET /book/pdf/HigherOrderPerl.pdf HTTP/1.1 User-Agent: Mojolicious (Perl) Accept-Encoding: gzip Content-Length: 0 If-Modified-Since: Sat, 01 Sep 2018 22:16:26 GMT Host: hop.perl.plover.com

------HTTP/1.1 304 Not Modified ETag: "1f0ed0-48b4a2d5182c0" Date: Sat, 01 Sep 2018 22:17:31 GMT Server: Apache/2.4.18 (Ubuntu)

The response does not contain the actual resource and you’ve saved yourself a 13 MB transfer! But now, having mentioned that, if you already have the book, don’t hammer Mark Jason’s server to test this program. Download his book once (or try this program on something else).

There’s another interesting thing in that response that helps with the caching, the “ETag”.

160 | Handling Responses The ETag

A ETag (or entity tag) is a short string that identifies a resource and its ver- sion. It doesn’t have to be a digest (such as an CRC, MD5, or SHA1) but often is one of those things. Here’s a program that makes a HEAD request and looks at that response header:

use Mojo::URL; use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new;

my $base = Mojo::URL->new( 'https://api.bitbucket.org/' ); my $user = $ARGV[0] // 'briandfoy'; my $url = $base->clone->path( "/2.0/repositories/$user" );

my $tx = $ua->head( $url ); say "ETag is ", extract_etag( $tx );

sub extract_etag ( $tx ) { my $etag = $tx->result->headers->header( 'etag' ); $etag =~ s/\A"|"\z//gr; }

Run this and you get a string that the server uses to mark the version of the resource. You don’t need to use a sequential identifier; it can be anything that uniquely identifies that version. Bitbucket uses what looks like a digest:

% perl bitbucket-repo-etag.pl ETag is gz[3e65ef659c16089a082196949c8c8e2f]

Expand that program to add a cache that uses the ETag value as the cache filename (again, naïve and not intended for production). If that filename doesn’t exist, fetch the resource and save it. The next time you make the same request you can pull it from the cache.

This example makes a HEAD request instead of a conditional request. That response should include the ETag header so you can decide what to do next.

Caching and Conditional Requests | 161 That’s not a requirement for this sort of action, but I show it because it’s another trick you can use:

use Mojo::File; use Mojo::UserAgent; use Mojo::Util qw(dumper); use Mojo::JSON qw(decode_json);

my $ua = Mojo::UserAgent->new;

my $base = Mojo::URL->new( 'https://api.bitbucket.org/' );

my $user = $ARGV[0] // 'briandfoy'; my $url = $base->clone->path( "/2.0/repositories/$user" );

my $tx = $ua->head( $url );

my $etag = extract_etag( $tx );

my $path = make_filename( $etag ); unless( -e $path ) { my $tx = $ua->get( $url ); my $etag = extract_etag( $tx ); $tx->result->save_to( $path ); }

my $json = decode_json( $path->slurp ); say dumper( $json );

sub make_filename ( $etag ) { state $dir = Mojo::File->new('etag-cache')->make_path; $dir->child( $etag ); }

sub extract_etag ( $tx ) { my $etag = $tx->result->headers->header( 'etag' ); say "ETag is $etag"; $etag =~ s/\A"|"\z//gr; }

162 | Handling Responses It gets better—you can tell the server which versions you have. This requires that your caching connects the request URL to the ETag. A simple solution is a directory named for a digest of the URL (so no pesky & or / characters) un- der which you store files named for their ETag value. Perhaps you’ve stored many of them. Instead of a HEAD request, this example sends several values in the If-None-Match request header (a conditional response):

use Mojo::File; use Mojo::UserAgent; use Mojo::Util qw(dumper md5_sum); use Mojo::JSON qw(decode_json);

my $ua = Mojo::UserAgent->new;

my $base = Mojo::URL->new( 'https://api.bitbucket.org/' );

my $user = $ARGV[0] // 'briandfoy'; my $url = $base->clone->path( "/2.0/repositories/$user" );

my $etags = get_cached_etags( $url );

my $tx = $ua->get( $url => { 'If-None-Match' => $etags->to_array } );

# sometimes the server doesn't send the ETag back my $etag = extract_etag( $tx ) // $etags->[-1]; say "Etag is $etag"; my $path = make_filename( $url, $etag ); say "Path is $path";

if( $tx->result->is_success and $tx->result->code != 304 ) { $tx->result ->content->asset ->move_to( $path ); }

my $json = decode_json( $path->slurp ); say dumper( $json );

Caching and Conditional Requests | 163 sub get_cached_etags ( $url ) { my $files = cache_url_dir($url)->list_tree; say "Files are\n\t", $files->join( "\n\t" ); $files->map( sub { $_->basename } ); }

sub cache_base_dir () { state $dir = Mojo::File->new('etag-cache')->make_path; $dir; }

sub cache_url_dir ( $url ) { my $digest = md5_sum( "$url" ); my( $second, $first ) = $digest =~ /\A((.).)/; cache_base_dir() ->child( $first, $second, $digest ) ->make_path; }

sub make_filename ( $url, $etag ) { say "Given etag: $etag"; cache_url_dir( $url )->child( $etag ); }

sub extract_etag ( $tx ) { $tx->result->headers->header( 'etag' ) }

Weak ETags

Some resources might not look exactly the same but represent the same thing anyway. The stringification of a Perl hash is like that: the keys maybein different orders but that doesn’t matter to you because its the samehash.

The same significant content might be presented differently. Maybe thewrap- pers around the content show ever changing information (the current time,

164 | Handling Responses your browser info, etc) but you shouldn’t have to refetch the resource be- cause those parts aren’t stable. For instance, a that helpfully in- cludes the local weather next to the main content needn’t change its ETag because unrelated minor information changed.

The ETag value might represent the interesting, main part of the resource. In that case, the ETag value might start with W/ for “weak”:

ETag: W/"somevalue"

You can still cache the resource but if you did an octet-by-octet comparison (such as making a digest) you might not see the same overall content you saw previously.

Controlling your cache

There are various sorts of caches (another subject that deserves its own book). Your user-agent might store resources to use privately or to avoid transfer- ring data that it already has. If you create an intermediate server or proxy, you might cache resources that many user-agents may access. That’s more in web application territory than user-agent territory; I’ll leave that to you to contemplate. It’s a simple matter of programming,

A server can send one or more headers back that control how you cache items. That might tell you to not cache the value at all. RFC 7234 covers what you should do.

Fluent Content

Mojo::DOM can handle HTML, JSON, and XML so you can play with the con- tent directly from the response object. You’ve already seen the selectors in chapter 5. Now it’s time to put that together with responses.

Fluent Content | 165 Extract emojis

Here’s a program I created to list the new emojis in Unicode 10 by extracting them a page. Once I have the transaction object, I chain methods until I eventually connect the results with join before I output them:

use charnames qw();

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my $url = 'https://blog.emojipedia.org/whats-new-in-unicode-10/'; my $tx = $ua->get( $url );

die "That didn't work!\n" unless $tx->result->is_success;

say $tx->result ->dom ->find( 'ul:not( [class] ) li a' ) ->map( 'text' ) ->map( sub { my $c = substr $_, 0, 1; ord($c) > 0x1F000 ? [ $c, ord($c), charnames::viacode( ord($c) ) ] : () }) ->sort( sub { $a->[1] <=> $b->[1] } ) ->map( sub { sprintf '%s (U+%05X) %s', $_->@* }) ->join( "\n" );

Most of the work was figuring out the selector for find, which I figured out by simply inspecting the HTML. The start of the output looks like this:

(U+1F6F7) SLED

166 | Handling Responses (U+1F6F8) FLYING SAUCER (U+1F91F) I LOVE YOU HAND SIGN (U+1F928) FACE WITH ONE EYEBROW RAISED (U+1F929) GRINNING FACE WITH STAR EYES (U+1F92A) GRINNING FACE WITH ONE LARGE AND ONE SMALL EYE (U+1F92B) FACE WITH FINGER COVERING CLOSED LIPS

Extracting images

With a DOM and selectors, it’s easy to extract all the image URLs from HTML. Here’s a short program to pull out all of the images from the URL you specify on the command line:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

say $ua ->get( shift ) ->result ->dom ->find( 'img' ) ->map( attr => 'src' ) ->join( "\n" );

The results show several image paths. Some are URLs and some are relative paths:

% perl images.pl http://www.robotsgallery.com/ http://www.robotsgallery.com/imagenes/comercio/2/ROBOTSJPGE.png /imagenes/logolinkedin.png /imagenes/logoyoutube.gif http://www.robotsgallery.com/imagenes/articulos/IRB 6600_01.jpg

To resolve relative URLs, a browser typically uses the URL that it requested as the base. However, there might be a base tag in the HTML that specifies a different URL. Furthermore, there might have been some sort of redirec-

Fluent Content | 167 tion that changed the URL—that new URL will be in the Location response header. Here’s a program that looks for that base tag, then for Location header, and as a last resort, for the original URL. Mojo::URL takes care of the resolution:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my $url = Mojo::URL->new( shift );

my $tx = $ua->get( $url ); my $result = $tx->result;

my $base_collection = $result->dom->find( 'base' ); my $location = $result->headers->location;

my( $base_url ) = Mojo::URL->new( grep { defined } eval { $base_collection->map( attr => 'size' )->[-1] }, $result->headers->location, $url, );

say $result ->dom ->find( 'img' ) ->map( sub { my $u = Mojo::URL->new( $_->attr( 'src' ) ); $u->base( $base_url )->to_abs; }) ->join( "\n" );

Now those relative paths are resolved:

% perl images.pl http://www.robotsgallery.com/ http://www.robotsgallery.com/imagenes/comercio/2/ROBOTSJPGE.png http://www.robotsgallery.com/imagenes/logolinkedin.png http://www.robotsgallery.com/imagenes/logoyoutube.gif http://www.robotsgallery.com/imagenes/articulos/IRB%206600_01.jpg

168 | Handling Responses From here you can construct a program to download and store these.

Extracting links

Extracting links is almost the same task as extracting images. It’s a different tag but the other parts are the same. In this example, the HTML tag and attribute are different and there’s a uniq after the map:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my $url = Mojo::URL->new( shift );

my $tx = $ua->get( $url ); my $result = $tx->result;

my $base_collection = $result->dom->find( 'base' ); my $location = $result->headers->location;

my( $base_url ) = Mojo::URL->new( grep { defined } eval { $base_collection->map( attr => 'size' )->[-1] }, $result->headers->location, $url, );

say $result ->dom ->find( 'a' ) ->map( sub { my $u = Mojo::URL->new( $_->attr( 'href' ) ); $u->base( $base_url )->to_abs; }) ->uniq ->join( "\n" );

Here are the first few results:

Fluent Content | 169 % perl links.pl https://www.perl.com https://perl.org https://twitter.com/intent/follow?screen_name=perlfoundation https://www.perl.com/index.xml https://github.com/tpf/perldotcom https://www.perl.com/ https://www.perl.com/about

Progress Bars

As Mojo reads data from the response stream, it emits progress events. The content object knows how much it has read and you can compare that to the value in Content-Length to see how far along you are. You want to install a handler for the progress event to monitor the response but you have to wait for the request to finish.

Use a user-agent event handler that runs whenever any request starts. This user-agent handler installs a request handler with once. That sees and han- dles the request finish event. That’s when the response is set up and you can install a handler for progress. It’s a tricky double-layered installation that may take you some time to get comfortable with:

use Mojo::Base -strict, -signatures; use Mojo::UserAgent; use Term::ProgressBar;

# Autoflush to see the continuing progress bar instead # the end result STDOUT->autoflush(1);

$ua->on(prepare => sub ($ua, $tx) { my ($bar, $len); $tx->res->on(progress => sub ($res) { return unless $len = $res->headers->content_length; $bar ||= Term::ProgressBar->new({count => $len}); $bar->update($res->content->progress);

170 | Handling Responses }); $tx->res->on(progress => sub ($res) { $bar->update($len) if $len; }); }); $ua->get('https://mojolicious.org');

The output is more exciting in the terminal as the length of the bar actually progresses:

% perl progress.pl https://example.com/some.pdf 100% [======]

You can do anything you like in those handlers, though. If I were more mo- tivated, I’d attempt to estimate the time left in the download.

Reading the data yourself

Sometimes you may want to handle the content yourself. Maybe it’s not something that comes all at once or it’s larger than you care to put in memory at one time. Instead of using the default read event, you can supply your own.

You need to build the transaction then unsubscribe from the default read event from Mojo::Content. After that, supply your own.

When the response gets a chunk of the message body data, it emits the read event with the data. After that it’s up to you do do something with it. In this example, you extract the complete lines and buffer the possible incomplete line. For each chunk, you go process each complete line with Text::CSV_XS.

This is a simplistic example since it assumes that a “line” means and ignores special cases such as embedded newlines. If you decide to handle the chunks yourself, you take on the responsibility for everything. Each chunk gives you raw octets, so you’ll have to handle characters whose octets span chunks and lines that span chunks:

Reading the data yourself | 171 use v5.10; use strict; use warnings; use feature qw(signatures); no warnings qw(experimental::signatures);

use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new;

my $url = ...; my $tx = $ua->build_tx( GET => $url );

$tx->res->content ->unsubscribe('read') ->on( read => process_bytes_factory() );

$tx = $ua->start($tx);

sub process_bytes_factory { return sub ( $content, $bytes ) { state $csv = do { require Text::CSV_XS; Text::CSV_XS->new( { decode_utf8 => 1 } ); }; state $buffer = ''; state $line_no = 0;

$buffer .= $bytes; # fragile if the entire content does not end in a # newline (or whatever the line ending is) my $last_line_incomplete = $buffer !~ /\n\z/;

# will not work if the format allows embedded newlines my @lines = split /\n/, $buffer; $buffer = pop @lines if $last_line_incomplete;

foreach my $line ( @lines ) { my $status = $csv->parse($line); my @row = $csv->fields;

172 | Handling Responses # do something interesting say join ':', $line_no++, @row[2,4]; } }; }

Handling Meta Tags

Not all of the metadata is always in the response headers. People often want to specify something other than what the server allows but they can’t con- figure their (often shared) server. Instead, they add things to the resource’s HTML. Interactive browsers can pick up on this and adjust their behavior. It’s not too hard to do in Mojo either. meta tags can show up in the HTML head element. The http-equiv tag can note a replacement response header. This one specifies a charset that may be different from the one that the server uses as the default:

HTML 5 changes that to have it’s own META attribute:

When the response finishes, process the content so you can findany meta tags and add their information to the response headers (or handle them in some other way). This way you don’t need special cases later:

$tx->result->once( finish => \&process_meta_options );

sub process_meta_options ( $res ) { $res

Handling Meta Tags | 173 ->dom ->find( 'head meta[charset]' ) # HTML 5 ->map( sub { my $content_type = $res->headers->header( 'Content-type' ); return unless my $meta_charset = $_->{charset}; $content_type =~ s/;.*//; $res->headers->content_type( "$content_type; charset=$_->{charset}" ); } ); }

I’m on the fence with this technique though. I like eliminating special cases by normalizing data, but I also like to see the original data. I don’t mind it so much here because the response object isn’t the raw data either. Ask me on a different day and I’ll give you another answer. Do what makes sensefor you.

There’s a particular meta tag relevant to web spidering. Some pages tell you to not index them and don’t follow their links:

Most likely, these are short-lived pages that only exist during a session, or that need particular input to replicate. A different session or different inputs makes something different.

For instance, logging in to a site to see a user profile page might have aURL with no user information in it (say, my.page.com/profile). You wouldn’t want to tie that URL to the information it showed because it’s special to the logged in user. Another user goes to the same URL but sees different information. And, each of the links in that page may be similarly special.

In the early days of web spidering, unthoughtful bots would get trapped in CGI-driven sites that added various things to query strings. With an infinite number of combinations, the spider couldn’t break out of the site because it never exhausted the URLs left to visit even though it never visited the same one twice. As a side note, don’t count the query string as part of what you

174 | Handling Responses have seen unless you know it matters).

Restricting Response Sizes

A server can send you anything it likes but that doesn’t mean that you have to accept it. Instead of allowing your unmonitored, long-running user-agent to download multi-gigabyte files that you didn’t foresee, set some boundaries for your client.

There are three things that you can limit:

• the number of header lines

• the length of any header line

• the message size

These apply to objects of various depths in the transaction object; you can use environment variables to change them:

• MOJO_MAX_LINES

• MOJO_MAX_LINE_SIZE

• MOJO_MAX_MESSAGE_SIZE

Control the maximum message size from the user-agent object:

$ua->max_response_size( $max );

If you don’t want to limit the message size, set this to zero (the default is 2GB):

$ua->max_response_size( 0 ); # unlimited

Restricting Response Sizes | 175 Pagination

Some responses do not return all of the data, usually because the data sizes are very large. Instead, they paginate it. For 1,000 results, you get 1 through 50. To get 51 to 100, you need to make a new request.

The particular way that you handle many requests depends on the task. You could do them concurrently (see chapter 9) or serially. Consider how much you ask for at one time, both in the size of the response and the number of requests that you make.

The GitHub API has a nice way of doing this. Instead of telling you the num- ber of results and the offset that you have, it gives you the first and nextlinks in the response header. The https://api.github.com/repositories endpoint is public (so, unauthenticated rate limit of 60 requests per second) and returns the Link header:

Link: ;⏎ rel="next",⏎ ;⏎ rel="last"

Here’s a program that looks at that to decide the next request and add it to @queue:

use Mojo::UserAgent; use Mojo::Util qw(dumper);

my $ua = Mojo::UserAgent->new;

my @queue = qw(https://api.github.com/repositories);

while( @queue ) { my $tx = $ua->get( shift @queue ); my $link = $tx->result->headers->header('Link'); my %links = reverse $link =~ m/<(.+?)>; rel="(.+?)"/g; push @queue, $links{'next'} if defined $links{'next'};

176 | Handling Responses ... handle the current response... }

Other APIs have their own ways of noting the equivalent. You may need to calculate some offsets and adjust your query string, but that’s a small bitof work that I leave to you.

Summary

You’ve seen some simple response handling. The complexity comes from your decisions about the content rather than the mechanism to retrieve it. Now it’s a matter of you putting the pieces together for your particular situ- ation.

Summary | 177 178 | Handling Responses Chapter 9

Concurrency

With 1/6 the gravity, two can work and be lazy at the same time. It’s like being a voice actor!

Concurrency is about scalability. Do more at the same time by breaking your programming tasks into components that can overlap in time. While one component is waiting, another can get some work done. This chapter covers promises, one of the ways Mojo uses to start things that can proceed inde- pendently of each other.

I’m purposefully ignoring earlier ways that Mojo handled concurrency since promises are the future.

Some Misconceptions

People often confuse parallelism with concurrency. I don’t think that’s un- justified; other sorts of people would like you to think you have parallel pro- cessing when you really don’t. Some of that is simply buzzword compati- bility and part of it is outright puffery (the marketing word for “lies”). The

| 179 distinction is important but often bent to meet non-technical requirements.

• Parallel processing means two things are happening at the exact same time. This is something hardware does. • Concurrency (or asynchronous) means that two different components can work in overlapping time periods. This is something that does.

Parallel processing helps with tasks that want complete utilization of a pro- cessor or where sequencing is important. To do more you need more proces- sors.

If you have one processor (whatever that means nowadays), only one thing can happen at a time. That can’t be parallel. However, it can do part of one task then switch to another task for a bit. It can keep switching back and forth until the tasks complete.

Concurrency helps with tasks that can be idle. Waiting for a server to re- spond doesn’t need processor time, so that component can let something else work in the meantime. This works when you can divide the task into inde- pendent components.

Threads are another confusing topic, and fortunately I don’t need to write a book on those. They help with concurrency because they organize and manage (usually behind the scenes) the independent tasks. Still, they don’t allow a single processor to do any more work than it can do.

Hardware threads are physical features that allow a processor to switch be- tween things. Software threads fake this at a level above the hardware, but you can’t have more real threads than the hardware allows. It’s still all about the organization of the concurrent tasks.

Blocking

Once you get past that distinction, there’s one more concept to understand: blocking and non-blocking code. You can always undermine the capabilities

180 | Concurrency of your computer or subvert the framework that’s trying to make it easy for you.

• Blocking code must finish before it lets the program continue even if it’s not doing anything.

• Non-blocking code allows the program to do other things but the pro- gram will check the non-blocking components when it has time.

Non-blocking code isn’t just about getting more done quicker. It’s also about reducing memory use. Use less memory and you have more space to do other things. The longer your component sticks around, the longer it’s taking up resources. Aaron Turon’s Ph.D. thesis “Understanding and Expressing Scal- able Concurrency” is an interesting read (and beautifully typeset). The Mo- jolicious repository has Note about performance vs scalability. There are various (non-Perl) resources out there that can explain this is gory details.

In your user-agent code, you might block as you send the request, wait for the response, parse the response, and do whatever you wanted with that fluent code. Mojo solves most of this problem for you as long as you don’t work against it.

Mojo is a non-blocking framework, but it doesn’t magically turn everything you write into non-blocking code. If you call sleep 10 your code is going to block for 10 seconds. If you try to connect a socket to an unreachable host, your code will block until the socket times out. Most database drivers will block (although Mojo has some drivers that don’t). It’s easy to defeat Mojo’s nice non-blocking features, so take care when you craft your components.

What’s the real issue?

Your program may block for many reasons and each situation may need a different approach. There may be many resources that are scarce inyour context. To handle any of them you’ll end up doing more work to manage the resources and sometimes that extra work saves you time.

Some Misconceptions | 181 Your task may be “IO bound” because it can’t read the input or write the output fast enough. You might be trying to read from a slow network con- nection (“bandwidth bound”). You may not be able to affect that too much.

Your program is “latency bound” when it’s the lag in other resources that prevents your program from moving on. This is the common case in web programming and the situation that Mojo is best able to solve. It can poll various filehandles to see which have some data to read then handle those.If there’s no data to read, your program can do something else for some interval before it checks for input again.

Or, your program may be “CPU bound”. The task is complicated enough that it needs to use as many CPU cycles as you can provide. This excludes other tasks from working until you compute these results.

Non-blocking Requests

Non-blocking single requests are easy: pass a subroutine reference argument when you make the request. This works for any of the request methods (head, get, post, and so on):

use Mojo::Util qw(steady_time); use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

say "Before: " . steady_time();

my $url = shift; $ua->get( $url => sub ( $ua, $tx ) { say "Finished: " . steady_time(); });

say "After: " . steady_time();

END { say "Exiting: " . steady_time(); }

182 | Concurrency The call to get returns right away even though it hasn’t done its work yet. It’s request becomes part of the event loop which will handle it along with everything else that’s going on.

The steady_time from Mojo::Util is handy to see when things happen:

% perl non-blocking.pl https://www.perl.com Before: 1410.471703 After: 1410.472177 Finished: 1410.474147 Exiting: 1410.474247

Notice those times are not Unix epoch times, but are good enough to be rel- ative times. Also, there’s a long lag from the time the request is finished and the time the program stops. That loop is still doing its thing and eventually realizes it has nothing left to do and gives up.

Now try it with multiple URLs. You queue a bunch of requests that run in- dependently. The event loop will check on them and do some work if there’s something to do. Otherwise the loop moves on to something else. Once the request finishes the callback does it’s work:

use Mojo::Util qw(steady_time); use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

say "Before: " . steady_time();

foreach my $url ( @ARGV ) { state $count = 0; my $label = $count++; $ua->get( $url => sub ( $ua, $tx ) { say "Finished $label: " . steady_time(); }); }

say "After:" . steady_time();

END { say "Exiting: " . steady_time(); }

Non-blocking Requests | 183 The callback runs completely before the program can do something else. If you do something interesting, such as waiting for a database to respond (or something uninteresting, such as sleeping), you’ll hold up the rest of your program. If you aren’t careful, your callback can turn a non-blocking frame- work back into the blocking situation it wanted to avoid.

The requests don’t finish in the the order you created them. Try that again or with different URLs and you many get different timings or orders. Differ- ent servers respond at different speeds, so the order you make the requests doesn’t matter as much as the latency. These all finish within a second of each other:

% perl non-blocking.pl https://www.perl.com⏎ https://www.mojolicious.org⏎ https://www.learning-perl.com Before: 1596.904947 After: 1596.905657 Starting loop: 1601.910896 Finished 1: 1601.911408 Finished 0: 1601.91188 Finished 2: 1601.912165 Exiting: 1601.912311

Change that callback to do something blocking, simulated here with a sleep:

$ua->get( $url => sub ( $ua, $tx ) { sleep 3; say "Finished $label: " . steady_time(); });

Now the situation is different because every handler holds up the rest ofthe program. Each handler takes up three seconds where nothing else can hap- pen. Instead of finishing close to each other the handlers are spread three seconds apart:

% perl non-blocking.pl https://www.perl.com⏎ https://www.mojolicious.org⏎ https://www.learning-perl.com Before: 1785.831102

184 | Concurrency After: 1785.831828 Starting loop: 1790.837087 Finished 2: 1793.839834 Finished 1: 1796.84559 Finished 0: 1799.84964 Exiting: 1799.849877

Promises

Mojo replaced its older concurrency implementation with promises, stolen from the Promises/A+ specification for JavaScript. These tools break the “callback hell” that you may have already experienced in that language (or even in Mojo). Instead of descending into the different levels of hell for each level of callback, you proceed in a mostly linear fashion that’s easier to fol- low.

You’ll probably think promises are weird at first. But, once you them, you should see they are much easier than the alternative. And, since you are likely already using promises in JavaScript, you might as well use them in Mojo too.

A promise is an object that defers its result, and can be pending (not done yet), fulfilled (done successfully), or rejected (couldn’t finish). A promise runs a different callback when it’s fulfilled or rejected.

In the specification, a promise is a “thenable”—an object that can respond to a then method. That then takes two code reference arguments:

my $promise = Mojo::Promise->new; $promise->then( \&fulfilled, \&rejected );

That merely sets up the promise, which doesn’t do anything yet. There’s an interesting bit of program flow to pay attention to here. You create the

Promises | 185 promise and store it. Later, in a separate expression, you call then. That returns a new promise with the added callbacks:

my $promise1 = Mojo::Promise->new; my $promise2 = $promise1->then( \&fulfilled, \&rejected );

If you called then immediately, as you would with fluent programming, you add the callbacks to the promise that new created, discard that promise, and return the new one from then. This won’t work because it doesn’t store the interesting promise and you get back a new, unaffected promise:

my $promise2 = Mojo::Promise->new->then( # Wrong! \&fulfilled, \&rejected );

That seems odd now, but promises build on each other and creates chains of promises.

Final statuses

The promise won’t run either of its handlers until it changes state from pend- ing to either fulfilled or rejected. You can do that yourself, but once you choose the final state you can’t change it again:

$promise->resolve; # or ... $promise->reject;

These change the status of the promise but they don’t execute the callbacks. It’s the event loop that notices the promise is no longer pending. Call wait so the event loop can do its work:

use Mojo::Promise;

186 | Concurrency my $promise = Mojo::Promise->new; $promise->then( sub { say "Fulfilled" }, sub { say "Rejected" }, );

$promise->resolve; $promise->wait;

Once the event loop does another cycle it sees the promise has a final status so it runs the callback that goes with that. In this example, you fulfilled the promise:

% perl promises.pl Fulfilled

You can pass arguments to resolve or reject. Those become the argument to the handler:

use Mojo::Promise;

my $promise = Mojo::Promise->new; $promise->then( sub { say "Fulfilled with: @_" }, sub { say "Rejected with: @_" }, );

$promise->resolve( qw(Bender Fry) ); $promise->wait;

The output now shows the arguments the callback received:

% perl promises.pl Fulfilled with: Bender Fry

Modify the program so the first callback has a signature with two parame- ters but then pass it three arguments. Normally that’s a fatal error, but this

Promises | 187 program doesn’t output anything and doesn’t complain:

use Mojo::Promise;

my $promise = Mojo::Promise->new; $promise->then( sub ( $robot, $human ) { say <<~"HERE" Fulfilled robot: $robot human: $human HERE }, sub { say "Rejected with: @_" }, );

$promise->resolve( qw(Bender Fry Leela) ); $promise->wait;

The absence of a warning or error doesn’t mean that nothing blew up. Add a catch to handle that. The error string is an argument to that callback:

use Mojo::Promise;

my $promise = Mojo::Promise->new; $promise ->then( sub ( $robot, $human ) { say <<~"HERE" Fulfilled robot: $robot human: $human HERE }, sub { say "Rejected with: @_" }, ) ->catch( sub { say "Error: @_" } );

188 | Concurrency $promise->resolve( qw(Bender Fry Leela) ); $promise->wait;

Now you see what the problem was:

% perl promises.pl Error: Too many arguments for subroutine 'main::__ANON__'

Chains of promises

A thenable returns another promise. Calling then returns a new promise after attaching the callbacks to the invocant promise. The new promise fulfills if the previous promise fulfilled. Otherwise it’s rejected:

use Mojo::Promise;

my $promise = Mojo::Promise->new; $promise ->then( sub ( $robot, $human ) { say <<~"HERE" Fulfilled robot: $robot human: $human HERE }, sub { say "Rejected with: @_" }, ) ->then( sub { say "First level fulfilled" }, sub { say "Second level reject: @_" } );

$promise->resolve( qw(Bender Fry Leela) ); $promise->wait;

Promises | 189 This is really the same thing as the previous catch example and it fails in the same way:

% perl promises.pl Second level reject: Too many arguments for subroutine...

The catch is really just another promise that has only a callback for rejected state. The fulfilled argument can be undef:

$promise->then( undef, sub { say "Second level reject: @_" } );

There’s also finally, which is like then where both the fulfill and reject call- backs are the same:

$promise->finally( $sub );

The finally method is a shortcut for a then with the same code reference in both arguments:

my $sub = sub { say "Finished! }; $promise->then( $sub, $sub );

Although it looks like it can only happen at the end, finally is a thenable too and you can call further thens on it.

Promise requests

Each of the request methods has a _p variant that returns a promise. Attach callbacks to handle either final status, then call to wait to start the tasks:

use Mojo::Base qw(-strict -signatures); use Mojo::UserAgent;

190 | Concurrency my $ua = Mojo::UserAgent->new->max_redirects(0);

$ua ->get_p( shift ) ->then( sub ( $tx ) { # fulfilled printf "%3d %s\n", $tx->result->code, $tx->req->url, }, sub ( @args ) { # rejected say "Error: @args" }) ->wait ;

Run this to see the response (the second URL is misspelled):

% perl promise-request.pl https://www.perl.com 200 https://www.perl.com % perl promise-request.pl http://www.perl.con Error: Can't connect: nodename nor servname provided, or not known

All promises

An all promise composes one or more promises and is fulfilled only when all of those promises are fulfilled. It is rejected when the first of any ofthe promises is rejected.

Create a branch of promises and store them in an array. Use all to combine them into a higher-order promise then call wait to start them:

use Mojo::Promise; use Mojo::Util qw(dumper);

my @promises = map { my $promise = Mojo::Promise->new;

Promises | 191 my $n = $_; $promise->then( sub { say $n ** 2 }, sub { warn "Error in $n" } ); $promise; } 0 .. 10;

my $uber_promise = Mojo::Promise ->all( @promises ) ->then( sub { say "Everything worked" }, sub { say "One thing failed" } );

$_->resolve() foreach @promises;

$uber_promise->wait;

The output shows that each promise is fulfilled and prints the square ofa number. Finally, the big promise is fulfilled because all the promises it con- tains were fulfilled:

% perl promise-all.pl 0 1 4 9 16 25 36 49 64 81 100 Everything worked

Change the last part of the code to reject a single promise (and leave the rest of them pending):

192 | Concurrency my $uber_promise = Mojo::Promise ->all( @promises ) ->then( sub { say "Everything worked" }, sub { say "One thing failed" } );

$promises[0]->reject;

$uber_promise->wait;

Now the overall promise is rejected immediately because it knows that at least one didn’t work out:

% perl promise-all.pl One thing failed Error in 0 at ...

In that case you don’t need to do the other work, so it might not happen. This also means that you can’t rely on side effects from callbacks that might never run.

Arguments to promises

The all promise gets a list of array references as its arguments. Each array reference is the set of arguments to a call to resolve or reject (and not the last evaluated expressions of the callbacks):

use Mojo::Promise; use Mojo::Util qw(dumper); use Mojo::Collection qw(c);

my @promises = map { my $promise = Mojo::Promise->new; my $n = $_; $promise->then( sub { say $n ** 2 },

Promises | 193 sub { warn "Error in $n" } ); $promise; } 0 .. 4;

my $uber_promise = Mojo::Promise ->all( @promises ) ->then( sub { say "Everything worked\n", dumper(@_); }, sub { say "One thing failed\n", dumper(@_) } );

foreach ( @promises ) { state @characters = c( qw(Fry Bender Leela Farnsworth Nibbler ) )->shuffle->to_array->@*;

$_->resolve( pop @characters ); }

$uber_promise->wait;

The numbers come from the inner fulfilled promises. When all of those are fulfilled, the “Everything Works” message shows up from the all promise. That’s followed by a dump of the arguments to that callback:

% perl promise-all.pl 0 1 4 9 16 Everything worked [ "Nibbler" ] [

194 | Concurrency "Bender" ] [ "Fry" ] [ "Leela" ] [ "Farnsworth" ]

You can flatten those results with ->@* after the array reference:

my $uber_promise = Mojo::Promise ->all( @promises ) ->then( sub { my @results = map { $_->@* } @_; say "Everything worked: @results"; }, sub { say "One thing failed\n", dumper(@_) } );

Now you have the character names as a list:

% perl promise-all.pl 0 1 4 9 16 Everything worked: Bender Farnsworth Nibbler Leela Fry

Promises | 195 First one wins

The race creates a promise that takes the status of the first promise that changes its state. That can be a resolved or rejected promise:

use Mojo::Promise; my @promises = map { Mojo::Promise->new } 0 .. 2;

my $race = Mojo::Promise->race( @promises ); $race->then( sub { say "Fulfilled!" }, sub { say "Rejected!" } );

my $method = time % 2 ? 'resolve' : 'reject';

$promises[ int rand @promises ]->$method(); $race->wait;

When you run this, the result is the first completed promise:

% perl promise-all.pl Rejected! % perl promise-all.pl Rejected! % perl promise-all.pl Fulfilled!

Let’s get the first request that receives a response. You might do thisonan internal network to hit several sources with the same request (maybe slightly different requests to slightly different servers) and take the first answer:

use Mojo::Promise;

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

my @urls = qw(

196 | Concurrency http://www.perl.org/ http://www.perl.com/ http://www.mojolicious.org/ http://www.promisesaplus.com/ );

my @promises = map { $ua->get_p( $_ ); } @urls;

my $race = Mojo::Promise->race( @promises ); $race->then( sub ( $tx ) { printf "%s wins: %s\n", $tx->req->url, eval { $tx->result->dom->at( 'title' )->text } // 'No title'; }, sub ( $tx ) { printf "%s errored!\n", $tx->req->url } ) ->catch( sub { say "Error: @_" } );

$race->wait;

When I first wrote this the Mojo site was winning, but when I checked again www.perl.org was winning:

% perl race.pl http://www.perl.org/ wins: The Perl Programming ⏎ Language - www.perl.org

The race resolves when the first promise is no longer pending, even if that promise was rejected. You’ll see a way around that in a moment.

Promises | 197 Higher-order promises

I created Mojo::Promise::Role::HigherOrder to offer roles for the other com- binations of promises: none, any, and some promises.

The any promise resolves when the first promise resolves:

use Mojo::Promise; use Mojo::Promise::Role::HigherOrder;

my @promises = map { get_p( $url ) } @urls;

my $hop = Mojo::Promise->with_roles('+Any')->new; $hop->any( @promises ); $hop->then( \&fulfilled, \&rejected ); $hop->wait;

The none promise resolves only if all of its promises are rejected:

my $hop = Mojo::Promise->with_roles('+None')->new; $hop->none( @promises); $hop->then( \&fulfilled, \&rejected );

The some promise resolves when a certain number of promises resolves:

my $hop = Mojo::Promise->with_roles('+Some')->new; $hop->some( @promises); $hop->then( \&fulfilled, \&rejected );

Find at least one working mirror

Here’s a program that I wrote for the 2018 Mojolicious Advent Calendar.I want to check that there is at least one working mirror from a list of CPAN mirrors (in this case, taken from my local CPAN config). The higher-order promise succeeds if at least one mirror responds, and doesn’t care if no others

198 | Concurrency respond:

use Mojo::Base qw(-strict -signatures)

use File::Spec::Functions; use Mojo::Promise; use Mojo::Promise::Role::HigherOrder; use Mojo::UserAgent; use Mojo::URL;

# The CPAN config is actually a use lib catfile( $ENV{HOME}, '.cpan' ); my @mirrors = eval { no warnings qw(once); my $file = Mojo::URL->new( 'index.html' ); require CPAN::MyConfig; map { say "1: $_"; $file->clone->base(Mojo::URL->new($_))->to_abs } $CPAN::Config->{urllist}->@* };

die "Did not find CPAN::MyConfig\n" unless @mirrors; my $ua = Mojo::UserAgent->new;

my @all_sites = map { $ua->get_p( $_ )->then( sub ($tx) { die unless $tx->result->body =~ /Jarkko/ }); } @mirrors; my $any_promise = Mojo::Promise ->with_roles( '+Any' ) ->any( @all_sites ) ->then( sub { say "At least one of them worked!" }, sub { say "None of them worked!" }, );

$any_promise->wait;

Promises | 199 Find more than one working mirror

One working mirror might not be enough for me, but requiring that every mirror work is probably too much. The point of a mirror pool is that I expect some of them to be down at any time. Suppose that I want at least two of them. I change the higher-order promise in the previous program to require two working mirrors:

my $some_promise = Mojo::Promise ->with_roles( '+Some' ) ->some( \@all_sites, 2 ) ->then( sub { say "At least $count of them worked!" }, sub { say "None of them worked!" }, );

Check that no URLs are not found

Flip the situation around. I want to check that none of the URLs get a 404 (not found) response. That’s different than checking that all URLs get some- thing other than a 404, which you could also do. But, none rounds out the higher-order constructs and lets me approach the solution from another an- gle:

use Mojo::Base qw(-strict -signatures); use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

use Mojo::Promise; use Mojo::Promise::Role::HigherOrder;

my @urls = qw( https://www.learning-perl.com/ https://www.perl.org/ https://perldoc.perl.org/not_there.pod );

200 | Concurrency my @all_sites = map { my $p = $ua->get_p( $_ ); $p->then( sub ( $tx ) { $tx->res->code == 404 ? $tx->req->url : die $tx->req->url } ); } @urls;

my $all_promise = Mojo::Promise ->with_roles( '+None' ) ->none( @all_sites ) ->then( sub { say "None of them were 404!" }, sub { say "At least one was 404: @_!" }, );

$all_promise->wait;

Summary

Concurrency is a new way of thinking for many programmers. You need to be able to separate a task into independent parts that can interleave their work. How you do that depends on the resources they need to share. Mojo’s non-blocking features are good for latency bound applications where your tasks are often waiting for results from external resources. Mojo builds sev- eral concurrency interfaces on top of its Promises/A+ implementation. You saw how that works in a bit more detail than you probably needed, but that should help you effectively use them in your programs.

Always remember that Mojo doesn’t magically make all of your program non-blocking.

Summary | 201 202 | Concurrency Chapter 10

Content Generators

I don’t remember ever fighting Godzilla… But that is so whatI would have done!

It’s a bit tedious to exactly create the message bodies each time you want to make a request. Not only that, you might (probably will) get it wrong if you build up that string yourself. Besides the message body you’ll need to fix up the request headers.

A content generator lets you give a name to a subroutine that handles all of the details. Mojo supplies some of these shortcuts but you can also make your own. This way, you can give your requests a data structure that you automatically turn into the right content.

Existing Content Generators

You’ve already seen the three built-in content generators: form, json, and multipart. See these by asking the Mojo::UserAgent::Transactor to give you the hash of names and code references:

| 203 use Mojo::UserAgent; my $ua = Mojo::UserAgent->new(); say join "\n", keys $ua->transactor->generators->%*;

As I write this there are three built-in content generators (and these cover most of the situations you’ll encounter with servers you don’t control):

form json multipart

The first one, form, takes a Perl data structure and represents it as form data. To see this without actually requesting something, use build_tx to create the request without sending it to a server. The arguments are the same with the addition of the HTTP verb at the beginning:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new();

my %hash = ( robot => 'Bender', human => 'Fry', );

say $ua->build_tx( GET => 'www.example.com/', form => \%hash )->req->to_string;

The form data for a GET request shows up as part of the query string; that’s in the request URL and not the message body. There’s no need for a Content- Type because there’s no content (and the Content-Length is zero:

GET /?human=Fry&robot=Bender HTTP/1.1 User-Agent: Mojolicious (Perl) Content-Length: 0 Accept-Encoding: gzip

204 | Content Generators Host: www.example.com

Make a small change to that program to use the POST instead of GET and something different happens:

say $ua->build_tx( POST => 'www.example.com/', form => \%hash )->req->to_string;

A POST puts its form data in the message body. Now there’s a Content-Type and a non-zero Content-Length. The content generator takes care of all of that:

POST / HTTP/1.1 Content-Type: application/x-www-form-urlencoded Content-Length: 22 Accept-Encoding: gzip Host: www.example.com User-Agent: Mojolicious (Perl)

human=Fry&robot=Bender

Perhaps you have to send JSON data instead. Change the generator and the request with the same data look different:

say $ua->build_tx( POST => 'www.example.com/', json => \%hash )->req->to_string;

The Content-Type and Content-Length are different because the message body is different. It’s the same information in a different representation:

POST / HTTP/1.1 Content-Length: 32 Content-Type: application/json User-Agent: Mojolicious (Perl)

Existing Content Generators | 205 Accept-Encoding: gzip Host: www.example.com

{"human":"Fry","robot":"Bender"}

The last default generator is multipart, which is designed to send several large chunks of data. The message body is divided into sections for each chunk:

use Mojo::UserAgent; my $ua = Mojo::UserAgent->new();

my @parts = qw(Bender Fry);

say $ua->build_tx( 'POST' => 'www.example.com/', multipart => \@parts, )->req->to_string;

Again, the generator takes care of everything that needs to happen. The Content-Type has a MIME type and information about the boundary. This looks a bit silly but imagine sending several meaty files in one request:

POST / HTTP/1.1 User-Agent: Mojolicious (Perl) Host: www.example.com Accept-Encoding: gzip Content-Length: 46 Content-Type: multipart/mixed; boundary=GAKGX

--GAKGX

Bender --GAKGX

Fry --GAKGX--

206 | Content Generators Creating Your Own Generator

The pre-defined content generators might not do what you need. For exter- nal servers, you probably don’t need something fancier (although if you do please let me know). However, you’re probably also writing your own Mojo web applications where you can control and expand what clients can send. If you’re working on both sides of the transaction there might be easier ways to represent data.

A simple example

First, write a little server that accepts a message body which is a series of network-encoded four-octet integers. After decoding the message body it returns the sum. A Mojolicious::Lite server does that easily:

use Mojolicious::Lite; use List::Util qw(sum);

# Access request information post '/add' => sub { my $c = shift; my @ints = unpack 'N*', $c->req->body; say "ints are @ints"; my $sum = sum( @ints ); $c->result->headers->content_type( 'text/plain' ); $c->render( text => $sum ); };

app->start;

Before you create a content generator do the work by hand. You can pack the numbers and add them to the message body. You can handle the Content- Type yourself and the Content-Length is handled by body when you add the data:

use Mojo::UserAgent;

Creating Your Own Generator | 207 my $ua = Mojo::UserAgent->new;

my @numbers = map { int( rand( 0xFFFFFFFF + 1 )) } 0 .. 5;

my $tx = $ua->build_tx( POST => 'www.example.com/add', ); $tx->req->headers->content_type( 'application/packed-ints' ); $tx->req->body( pack 'N*', @numbers ); say $tx->req->to_string;

The request has the packed numbers in the message body and your desired Content-Type (although there are some funny characters when you print this):

POST /add HTTP/1.1 Content-Type: application/packed-ints Accept-Encoding: gzip Content-Length: 24 User-Agent: Mojolicious (Perl) Host: www.example.comҺɜ

¿¿4l¿¿¿¿[U]¿E¿uE=

That wasn’t complicated but it’s easier to see content generators with an easy example. Now build that into a subroutine and add it with add_generator, giving it a name and a code reference. Use that name in build_tx to turn your data into the right message body. This creates the same request:

use Mojo::Base -strict, -signatures; use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

# args are the transactor, the transaction, and the rest sub packed_ints ( $t, $tx, @args ) { $tx->req->headers->content_type( 'application/packed-ints' ); $tx->req->body( pack 'N*', $args[0]->@* ); } $ua->transactor->add_generator(

208 | Content Generators packed_ints => \&packed_ints );

my @numbers = map { int(rand( 0xFFFFFFFF + 1 )) } 0 .. 5;

my $tx = $ua->build_tx( POST => 'localhost:3000/add', packed_ints => \@numbers, );

say $tx->req->to_string;

Although this looked simple, no one who should use packed_ints needs to worry about the details. It’s the same code reüse benefit as subroutines (be- cause it is a subroutine). If you find a problem, you only have to fix it inone place.

Once you like your request, send it to the server:

$ua->start( $tx ); say $tx->result->body;

Or do it all together:

my $tx = $ua->post( 'localhost:3000/add', packed_ints => \@numbers ); say $tx->result->body;

A Sereal generator

An object serializer might be a more useful example (and most of your work is already done). Perhaps you want to send a Perl object from your program to a web application. I don’t particularly endorse that but it’s a possibility and an interesting case when other representations don’t work. JSON does its job

Creating Your Own Generator | 209 but can spend too much time converting numbers and strings, for instance. You might need something faster and smaller for high throughput, high rate applications.

The Sereal module can encode and decode Perl data structures into a com- pact binary format (read more about it in [Booking.com’s announcement](https://medium.com/booking- com-development/sereal-a-binary-data--format-f5ebd6ede507)). It encodes and decodes quickly and has a small memory footprint. The par- ticular features don’t matter as much as the way that you can make it a content generator:

use Mojo::Base -strict, -signatures; use Mojo::UserAgent; my $ua = Mojo::UserAgent->new;

sub to_sereal ( $t, $tx, @args ) { state $rc = require Sereal::Encoder; $tx->req->headers->content_type( 'application/sereal' ); my $body = Sereal::Encoder::encode_sereal( \@args ); $tx->req->body( $body ); } $ua->transactor->add_generator( sereal => \&to_sereal );

my @numbers = map { int(rand( 0xFFFFFFFF + 1 )) } 0 .. 5;

my $tx = $ua->build_tx( POST => 'www.example.com/add', sereal => \@numbers, );

say $tx->req->to_string;

210 | Content Generators Summary

Content generators allow you to specify the sort of representation you’d like in the message body. Create some code, give it a name, and use that name when you build a transaction. Since the code for the generator is in one place you can easily adjust all of the uses throughout your code base.

Summary | 211 212 | Content Generators Chapter 11

Command-Line Programs

Oh wait, you’re serious. Let me laugh even harder.

Several of the methods from Mojo::UserAgent and other utilities have short- cuts to make them easy for you to use directly from the command line. These one-liners are handy for short programs you probably won’t need again or even those that you’d want to make into shell aliases.

The mojo Command

The mojo command is mostly for creating, running, and testing Mojo web applications—stuff that I don’t cover in this book. It does have one interest- ing command: get, which makes a web request. It’s there to make requests against your own application but it’s useful for any request to any server.

Without any arguments, you get a help message that lists several things that you can do. Most of these aren’t interesting for this chapter:

% mojo

| 213 Usage: APPLICATION COMMAND [OPTIONS]

mojo version mojo generate lite_app ./myapp.pl daemon -m production -l http://*:8080 ./myapp.pl get /foo ./myapp.pl routes -v ...

Try the get command without further arguments; you get another help mes- sage that’s specific to that command.

% mojo get Usage: APPLICATION get [OPTIONS] URL⏎ [SELECTOR|JSON-POINTER] [COMMANDS]

./myapp.pl get / ...

Fetch a resource by supplying the URL as the argument to get. IPify returns your IP address as text:

% mojo get api.ipify.org 89.187.178.38

Dump it to the screen or redirect it to a file:

% mojo get https://mojolicious.org % mojo get https://mojolicious.org > mojo.html

Use -v to see the request and response headers. If you request just mojolicious.org, it looks like nothing happens. The headers tell a different story. You get a response but it’s an HTTP redirect. The mojo command didn’t automatically handle that for you:

% mojo get -v www.mojolicious.org GET / HTTP/1.1 User-Agent: Mojolicious (Perl)

214 | Command-Line Programs Accept-Encoding: gzip Host: www.mojolicious.org Content-Length: 0

HTTP/1.1 301 Moved Permanently Location: https://www.mojolicious.org/ Date: Mon, 23 Jul 2018 18:08:54 GMT Transfer-Encoding: chunked CF-RAY: 43f01855c650923c-EWR Connection: keep-alive Server: cloudflare Cache-Control: max-age=3600 Expires: Mon, 23 Jul 2018 19:08:54 GMT

Follow redirects (up to 10 of them) with the -r switch. Now you see some content:

% mojo get -r www.mojolicious.org Mojolicious - Perl real-time web framework ...

Add a header with -H. Separate the header name and value with a colon:

% mojo get -v -H 'X-Bender: Bite me!' www.mojolicious.org GET / HTTP/1.1 Accept-Encoding: gzip Host: www.mojolicious.org Content-Length: 0 User-Agent: Mojolicious (Perl) X-Bender: Bite me!

...

The mojo Command | 215 Add some form data with -f. Specify each name and value pair with a sep- arate -f:

% mojo get -v -f 'a=b' -f 'a=c' www.httpbin.org/get GET /get?a=b&a=c HTTP/1.1 Host: www.httpbin.org User-Agent: Mojolicious (Perl) Accept-Encoding: gzip Content-Length: 0

HTTP/1.1 200 OK Content-Type: application/json Content-Encoding: gzip Server: nginx Date: Sat, 13 Apr 2019 01:26:51 GMT Connection: keep-alive Content-Length: 202 Access-Control-Allow-Origin: * Access-Control-Allow-Credentials: true

{ "args":{ "a":[ "b", "c" ] }, "headers":{ "Accept-Encoding": "gzip", "Host": "www.httpbin.org", "User-Agent": "Mojolicious (Perl)" }, "origin": "89.46.103.172, 89.46.103.172", "url": "https://www.httpbin.org/get?a=b&a=c" }

Try it with something that needs URL encoding, such as a space in the value for a:

% mojo get -v -f 'a=b' -f 'a=c d' www.httpbin.org/get

216 | Command-Line Programs GET /get?a=b&a=c+d HTTP/1.1 Host: www.httpbin.org Accept-Encoding: gzip User-Agent: Mojolicious (Perl) Content-Length: 0

HTTP/1.1 200 OK Access-Control-Allow-Origin: * Content-Type: application/json Content-Length: 205 Connection: keep-alive Date: Sat, 13 Apr 2019 01:28:34 GMT Access-Control-Allow-Credentials: true Server: nginx Content-Encoding: gzip

{ "args":{ "a":[ "b", "c d" ] }, "headers":{ "Accept-Encoding": "gzip", "Host": "www.httpbin.org", "User-Agent": "Mojolicious (Perl)" }, "origin": "89.46.103.172, 89.46.103.172", "url": "https://www.httpbin.org/get?a=b&a=c+d" }

Although the command name is get you can change the HTTP verb with - M. Now the form data are in the message body so you don’t see those in the request headers:

% $ mojo get -v -M POST -f 'a=b' -f 'a=c d'⏎ www.httpbin.org/post POST /post HTTP/1.1

The mojo Command | 217 User-Agent: Mojolicious (Perl) Content-Length: 9 Host: www.httpbin.org Content-Type: application/x-www-form-urlencoded Accept-Encoding: gzip

...

Processing the response

Immediately process the result of a request by providing a selector. For JSON responses, the selector starts with / and follows the rules from Mojo::JSON ::Pointer (RFC 6901. XPath users should be comfortable with this:

% mojo get -f 'a=b' -f 'a=c d' www.httpbin.org/get /url https://www.httpbin.org/get?a=b&a=c+d % mojo get -f 'a=b' -f 'a=c d' www.httpbin.org/get /args {"a":["b","c d"]}

Earlier, in chapter 3, you saw jq doing the same sort of thing. Use it to extract part of the response:

% mojo get -f 'a=b' -f 'a=c d' www.httpbin.org/get /args⏎ | jq '.a[1]' "c d"

If the response is HTML, use CSS selectors instead. This extracts the title part of the DOM and includes the HTML:

% mojo get -r www.mojolicious.org title Mojolicious - Perl real-time web framework

After the selector, supply a command and its arguments. Select only the text in the title tag:

218 | Command-Line Programs % mojo get -r www.mojolicious.org title text Mojolicious - Perl real-time web framework

In this example, attr selects values from attributes of the selected items; the href is the argument to attr:

% mojo5.28.0 get -r www.mojolicious.org 'a[href]' attr href https://metacpan.org/release/Mojolicious https://mojolicious.org https://mojolicious.org/perldoc ...

Here’s another example in which you extract the text inside the H1, H2, and H3 tags:

% mojo get -r www.mojolicious.org 'h1, h2, h3' text A next generation web framework for the Perl programming language.

Features Installation Getting Started Duct tape for the HTML5 web Want to know more?

Perl One-Liners

A “one-liner” is a program that you write and execute on the command line. It’s typically for small bits of code that you may not intend to reuse. Specify that program as the argument to the -e switch:

% perl -e 'print "Hello World\n"'

On Windows, you need to use double quotes for the argument, which is a problem inside your program. Use the qq generalized quoting instead of dou- ble quotes in your program:

Perl One-Liners | 219 C:\perl -e "print qq(Hello World\n)"

In those two examples you need the double quotes inside the program to interpolate the \n into a newline. You can get rid of the double quote problem by using the -l switch; that automatically adds a newline to a print. You don’t need the interior double quotes (or equivalent) now. The Unix shell and Windows command lines can look the same:

% perl -l -e "print 'Hello World'" C:\perl -l -e "print 'Hello World'"

Since the -l takes no argument, you can “bundle” the switches for a little less typing:

% perl -le "print 'Hello World'" C:\perl -le "print 'Hello World'"

The -E is like -e but also enables the new features added since v5.8. One of those features is say—it adds a newline to the end of its output. This is effectively the same as the -l and -e together:

% perl -E "say 'Hello World'" C:\perl -E "say 'Hello World'"

Use -M to load a module from the command line. The ojo module is cleverly named to spell out “Mojo” when you load it with -M:

% perl -Mojo -E "say Mojolicious->VERSION"

The -I adds directories to the module search path. This example will find the ./lib/yModule.pm file:

% perl -Ilib -MyModule "..."

Remember that v5.26 removed the dot from the default @INC. Reädd that with -I. if you need it:

220 | Command-Line Programs % perl -I. -MyModule "..."

The PERL5OPT environment variable can make these one-liners a bit shorter and are often useful in a session (but maybe not across sessions). These command-line options are automatically added to your call to perl. Along with that, the PERL5LIB environment variable adds directories to @INC. These reduce the amount of repeated typing a bit:

% export PERL5OPT='-Mojo' % export PERL5LIB='lib' % perl -E "say Mojolicious->VERSION"

In your shell setting file you should be able to define aliases for one-liners. Here’s the bash alias for a Perl one-liner that converts a decimal number to its hexadecimal representation:

# .bash_profile alias d2h="perl -e 'printf qq|%X\n|, int( shift )'"

You can do a similar thing with DOS or PowerShell but it’s a bit more compli- cated than I want to show here; I’ll punt to a [Aliases in Windows command prompt](https://stackoverflow.com/q/20530996/2766176) on StackOverflow.

Fetching a Resource

All of the HTTP verbs have a shortcut function. Instead of returning a trans- action object they return the response object; that cuts out one step in getting to the result. Instead of get there is g and you can call body right away:

% perl -Mojo -E "say g('mojolicious.org')->body"

Notice the lack of a scheme in that example. It’s just the host name and the shortcut figures it out. The g takes the same arguments as get. This one includes some form data in the request:

Fetching a Resource | 221 % perl -Mojo -E "say g('http://httpbin.org/get'⏎ => form => { robot => 'Bender'})->body"

Save the content so you can play with it later:

% perl -Mojo -E "g(shift)->save_to('test.html')"⏎ mojolicious.org

The p does a POST request:

% perl -Mojo -E "say p('http://httpbin.org/post' =>⏎ form => { robot => 'Bender'})->body"

The h shortcut stands in for the head method. Sometimes you don’t care about the content (which can be quite long). This one-liner shows all of the headers:

% perl -Mojo -E "say h('mojolicious.org')⏎ ->headers->to_string" Server: cloudflare Content-Encoding: gzip Connection: keep-alive Set-Cookie: ... CF-RAY: 43eeea976c1021c2-EWR Content-Type: text/html;charset=UTF-8 Expect-CT: max-age=604800, report-uri="..." Date: Mon, 23 Jul 2018 14:42:55 GMT

Or look at a particular header:

% perl -Mojo -E 'say h("mojolicious.org")->headers⏎ ->set_cookie' __cfduid=d1727b77aa1044975e86fb61badd3f1ad1532357521;⏎ expires=Tue, 23-Jul-19 14:52:01 GMT; path=/; domain=⏎ .mojolicious.org; HttpOnly; Secure

222 | Command-Line Programs Some one-liners

This section includes some one-liners. Some of them can use both the mojo command and the shortcuts.

Show the HTTP status code, which I like to check if the right thing is hap- pening:

% perl -Mojo -e "say g(shift)->code" www.perl.com 200

Extract the unique links in a resource:

% mojo get https://www.mojolicious.org a attr href % perl -Mojo -E 'g("mojolicio.us")->dom("a")⏎ ->map( attr => "href" )->uniq->join("\n")->say'

Extract all the images:

% mojo get https://www.mojolicious.org img attr src % perl -Mojo -E 'g("mojolicio.us")->dom("img")⏎ ->map( attr => "src" )->uniq->join("\n")->say'

Get the titles from the latest articles in /r/perl on Reddit:

% mojo https://www.reddit.com/r/perl/⏎ 'p.title > a.title' text

List all the HTML headings (not HTTP headers) in a document:

% mojo get https://www.mojolicious.org 'h1,h2,h3' text

Grab the value of a particular header. Most of the time you only need a HEAD request:

% perl -Mojo -E "say h(shift)->headers⏎ ->header(shift) // ''"⏎

Fetching a Resource | 223 https://www.mojolicious.org Server cloudflare % perl -Mojo -E "say h(shift)->headers⏎ ->header(shift) // ''"⏎ https://www.perl.com/ ETag "5b5a1636-8248"

Download some JSON and extract something from it. The j turns a JSON string into a Perl data structure. This one gets the author record from MetaC- PAN by the ID and extracts the full name:

% perl -Mojo -e "say⏎ j(g('https://fastapi.metacpan.org/v1/author/'⏎ .shift)->body)->{name}" BDFOY brian d foy

And here’s one with the IPify response:

% perl -Mojo -E "say j(g(shift)->body)->{ip}"⏎ 'api.ipify.org?format=json' 89.187.178.38

Playing with the DOM

The f shortcut creates a Mojo::File object. Call slurp on that and you’ve reïnvented the cat (or type) command:

% perl -Mojo -e "print f('test.md')->slurp"

Instead of specifying the filename inside the program, use shift to get it from the command-line arguments. Outside of a subroutine definition the shift works on @ARGV. This give the same result:

% perl -Mojo -e "print f(shift)->slurp" test.md

224 | Command-Line Programs The x shortcut creates a Mojo::DOM object from its argument.

% perl -Mojo -E "say x('

Bender

')⏎ ->at('p')->text" Bender

Combine that with f to load a local file into a DOM

% perl -Mojo -E "say x(f(shift)->slurp)⏎ ->at('p')->text"

The n runs a block of code the number of times that you specify. Here’s one that slurps a file then finds all the DIV tags 10 times. It outputs areportof the benchmarks:

% perl -Mojo -E 'my $dom=x(f(shift)->slurp);⏎ n { $dom->find("div") } 10' spec.html 5.09754 wallclock secs ( 5.06 usr + 0.01 sys = 5.07 CPU)

Or grab it from the network, which is a little bit slower:

% perl -Mojo -E 'my $dom=g(shift)->dom;⏎ n { $dom->find("div") } 10' https://html.spec.whatwg.org 5.31821 wallclock secs ( 5.28 usr + 0.01 sys = 5.29 CPU)

Extending ojo

Create your own ojo-like module to provide other shortcuts you’d like. Here’s an extension that adds an S to slurp a file and cookie to extract the Set- Cookie header. It loads ojo first and reëxports those functions:

use Mojo::Base -strict, -signatures; use ojo;

use Exporter qw(import);

Extending ojo | 225 our @EXPORT = ( # new shortcuts qw( s cookie ), # from ojo qw(abcdfghjnoprtux) );

sub S ( $file ) { f( $file )->slurp } sub cookie ( $url ) { g( $url )->headers->set_cookie }

Now loading a file is a bit shorter (something important for one-liners):

% perl -Ilib -MyOjo -e "print S(shift)" test.html

Create a DOM from the slurped file right away:

% perl -Ilib -MyOjo -e "x(S(shift))->at('p')->text" test.html

Check the cookie value for a resource:

% perl -Ilib -MyOjo -E "say cookie(shift)"⏎ mojolicious.org __cfduid=dc889efdd23c2fca4379087e969ae66901532359564;⏎ expires=Tue, 23-Jul-19 15:26:04 GMT; path=/; domain=⏎ .mojolicious.org; HttpOnly; Secure

You might also consider not inheriting from ojo. Take a look at the source; it’s very simple and you can easily create your standalone module doing the same sort of thing with your specialized subroutines.

Summary

There are several shortcuts that make Mojo one-liners easy. Make your re- quest and process it right from the command line. If you want to reüse these one-liners you can create shell aliases. If you need more you can create your

226 | Command-Line Programs own shortcut module to handle it for you.

Summary | 227 228 | Command-Line Programs Chapter 12

Conclusion

The ship’s ready except for this cup holder, and I should have that done in 12 hours.

I set out to see how much of a Mojo book I could write in three months and had a target date of the 2018 Nordic Perl Workshop in Oslo. As with any book I’ve written I expanded the scope and exceeded my estimated page count. There were some topics that I would have liked to include but that I cut for time and space.

Since I’m publishing this in a new eBook model, I’ll have the opportunity to update this book frequently so these topics might be in future editions. However, I won’t make any promises. Maybe these things will show up in their own tiny eBooks.

| 229 Uncovered Topics

Roles

Mojolicious 8 introduced roles that you can apply to many of the main classes:

my $ua = Mojo::UserAgent->new->with_roles( '+Bender' );

This assumes the namespace of the object along with ::Role:: and the name you specified. You’d end up with Mojo::UserAgent::Role::Bender.

my $ua = Mojo::UserAgent->new->with_roles( 'My::Role::Bender' );

Joel Berger described roles as part of the Mojo 2017 Advent Calendar and I wrote about them for the 2018 version.

Namespace selectors

XML documents can define namespaces to compartmentalize tags. You can build selectors based on those namespaces but I don’t want to write a book about XML. If there’s enough demand for these, I can expand chapter 5.

Servers inside the user-agent

You can set up a Mojo app inside your user-agent.

use Mojolicious; use Mojo::UserAgent;

my $ua = Mojo::UserAgent->new;

230 | Conclusion my $server = $ua->server->app( Mojolicious->new ); my $routes = $ua->server->app->routes;

$routes->get( '/time', sub { my $c = shift; $c->render( json => {now => time} ); });

$routes->get( '/time-delay/:seconds', sub { my $c = shift; $c->render_later; sleep 5; $c->render( json => {now => time} ); });

say $ua->get( '/time' )->result->to_string;

That makes a request to the local server:

HTTP/1.1 200 OK Date: Wed, 05 Sep 2018 15:24:38 GMT Content-Type: application/json;charset=UTF-8 Server: Mojolicious (Perl) Content-Length: 18

{"now":1536161078}

This goes the other way too. Your Mojolicious program has access to a user- agent inside the app so you can make requests inside a controller. That re- quest can be non-blocking; you call render_later and handle the response with a callback.

Note that httpbin also has a delayed response endpoint, although that’s not as fun as writing your own.

Uncovered Topics | 231 Range requests

You can request particular octets of a resource. This can be handy for re- suming interrupted requests, for example. It’s not that complicated; send the right header and process the response. The Range request header tells the server which octets to return:

Range: bytes=0-1023

The server sends with a Content-Range response header that tells you the range and the total number:

Content-Range: bytes 0-1023/146515

It’s up to you to reassemble the ranges into the whole document.

Websockets

Mojo can handle . Your client connects to a server and doesn’t disconnect. It sends some data and gets some data back but without all the HTTP stuff. This is handy when you have some sort of live updating real-time data where you don’t want another full HTTP connection.

I think this sort of programming deserves its own minibook instead of shoe- horning it into the structure of this one. Part of that includes writing some webservers to handle the server side of the conversation, which I didn’t want to do in this book.

Mojo::UserAgent inside a web app

Mojo::UserAgent works from inside a Mojo web application. It’s all the same stuff and you can use the response from an internal request to makeyour response to the external client.

232 | Conclusion Working with alternate TLS certificates

You can replace the secure certificate stuff that Mojo uses by default ifyou have special server requirements. It’s not that hard but to cover it adequately in the book I’d want to write about how that stuff actually works. I didn’t think is was worth the trade off. If you are doing it you probably already understand what you need to do. And, if you aren’t doing it, you don’t need it. Maybe I’m wrong but I’m the one who gets to decide.

$ua->ca('/etc/tls/ca.crt'); # or $ENV{MOJO_CA_FILE} $ua->cert('/etc/tls/client.crt'); # or $ENV{MOJO_CERT_FILE}

You can also play around with IO::Socket::SSL options.

Testing

I like to test all of the code that I write and testing web client code is a special challenge. I could mock up servers (or other code) to provide all sorts of interesting situations but that’s almost a book itself. Most of the Mojo testing stuff deals with the web applications that you create yourself rather than user-agent stuff to hit servers you don’t control. Sadly, this topic wasatthe end of the list (just like real world software testing, eh?) and I excluded it to get the rest of the book to you.

Uncovered Topics | 233 234 | Conclusion Chapter 13

Resources

Hey! Oh! You want me to do two things? Man, I’d call my lawyer if dialin’ the phone wasn’t such a hassle.

This ending chapter collects various things you may want to read. I men- tioned some of them in the previous chapters and some of them were in my notes but didn’t have a chance to show themselves in the content. Others are things I think are relevant to the problems you’ll run into as you create your own clients.

Introduction

• A Standard for Robot Exclusion

• Perl.org

• MetaCPAN

• The Official Mojolicious Site

| 235 Perl Books

• Web Client Programming with Perl (old, but still interesting), by Lin- coln Stein

• Learning Perl, by Randal L. Schwartz, Tom , and brian d foy

• Intermediate Perl, by Randal L. Schwartz and brian d foy

• Mastering Perl, by brian d foy

, by , Tom Christiansen, and brian d foy

Perl Features

I explain many of Perl’s new features (those added since v5.8) at The Effec- tive Perler. These are the features that I used often in the examples:

• Subroutine signatures

• Postfix dereferencing

• Indented here docs

• The /r flag for s///

Basic Utilities

• JSON

• RFC 8259: The JavaScript Object Notation (JSON) Data Interchange Format

• GitHub API (v3)

• Uniform Resource Identifier (URI): Generic Syntax

236 | Resources Selectors

• Cascading Style Sheets

• Selectors Level 3

• Selectors Level 4

Requests and Responses

• Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies

• Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types

• RFC 7230: Transfer Protocol (HTTP/1.1): Message Syntax and Routing

• RFC 7231: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content

• RFC 6266: Use of the Content-Disposition Header Field in the Hyper- text Transfer Protocol (HTTP)

• RFC 7234: Hypertext Transfer Protocol (HTTP/1.1): Caching

Cookies

• RFC 6265: HTTP State Management Mechanism

Concurrency

• Concurrency is not parallelism

| 237 • Writing Non-Blocking Applications with Mojolicious: Part 1

• Writing Non-Blocking Applications with Mojolicious: Part 2

• Writing Non-Blocking Applications with Mojolicious: Part 3

• Promise/A+ specification

• Higher Order Promises

Content Generators

• Day 11: UserAgent Content Generators

One-Liners

• perlrun

• Perl One-Liners

• One-liners for Fun and Profit

238 | Resources