Example 1: A Huge Number of Requests

People often need to fetch thousands of requests. The naïve approach is to post() all those requests at a POE::Component::Client::HTTP instance. This always ends in tears because all those requests are enqueued together. Because messages are first-in/first-out, the first HTTP response can't arrive until the last HTTP request has been sent.

Symptoms including POE::Component::Client::HTTP slowing down progressively more as the event queue grows. At some point its performance becomes unacceptably slow.

The solution is to fire off a small number of initial requests, perhaps ten or twenty, and to fire new requests as each response arrives. This is much more responsive, and it keeps ten to twenty requests running in parallel.

Of course you can use fewer or more initial requests to tune the degree of parallelism.

#!/usr/bin/perl

use warnings;
use strict;

use HTTP::Request::Common qw(GET);
use POE qw(Component::Client::HTTP);

# Start a resolver.

POE::Component::Client::HTTP->spawn(Alias => 'ua');

# Open a file containing one URL per line.

open my $url_list_fh, "<", "url-list.txt" or die $!;

# Start a session that will drive the resolver.
# Callbacks are named functions in the "main" package.

POE::Session->create(package_states => [main => ["_start", "got_response"]]);

POE::Kernel->run();
exit;

# Generator function to produce the next URL to fetch.
# For example, this one reads URLs from a file.

sub get_another_url {
  my $next_url = <$url_list_fh>;
  return unless defined $next_url;
  $next_url =~ s/^\s+//;
  $next_url =~ s/\s+$//;
  return unless length $next_url;
  return $next_url;
}

# The _start callback is used to begin the session's work.
# It starts a finite subset of the requests in the file.

sub _start {
  my $kernel = $_[KERNEL];

  for (1 .. 10) {
    my $next_url = get_another_url();
    last unless defined $next_url;

    $kernel->post("ua" => "request", "got_response", GET $next_url);
  }
}

# We will get an HTTP::Response object as each request finishes.
# Process the response, and trigger another request, if appropriate.

sub got_response {
  my ($heap, $request_packet, $response_packet) = @_[HEAP, ARG0, ARG1];

  my $http_request  = $request_packet->[0];
  my $http_response = $response_packet->[0];

  my $response_string = $http_response->as_string();
  $response_string =~ s/^/| /mg;
  print ",", '-' x 78, "\n";
  print $response_string;
  print "`", '-' x 78, "\n";

  my $next_url = get_another_url();
  if (defined $next_url) {
    $_[KERNEL]->post("ua" => "request", "got_response", GET $next_url);
  }
}

Example 2: One Session Per Request.

Here is how a dynamic group of sessions may share a single POE::Component::Client::HTTP user-agent. Note that the component is spawned as a service that multiple sessions share.

Dividing requests into separate sessions is useful for large tasks, such as web spidering, where each request is part of a larger sequence of operations. It's not always needed, as you can see from the other examples in this section.

#!/usr/bin/perl

use warnings;
use strict;

# POE::Component::Client::HTTP uses HTTP::Request and response
# objects.

use HTTP::Request::Common qw(GET POST);

# A list of pages to fetch.  They will be fetched in parallel.  Add
# more sites to see it in action.

my @url_list = qw(
  http://poe.perl.org/misc/test.html
);

# Include POE and the HTTP client component.

use POE qw(Component::Client::HTTP);

# Create a user agent.  It will be referred to as "ua".  It limits
# fetch sizes to 4KB (for testing).  If a connection has not occurred
# after 180 seconds, it gives up.

POE::Component::Client::HTTP->spawn(
  Alias   => 'ua',
  MaxSize => 4096,    # Remove for unlimited page sizes.
  Timeout => 180,
);

# Create a session for each request.

foreach my $url (@url_list) {

  POE::Session->create(
    inline_states => {
      _start => sub {
        my ($kernel, $heap) = @_[KERNEL, HEAP];

        # Post a request to the HTTP user agent component.  When the
        # component has an answer (positive or negative), it will
        # send back a "got_response" event with an HTTP::Response
        # object.

        $kernel->post(ua => request => got_response => GET $url );
      },

      # A response has arrived.  Display it.

      got_response => sub {
        my ($heap, $request_packet, $response_packet) = @_[HEAP, ARG0, ARG1];

        # The original HTTP::Request object.  If several requests
        # were made, this can help match the response back to its
        # request.

        my $http_request = $request_packet->[0];

        # The HTTP::Response object.

        my $http_response = $response_packet->[0];

        # Make the response presentable, and display it.

        my $response_string = $http_response->as_string();
        $response_string =~ s/^/| /mg;
        print ",", '-' x 78, "\n";
        print $response_string;
        print "`", '-' x 78, "\n";
      },
    },
  );
}

# Run everything, and exit when it's all done.

$poe_kernel->run();
exit 0;

Example 3: Multiple requests in a single session.

[ Added by ekkis ]

Here is a simpler example that uses a single session to request multiple urls. It is a "flyweight" version of the previous example. This is a common pattern, and the large-scale example also uses it.

Client::HTTP expects its requests to come from another session. Likewise, it sends its responses back to that session. This is why a Client::HTTP program needs at least one other session, even if it's as trivial as the one in the example below.

This is also why programs cannot send a request from one session and receive responses in another.

#!/usr/bin/perl

use warnings;
use HTTP::Request::Common qw(GET POST);
use POE qw(Component::Client::HTTP);

my @url_list = (
  "http://poe.perl.org/misc/test.html",
  "http://poe.perl.org/?POE_Cookbook/Web_Client"
);

POE::Component::Client::HTTP->spawn(Alias => 'ua');

sub got_response {
  my ($heap, $request_packet, $response_packet) = @_[HEAP, ARG0, ARG1];

  my $http_request  = $request_packet->[0];
  my $http_response = $response_packet->[0];

  my $response_string = $http_response->as_string();
  $response_string =~ s/^/| /mg;
  print ",", '-' x 78, "\n";
  print $response_string;
  print "`", '-' x 78, "\n";
}

sub _start {
  my $kernel = $_[KERNEL];

  foreach my $url (@url_list) {
    $kernel->post("ua" => "request", "got_response", GET $url);
  }
}

POE::Session->create(package_states => [main => ["_start", "got_response"]]);

$poe_kernel->run();

Example 4: Multiple Requests from One Session

Multiple requests in a single session with GET and POST, including a tag to identify the request.

[ Added by xantus ]

In this example, I've setup an array of request objects, and each one is posted to the ua session It demonstrates GET and POST

#!/usr/bin/perl

use warnings;
use HTTP::Request::Common qw(GET POST);
use POE qw(Component::Client::HTTP);

# first one is a simple GET
# second one is a POST to a (non-existant) script that accepts files
# third one is another type of POST to a (non-existant) script to login with a username/password
#    it automaticly posts it with a content type of 'application/x-www-form-urlencoded'
my @url_list = (
  (GET "http://poe.perl.org/?POE_Cookbook",),
  (
    POST "http://teknikill.net/upload_photos.pl",
    Content_Type => 'form-data',
    Content      => [
      name         => 'David Davis',
      email        => 'user@host.com',
      picture_file => ["$ENV{HOME}/pic_of_me.png"],
    ],
  ),
  (
    POST "http://teknikill.net/login.pl",
    Content => [
      username => 'joe_bob',
      password => 'my secret password',
    ],
  ),
);

POE::Component::Client::HTTP->spawn(Alias => 'ua');

sub got_response {
  my ($heap, $request_packet, $response_packet) = @_[HEAP, ARG0, ARG1];

  my $http_request  = $request_packet->[0];
  my $tag           = $request_packet->[1];
  my $http_response = $response_packet->[0];

  my $response_string = $http_response->as_string();
  $response_string =~ s/^/| /mg;
  print ",", '-' x 78, "\n";
  print "| Tag for this request: $tag\n";
  print "|", '-' x 78, "\n";
  print $response_string;
  print "`", '-' x 78, "\n";
}

sub _start {
  my $kernel = $_[KERNEL];

  foreach my $i (0 .. $#url_list) {

    # To pass a 'tag' to identify the request we sent, we'll send the array index
    $kernel->post("ua" => "request", "got_response", $url_list[$i], $i);
  }
}

POE::Session->create(package_states => [main => ["_start", "got_response"]]);

$poe_kernel->run();

Example 5: Save Large Files to Disk

[ Added by yoda ]

This takes advantage of the Streaming option and writes chunks of large responses to disk as they arrive. It's a useful alternative to loading large responses into memory.

#!/usr/bin/perl
use warnings;
use strict;
use Fcntl;
use HTTP::Request::Common qw(GET POST);
use POE qw(Component::Client::HTTP);

my @url_list = ("http://poe.perl.org/misc/stream-test.cgi",);

POE::Component::Client::HTTP->spawn(
  Alias     => 'ua',
  Streaming => 4096,
);

POE::Session->create(
  inline_states => {
    _start       => \&client_init,
    got_response => \&client_handle_response,
  }
);

sub client_init {
  $_[HEAP]->{files} = {};

  for my $url (@url_list) {
    $_[KERNEL]->post(ua => request => got_response => GET $url);
  }
}

sub client_handle_response {
  my $req = $_[ARG0][0];
  my ($res, $data) = @{$_[ARG1]};

  my $fh = $_[HEAP]->{files}{$req};
  unless ($fh) {
    my ($file) = $res->request->uri =~ m{/([^/.]+\.[^/]+)$};
    unless ($file) {
      require Digest::MD5;
      $file = Digest::MD5::md5_hex($res->request->uri);
    }
    sysopen(OUT, "/tmp/$file", O_WRONLY | O_CREAT | O_NONBLOCK);
    $fh = $_[HEAP]->{files}{$req} = *OUT{IO};
  }
  if (defined $data) {
    print $fh $data;
  }
  else {
    close $fh;
    delete $_[HEAP]->{files}{$req};
  }
}

POE::Kernel->run;