[prev in list] [next in list] [prev in thread] [next in thread] 

List:       apache-modperl
Subject:    Re: "DigExt" in user-agent hammering my site
From:       merlyn () stonehenge ! com (Randal L !  Schwartz)
Date:       1999-10-30 20:02:13
[Download RAW message or body]

>>>>> "Jay" == Jay J <3pound@iname.com> writes:

Jay> I just tried it using IE5 for NT4 ..

Jay> What you're seeing is when someone has used "Make available
Jay> offline" followed by:

Jay> "If this favorite links to other pages, would you like to make
Jay> those pages available offline too? [y/n] ... Download pages [x]
Jay> links deep from this page"

Jay> The useragent is this: Mozilla/4.0 (compatible; MSIE 5.0; Windows
Jay> NT; DigExt)

Jay> And proceeds to crawl the site with 0-wait time between requests....

Jay> I haven't inspected the client-header to see if there might be
Jay> something to indicate it's in "crawl" mode .. I think it's
Jay> doubtful there is. So.....


Nope, I could find nothing to distinguish "evil spider" mode from
normal browsing mode, other than the rapidity of the download
requests.

So, I wrote my own throttling routines, unsatisfied with the others
that I found...

    package Stonehenge::Throttle;
    use strict;

    ## usage: PerlAccessHandler Stonehenge::Throttle;

    my $HISTORYDIR = "/home/merlyn/lib/Apache/Throttle";

    my $WINDOW = 90;		# seconds of interest
    my $SLOWBYTES = $WINDOW * 2000;	# bytes before we sleep
    my $SLEEP = 1;			# sleep time
    my $DECLINEBYTES = $WINDOW * 3000; # bytes before we 408 error

    use vars qw($VERSION);
    $VERSION = (qw$Revision: 1.4 $ )[-1];

    use Apache::Constants qw(OK DECLINED);
    use Apache::File;
    use Apache::Log;

    use Stonehenge::Reload;

    sub handler {
      goto &handler if Stonehenge::Reload->reload_me;

      my $r = shift;
      return DECLINED unless $r->is_initial_req;
      my $log = $r->server->log;

      my $host = $r->get_remote_host;
      return DECLINED if $host =~ /\.(holdit|stonehenge)\.com$/;

      my $historyfile = "$HISTORYDIR/$host"; # closure var

      $r->register_cleanup
	(sub {
	   my $fh = Apache::File->new;
	   open $fh, ">>$historyfile" or return DECLINED;

	   my $time = time;
	   my $bytes = $r->bytes_sent;
	   syswrite $fh, pack "LL", $time, $bytes;
	   close $fh;

	   return OK;
	 });

      {
	my $startwindow = time - $WINDOW;
	my $totalbytes = 0;
	my $fh = Apache::File->new;
	open $fh, $historyfile or return DECLINED;
	while ((read $fh, my $buf, 8) > 0) {
	  my ($time, $bytes) = unpack "LL", $buf;
	  next if $time < $startwindow;
	  $totalbytes += $bytes;
	}
	if ($totalbytes > $DECLINEBYTES) {
	  $log->notice("$host got $totalbytes in $WINDOW secs, sending 503");
	  $r->header_out("Retry-After", $WINDOW);
	  return 503;		# Service Unavailable
	} elsif ($totalbytes > $SLOWBYTES) {
	  $log->notice("$host got $totalbytes in $WINDOW secs, sleeping for $SLEEP");
	  sleep $SLEEP;
	  return DECLINED;
	} else {
	  ## $log->notice("$host got $totalbytes in $WINDOW secs"); # DEBUG
	  return DECLINED;
	}
      }
      return DECLINED;
    }
    1;

This has to be aided by a cron script run every 20 minutes or so
that looks like this:

    #!/usr/bin/perl -w
    use strict;

    # $Id: throttle-cleaner,v 1.1 1999/10/28 19:44:09 merlyn Exp $

    my $DIR = "/home/merlyn/lib/Apache/Throttle";
    my $SECS = 360;			# more than Stonehenge::Throttle $WINDOW

    chdir $DIR or die "Cannot chdir $DIR: $!";
    opendir DOT, "." or die "Cannot opendir .: $!";
    my $when = time - $SECS;
    while (my $name = readdir DOT) {
      next unless -f $name;
      next if (stat($name))[8] > $when;
      ## warn "unlinking $name\n";
      unlink $name;
    }

So now I have a bytes-served-in-window throttler on my website that
prevents anyone from sucking down more than 3k/sec sustained over 90
seconds from any specific IP.

It triggered five times overnight.  But my ISP neighbors are now
happy.

I should clean up Stonehenge::Throttle and submit it.  Notice, no file
locking!  That was an interesting fallout of the design.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic