[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-catalog-sig
Subject:    [Catalog-sig] PyPI and Wiki crawling
From:       "Martin_v._Löwis" <martin () v ! loewis ! de>
Date:       2007-08-07 21:06:47
Message-ID: 46B8DEE7.2080703 () v ! loewis ! de
[Download RAW message or body]

I hope I have now solved the overload problem that massive
crawling has caused to the wiki, and, in consequence,
caused PyPI outage.

Following Laura's advice, I added Crawl-delay into robots.txt.
Several robots have picked that up, not just msnbot and slurp,
but also e.g. MJ12bot.

For the others, I had to fine-tune my throttling code, after
observing that the expensive URLs are those with a query string.
They now account for 3 regular queries (might have to bump this
to 5), so you can only do one of them every 6s.

For statistics of the load, see

http://ximinez.python.org/munin/localdomain/localhost.localdomain-pypitime.html

I added accounting of moin.fcgi run times, which shows that
Moin produced 15% CPU load on average (PyPI 3%, Postgres 2%)

Regards,
Martin
_______________________________________________
Catalog-SIG mailing list
Catalog-SIG@python.org
http://mail.python.org/mailman/listinfo/catalog-sig
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic