[prev in list] [next in list] [prev in thread] [next in thread]
List: wget
Subject: Re: How to stop infinite recursion?
From: Hrvoje Niksic <hniksic () xemacs ! org>
Date: 2006-05-28 0:02:36
Message-ID: 87zmh32ezn.fsf () xemacs ! org
[Download RAW message or body]
Robert Nicholson <robert@elastica.com> writes:
> When wget is traversing a url what stops it visiting that url again?
It keeps a table of visited URLs.
> and assuming it checks the url is it only checking for the exact
> string?
It is.
> ie. different url but same response because the url it's following
> the second time includes additional query parameters.
In such a case Wget can fetch the same resource more than once. In
the worst case, where new URLs are continually created based on old
requests, Wget can fall into a redirection "black hole" -- but so can
any crawler in the presence of dynamically generated URLs. Wget's
checks could be smarter, but I don't think there's a general solution
to that problem.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic