[prev in list] [next in list] [prev in thread] [next in thread]
List: wget
Subject: Re: Wget used for web indexer and update checker ?
From: Hrvoje Niksic <hniksic () srce ! hr>
Date: 1998-10-24 15:54:29
[Download RAW message or body]
Aivo Kalu <Aivo@vm.ee> writes:
> disk. Actually the last point needs still thinking, could I tell to
> wget that you don't need to same the pages, just retrieve them or is
> it impossible because of wget needs to know the links to follow and
> the only source to get these links is from locally stored web page ?
Your conclusion is correct. Wget's misdesign is that it always parses
the HTML from disk, instead of doing it in core, or even streamlined.
I hope to fix this in a future release.
> (possible source code change ?)
The fix for this is non-trivial.
> The second approach brings out similar problem as the first. How
> could wget know, which links to follow if the file at the local disk
> is empty ?
It is conceivable for Wget to build a DB base of files and
timestamps. Once I get around to implementing all the nice stuff I
plan for 1.6/2.0, this will be a breeze. :-)
> It could raise few problems with highly dynamic websites (suppose
> documents' and links' hierarchia changes in the middle of recursive
> retrieval) but I believe it could be useful.
Yes, Wget's usefulness is limited to static or semi-static web sites.
Sites that make heavy use of CGI, server-side HTML, JavaScript, and
such will not be handled correctly by Wget.
> May I ask your opinion regarding this feature, as I am probably
> beginner in Internet programing and have never seen wget's source
> code ?
Feel free to hack...
--
Hrvoje Niksic <hniksic@srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
The Lord protects children and fools... But don't push it.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic