[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wget
Subject:    Re: Wget used for web indexer and update checker ?
From:       Hrvoje Niksic <hniksic () srce ! hr>
Date:       1998-10-24 15:54:29
[Download RAW message or body]

Aivo Kalu <Aivo@vm.ee> writes:

> disk. Actually the last point needs still thinking, could I tell to
> wget that you don't need to same the pages, just retrieve them or is
> it impossible because of wget needs to know the links to follow and
> the only source to get these links is from locally stored web page ?

Your conclusion is correct.  Wget's misdesign is that it always parses 
the HTML from disk, instead of doing it in core, or even streamlined.
I hope to fix this in a future release.

> (possible source code change ?)

The fix for this is non-trivial.

> The second approach brings out similar problem as the first. How
> could wget know, which links to follow if the file at the local disk
> is empty ?

It is conceivable for Wget to build a DB base of files and
timestamps.  Once I get around to implementing all the nice stuff I
plan for 1.6/2.0, this will be a breeze.  :-)

> It could raise few problems with highly dynamic websites (suppose
> documents' and links' hierarchia changes in the middle of recursive
> retrieval) but I believe it could be useful.

Yes, Wget's usefulness is limited to static or semi-static web sites.
Sites that make heavy use of CGI, server-side HTML, JavaScript, and
such will not be handled correctly by Wget.

> May I ask your opinion regarding this feature, as I am probably
> beginner in Internet programing and have never seen wget's source
> code ?

Feel free to hack...

-- 
Hrvoje Niksic <hniksic@srce.hr> | Student at FER Zagreb, Croatia
--------------------------------+--------------------------------
The Lord protects children and fools...  But don't push it.

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic