[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wget
Subject:    Re: [wget] large memory allocation with -r and large files
From:       Micah Cowan <micah () cowan ! name>
Date:       2008-01-26 0:37:15
Message-ID: 479A80BB.4040802 () cowan ! name
[Download RAW message or body]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

rffmna@gmail.com wrote:
> When downloading large files in web directory (directory listing enabled), with
> wget -r http://example.com/folder/
> On Solaris with 512mb memory, wget fails with:
> wget: realloc: Failed to allocate 1073741824 bytes; memory exhausted.
> On Windows with 1gb memory, cygwin-wget uses too much memory (more than 2gb).
> 
> Files are about 1gb each, and there're about 7-9 files. The same
> problem still persists if add other options are added. eg
> wget -r -c -np -nd -nH http://example.com/folder/

Unfortunately, recursive mode currently works by slurping the entire
HTML file into memory, and then parsing it there. It is planned to move
to a streaming parser at some point in the future, but it's not there
yet. :\

It does look like it might be leaking a bit, though, if it's using 2gb
for 1gb files. Which version of Wget is this? It's possible that the
development sources may have fixed any leaks (though the core issue of
actually slurping the file still persists).

> If wget was ran on just individual files, wget runs fine. eg.
> wget -O in http://example.com/folder/
> wget -i in -F --base=http://example.com/folder/ http://example.com/folder/

Yes: here, recursive mode isn't enabled, and it doesn't need to slurp
all the files into memory to look for further links. It will still slurp
up the -i file, but not any of the ones it downloads from the -i file's
links.

Are these 1gb files really HTML files?

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHmoC67M8hyUobTrERAm5GAJ0Ryg0x4Lyx1Y9d/HyPCdSJN5N+SACcCbcQ
dSxFPrGj7TVDuZ16M/9sWsk=
=R9Jt
-----END PGP SIGNATURE-----
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic