[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mercurial
Subject:    SV: Slow push of large file over HTTP
From:       Michael_Tjørnemark <mtj () pfa ! dk>
Date:       2012-04-26 11:51:17
Message-ID: E2CD1466E9927D4EA97501ACD43BE51B204CD9F7DA () PFANPEX01 ! pfa ! dk
[Download RAW message or body]

> Michael Tjørnemark <mtj@pfa.dk> writes:
> 
> Hejsa :)
> 
> > I have a repository with a single changeset which adds a single 60 MB 
> > file (zip-file). Pushing this repo over HTTP is much, much slower than 
> > other commands on the repository, including a similar pull - is this 
> > to be expected? I have recreated the problem on other machines and 
> > files as well, so it seems to be a general problem with pushing a
> > large(ish) file.
> > 
> > Times (all on my local machine):
> > Commit file - 4 secs
> > Push to empty repo using filesystem - 7 secs Clone from repo over HTTP 
> > - 23 secs Pull to empty repo over HTTP - 23 secs Push to empty repo 
> > over HTTP - 4 mins <-- SLOW
> > 
> > Command to serve empty repo:
> > hg serve --config web.allow_push=* --config web.push_ssl=false
> > 
> > Command to push to empty repo:
> > hg push http://localhost:8000/ --debug --time
> > 
> > In the debug output (see below), bundling and sending takes around 10 
> > secs. Then there is a almost 4 min pause between "sending:
> > 60342/120684 kb (50.00%)" and "remote: adding changesets". (Also it 
> > seems wrong that sending only goes to 50%, but that is another 
> > problem).
> 
> You've stumpled upon a weird corner case in Python's HTTP library. If the server \
> asks for authentication, then we'll only see this *after* pushing the entire \
> changegroup to the server! We'll then have to start over. The progress code \
> anticipates this and claims that you need to send 120 MB for the 60 MB push so that \
> the progress bar will go smoothly from 0% to 100%. Here the push is really finished \
> when it reaches 50%. 

Yeah, I'm not too worried about that, but thanks for the explanation.

> I'm unsure why it then stops for 4 minuttes -- I would not expect that.
> 
> > I understand that the largefiles extension might help, but this 
> > requires that everybody that uses the repo enables the extension, so i 
> > would rather avoid that. And also everything else is fast (commit, 
> > clone, pull), so it seems as if something is wrong with push over 
> > HTTP.
> 
> Versioning zip files is... unusual :) Every revision of the zip file will take up a \
> lot of new space since it can't be delta compressed much against the previous \
> version. So after 10 edits to the file, you could end up with a repo with maybe 400 \
> MB of history for that single file. 
> The largefiles extension sound like just what you need -- Unity is actually is \
> using it for versioning a lot of zip files.

Yes, this is not really what i am trying to do, but was just a simple way to recreate \
the core problem with push of large files over HTTP. The real usecase that started my \
investigation was initializing a new repository with around 6000 files, 150 MB total \
and of varying size - the largest 33 MB, and about 15 of them > 1 MB - which took \
more than 15 minutes to push (on a slow machine).

I have now recreated the problem with 5 xml files of 16 MB each (a more reasonable \
example than a single zip), and pulling over HTTP takes 8 secs while pushing takes \
around 1 minute - so as in the original example a factor of almost 1:10 of pull to \
push times. A test of a repository with 4500 small files (7,5 MB total) takes around \
9 secs for both pull and push, so it seems to be a problem with large files.

We can live with the current performance since it only has to be done once and we are \
used to slow ClearCase performance, so I don't need a definitive answer. I just think \
that something is wrong when push is that much slower than pull - I would think the \
times should be comparable, so 10 times slower just seems wierd. It should be easy to \
recreate the problem anywhere (I have only tried it on Windows though) by serving an \
empty repository, and pushing a single changeset to it with one/more large files \
added, and compare that time with a pull from the same repository. \
_______________________________________________ Mercurial mailing list
Mercurial@selenic.com
http://selenic.com/mailman/listinfo/mercurial


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic