[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mercurial
Subject:    Re: Largefiles - tips for improving performance?
From:       "Matt Harbison" <mharbison72 () gmail ! com>
Date:       2018-03-27 1:18:07
Message-ID: op.zgigkhxy9lwrgf () envy
[Download RAW message or body]

On Mon, 26 Mar 2018 20:35:41 -0400, Miles Alan <m@userbound.com> wrote:

> Thanks for the ideas Matt.
>
> Something that's odd is that strace shows alot of calls where hg is  
> looking for '.hg' folders nested deeply within the .hglf folder (which  
> it consequently doesn't find).

That seems wrong.  Is there a pattern to this? (e.g. stat .hg/foo, then  
stat .hglf/foo/.hg?)  Are these calls all stat, or ...?  Is this an update  
with --clean, or no?  (And do you see a time difference with --clean vs  
not?)

I did a search for the constant representing '.hglf', and none of those  
hits seem obviously wrong.  I wonder if something is getting into the  
matcher with this.

> I'm not sure what that means. Also, today I managed to catch my  
> repository slow down just a few minutes ago and put a time infront just  
> to see:
>
> time hg up .
> 0 files updated, 0 files merged, 0 files removed, 0 files unresolved
> 71.45user 14.75system 2:26.64elapsed 58%CPU (0avgtext+0avgdata  
> 61168maxresident)k
> 48597000inputs+127248outputs (1major+17639minor)pagefaults 0swaps
>
> So 2m:26s for updating no files. Next time I'll try with --profile.
>
> My problem currently is just that I'm having difficulty replicating the  
> issue on a clean repository. It's not hard to find / pops up without me  
> trying on my actual production repository; however writing a script for  
> test cases against largefiles for a brand new repo (adding random files,  
> commiting etc.); I can't quite replicate. So it's not yet entirely clear  
> to me what triggers the behavior yet.  I'll reach back out when I have  
> some way to reliably demonstrate the issue or have a fix.

Maybe also try adding --debug too.  There are a bunch of things that get  
cached to speed things up. I haven't paid any attention to how the caching  
works, but I wonder if largefiles is breaking some caching mechanism, and  
causing that to be rebuilt (or ignored).  I saw a 'rebuilding cache'  
message with --debug the other day, but I'm not sure if all of them do  
that.  Any hits inside '.hg/cache'?  Any other extensions that you are  
using?

> As for LFS, I'm intentionally using the largefiles extension because I  
> like that I can push/pull through generic SSH for my machines without  
> special server setup. I'll be sticking happily with HG-largefiles as  
> long as I can work out this issue.
>
> Miles
>
> On Sat, Mar 24, 2018, at 6:10 PM, Matt Harbison wrote:
>> On Sat, 24 Mar 2018 14:01:23 -0400, Augie Fackler <raf@durin42.com>  
>> wrote:
>>
>> > (+Matt Harbison for largefiles insight - I have nothing to offer, but
>> > maybe he does)
>> >
>> >> On Mar 22, 2018, at 11:38 PM, Miles Alan <m@userbound.com> wrote:
>> >>
>> >> Does anyone have tips for improving largefile repository performance?
>> >>
>> >> I have an hg repository with about 20GB of audio files (around 3-5MB
>> >> each) tracked through the largefiles extension.  I treat my audio  
>> files
>> >> as pretty much write-only so I have 20GB in the working directory at
>> >> all times as well. Ordinary operations are quite fast;  however, I
>> >> notice occasionally if I update branches or switch between revisions  
>> I
>> >> will notice a large delay (2-5 minutes) for doing things like hg  
>> status
>> >> or hg commit.
>>
>> Have you tried the slow command with --profile?  I know the status code  
>> is
>> quite complicated.  Even if the files don't change much, it may still be
>> hashing the files to see if they've changed?
>>
>> > How many files are we talking about? And what platform? Based on the
>> > numbers, we're probably talking O(7k) files, which is not a lot, but
>> > might be a lot depending on the OS/filesystem in play...
>> >
>> >> I can see with strace, hg behind the scenes is actually performing  
>> stat
>> >> and read calls on my audio files. The odd thing is the revisions I'm
>> >> switching between don't touch the audio files at all (I also track
>> >> ordinary small text files / code in this same repository). Does  
>> anyone
>> >> have tips for improving performance when working with largefiles - or
>> >> suggestions on where to look in the extension source to put in a  
>> patch?
>>
>> I'm assuming you mean the audio files in you're working directory, and  
>> not
>> under .hg/largefiles where copies are stashed?
>>
>> I don't think there's much that you can do to improve performance, aside
>>  from changing the code, other than make sure the files can be read  
>> quickly
>> (like using SSDs).  Patches would be welcomed.
>>
>>      https://www.mercurial-scm.org/wiki/DeveloperInfo
>>
>> When you clone the repo, the largefiles code lives in hgext/largefiles.
>> overrides.py are wrappers around commands and internal functions.
>> reposetup.py is where the largefile subclass of the repository class  
>> lives
>> (along with the complicated status method).  You'll notice that many of
>> the command wrappers simply set `repo.lfstatus = True`, and then call  
>> into
>> the core code.  This is the trigger to work on the largefile itself,
>> instead of the tracked standin file under .hglf.
>>
>>
>> If you can tolerate a one time hash change, you might want to look at
>> converting to LFS.  It's not feature complete at the moment (hopefully  
>> `hg
>> serve` capability will get accepted this cycle).  But it would be an
>> interesting compare and contrast if you know what operations are causing
>> you problems, and you can readily reproduce it.
>>
>> >> Miles
>> >> _______________________________________________
>> >> Mercurial mailing list
>> >> Mercurial@mercurial-scm.org
>> >> https://www.mercurial-scm.org/mailman/listinfo/mercurial
_______________________________________________
Mercurial mailing list
Mercurial@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic