[prev in list] [next in list] [prev in thread] [next in thread] 

List:       git
Subject:    Re: [GSoC][PATCH v4 2/7] clone: better handle symlinked files at .git/objects/
From:       Thomas Gummerer <t.gummerer () gmail ! com>
Date:       2019-03-30 19:27:38
Message-ID: 20190330192738.GQ32487 () hank ! intra ! tgummerer ! com
[Download RAW message or body]

On 03/30, Matheus Tavares Bernardino wrote:
> On Fri, Mar 29, 2019 at 5:05 PM Thomas Gummerer <t.gummerer@gmail.com> wrote:
> >
> > On 03/29, Matheus Tavares Bernardino wrote:
> > > Ok, what if instead of using linkat() we use 'realpath(const char
> > > *path, char *resolved_path)', which will resolve any symlinks at
> > > 'path' and store the canonical path at 'resolved_path'? Then, we can
> > > still keep using link() but now, with the certainty that all platforms
> > > will have a consistent behaviour? (also, realpath() is POSIX.1-2001)
> > > Would that be a better idea?
> >
> > Yeah, I think that is a good idea.  Note that 'realpath()' itself is
> > not used anywhere in our codebase either, but there is
> > 'strbuf_realpath()', that from reading the function documentation does
> > exactly what 'realpath()' would do.  So using 'strbuf_realpath()'
> > would probably be the right thing to do here.
> 
> Thanks. While I was looking for realpath() at git codebase (before I
> saw your email), I got a little confused: Besides strbuf_realpath() I
> also found real_path(), real_path_if_valid() and real_pathdup(). All
> these last three use strbuf_realpath() but they also initialize the
> struct strbuf internally and just return a 'char *', which is much
> convenient in some cases.

Right, feel free to use whichever is most convenient for you, and
whichever works in the context.

>                            What seems weird to me is that, whilst
> real_pathdup() releases the internally initialized struct strubuf
> (leaving just the returned string to be free'd by the user), the other
> two don't. So, if struct strbuf change in the future to have more
> dynamic allocated resources, these functions will also have to be
> modified. Also, since real_pathdup() can already do what the other two
> do, do you know if there is a reason to keep all of them?

Right, '*dup()' functions usually leave the return value to be free'd
by the caller.  And while 'real_pathdup()' could do what the others do
already it also takes more effort to use it.  Users don't need to free
the return value from 'real_path()' to avoid a memory leak.  This
alone justifies its existence I think.

> One last question: I found some places which don't free the string
> returned by, for example, real_path() (e.g., find_worktree() at
> worktree.c). Would it be a valid/good patch (or patches) to add free()
> calls in this places? (I'm currently trying to get more people here at
> USP to contribute to git, and maybe this could be a nice first
> contribution for them...)

Trying to plug memory leaks in the codebase is definitely something
that I think is worthy of doing.  Sometimes it's not worth actually
free'ing the memory, for example just before the program exits, in
which case we can use the UNLEAK annotation.  It was introduced in
0e5bba53af ("add UNLEAK annotation for reducing leak false positives",
2017-09-08) if you want more background.

That said, the memory from 'real_path()' should actually not be
free'd.  The strbuf there has a static lifetime, so it is valid until
git exits.  If we were to free the return value of the function we'd
actually free an internal buffer of the strbuf, that is still valid.
So if someone were to use 'real_path()' after that, the memory that
strbuf still thinks it owns would actually have been free'd, which
would result in undefined behaviour, and probably would make git
segfault.
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic