'Re: Git clone and case sensitivity'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       git
Subject:    Re: Git clone and case sensitivity
From:       Jeff Hostetler <git () jeffhostetler ! com>
Date:       2018-07-31 19:39:41
Message-ID: c174d972-fc5b-88e7-b437-89b9dd29e579 () jeffhostetler ! com
[Download RAW message or body]



On 7/29/2018 5:28 AM, Jeff King wrote:
> On Sun, Jul 29, 2018 at 07:26:41AM +0200, Duy Nguyen wrote:
> 
>>> strcasecmp() will only catch a subset of the cases. We really need to
>>> follow the same folding rules that the filesystem would.
>>
>> True. But that's how we handle case insensitivity internally. If a
>> filesytem has more sophisticated folding rules then git will not work
>> well on that one anyway.
> 
> Hrm. Yeah, I guess that's the best we can do for the actual in-memory
> checks. Everything else depends on doing an actual filesystem operation,
> and our icase stuff kicks in way before then. I was mostly thinking of
> HFS+ utf8 normalization weirdness, but I guess people are accustomed to
> that by now.
> 
>>> For the case of clone, I actually wonder if we could detect during the
>>> checkout step that a file already exists. Since we know that the
>>> directory we started with was empty, then if it does, either:
>>>
>>>    - there's some funny case-folding going on that means two paths in the
>>>      repository map to the same name in the filesystem; or
>>>
>>>    - somebody else is writing to the directory at the same time as us
>>
>> This is exactly what my first patch does (minus the sparse checkout
>> part).
> 
> Right, sorry, I should have read that one more carefully.
> 
>> But without knowing the exact folding rules, I don't think we can
>> locate this "somebody else" who wrote the first path. So if N paths
>> are treated the same by this filesystem, we could only report N-1 of
>> them.
>>
>> If we want to report just one path when this happens though, then this
>> works quite well.
> 
> Hmm. Since most such systems are case-preserving, would it be possible
> to report the name of the existing file? Doing it via opendir/readdir is
> hacky, and anyway puts the burden on us to find the matching name. Doing
> it via fstat() on the opened file doesn't work because at that the
> filesystem has resolved the name to an inode.
> 
> So yeah, perhaps strcasecmp() is the best we can do (I do agree that
> being able to mention all of the conflicting names is a benefit).
> 
> I guess we should be using fspathcmp(), though, in case it later learns
> to be smarter.
> 
> -Peff
> 

As has already been mentioned, this gets into weird territory really
fast, between case folding, final space/dot on windows, utf8 NFC/NFD
weirdness on the mac, utf8 invisible chars on the mac, long/short names
on windows, and etc.

And that's just for filenames.  Things really get weird if directory
names have these ambiguities.

Perhaps just print the problematic paths (where the collision is
detected) and let the user decide how to correct them.

Perhaps we could have a separate tool that could scan the index or
commit for potential conflicts and warn them in advance (granted, it
might not be perfect and may report a few false positives).

Forcing them into a sparse-checkout situation might be over their
skill level.

Jeff
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic