[prev in list] [next in list] [prev in thread] [next in thread] 

List:       git
Subject:    Re: Re*: [PATCH v3] fetch: replace string-list used as a look-up table with a hashmap
From:       Johannes Schindelin <Johannes.Schindelin () gmx ! de>
Date:       2018-10-31 14:50:00
Message-ID: nycvar.QRO.7.76.6.1810311542560.4546 () tvgsbejvaqbjf ! bet
[Download RAW message or body]

Hi Junio,

On Sat, 27 Oct 2018, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > Just one thing^W^Wa couple of things:
> >
> > It would probably make more sense to `hashmap_get_from_hash()` and
> > `strhash()` here (and `strhash()` should probably be used everywhere
> > instead of `memhash(str, strlen(str))`).
> 
> hashmap_get_from_hash() certainly is much better suited for simpler
> usage pattern like these callsites, and the ones in sequencer.c.  It
> is a shame that a more complex variant takes the shorter-and-sweeter
> name hashmap_get().

I agree, at least in part.

From what I understand, hashmap_get_from_hash() needs a little assistance
from the comparison function with which the hashmap is configured, see
e.g. this function in the sequencer:

	static int labels_cmp(const void *fndata, const struct labels_entry *a,
			      const struct labels_entry *b, const void *key)
	{
		return key ? strcmp(a->label, key) : strcmp(a->label, b->label);
	}

See how that first tests whether `key` is non-`NULL`, and then takes a
shortcut, not even looking at `b`? This is important, because `b` does not
refer to a complete `labels_entry` when we call `hashmap_get_from_hash()`.
It only refers to a `hashmap_entry`. Looking at `b->label` would access
some random memory, and do most certainly the wrong thing.

> I wish we named the latter hashmap_get_fullblown_feature_rich() and
> called the _from_hash() thing a simple hashmap_get() from day one,
> but it is way too late.
> 
> I looked briefly the users of the _get() variant, and some of their
> uses are legitimately not-simple and cannot be reduced to use the
> simpler _get_from_hash variant, it seems.  But others like those in
> builtin/difftool.c should be straight-forward to convert to use the
> simpler get_from_hash variant.  It could be a low-hanging fruit left
> for later clean-up, perhaps.

Right. #leftoverbits

> >> @@ -271,10 +319,10 @@ static void find_non_local_tags(const struct ref *refs,
> >>  			    !has_object_file_with_flags(&ref->old_oid,
> >>  							OBJECT_INFO_QUICK) &&
> >>  			    !will_fetch(head, ref->old_oid.hash) &&
> >> -			    !has_sha1_file_with_flags(item->util,
> >> +			    !has_sha1_file_with_flags(item->oid.hash,
> >
> > I am not sure that we need to test for null OIDs here, given that...
> > ...
> > Of course, `has_sha1_file_with_flags()` is supposed to return `false` for
> > null OIDs, I guess.
> 
> Yup.  An alternative is to make item->oid a pointer to oid, not an
> oid object itself, so that we can express "no OID for this ref" in a
> more explicit way, but is_null_oid() is already used as "no OID" in
> many other codepaths, so...

Right, and it would complicate the code. So I am fine with your version of
it.

> >> +	for_each_string_list_item(remote_ref_item, &remote_refs_list) {
> >> +		const char *refname = remote_ref_item->string;
> >> +		struct hashmap_entry key;
> >> +
> >> +		hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> >> +		item = hashmap_get(&remote_refs, &key, refname);
> >> +		if (!item)
> >> +			continue; /* can this happen??? */
> >
> > This would indicate a BUG, no?
> 
> Possibly.  Alternatively, we can just use item without checking and
> let the runtime segfault.

Hahaha! Yep. We could also cause a crash. I do prefer the BUG() call.

> Here is an incremental on top that can be squashed in to turn v3
> into v4.

Nice.

Thanks!
Dscho

> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 0f8e333022..aee1d9bf21 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -259,7 +259,7 @@ static struct refname_hash_entry *refname_hash_add(struct hashmap *map,
>  	size_t len = strlen(refname);
>  
>  	FLEX_ALLOC_MEM(ent, refname, refname, len);
> -	hashmap_entry_init(ent, memhash(refname, len));
> +	hashmap_entry_init(ent, strhash(refname));
>  	oidcpy(&ent->oid, oid);
>  	hashmap_add(map, ent);
>  	return ent;
> @@ -282,11 +282,7 @@ static void refname_hash_init(struct hashmap *map)
>  
>  static int refname_hash_exists(struct hashmap *map, const char *refname)
>  {
> -	struct hashmap_entry key;
> -	size_t len = strlen(refname);
> -	hashmap_entry_init(&key, memhash(refname, len));
> -
> -	return !!hashmap_get(map, &key, refname);
> +	return !!hashmap_get_from_hash(map, strhash(refname), refname);
>  }
>  
>  static void find_non_local_tags(const struct ref *refs,
> @@ -365,12 +361,10 @@ static void find_non_local_tags(const struct ref *refs,
>  	 */
>  	for_each_string_list_item(remote_ref_item, &remote_refs_list) {
>  		const char *refname = remote_ref_item->string;
> -		struct hashmap_entry key;
>  
> -		hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> -		item = hashmap_get(&remote_refs, &key, refname);
> +		item = hashmap_get_from_hash(&remote_refs, strhash(refname), refname);
>  		if (!item)
> -			continue; /* can this happen??? */
> +			BUG("unseen remote ref?");
>  
>  		/* Unless we have already decided to ignore this item... */
>  		if (!is_null_oid(&item->oid)) {
> @@ -497,12 +491,12 @@ static struct ref *get_ref_map(struct remote *remote,
>  
>  	for (rm = ref_map; rm; rm = rm->next) {
>  		if (rm->peer_ref) {
> -			struct hashmap_entry key;
>  			const char *refname = rm->peer_ref->name;
>  			struct refname_hash_entry *peer_item;
>  
> -			hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> -			peer_item = hashmap_get(&existing_refs, &key, refname);
> +			peer_item = hashmap_get_from_hash(&existing_refs,
> +							  strhash(refname),
> +							  refname);
>  			if (peer_item) {
>  				struct object_id *old_oid = &peer_item->oid;
>  				oidcpy(&rm->peer_ref->old_oid, old_oid);
> 
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic