[prev in list] [next in list] [prev in thread] [next in thread]
List: git
Subject: Re: Re*: [PATCH v3] fetch: replace string-list used as a look-up table with a hashmap
From: Johannes Schindelin <Johannes.Schindelin () gmx ! de>
Date: 2018-10-31 14:50:00
Message-ID: nycvar.QRO.7.76.6.1810311542560.4546 () tvgsbejvaqbjf ! bet
[Download RAW message or body]
Hi Junio,
On Sat, 27 Oct 2018, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Just one thing^W^Wa couple of things:
> >
> > It would probably make more sense to `hashmap_get_from_hash()` and
> > `strhash()` here (and `strhash()` should probably be used everywhere
> > instead of `memhash(str, strlen(str))`).
>
> hashmap_get_from_hash() certainly is much better suited for simpler
> usage pattern like these callsites, and the ones in sequencer.c. It
> is a shame that a more complex variant takes the shorter-and-sweeter
> name hashmap_get().
I agree, at least in part.
From what I understand, hashmap_get_from_hash() needs a little assistance
from the comparison function with which the hashmap is configured, see
e.g. this function in the sequencer:
static int labels_cmp(const void *fndata, const struct labels_entry *a,
const struct labels_entry *b, const void *key)
{
return key ? strcmp(a->label, key) : strcmp(a->label, b->label);
}
See how that first tests whether `key` is non-`NULL`, and then takes a
shortcut, not even looking at `b`? This is important, because `b` does not
refer to a complete `labels_entry` when we call `hashmap_get_from_hash()`.
It only refers to a `hashmap_entry`. Looking at `b->label` would access
some random memory, and do most certainly the wrong thing.
> I wish we named the latter hashmap_get_fullblown_feature_rich() and
> called the _from_hash() thing a simple hashmap_get() from day one,
> but it is way too late.
>
> I looked briefly the users of the _get() variant, and some of their
> uses are legitimately not-simple and cannot be reduced to use the
> simpler _get_from_hash variant, it seems. But others like those in
> builtin/difftool.c should be straight-forward to convert to use the
> simpler get_from_hash variant. It could be a low-hanging fruit left
> for later clean-up, perhaps.
Right. #leftoverbits
> >> @@ -271,10 +319,10 @@ static void find_non_local_tags(const struct ref *refs,
> >> !has_object_file_with_flags(&ref->old_oid,
> >> OBJECT_INFO_QUICK) &&
> >> !will_fetch(head, ref->old_oid.hash) &&
> >> - !has_sha1_file_with_flags(item->util,
> >> + !has_sha1_file_with_flags(item->oid.hash,
> >
> > I am not sure that we need to test for null OIDs here, given that...
> > ...
> > Of course, `has_sha1_file_with_flags()` is supposed to return `false` for
> > null OIDs, I guess.
>
> Yup. An alternative is to make item->oid a pointer to oid, not an
> oid object itself, so that we can express "no OID for this ref" in a
> more explicit way, but is_null_oid() is already used as "no OID" in
> many other codepaths, so...
Right, and it would complicate the code. So I am fine with your version of
it.
> >> + for_each_string_list_item(remote_ref_item, &remote_refs_list) {
> >> + const char *refname = remote_ref_item->string;
> >> + struct hashmap_entry key;
> >> +
> >> + hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> >> + item = hashmap_get(&remote_refs, &key, refname);
> >> + if (!item)
> >> + continue; /* can this happen??? */
> >
> > This would indicate a BUG, no?
>
> Possibly. Alternatively, we can just use item without checking and
> let the runtime segfault.
Hahaha! Yep. We could also cause a crash. I do prefer the BUG() call.
> Here is an incremental on top that can be squashed in to turn v3
> into v4.
Nice.
Thanks!
Dscho
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 0f8e333022..aee1d9bf21 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -259,7 +259,7 @@ static struct refname_hash_entry *refname_hash_add(struct hashmap *map,
> size_t len = strlen(refname);
>
> FLEX_ALLOC_MEM(ent, refname, refname, len);
> - hashmap_entry_init(ent, memhash(refname, len));
> + hashmap_entry_init(ent, strhash(refname));
> oidcpy(&ent->oid, oid);
> hashmap_add(map, ent);
> return ent;
> @@ -282,11 +282,7 @@ static void refname_hash_init(struct hashmap *map)
>
> static int refname_hash_exists(struct hashmap *map, const char *refname)
> {
> - struct hashmap_entry key;
> - size_t len = strlen(refname);
> - hashmap_entry_init(&key, memhash(refname, len));
> -
> - return !!hashmap_get(map, &key, refname);
> + return !!hashmap_get_from_hash(map, strhash(refname), refname);
> }
>
> static void find_non_local_tags(const struct ref *refs,
> @@ -365,12 +361,10 @@ static void find_non_local_tags(const struct ref *refs,
> */
> for_each_string_list_item(remote_ref_item, &remote_refs_list) {
> const char *refname = remote_ref_item->string;
> - struct hashmap_entry key;
>
> - hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> - item = hashmap_get(&remote_refs, &key, refname);
> + item = hashmap_get_from_hash(&remote_refs, strhash(refname), refname);
> if (!item)
> - continue; /* can this happen??? */
> + BUG("unseen remote ref?");
>
> /* Unless we have already decided to ignore this item... */
> if (!is_null_oid(&item->oid)) {
> @@ -497,12 +491,12 @@ static struct ref *get_ref_map(struct remote *remote,
>
> for (rm = ref_map; rm; rm = rm->next) {
> if (rm->peer_ref) {
> - struct hashmap_entry key;
> const char *refname = rm->peer_ref->name;
> struct refname_hash_entry *peer_item;
>
> - hashmap_entry_init(&key, memhash(refname, strlen(refname)));
> - peer_item = hashmap_get(&existing_refs, &key, refname);
> + peer_item = hashmap_get_from_hash(&existing_refs,
> + strhash(refname),
> + refname);
> if (peer_item) {
> struct object_id *old_oid = &peer_item->oid;
> oidcpy(&rm->peer_ref->old_oid, old_oid);
>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic