'[PATCH] Dwarf 5: Handle debug_str_offsets and indexed attributes that have base offsets.'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gdb-patches
Subject:    [PATCH] Dwarf 5: Handle debug_str_offsets and indexed attributes that have base offsets.
From:       simon.marchi () polymtl ! ca (Simon Marchi)
Date:       2019-12-27 23:20:00
Message-ID: 9e86baf4-ffbc-916a-6f29-c2027f39898e () polymtl ! ca
[Download RAW message or body]

On 2019-12-26 7:41 p.m., Ali Tamur wrote:
>> Ok, so that reprocessing happens right after having read the attributes from the DIE.
>>
>> Another approach that I would have expected would be to do it more lazily.  Only keep
>> the index in the attribute, and go read the actual string when we call dwarf2_string_attr
>> on it (and then maybe cache the value).  Did you consider doing something like that, and
>> perhaps it was a too involving change?
> 
> I want the patch to be as little invasive as possible, so I prefer not changing laziness,
> if that's ok with you.

Yeah, that's fine, it is simple like that.

> @@ -532,6 +531,12 @@ public:
>       all such types here and process them after expansion.  */
>    std::vector<struct type *> rust_unions;
>  
> +  /* The DW_AT_str_offsets_base attribute if present.  For DWARF 4 version DWO
> +     files, the value is implicitly zero. For DWARF 5 version DWO files, the

Two spaces after the period.

> @@ -1272,6 +1277,18 @@ struct attribute
>         here for better struct attribute alignment.  */
>      unsigned int string_is_canonical : 1;
>  
> +    /* For strings in non-DWO files, was a string recorded with
> +       DW_AT_GNU_str_index before we knew the value of DW_AT_str_offsets_base?
> +       If non-zero, then the "value" of the string is in u.unsnd, and
> +       read_dwo_str_index must be called to obtain the actual string.
> +       This is necessary for DW_AT_comp_dir and DW_AT_GNU_dwo_name (or
> +       DW_AT_dwo_name) attributes: To save a relocation DW_AT_GNU_str_index is
> +       used, but because all .o file .debug_str_offsets sections are
> +       concatenated together in the executable each .o has its own offset into
> +       this section. So if DW_AT_str_offsets_base comes later in the DIE, we

Two spaces after the period.

> +       need to reprocess these attributes later.  */
> +    bool string_is_str_index;

Thanks for this comment.  If it avoids increasing the structure size, we could eat
another bit from the `form` field.

> @@ -7182,7 +7219,47 @@ lookup_signatured_type (struct dwarf2_cu *cu, ULONGEST sig)
>        return entry;
>      }
>  }
> -

> +
> +/* Return the address base of the compile unit, which, if exists, is stored
> +   either at the attribute DW_AT_GNU_addr_base, or DW_AT_addr_base.  Also takes
> +   as parameter whether it should follow DW_AT_specification.  See the comments
> +   on dwarf2_attr_no_follow and dwarf2_attr for the difference.  */
> +static gdb::optional<ULONGEST>
> +lookup_addr_base (struct dwarf2_cu *cu, struct die_info* comp_unit_die,

Space before the asterisk:

  struct die_info *comp_unit_die

> +		  bool will_follow)
> +{
> +  struct attribute *attr;
> +  if (will_follow)
> +    {
> +      attr = dwarf2_attr (comp_unit_die, DW_AT_addr_base, cu);
> +      if (attr == nullptr)
> +        attr = dwarf2_attr (comp_unit_die, DW_AT_GNU_addr_base, cu);
> +    }
> +  else
> +    {
> +      attr = dwarf2_attr_no_follow (comp_unit_die, DW_AT_addr_base);
> +      if (attr == nullptr)
> +        attr = dwarf2_attr_no_follow (comp_unit_die, DW_AT_GNU_addr_base);
> +    }
> +  if (attr == nullptr)
> +    return gdb::optional<ULONGEST> ();
> +  return DW_UNSND (attr);
> +}
> +
> +/* Return range lists base of the compile unit, which, if exists, is stored
> +   either at the attribute DW_AT_rnglists_base or DW_AT_GNU_ranges_base.  */
> +ULONGEST
> +lookup_ranges_base (struct dwarf2_cu *cu, struct die_info* comp_unit_die)

Space before the asterisk.

> @@ -7274,20 +7350,14 @@ read_cutu_die_from_dwo (struct dwarf2_per_cu_data *this_cu,
>        ranges = dwarf2_attr (stub_comp_unit_die, DW_AT_ranges, cu);
>        comp_dir = dwarf2_attr (stub_comp_unit_die, DW_AT_comp_dir, cu);
>  
> -      /* There should be a DW_AT_addr_base attribute here (if needed).
> -	 We need the value before we can process DW_FORM_GNU_addr_index
> -         or DW_FORM_addrx.  */
> -      cu->addr_base = 0;
> -      attr = dwarf2_attr (stub_comp_unit_die, DW_AT_GNU_addr_base, cu);
> -      if (attr != nullptr)
> -	cu->addr_base = DW_UNSND (attr);
> +      auto maybe_addr_base = lookup_addr_base (cu, stub_comp_unit_die, true);

Can you explain why we need to use the "follow" version here (passing true as the third
argument)?

I'm generating executables with split DWARF using both clang 9.0.0 and gcc 9.2.0,
using both -gdwarf-4 and -gdwarf-5 and in all cases it's not needed: the DW_AT_addr_base
or DW_AT_GNU_addr_base attribute is directly in the stub comp unit die, which we
are passing here.  So if it's really not needed here, then I wonder when this third
parameter is useful.

> @@ -7396,8 +7467,7 @@ read_cutu_die_from_dwo (struct dwarf2_per_cu_data *this_cu,
>       TUs by skipping the stub and going directly to the entry in the DWO file.
>       However, skipping the stub means we won't get DW_AT_comp_dir, so we have
>       to get it via circuitous means.  Blech.  */
> -  if (comp_dir != NULL)
> -    result_reader->comp_dir = DW_STRING (comp_dir);
> +  result_reader->comp_dir = get_comp_dir_attr (comp_unit_die, cu);

If you do get rid of str_is_string_index, I think this change won't be necessary anymore,
since it will essentially just end up using DW_STRING.

>  
>    /* Skip dummy compilation units.  */
>    if (info_ptr >= begin_info_ptr + dwo_unit->length
> @@ -7695,6 +7765,13 @@ init_cutu_and_read_dies (struct dwarf2_per_cu_data *this_cu,
>    init_cu_die_reader (&reader, cu, section, NULL, abbrev_table);
>    info_ptr = read_full_die (&reader, &comp_unit_die, info_ptr, &has_children);
>  
> +  /* DWO_AT_GNU_dwo_name and DW_AT_comp_dir in the stub may be using
> +     DW_FORM_GNU_str_index which needs DW_AT_str_offsets_base.  */
> +  struct attribute *attr = dwarf2_attr (comp_unit_die,
> +					DW_AT_str_offsets_base, cu);
> +  if (attr != nullptr)
> +    cu->str_offsets_base = DW_UNSND (attr);

>From what I understand, cu->str_offsets_base will have already been set by
read_full_die so it would be unnecessary to do so here.

> @@ -18481,10 +18570,26 @@ read_full_die_1 (const struct die_reader_specs *reader,
>       attributes.  */
>    die->num_attrs = abbrev->num_attrs;
>  
> +  std::vector<int> indexes_that_need_reprocess;
>    for (i = 0; i < abbrev->num_attrs; ++i)
> -    info_ptr = read_attribute (reader, &die->attrs[i], &abbrev->attrs[i],
> -			       info_ptr);
> +    {
> +      bool need_reprocess;
> +      info_ptr =
> +        read_attribute (reader, &die->attrs[i], &abbrev->attrs[i],
> +			info_ptr, &need_reprocess);
> +      if (need_reprocess)
> +        indexes_that_need_reprocess.push_back (i);
> +    }
> +
> +  struct attribute *attr = dwarf2_attr_no_follow (die, DW_AT_str_offsets_base);
> +  if (attr)

attr != nullptr

> +    cu->str_offsets_base = DW_UNSND (attr);
>  
> +  auto maybe_addr_base = lookup_addr_base(cu, die, false);
> +  if (maybe_addr_base.has_value())

Space before parentheses on the two lines above.

It would also be good to add a comment above this block of code, that explains
what we are doing (reading the attribute values that require the string and address
offset tables, after having read the base for both tables for this CU).

> +    cu->addr_base = *maybe_addr_base;
> +  for (int index : indexes_that_need_reprocess)
> +    read_attribute_reprocess (reader, &die->attrs[index]);
>    *diep = die;
>    *has_children = abbrev->has_children;
>    return info_ptr;
> @@ -18996,12 +19101,23 @@ partial_die_info::read (const struct die_reader_specs *reader,
>    int has_high_pc_attr = 0;
>    int high_pc_relative = 0;

Could the changes in this function be simpler, do we really need to do
two passes on the attributes?  I think that at this point, the string
offset/addr bases will be known and set in the CU, they will have been
set by read_full_die that was called on the top-level CU die.  So we
could just reprocess the attributes as we read them in the existing loop,
like this:

      bool needs_reprocess;

      info_ptr = read_attribute (reader, &attr, &abbrev.attrs[i], info_ptr,
                 &needs_reprocess);
      if (needs_reprocess)
        read_attribute_reprocess (reader, &attr);

> +  std::vector<struct attribute> attr_vec;
> +  std::vector<int> indexes_that_need_reprocess;
> +  attr_vec.resize(abbrev.num_attrs);

You could use the std::vector constructor that creates the vector with
the right size from the start:

  std::vector<struct attribute> attr_vec (abbrev.num_attrs);

>    for (i = 0; i < abbrev.num_attrs; ++i)
>      {
> -      struct attribute attr;
> -
> -      info_ptr = read_attribute (reader, &attr, &abbrev.attrs[i], info_ptr);
> +      bool need_reprocess;
> +      info_ptr = read_attribute (reader, &attr_vec[i], &abbrev.attrs[i],
> +				 info_ptr, &need_reprocess);
> +      if (need_reprocess)
> +        indexes_that_need_reprocess.push_back (i);
> +    }
> +  for (int index : indexes_that_need_reprocess)
> +    read_attribute_reprocess (reader, &attr_vec[index]);
>  
> +  for (i = 0; i < abbrev.num_attrs; ++i)

Since we now iterate on the vector, this loop can be

  for (const attribute &attr : attr_vec)

> +    {
> +      struct attribute &attr = attr_vec[i];
>        /* Store the data if it is of an attribute we want to keep in a
>           partial symbol table.  */
>        switch (attr.name)
> @@ -19443,12 +19559,49 @@ partial_die_info::fixup (struct dwarf2_cu *cu)
>    fixup_called = 1;
>  }
>  
> +/* Process the attributes that had to be skipped in the first round. These
> +   attributes are the ones that need str_offsets_base or addr_base attributes.
> +   They could not have been processed in the first round, because at the time
> +   the values of str_offsets_base or addr_base may not have been known.  */
> +void read_attribute_reprocess (const struct die_reader_specs *reader,
> +			       struct attribute *attr)
> +{
> +  struct dwarf2_cu *cu = reader->cu;
> +  switch (attr->form)
> +    {
> +      case DW_FORM_addrx:
> +      case DW_FORM_GNU_addr_index:
> +        DW_ADDR (attr) = read_addr_index (cu, DW_UNSND (attr));
> +        break;
> +      case DW_FORM_strx:
> +      case DW_FORM_strx1:
> +      case DW_FORM_strx2:
> +      case DW_FORM_strx3:
> +      case DW_FORM_strx4:
> +      case DW_FORM_GNU_str_index:
> +        unsigned int str_index = DW_UNSND (attr);
> +	if (reader->dwo_file != NULL)
> +	  {
> +	    DW_STRING (attr) = read_dwo_str_index (reader, str_index);
> +	    DW_STRING_IS_CANONICAL (attr) = 0;
> +	    DW_STRING_IS_STR_INDEX (attr) = false;
> +	  }
> +	else if (cu->str_offsets_base.has_value ())
> +	  {
> +	    DW_STRING (attr) = read_stub_str_index (cu, str_index);
> +	    DW_STRING_IS_CANONICAL (attr) = 0;
> +	    DW_STRING_IS_STR_INDEX (attr) = false;
> +	  }

Since this function is only supposed to be called on forms that need reprocessing,
I'd suggest adding a default case with:

  gdb_assert_not_reached (_("Unexpected DWARF form.");

> @@ -20331,6 +20477,71 @@ read_str_index (const struct die_reader_specs *reader, ULONGEST str_index)
>    return (const char *) (str_section->buffer + str_offset);
>  }
>  
> +/* Given a DW_FORM_GNU_str_index from a DWO file, fetch the string.  */
> +
> +static const char *
> +read_dwo_str_index (const struct die_reader_specs *reader, ULONGEST str_index)
> +{
> +  ULONGEST str_offsets_base = reader->cu->header.version >= 5
> +			      ? reader->cu->header.addr_size : 0;
> +  return read_str_index (reader->cu,
> +			 &reader->dwo_file->sections.str,
> +			 &reader->dwo_file->sections.str_offsets,
> +			 str_offsets_base, str_index);
> +}
> +
> +/* Given a DW_FORM_GNU_str_index from a Fission stub, fetch the string.  */
> +
> +static const char *
> +read_stub_str_index (struct dwarf2_cu *cu, ULONGEST str_index)
> +{
> +  struct objfile *objfile = cu->per_cu->dwarf2_per_objfile->objfile;
> +  const char *objf_name = objfile_name (objfile);
> +  static const char form_name[] = "DW_FORM_GNU_str_index";
> +  static const char str_offsets_attr_name[] = "DW_AT_str_offsets";
> +
> +  if (!cu->str_offsets_base.has_value())

Space before parentheses.

> +    error (_("%s used in Fission stub without %s"
> +	     " in CU at offset 0x%lx [in module %s]"),
> +	   form_name, str_offsets_attr_name,
> +	   (long) cu->header.offset_size, objf_name);
> +
> +  return read_str_index (cu,
> +			 &cu->per_cu->dwarf2_per_objfile->str,
> +			 &cu->per_cu->dwarf2_per_objfile->str_offsets,
> +			 *cu->str_offsets_base, str_index);
> +}
> +
> +/* Wrapper around read_stub_str_index for attributes that *may* be using
> +   DW_AT_GNU_str_index from a non-DWO file.  */
> +
> +static const char *
> +get_stub_string_attr (struct dwarf2_cu *cu, const struct attribute *attr)
> +{
> +  if (DW_STRING_IS_STR_INDEX (attr))
> +    return read_stub_str_index (cu, DW_UNSND (attr));
> +  return DW_STRING (attr);
> +}

I was wondering about this: in which situation would the string not be read
already?  In other words, when would the "if" above be true?  It appears that
with this patch, str_is_string_index is never set to true.  Instead, everything
happens under read_full_die_1.  When read_full_die_1, all the strings that
needed reprocessing have been read in.  So it seems like you could actually
remove str_is_string_index and all that is related.

I think that's a good way to do it, actually, because all the complexity related
to this corner case is encapsulated in read_full_die_1 and a few related functions.
For all other functions, strings using the string offset table just work like
other strings.

Simon


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic