'Re: Debuglocus'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc
Subject:    Re: Debuglocus
From:       Alexandre Oliva <aoliva () redhat ! com>
Date:       2009-03-31 0:41:41
Message-ID: or4oxaed6i.fsf () oliva ! athome ! lsd ! ic ! unicamp ! br
[Download RAW message or body]

On Mar 21, 2009, Andrew MacLeod <amacleod@redhat.com> wrote:

> I was planning to send this note next week, but I guess I should know
> better than to put something on the wiki and think it won't be
> noticed, even if it is in a crusty dark corner :-)

:-)

> I see on #gcc that my debuglocus document has been discovered. It
> describes the work Aldy and I are experimenting with. It can be found
> on
> http://gcc.gnu.org/wiki/AndrewMacLeod

> I've been trying to find alternative implementations or approaches to
> Alex's VTA work to see if its possible to avoid introducing a whole
> new class of instructions into the IL.

First of all, I'd like to thank you for putting so much thought into
this project.  Regardless of whether this approach proves to be workable
(you know I have several concerns about it), a number of very
interesting ideas came out of it, that can certainly find their way into
whatever improved debug info infrastructure GCC adopts in the end.  More
on my concerns below.

I also thank you for having paid close attention to a number of issues I
pointed out ever since you first got involved in this project.  It is
easy to see how some of them have found their way into the published
Debuglocus proposal, and even in the earlier drafts you shared with me
long ago.  I can see that our interactions have helped improve and
clarify some passages I'd had misunderstood in my first readings.


As you're well aware, my primary concern with this staged approach is
that the most difficult problems, for which we don't have solutions
thought through even for the known issues in them, are relegated to
later stages.  AFAICT there's a significant risk that we might end up
with a partial implementation that doesn't quite solve the problem,
because, after putting some effort into the first stages, we won't want
to give it up, even if we find unworkable problems in the latter stages.

If, for this reason, we end up settling for an incomplete solution, we
might end up with something equivalent to the approach taken in the
var-mappings branch, which you called stage 1: both are known to not be
enough to solve the problem.  Since there are open and hard problems
beginning in stage 2, I'd rather see a workable roadmap early, rather
than fear a dead end down the road.

That's why, in my design, I faced the full complexity upfront, and I had
designed solutions for every part of the problem before I started
implementing it: when I did start, I knew I had something that would
work, that would fully solve the problem it was meant to solve.

Now, I have a pretty good idea of the complexity we're facing here.  I
can tag along with some of the hand-waving, as you put it, like I've
done since early on.  However, because I haven't noticed any progress in
as much as planning to solve those hard issues I brought up, that will
have to be faced down the road if the problem is to be solved in the
end, I'll point them out below.  Hopefully someone will be able to come
up with a workable design for them (both you and me have failed to fill
in these gaps so far), and give us all confidence that this is more than
a "wouldn't it be so nice if it actually stood a chance of working?".

On to the concerns...


=== Insns with multiple outputs

There's no provision whatsoever in the design for parallels with
multiple sets or other implicit modifications.

Parallels with more than one set are not very common in regular insns,
the exception probably being divmod insns.  However, parallels are the
way asm statements with multiple outputs are represented, and these are
quite prevalent in kernel code (systemtap, anyone?).

On top of that, there's another class of modifications that are not
represented with sets: pre/post-auto-increment/decrement/modification.
These are part of addressing modes, they make a *lot* of difference on
machines that support such addressing modes, and they are often used on
user-visible pointers.  Nevertheless, AFAICT they are completely left
out of the design you have proposed.

And then, there's vectorization, that could group together separate
user-visible variables that undergo similar operations, so that they can
be performed in a single instruction, even with a single set, even an
explicit parallel RTL construct.

Maybe you do have plans to address these issues, but given that in no
case does your notation have means to choose one out of many outputs of
an insn (and, inductively, of other insns, even after introducing temps
in stage 5), I don't see how this could be pulled off.


=== Unrelated concerns in stage2

I can't tell whether stage2 is an effort to reduce the amount of debug
information emitted (and maintained throughout compilation?), or to
avoid the jumping back-and-forth while single-stepping optimized code.

Regardless, it appears to be conflating debug-info concepts that hold no
relationship whatsoever to each other, while at the same time inventing
another mechanism to implement something that's already standardized in
DWARF.

Source line number and variable location information are two unrelated
mappings in debug info.  The former maps ranges of PCs to source file
line number; the latter maps variable and PC range to run-time loci; and
both are kind of unidirectional mappings.  Say, from a PC you can locate
the corresponding line number (there may be more than one, if
zero-length ranges are used), and you can tell exactly where each
variable is (there may be multiple locations), but there may be multiple
non-contiguous ranges that map to the same line number and locations.
There may even be overlapping regions in location lists.  There's no
correspondence between line numbers and variable loci, and I don't quite
see the point of trying to get them to interfere with each other.

I don't see that a project to choose "special" line number changes fits
in a project to generate accurate variable location information.  You
don't even need any line numbers whatsoever in order for variable
locations to be useful.

Furthermore, line number changes in DWARF have a boolean flag named
is_stmt, and single-stepping and breakpoints are supposed to only pay
attention to line number changes that carry that flag set.

Now, having certain instructions marked as beginning of statements
doesn't mean you don't want line number information for instructions
that aren't.  Say, if you're single-stepping instructions, rather than
source lines, you can use help relating instructions with their source
line, and being able to inspect variables, very much like you want
precise saved-register information at call sites to inspect state at
callers, even if the instruction after the call is not the beginning of
a statement.

Besides, if you're using hardware watchpoints to monitor some variable
or memory location, be it in the debugger or in some automated monitor
that is part of a larger system, you'd still want accurate values for
variables at the point the debugger stopped, rather than having to wait
for the next "relevant" instruction for the view to catch up.

Conditional watchpoints that might be referencing outdated variables,
along with the inability to tell whether the view is up-to-date at the
instruction at which the program stopped, make me think this effort is
misdirected.

IMHO, binding points should model, as closely as possible, the
variable-changing "side effects" specified in the language's virtual
machine, i.e., the points where users would expect variables to be bound
to values in an unoptimized version of the program.


=== Binding point ordering

I'm not convinced that numbering is a sufficient, or even workable
approach to reconstruct variable value changes as they'd have been
observed in an unoptimized program.

It's not just because, in the presence of temps and multiple annotations
in the same insn, you can't quite tell whether an annotation is to take
effect before or after that insn.  There are far more interesting issues
related with loops.  Consider:

  for (a = 0; ; x = n) {
    b = before(x, a);
    n = next(x);
    if (stop(n, x, b, a))
      break;
    a = after(x, b);
  }

before, next, stop and after are suitably-defined inlinable functions.

If my reading is correct, after some machinations in the compiler we'd
end up with something like this:

<bb 2>:
  a_1 = 0; # debug a ORDER 1
<bb 3>:
  x_2 = PHI <x_0(D) (2), x_8 (4)>;
  a_3 = PHI <a_1 (2), a_7 (4)>;

  b_4 = before(x_2, a_3); # debug b ORDER 2
  n_5 = next(x_2); # debug n ORDER 3
  T_6 = stop(n_5, x_2, b_4, a_3);
  if (T_6 != 0) goto <bb 5>; else goto <bb 4>;
<bb 4>:
  a_7 = after(x_2, b_4); # debug a ORDER 4
  x_8 = n_5; # debug x ORDER 5
  goto <bb 3>;
<bb 5>:
  
So far so good.  Now let's rock the boat a bit and optimize it:

<bb 2>:

<bb 3>:
  x_2 = PHI <x_0(D) (2), n_5 (4)>;
  a_3 = PHI <0 (2), a_7 (4)>;

  b_4 = before(x_2, a_3); # debug b ORDER 2, a = 0 ORDER 1
  n_5 = next(x_2); # debug n ORDER 3, x ORDER 5
  T_6 = stop(n_5, x_2, b_4, a_3);
  if (T_6 != 0) goto <bb 5>; else goto <bb 4>;
<bb 4>:
  a_7 = after(x_2, b_4); # debug a ORDER 4

  goto <bb 3>;
<bb 5>:

See, we substituted a_1 and x_8 for their equivalences.  But now it
looks like the assignment to a (ORDER 1) is within the loop.  Where else
could we place the annotation, and how could we tell whether or not it's
in the loop?

Given this ambiguity, one might as well conclude that the assignment to
x might have been *after* the loop, rather than inside it.  How could
one tell?

Well, at least there still appears to be some sequential ordering that
makes sense.  But what if we were to apply a common loop optimization:
duplicating the "header", to turn the branch in bb4 into a fall-through
edge, so that the entry test is not part of the loop, and the exit test
is at the end of the body?  In this case, this would involve unrolling
part of the body, just because the exit test is not in the condition of
the for statement.  It's not unusual: if we had an inlined function call
in the loop condition, we'd observe a very similar issue.  This one just
makes the problem more visible without adding too much complexity.
Here's what we'd get:

  b_4 = before(x_0(D), 0); # debug b ORDER 2, a = 0 ORDER 1
  n_5 = next(x_0(D)); # debug n ORDER 3, x ORDER 5
  T_6 = stop(n_5, x_0(D), b_4, 0);
  if (T_6 != 0) goto <bb 5>; else goto <bb 4>;

<bb 4>:
  x_2 = PHI <x_0(D) (2), n_12 (4)>;
  b_9 = PHI <b_4 (2), b_10 (4)>;
  n_11 = PHI <n_5 (2), n_12 (4)>;
  
  a_7 = after(x_2, b_9); # debug a ORDER 4

---
  b_10 = before(n_11, a_3); # debug b ORDER 2, a = 0 ORDER 1
  n_12 = next(n_11); # debug n ORDER 3, x ORDER 5
  T_13 = stop(n_12, n_11, b_10, a_7);
  if (T_13 != 0) goto <bb 5>; else goto <bb 4>;

<bb 5>:

Now, some questions for which we'd need answers in order to be able to
output proper debug information for this loop:

- There are now two statements with ORDER 1, 2, 3 and 5 each, but only
one 4, and this one is inside the loop.  How do you figure out how to
tell and arrange the duplicates so as to be able to emit proper debug
info?

- Even if you could somehow tell apart the duplicates and focus only on
those that are within the loop, how would you figure out that 4 and 5
are to come *before* 2 and 3 there, in spite of their numbers, i.e.,
that the assignment to x ORDER 5 is to take the value of n_12 only in
the next iteration, rather than in the same iteration, before we even
call next(n_11)?

- If the loop body were to be actually unrolled, then the 'x ORDER 5'
would actually refer to x in another copy of the loop body.  How would
you deal with that?

- If before, after, next and stop were inlined, the SSA names for their
incoming arguments within the loop would have all been eliminated in
favor of the variables in the caller.  You'd also get inter-iteration
notes for variables initialized to x, just like you get them for x.  How
do you ensure such variables get the correct values?


=== Reordering debug temps

Consider this piece of code:

  dbg = (a + b) + (c + d);
  foo (a + b, c + d);
#if DEBUG
  dump (dbg);
#endif

compiled with -UDEBUG and optimized, this becomes:

  T.1 = a + b; # debug tmp1 ORDER 2
  T.2 = c + d; # debug tmp2 ORDER 3, dbg = tmp1 + tmp2 ORDER 1
  (void) foo (T.1, T.2);

But what if the assignments to T.1 and T.2 get reordered by the
scheduler?

(set (reg T2) (plus (reg C) (mem (...) [D]))) # debug tmp2 ORDER 3,
                                         dbg = tmp1 + tmp2 ORDER 1
(set (reg T1) (plus (reg A) (reg B))) # debug tmp1 ORDER 2

What does the note that sets dbg mean?  How do you tell it's the tmp1
that's only going to be bound/computed in the next insn, rather than
some previously-computed tmp1 (say, one in a previous loop iteration)?
Would you have to look at the notes when moving insns about to ensure
the temp notes don't become nonsensical or change meaning?


=== Maintenance burden

Using a hook on unlinking to tell whether to move elsewhere the notes
attached to an insn, a tuple or an expression is not as simple as it
might sound.  Or, rather, it is that simple, the problem is that it is
*too* simple: it's just not enough.

We might want to start with tree folding, which might simplify
expressions with debug loci to constants, or to other expressions that
have their own loci.  Where would we then tack on the notes that are
about to be thrown away, without as much as an explicit unlinking
operation?

Furthermore, unlinking appears in various different kinds of operations:

- moving insns without re-emitting: unlink, re-insert

- moving insns with re-emitting: emit pattern, unlink original

- removing insns: unlink

- duplicating insns into multiple blocks: emit, emit, unlink

All of these cases are present in GCC, but if you just hook onto unlink,
you might end up creating temps and attaching notes elsewhere, assuming
insns are going away, when not necessary.  And, in the case of moving
insns about and using temps, you actually need to know where you're
moving the pattern to, to be able to compute temps properly.

At which point I begin to doubt the wisdom of going through so much
trouble to avoid some small changes in optimizers.  It seems to me like
this is not avoiding them at all, it's actually requiring further care
on the optimizers rather than the mostly-brainless changes that VTA
required so far


=== More?

Depending on the proposed solutions for the problems above, I foresee
further complications, but I'll detail them only once the need arises.
This is long enough as is.


Regards,

-- 
Alexandre Oliva           http://www.lsd.ic.unicamp.br/~oliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist      Red Hat Brazil Compiler Engineer

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic