[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gcc
Subject:    Re: Help with rich_location and GIMPLE stmts
From:       Martin_Liška <mliska () suse ! cz>
Date:       2017-05-19 14:30:14
Message-ID: 4241660d-d4f0-f6dd-e931-409712591c53 () suse ! cz
[Download RAW message or body]

On 05/19/2017 02:14 PM, Marek Polacek wrote:
> On Thu, May 18, 2017 at 01:22:02PM +0200, Martin Liška wrote:
> > On 05/16/2017 09:14 PM, David Malcolm wrote:
> > > On Mon, 2017-05-15 at 15:36 +0200, Martin Liška wrote:
> > > > Hi.
> > > > 
> > > > I sent this email to David some time ago, but it should be probably
> > > > answered
> > > > on gcc mailing list.
> > > 
> > > > I have idea one to improve gcov tool and I'm interested in more
> > > > precise locations for gimple
> > > > statements. For gcov purpose, we dump location in ipa-profile pass,
> > > > which is an early IPA
> > > > pass and this data is used by gcov tool to map statements (blocks) to
> > > > lines of code.
> > > > 
> > > > I did a small experiment on the place we emit the location data:
> > > > 		  inform (gimple_location (stmt), "output_location");
> > > > 
> > > > and it shows for:
> > > > $ cat m2.c
> > > > unsigned int
> > > > UuT (void)
> > > > { unsigned int true_var = 1; unsigned int false_var = 0; unsigned int
> > > > ret = 0; if (true_var) /* count(1) */ { if (false_var) /* count(1) */
> > > > ret = 111; /* count(#####) */ } else ret = 999; /* count(#####) */
> > > > return ret; }
> > > > 
> > > > int
> > > > main (int argc, char **argv)
> > > > {
> > > > UuT ();
> > > > return 0;
> > > > }
> > > > 
> > > > $ gcc --coverage m2.c
> > > > m2.c: In function ‘main':
> > > > m2.c:8:3: note: output_location
> > > > UuT ();
> > > > ^~~~~~
> > > > # .MEM_2 = VDEF <.MEM_1(D)>
> > > > UuT ();
> > > > m2.c:9:10: note: output_location
> > > > return 0;
> > > > ^
> > > > _3 = 0;
> > > > m2.c: In function ‘UuT':
> > > > m2.c:3:16: note: output_location
> > > > { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> > > > int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> > > > count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> > > > count(#####) */ return ret; }
> > > > ^~~~~~~~
> > > > true_var_3 = 1;
> > > > m2.c:3:43: note: output_location
> > > > { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> > > > int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> > > > count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> > > > count(#####) */ return ret; }
> > > > ^~~~~~~~~
> > > > false_var_4 = 0;
> > > > m2.c:3:71: note: output_location
> > > > { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> > > > int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> > > > count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> > > > count(#####) */ return ret; }
> > > > 
> > > > ^~~
> > > > ret_5 = 0;
> > > > m2.c:3:83: note: output_location
> > > > { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> > > > int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> > > > count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> > > > count(#####) */ return ret; }
> > > > 
> > > > ^
> > > > if (true_var_3 != 0)
> > > > m2.c:3:114: note: output_location
> > > > { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> > > > int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> > > > count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> > > > count(#####) */ return ret; }
> > > > 
> > > > ^
> > > > if (false_var_4 != 0)
> > > > m2.c:3:145: note: output_location
> > > > { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> > > > int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> > > > count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> > > > count(#####) */ return ret; }
> > > > 
> > > > 
> > > > ~~~~^~~~~
> > > > ret_7 = 111;
> > > > m2.c:3:182: note: output_location
> > > > { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> > > > int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> > > > count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> > > > count(#####) */ return ret; }
> > > > 
> > > > 
> > > > ~~~~^~~~~
> > > > ret_6 = 999;
> > > > m2.c:3:215: note: output_location
> > > > { unsigned int true_var = 1; unsigned int false_var = 0; unsigned
> > > > int ret = 0; if (true_var) /* count(1) */ { if (false_var) /*
> > > > count(1) */ ret = 111; /* count(#####) */ } else ret = 999; /*
> > > > count(#####) */ return ret; }
> > > > 
> > > > 
> > > > 
> > > > ^~~
> > > > _8 = ret_2;
> > > > m2.c:3:215: note: output_location
> > > > # VUSE <.MEM_9(D)>
> > > > return _8;
> > > > 
> > > > Which is not optimal, for some assignments I see just LHS 
> > > > (false_var_4 = 0),
> 
> Note that
> 
> unsigned int false_var = 0;
> 
> is not an assignment-expression, it's an initialization.  Only the
> '0' here is parsed as an assignment-expression, but in this case
> set_c_expr_source_range isn't called.

Hello.

Yes, I noticed that it's not called.

> 
> > > 
> > > My first though was: are there assignments for which this isn't the
> > > case?  The only one I see is the:
> > > ret = 999;
> > > ~~~~^~~~~
> > > 
> > > Are the locations for these assignments coming through from the
> > > frontend?
> > 
> > Hi.
> > 
> > Actually not all, the default assignments are created in gimplifier and 
> > location is assigned from DECL_EXPR:
> > 
> > (gdb) p debug_tree(*expr_p)
> > <decl_expr 0x7ffff6988c80
> > type <void_type 0x7ffff6878f18 void VOID
> > align 8 symtab 0 alias set -1 canonical type 0x7ffff6878f18
> > pointer_to_this <pointer_type 0x7ffff68800a8>>
> > side-effects
> > arg 0 <var_decl 0x7ffff7f9ae10 true_var
> > type <integer_type 0x7ffff6878690 unsigned int public unsigned SI
> > size <integer_cst 0x7ffff6860f18 constant 32>
> > unit size <integer_cst 0x7ffff6860f30 constant 4>
> > align 32 symtab 0 alias set -1 canonical type 0x7ffff6878690 precision 32 min \
> > <integer_cst 0x7ffff6860f48 0> max <integer_cst 0x7ffff6860f00 4294967295> \
> > pointer_to_this <pointer_type 0x7ffff6885dc8>> used unsigned SI file /tmp/m2.c \
> > line 4 col 16 size <integer_cst 0x7ffff6860f18 32> unit size <integer_cst \
> > 0x7ffff6860f30 4> align 32 context <function_decl 0x7ffff697ce00 UuT> initial \
> > <integer_cst 0x7ffff698b258 1> chain <var_decl 0x7ffff7f9aea0 false_var type \
> > <integer_type 0x7ffff6878690 unsigned int> used unsigned SI file /tmp/m2.c line 4 \
> > col 43 size <integer_cst 0x7ffff6860f18 32> unit size <integer_cst 0x7ffff6860f30 \
> > 4> align 32 context <function_decl 0x7ffff697ce00 UuT> initial <integer_cst \
> >                 0x7ffff6860f48 0> chain <var_decl 0x7ffff7f9af30 ret>>>
> > /tmp/m2.c:4:16 start: /tmp/m2.c:4:16 finish: /tmp/m2.c:4:23>
> > 
> > That explains why only LHS of these assignments is selected.
> > 
> > > 
> > > I believe that in the C frontend these are assignment-expression, and
> > > hence handled by c_parser_expr_no_commas; in particular the use of
> > > op_location and the call:
> > > 
> > > set_c_expr_source_range (&ret, lhs.get_start (), rhs.get_finish ());
> > > 
> > > ought to be setting up the caret of the assignment to be on the
> > > operator token, and for the start/finish to range from the start of the
> > > lhs to the end of the rhs i.e. what we see for:
> > > 
> > > ret = 999;
> > > ~~~~^~~~~
> > 
> > Yep, MODIFY_EXPRs created in FE go this way and it's fine.
> > 
> > > 
> > > 
> > > > for return statements only a returned value is displayed. 
> > > 
> > > Is this running on SSA form?  If so, I wonder if you're running into
> > > something like this:
> > > 
> > > retval_N = PHI <lots of values>;
> > > return retval_N;
> > > 
> > > where it's using the location of that "return retval_N;" for all of the
> > > return statements in the function, rather than the individual source
> > > locations.
> > 
> > Yep, but we properly assign each assignment to a SSA name that's going to
> > be merged in exit BB by PHI node:
> > 
> > _8 = ret_2;
> > /tmp/m2.c:7:8: note: output_location
> > return ret; }
> > ^~~
> > 
> > Here the location comes from c_finish_return function where location
> > comes from a value that's returned.
> > 
> > > 
> > > > For conditions, only condition beginning is showed.
> > > > Is this known behavior or do I miss
> > > > something?
> > > 
> > > c_parser_if_statement has:
> > > 
> > > loc = c_parser_peek_token (parser)->location;
> > > 
> > > which is that of the open-paren.  Maybe we should build a location
> > > covering the range of the "if ( expression )" part of the if-statement?
> > 
> > Adding Marek as C FE maintainer to reply the question.
> 
> I suppose we could do better and I'd probably highlight just the expression
> part of "if ( expression )".  But not sure how many use cases this range
> location would have.

Works for me. I guess it can take some time to improve locations of GIMPLE \
expressions. Anyhow, I can start enhancing gcov tool even with current locations. \
Having that, we can provide users following kind of information:

    1|int b, c, d, e;
      
    2|
      
    3|int main()
      
    4|{
      ^1
    5|   int a = b < 1 ? (c < 3 ? d : c) : e;
                         ^1       ^1  ^0   ^0
    6|   return a;
      
    7|}

Where '^0' means the block (statements) are not executed.

Martin

> 
> 	Marek
> 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic