'Re: [cfe-dev] [analyzer] Why use `checkPostCall()` to model the function semantics in `StdLibraryFun'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cfe-dev
Subject:    Re: [cfe-dev] [analyzer] Why use `checkPostCall()` to model the function semantics in `StdLibraryFun
From:       Artem Dergachev via cfe-dev <cfe-dev () lists ! llvm ! org>
Date:       2018-04-30 19:55:19
Message-ID: 1e767813-8bcb-8fd0-6992-cb559691ad21 () gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

On 4/25/18 8:50 PM, Henry Wong wrote:
> Thank you for taking your time to give me this clarification and 
> detailed explanation, Artem! You always give more information than I 
> want 😊.
>
> The discussions about introducing a system of dependencies between 
> checkers should be in 
> http://lists.llvm.org/pipermail/cfe-dev/2017-December/056424.html. I 
> put this links here so others can see it.

+cfe-dev

>
> One more question. If it is possible move some logic from 
> 'checkPostCall()' to 'evalCall()' in 'StdLibraryFunctionsChecker'. For 
> 'isspace()', like below:
> -------------------------------------------------------------------------------
>               Current Implementation
>  evalCall() : 'Conjure a symbol' and 'bind it to CallExpr'
>  checkPostCall() : 'assume' and 'apply ranges'
>
>
>                Possible Implementation
> evalCall() : 'Conjure a symbol' and 'bind it to CallExpr'
> 'assume' and 'apply ranges'
> -------------------------------------------------------------------------------
>
> Is it reasonable?

You won't be able to move that logic because not all functions should be 
modeled via evalCall().

You may duplicate the logic in evalCall() so that state split happened 
earlier for functions that are modeled with evalCall(), but i'm still 
interested in seeing a motivation for such change. There's only one 
correct way to make a state split, and such state split is pretty much 
idempotent.

>
> Thanks again!
> Henry Wong
> Qihoo 360 Codesafe Team
> ------------------------------------------------------------------------
> *From:* Artem Dergachev <noqnoqneo@gmail.com>
> *Sent:* Thursday, April 26, 2018 3:09
> *To:* Henry Wong; cfe-dev@lists.llvm.org
> *Subject:* Re: [cfe-dev] [analyzer] Why use `checkPostCall()` to model 
> the function semantics in `StdLibraryFunctionsChecker.cpp` ?
> StdLibraryFunctionsChecker uses evalCall for most calls it models. It 
> uses checkPostCall only for few functions for which it models so 
> little about the function that it's unlikely to ever be a problem for 
> other checkers that may eventually want to model the function exactly. 
> The checker's original intent was to cut away infeasible paths in the 
> program, eg. preventing analysis of paths on which getline() is 
> assumed to return -2. It is indeed a problem that other checkers are 
> not able to reliably access this information immediately in their own 
> checkPostCall, but currently there are no checkers that are actively 
> relying on that. There are also discussions about introducing a system 
> of dependencies between checkers so that dependent checkers 
> automatically turned on checkers on which they depend and have their 
> callbacks fire in a specific order, which could probably be already 
> hacked up by writing weird registerChecker() functions that register 
> dependencies first.
>
> There are currently at least 4 different ways the analyzer can model a 
> function:
>
> 1. Conservative evaluation (normal analyzer behavior when body of the 
> function is unavailable).
> 2. Inlining (model the function precisely by jumping into it and 
> proceeding with normal analysis inside it).
> (1. and 2. will be collectively referred to as "default evaluation".)
> 3. Body farm (provide a simplified synthetic AST for the function body 
> and then inline it).
> 4. evalCall() in checker (let a checker manipulate the program state 
> manually to model arbitrary effects of the function).
>
> Additionally, any checker may influence the analysis at almost any 
> point, which should be used carefully. For instance, splitting the 
> path or cutting away a path that seems infeasible is fine (as long as 
> it is the desired behavior), replacing a value of an expression with a 
> different value is bad.
>
> When body farms were introduced, they seemed to be a great way of 
> modeling library functions, and they are fairly effective for the few 
> functions they were used for. But later a lot of functions turned out 
> to be problematic to model that way - either because their simplified 
> AST is too complicated to synthesize correctly (eg. std::call_once 
> turned out to be extremely painful because we had to write down AST 
> for template instantiations manually node-by-node without being able 
> to rely on the compiler to help us with that) or because a good 
> synthetic AST will not be understood by the analyzer anyway. 
> StdLibraryFunctionsChecker is modeling some functions that fall to the 
> latter category. You should be able to find further explanation of why 
> they are hard to body-farm in the checker's comments.
>
> The difference between evalCall and checkPostCall is that evalCall 
> overrides the default evaluation. If a checker does evalCall(), the 
> function will never be inlined or invalidate potentially accessible 
> memory. The checker will also need to come up with a good 
> representation of the return value and will have a chance to specify 
> it. If two different checkers try to evalCall() the same call, the 
> analyzer will defensively crash.
>
> StdLibraryFunctionsChecker uses evalCall for modeling calls that it 
> can model *exactly*.
>
> It also uses checkPostCall for stuff that it can't model exactly, but 
> for the lack of better modeling it can still model a few things that 
> are safe to model in post-call, in addition to the effects of default 
> evaluation. For example, it doesn't model the string (or even the 
> length of the string) produced by getline() but it does know that this 
> function never returns -2, so it cuts away the respective paths. If 
> getline() is inlined or a different checker models it in evalCall or 
> even in checkPostCall, StdLibraryFunctionsChecker will still work 
> correctly, because, well, whatever the other modeling does, it 
> shouldn't make getline() return -2. It might happen that another 
> checker substitutes the return value in PostCall leading to a race, 
> but that's the exact reason why substituting expression values after 
> the expression is evaluated is a bad idea anywhere in the analyzer.
>
> StdLibraryFunctionsChecker uses a custom system of function summaries 
> which is relatively extensible but not super flexible. It should 
> probably not used for modeling everything. In fact, i doubt it'd be 
> easy to extend it to reliably model anything but range constraints. 
> Side effects like "this function writes its 1st argument to memory 
> pointed to by its 2nd argument" are already pretty unpleasant to 
> summarize declaratively; add a couple of levels of pointer indirection 
> and it'd be a nightmare.
>
> So, adding more functions and side effect variants similar to what's 
> already there is welcome. I'm moderately curious about how far this 
> summary system can be pushed, but reliability comes first. Trying to 
> model every function this way is not a great idea.
>
> On 4/24/18 8:46 PM, Henry Wong via cfe-dev wrote:
>> Hi all,
>>
>> `StdLibraryFunctionsChecker.cpp` is a very useful and great tool to 
>> improve the modeling of library function. But I can't figure out why 
>> use `checkPostCall()` to model the function samantic.
>>
>> What puzzles me is the order of API calls. For example, if we want to 
>> make some checks on `getline()` in `CheckerA`, and use 
>> `checkPostCall()` to collect information or set `ProgramState`, the 
>> `checkPostCall()` of `CheckerA` is likely to be behind the 
>> `checkPostCall()` of `StdLibraryFunctionsChecker.cpp`. At this point, 
>> `CheckerA` does not use the model information of `getline()` in 
>> `StdLibraryFunctionsChecker.cpp`. So what is the original intention 
>> of using `checkPostCall()` to play the key role in modeling?
>>
>> And I want to know what plans community have for 
>> `StdLibraryFunctionsChecker.cpp` in the future, for example, extend 
>> it to handle more complex library functions?
>>
>> Thanks in advance!
>>
>> Henry Wong
>> Qihoo 360 Codesafe Team
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev@lists.llvm.org  <mailto:cfe-dev@lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

[Attachment #5 (text/html)]

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="moz-cite-prefix">On 4/25/18 8:50 PM, Henry Wong wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:SG2PR0401MB199984B4270226687D2B652FA48E0@SG2PR0401MB1999.apcprd04.prod.outlook.com">
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} \
</style>  <div style="font-family: Calibri, Helvetica, sans-serif,
        EmojiFont, &quot;Apple Color Emoji&quot;, &quot;Segoe UI
        Emoji&quot;, NotoColorEmoji, &quot;Segoe UI Symbol&quot;,
        &quot;Android Emoji&quot;, EmojiSymbols; font-size: 12pt; color:
        rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
        Thank you for taking your time to give me this clarification and
        detailed explanation, Artem! You always give more information
        than I want
        <span>😊</span>.</div>
      <div style="font-family: Calibri, Helvetica, sans-serif,
        EmojiFont, &quot;Apple Color Emoji&quot;, &quot;Segoe UI
        Emoji&quot;, NotoColorEmoji, &quot;Segoe UI Symbol&quot;,
        &quot;Android Emoji&quot;, EmojiSymbols; font-size: 12pt; color:
        rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
        <br>
      </div>
      <div style="font-family: Calibri, Helvetica, sans-serif,
        EmojiFont, &quot;Apple Color Emoji&quot;, &quot;Segoe UI
        Emoji&quot;, NotoColorEmoji, &quot;Segoe UI Symbol&quot;,
        &quot;Android Emoji&quot;, EmojiSymbols; font-size: 12pt; color:
        rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
        <span style="color: rgb(33, 33, 33); font-family: &quot;Segoe
          UI&quot;, &quot;Segoe UI Web (West European)&quot;,
          &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
          Roboto, &quot;Helvetica Neue&quot;, sans-serif; font-size:
          15px; font-style: normal; font-variant-ligatures: normal;
          font-variant-caps: normal; font-weight: 400;">The
          discussions about introducing a system of dependencies between
          checkers should be in <a
            href="http://lists.llvm.org/pipermail/cfe-dev/2017-December/056424.html"
            id="LPlnk781175" \
moz-do-not-send="true">http://lists.llvm.org/pipermail/cfe-dev/2017-December/056424.html</a>.
  I put this links here so others can see it.</span><br>
      </div>
    </blockquote>
    <br>
    +cfe-dev<br>
    <br>
    <blockquote type="cite"
cite="mid:SG2PR0401MB199984B4270226687D2B652FA48E0@SG2PR0401MB1999.apcprd04.prod.outlook.com">
  <div style="font-family: Calibri, Helvetica, sans-serif,
        EmojiFont, &quot;Apple Color Emoji&quot;, &quot;Segoe UI
        Emoji&quot;, NotoColorEmoji, &quot;Segoe UI Symbol&quot;,
        &quot;Android Emoji&quot;, EmojiSymbols; font-size: 12pt; color:
        rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
      </div>
      <div style="font-family: Calibri, Helvetica, sans-serif,
        EmojiFont, &quot;Apple Color Emoji&quot;, &quot;Segoe UI
        Emoji&quot;, NotoColorEmoji, &quot;Segoe UI Symbol&quot;,
        &quot;Android Emoji&quot;, EmojiSymbols; font-size: 12pt; color:
        rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
        <span style="color: rgb(33, 33, 33); font-family: &quot;Segoe
          UI&quot;, &quot;Segoe UI Web (West European)&quot;,
          &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
          Roboto, &quot;Helvetica Neue&quot;, sans-serif; font-size:
          15px; font-style: normal; font-variant-ligatures: normal;
          font-variant-caps: normal; font-weight: 400;"><br>
        </span></div>
      <div style="font-family: Calibri, Helvetica, sans-serif,
        EmojiFont, &quot;Apple Color Emoji&quot;, &quot;Segoe UI
        Emoji&quot;, NotoColorEmoji, &quot;Segoe UI Symbol&quot;,
        &quot;Android Emoji&quot;, EmojiSymbols; font-size: 12pt; color:
        rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
        <span style="color: rgb(33, 33, 33); font-family: &quot;Segoe
          UI&quot;, &quot;Segoe UI Web (West European)&quot;,
          &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
          Roboto, &quot;Helvetica Neue&quot;, sans-serif; font-size:
          15px; font-style: normal; font-variant-ligatures: normal;
          font-variant-caps: normal; font-weight: 400;">One more
          question. If it is possible move some logic from
          'checkPostCall()' to 'evalCall()' in '<span style="color:
            rgb(33, 33, 33); font-family: &quot;Segoe UI&quot;,
            &quot;Segoe UI Web (West European)&quot;, &quot;Segoe
            UI&quot;, -apple-system, BlinkMacSystemFont, Roboto,
            &quot;Helvetica Neue&quot;, sans-serif; font-size: 15px;
            font-style: normal; font-variant-ligatures: normal;
            font-variant-caps: normal; font-weight: \
400;">StdLibraryFunctionsChecker</span>'.  For 'isspace()', like below:</span></div>
      <div style="font-family: Calibri, Helvetica, sans-serif,
        EmojiFont, &quot;Apple Color Emoji&quot;, &quot;Segoe UI
        Emoji&quot;, NotoColorEmoji, &quot;Segoe UI Symbol&quot;,
        &quot;Android Emoji&quot;, EmojiSymbols; font-size: 12pt; color:
        rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
        <span style="color: rgb(33, 33, 33); font-family: &quot;Segoe
          UI&quot;, &quot;Segoe UI Web (West European)&quot;,
          &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
          Roboto, &quot;Helvetica Neue&quot;, sans-serif; font-size:
          15px; font-style: normal; font-variant-ligatures: normal;
          font-variant-caps: normal; font-weight: \
400;">-------------------------------------------------------------------------------<br>
  </span></div>
      <div style="font-family: Calibri, Helvetica, sans-serif,
        EmojiFont, &quot;Apple Color Emoji&quot;, &quot;Segoe UI
        Emoji&quot;, NotoColorEmoji, &quot;Segoe UI Symbol&quot;,
        &quot;Android Emoji&quot;, EmojiSymbols; font-size: 12pt; color:
        rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
        <span style="color: rgb(33, 33, 33); font-family: &quot;Segoe
          UI&quot;, &quot;Segoe UI Web (West European)&quot;,
          &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
          Roboto, &quot;Helvetica Neue&quot;, sans-serif; font-size:
          15px; font-style: normal; font-variant-ligatures: normal;
          font-variant-caps: normal; font-weight: 400;">               
                        Current Implementation</span></div>
      <div style="background-color: rgb(255, 255, 255);"><font
          color="#212121"><span style="font-size: 15px;">               
             evalCall() : 'Conjure a symbol' and 'bind it to \
CallExpr'</span></font></div>  <div style="background-color: rgb(255, 255, \
255);"><font  color="#212121"><span style="font-size: 15px;">     
             checkPostCall() : 'assume' and 'apply ranges'</span></font></div>
      <div style="background-color: rgb(255, 255, 255);"><font
          color="#212121"><span style="font-size: 15px;"><br>
          </span></font></div>
      <div style="background-color: rgb(255, 255, 255);"><font
          color="#212121"><span style="font-size: 15px;"><br>
          </span></font></div>
      <div style="background-color: rgb(255, 255, 255);"><font
          color="#212121"><span style="font-size: 15px;">               
                           Possible Implementation</span></font></div>
      <div style="background-color: rgb(255, 255, 255);"><font
          color="#212121"><span style="font-size: 15px;"><span
              style="color: rgb(33, 33, 33); font-family: &quot;Segoe
              UI&quot;, &quot;Segoe UI Web (West European)&quot;,
              &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
              Roboto, &quot;Helvetica Neue&quot;, sans-serif; font-size:
              15px; font-style: normal; font-variant-ligatures: normal;
              font-variant-caps: normal; font-weight: 400;">           
                   </span><span style="color: rgb(33, 33, 33);
              font-family: &quot;Segoe UI&quot;, &quot;Segoe UI Web
              (West European)&quot;, &quot;Segoe UI&quot;,
              -apple-system, BlinkMacSystemFont, Roboto, &quot;Helvetica
              Neue&quot;, sans-serif; font-size: 15px; font-style:
              normal; font-variant-ligatures: normal; font-variant-caps:
              normal; font-weight: 400;">evalCall() : 'Conjure a symbol'</span><span
              style="color: rgb(33, 33, 33); font-family: &quot;Segoe
              UI&quot;, &quot;Segoe UI Web (West European)&quot;,
              &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
              Roboto, &quot;Helvetica Neue&quot;, sans-serif; font-size:
              15px; font-style: normal; font-variant-ligatures: normal;
              font-variant-caps: normal; font-weight: 400;"> and 'bind
              it to CallExpr</span><span style="color: rgb(33, 33, 33);
              font-family: &quot;Segoe UI&quot;, &quot;Segoe UI Web
              (West European)&quot;, &quot;Segoe UI&quot;,
              -apple-system, BlinkMacSystemFont, Roboto, &quot;Helvetica
              Neue&quot;, sans-serif; font-size: 15px; font-style:
              normal; font-variant-ligatures: normal; font-variant-caps:
              normal; font-weight: 400;">'</span><br>
          </span></font></div>
      <div style="background-color: rgb(255, 255, 255);"><font
          color="#212121"><span style="font-size: 15px;"><span
              style="color: rgb(33, 33, 33); font-family: &quot;Segoe
              UI&quot;, &quot;Segoe UI Web (West European)&quot;,
              &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
              Roboto, &quot;Helvetica Neue&quot;, sans-serif; font-size:
              15px; font-style: normal; font-variant-ligatures: normal;
              font-variant-caps: normal; font-weight: 400;">           
                                    <span style="color: rgb(33, 33, 33);
                font-family: &quot;Segoe UI&quot;, &quot;Segoe UI Web
                (West European)&quot;, &quot;Segoe UI&quot;,
                -apple-system, BlinkMacSystemFont, Roboto,
                &quot;Helvetica Neue&quot;, sans-serif; font-size: 15px;
                font-style: normal; font-variant-ligatures: normal;
                font-variant-caps: normal; font-weight: 400;">'assume'</span><span
                style="color: rgb(33, 33, 33); font-family: &quot;Segoe
                UI&quot;, &quot;Segoe UI Web (West European)&quot;,
                &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
                Roboto, &quot;Helvetica Neue&quot;, sans-serif;
                font-size: 15px; font-style: normal;
                font-variant-ligatures: normal; font-variant-caps:
                normal; font-weight: 400;"> and '</span><span
                style="color: rgb(33, 33, 33); font-family: &quot;Segoe
                UI&quot;, &quot;Segoe UI Web (West European)&quot;,
                &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
                Roboto, &quot;Helvetica Neue&quot;, sans-serif;
                font-size: 15px; font-style: normal;
                font-variant-ligatures: normal; font-variant-caps:
                normal; font-weight: 400;">apply ranges</span><span
                style="color: rgb(33, 33, 33); font-family: &quot;Segoe
                UI&quot;, &quot;Segoe UI Web (West European)&quot;,
                &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
                Roboto, &quot;Helvetica Neue&quot;, sans-serif;
                font-size: 15px; font-style: normal;
                font-variant-ligatures: normal; font-variant-caps:
                normal; font-weight: 400;">'</span></span></span></font></div>
      <div style="font-family: Calibri, Helvetica, sans-serif,
        EmojiFont, &quot;Apple Color Emoji&quot;, &quot;Segoe UI
        Emoji&quot;, NotoColorEmoji, &quot;Segoe UI Symbol&quot;,
        &quot;Android Emoji&quot;, EmojiSymbols; font-size: 12pt; color:
        rgb(0, 0, 0); background-color: rgb(255, 255, 255);">
        <span style="color: rgb(33, 33, 33); font-family: &quot;Segoe
          UI&quot;, &quot;Segoe UI Web (West European)&quot;,
          &quot;Segoe UI&quot;, -apple-system, BlinkMacSystemFont,
          Roboto, &quot;Helvetica Neue&quot;, sans-serif; font-size:
          15px; font-style: normal; font-variant-ligatures: normal;
          font-variant-caps: normal; font-weight: \
400;">-------------------------------------------------------------------------------<br>
  </span></div>
      <div style="font-family: Calibri, Helvetica, sans-serif;
        font-size: 12pt; color: rgb(0, 0, 0);">
        <br>
      </div>
      <div style="font-family: Calibri, Helvetica, sans-serif;
        font-size: 12pt; color: rgb(0, 0, 0);">
        Is it reasonable?</div>
    </blockquote>
    <br>
    You won't be able to move that logic because not all functions
    should be modeled via evalCall().<br>
    <br>
    You may duplicate the logic in evalCall() so that state split
    happened earlier for functions that are modeled with evalCall(), but
    i'm still interested in seeing a motivation for such change. There's
    only one correct way to make a state split, and such state split is
    pretty much idempotent.<br>
    <br>
    <blockquote type="cite"
cite="mid:SG2PR0401MB199984B4270226687D2B652FA48E0@SG2PR0401MB1999.apcprd04.prod.outlook.com">
  <div style="font-family: Calibri, Helvetica, sans-serif;
        font-size: 12pt; color: rgb(0, 0, 0);">
        <br>
      </div>
      <div style="font-family: Calibri, Helvetica, sans-serif;
        font-size: 12pt; color: rgb(0, 0, 0);">
        Thanks again!</div>
      <div id="signature">
        <div style="font-family:Calibri,Helvetica,sans-serif;
          font-size:12pt; color:rgb(0,0,0)">
          Henry Wong</div>
        <div style="font-family:Calibri,Helvetica,sans-serif;
          font-size:12pt; color:rgb(0,0,0)">
          Qihoo 360 Codesafe Team</div>
      </div>
      <hr style="display:inline-block;width:98%" tabindex="-1">
      <div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
          face="Calibri, sans-serif" color="#000000"><b>From:</b> Artem
          Dergachev <a class="moz-txt-link-rfc2396E" \
href="mailto:noqnoqneo@gmail.com">&lt;noqnoqneo@gmail.com&gt;</a><br>  <b>Sent:</b> \
                Thursday, April 26, 2018 3:09<br>
          <b>To:</b> Henry Wong; <a class="moz-txt-link-abbreviated" \
href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>  <b>Subject:</b> \
Re: [cfe-dev] [analyzer] Why use  `checkPostCall()` to model the function semantics \
in  `StdLibraryFunctionsChecker.cpp` ?</font>
        <div> </div>
      </div>
      <div style="background-color:#FFFFFF">StdLibraryFunctionsChecker
        uses evalCall for most calls it models. It uses checkPostCall
        only for few functions for which it models so little about the
        function that it's unlikely to ever be a problem for other
        checkers that may eventually want to model the function exactly.
        The checker's original intent was to cut away infeasible paths
        in the program, eg. preventing analysis of paths on which
        getline() is assumed to return -2. It is indeed a problem that
        other checkers are not able to reliably access this information
        immediately in their own checkPostCall, but currently there are
        no checkers that are actively relying on that. There are also
        discussions about introducing a system of dependencies between
        checkers so that dependent checkers automatically turned on
        checkers on which they depend and have their callbacks fire in a
        specific order, which could probably be already hacked up by
        writing weird registerChecker() functions that register
        dependencies first.<br>
        <br>
        There are currently at least 4 different ways the analyzer can
        model a function:<br>
        <br>
        1. Conservative evaluation (normal analyzer behavior when body
        of the function is unavailable).<br>
        2. Inlining (model the function precisely by jumping into it and
        proceeding with normal analysis inside it).<br>
        (1. and 2. will be collectively referred to as "default
        evaluation".)<br>
        3. Body farm (provide a simplified synthetic AST for the
        function body and then inline it).<br>
        4. evalCall() in checker (let a checker manipulate the program
        state manually to model arbitrary effects of the function).<br>
        <br>
        Additionally, any checker may influence the analysis at almost
        any point, which should be used carefully. For instance,
        splitting the path or cutting away a path that seems infeasible
        is fine (as long as it is the desired behavior), replacing a
        value of an expression with a different value is bad.<br>
        <br>
        When body farms were introduced, they seemed to be a great way
        of modeling library functions, and they are fairly effective for
        the few functions they were used for. But later a lot of
        functions turned out to be problematic to model that way -
        either because their simplified AST is too complicated to
        synthesize correctly (eg. std::call_once turned out to be
        extremely painful because we had to write down AST for template
        instantiations manually node-by-node without being able to rely
        on the compiler to help us with that) or because a good
        synthetic AST will not be understood by the analyzer anyway.
        StdLibraryFunctionsChecker is modeling some functions that fall
        to the latter category. You should be able to find further
        explanation of why they are hard to body-farm in the checker's
        comments.<br>
        <br>
        The difference between evalCall and checkPostCall is that
        evalCall overrides the default evaluation. If a checker does
        evalCall(), the function will never be inlined or invalidate
        potentially accessible memory. The checker will also need to
        come up with a good representation of the return value and will
        have a chance to specify it. If two different checkers try to
        evalCall() the same call, the analyzer will defensively crash.<br>
        <br>
        StdLibraryFunctionsChecker uses evalCall for modeling calls that
        it can model *exactly*.<br>
        <br>
        It also uses checkPostCall for stuff that it can't model
        exactly, but for the lack of better modeling it can still model
        a few things that are safe to model in post-call, in addition to
        the effects of default evaluation. For example, it doesn't model
        the string (or even the length of the string) produced by
        getline() but it does know that this function never returns -2,
        so it cuts away the respective paths. If getline() is inlined or
        a different checker models it in evalCall or even in
        checkPostCall, StdLibraryFunctionsChecker will still work
        correctly, because, well, whatever the other modeling does, it
        shouldn't make getline() return -2. It might happen that another
        checker substitutes the return value in PostCall leading to a
        race, but that's the exact reason why substituting expression
        values after the expression is evaluated is a bad idea anywhere
        in the analyzer.<br>
        <br>
        StdLibraryFunctionsChecker uses a custom system of function
        summaries which is relatively extensible but not super flexible.
        It should probably not used for modeling everything. In fact, i
        doubt it'd be easy to extend it to reliably model anything but
        range constraints. Side effects like "this function writes its
        1st argument to memory pointed to by its 2nd argument" are
        already pretty unpleasant to summarize declaratively; add a
        couple of levels of pointer indirection and it'd be a nightmare.<br>
        <br>
        So, adding more functions and side effect variants similar to
        what's already there is welcome. I'm moderately curious about
        how far this summary system can be pushed, but reliability comes
        first. Trying to model every function this way is not a great
        idea.<br>
        <br>
        <div class="x_moz-cite-prefix">On 4/24/18 8:46 PM, Henry Wong
          via cfe-dev wrote:<br>
        </div>
        <blockquote type="cite">
          <style type="text/css" style="display:none">
<!--
p
	{margin-top:0;
	margin-bottom:0}
-->
</style>
          <div style="font-family:Calibri,Helvetica,sans-serif;
            font-size:12pt; color:rgb(0,0,0);
            background-color:rgb(255,255,255)">
            Hi all,</div>
          <div style="font-family:Calibri,Helvetica,sans-serif;
            font-size:12pt; color:rgb(0,0,0);
            background-color:rgb(255,255,255)">
            <br>
          </div>
          <div style="font-family:Calibri,Helvetica,sans-serif;
            font-size:12pt; color:rgb(0,0,0);
            background-color:rgb(255,255,255)">
            `StdLibraryFunctionsChecker.cpp` is a very useful and
            great tool to improve the modeling of library function. <span
              style="color:rgb(0,0,0);
              font-family:Calibri,Helvetica,sans-serif; font-size:12pt">But
              I can't figure out why use `checkPostCall()` to model the
              function samantic.</span></div>
          <div style="font-family:Calibri,Helvetica,sans-serif;
            font-size:12pt; color:rgb(0,0,0);
            background-color:rgb(255,255,255)">
            <span style="color:rgb(0,0,0);
              font-family:Calibri,Helvetica,sans-serif; font-size:12pt"><br>
            </span></div>
          <div style="font-family:Calibri,Helvetica,sans-serif;
            font-size:12pt; color:rgb(0,0,0);
            background-color:rgb(255,255,255)">
            <span style="color:rgb(0,0,0);
              font-family:Calibri,Helvetica,sans-serif; font-size:12pt">What
              puzzles me is the order of API calls. For example, if we
              want to make some checks on `getline()</span><span
              style="color:rgb(0,0,0);
              font-family:Calibri,Helvetica,sans-serif; font-size:12pt">`
              in `CheckerA`, and use `checkPostCall()</span><span
              style="color:rgb(0,0,0);
              font-family:Calibri,Helvetica,sans-serif; font-size:12pt">`
              to collect information or set `ProgramState</span><span
              style="color:rgb(0,0,0);
              font-family:Calibri,Helvetica,sans-serif; font-size:12pt">`,
              the `checkPostCall()` of `CheckerA` is likely to be behind
              the `checkPostCall()` of `StdLibraryFunctionsChecker.cpp`.
              At this point, `CheckerA` does not use the model
              information of `getline()` in
              `StdLibraryFunctionsChecker.cpp`. So what is the original
              intention of using `checkPostCall()` to play the key role
              in modeling?</span></div>
          <div style="font-family:Calibri,Helvetica,sans-serif;
            font-size:12pt; color:rgb(0,0,0);
            background-color:rgb(255,255,255)">
            <span style="color:rgb(0,0,0);
              font-family:Calibri,Helvetica,sans-serif; font-size:12pt"><br>
            </span></div>
          <div style="font-family:Calibri,Helvetica,sans-serif;
            font-size:12pt; color:rgb(0,0,0);
            background-color:rgb(255,255,255)">
            And I want to know what plans community have for
            `StdLibraryFunctionsChecker.cpp` in the future, for example,
            extend it to handle more complex library functions?</div>
          <div style="font-family:Calibri,Helvetica,sans-serif;
            font-size:12pt; color:rgb(0,0,0);
            background-color:rgb(255,255,255)">
            <br>
          </div>
          <div style="font-family:Calibri,Helvetica,sans-serif;
            font-size:12pt; color:rgb(0,0,0);
            background-color:rgb(255,255,255)">
            Thanks in advance!</div>
          <div style="font-family:Calibri,Helvetica,sans-serif;
            font-size:12pt; color:rgb(0,0,0);
            background-color:rgb(255,255,255)">
            <br>
          </div>
          <div id="x_signature">
            <div style="font-family:Calibri,Helvetica,sans-serif;
              font-size:12pt; color:rgb(0,0,0)">
              Henry Wong</div>
            <div style="font-family:Calibri,Helvetica,sans-serif;
              font-size:12pt; color:rgb(0,0,0)">
              Qihoo 360 Codesafe Team</div>
          </div>
          <br>
          <fieldset class="x_mimeAttachmentHeader"></fieldset>
          <pre class="x_moz-quote-pre">_______________________________________________
 cfe-dev mailing list
<a class="x_moz-txt-link-abbreviated" href="mailto:cfe-dev@lists.llvm.org" \
moz-do-not-send="true">cfe-dev@lists.llvm.org</a> <a class="x_moz-txt-link-freetext" \
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" \
moz-do-not-send="true">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a> \
</pre>  </blockquote>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>

[Attachment #6 (text/plain)]

_______________________________________________
cfe-dev mailing list
cfe-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev

[prev in list] [next in list] [prev in thread] [next in thread]