[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cfe-dev
Subject:    Re: [cfe-dev] Need help in implementing custom static analysis
From:       Artem Dergachev via cfe-dev <cfe-dev () lists ! llvm ! org>
Date:       2019-11-25 21:14:33
Message-ID: 0cf5ea98-6ee0-19b3-e2f3-bc37d56dddcf () gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hi,

Such analysis is trivial to perform with a custom Clang Static Analyzer 
checker. Just subscribe to checkPreCall and explore the symbolic values 
(SVals) of function arguments on possible execution paths. SVals capture 
a lot of information about where does the value come from and you don't 
need to manually track all re-assignments, as the analyzer does this for 
you, sometimes even across function calls. You can lookup what classes 
of SVals does it track and what kind of information they capture on our 
Doxygen:

     https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html
https://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html
https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html

In your example in case of 'f1(x1)' the symbolic value will be 
loc::MemRegionVal of SymbolicRegion of SymbolConjured of type void *, 
which you can extract from the SVal by doing 
V.getAsSymbol(true)->getType(), where V is your SVal.

In case of 'f2(x2)' you will only know that the value is equal to 'x', 
but the type of the original literal will be erased. You can still 
ultimately recover it via trackExpressionValue(), but that's not 
entirely convenient. That said, i'm not sure you really want it as long 
as you have the value anyway.

See also:

     http://clang-analyzer.llvm.org/checker_dev_manual.html

---

The only downside of the Static Analyzer is that it doesn't explore 
*all* possible execution paths, but only the ones it has time to 
carefully investigate (it intentionally suffers from "path explosion"). 
If your purpose is to make a tool that will find bugs in existing code, 
this is perfect. If you really really want to explore all execution 
paths no matter what, then you'll have to write your own analysis, and 
then one of your options will be to use Clang CFG:

     https://clang.llvm.org/doxygen/classclang_1_1CFG.html

Clang CFG is different from LLVM IR; it consists of Clang AST node 
pointers, so it still captures the information present in the original 
source code pretty much perfectly. There is a variety of existing 
analyses over Clang CFG available in Clang's lib/Analysis that you can 
use as an example or possibly even re-use.

That's much more work than a Static Analyzer checker though, and you'll 
have to deal with a lot more false positives due to lack of path 
sensitivity. It'll also be a much bigger challenge to find bugs across 
function calls.


On 23.11.2019 05:51, Pierre Graux via cfe-dev wrote:
> Hello,
>
> I am new to clang development and I would like to have your
> opinion on how I can do a specific task.
>
> I want to add a static analysis to the compilation of C++ part of
> Android applications (clang is the default compiler).
>
> During this analysis I want to locate the call of specific functions
> and then determine the type of the right value of the last
> assignation of their arguments.
>
> For example, if I track functions f1 and f2 in the following snippet:
> "
> unsigned long x1 = 0;
> unsigned int x2 = 0;
> unsigned char x3 = 0;
>
> x1 = malloc(...);
> x2 = 42;
> x3 = 'x';
> x2 = x3;
>
> f1(x1);
> f2(x2);
> "
> The analysis should return me "f1, void*" and "f2, unsigned char".
>
> Ideally, this analysis should generate a warning during the
> compilation process (depending on other conditions not mentioned
> here). However, if it is an external tool it is fully acceptable.
>
> I don't know if this kind of analysis is already present in clang but
> I think that it will be easier to implement it over CFG of llvm IR
> than over clang AST.
>
> I have looked at clang and llvm documentation but the different
> methods that I have seen do not seem to fulfill my requirements:
> - libclang or clang plugin: it seems that I can only access to the AST.
> - llvm pass: I won't be able to generate a warning.
>
> Do you have any advice about which interface I should use? Do you know
> any project/tool that could be good example and inspire me?
>
> Thank you very much,
>
> Pierre GRAUX
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


[Attachment #5 (text/html)]

<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
      charset=windows-1252">
  </head>
  <body>
    Hi,<br>
    <br>
    Such analysis is trivial to perform with a custom Clang Static
    Analyzer checker. Just subscribe to checkPreCall and explore the
    symbolic values (SVals) of function arguments on possible execution
    paths. SVals capture a lot of information about where does the value
    come from and you don't need to manually track all re-assignments,
    as the analyzer does this for you, sometimes even across function
    calls. You can lookup what classes of SVals does it track and what
    kind of information they capture on our Doxygen:<br>
    <br>
        <a class="moz-txt-link-freetext" \
href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html</a><br>
  
    <a class="moz-txt-link-freetext" \
href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">https://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html</a><br>
  
    <a class="moz-txt-link-freetext" \
href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html</a><br>
  <br>
    In your example in case of 'f1(x1)' the symbolic value will be
    loc::MemRegionVal of SymbolicRegion of SymbolConjured of type void
    *, which you can extract from the SVal by doing
    V.getAsSymbol(true)-&gt;getType(), where V is your SVal.<br>
    <br>
    In case of 'f2(x2)' you will only know that the value is equal to
    'x', but the type of the original literal will be erased. You can
    still ultimately recover it via trackExpressionValue(), but that's
    not entirely convenient. That said, i'm not sure you really want it
    as long as you have the value anyway.<br>
    <br>
    See also:<br>
    <br>
        <a class="moz-txt-link-freetext" \
href="http://clang-analyzer.llvm.org/checker_dev_manual.html">http://clang-analyzer.llvm.org/checker_dev_manual.html</a><br>
  <br>
    ---<br>
    <br>
    The only downside of the Static Analyzer is that it doesn't explore
    *all* possible execution paths, but only the ones it has time to
    carefully investigate (it intentionally suffers from "path
    explosion"). If your purpose is to make a tool that will find bugs
    in existing code, this is perfect. If you really really want to
    explore all execution paths no matter what, then you'll have to
    write your own analysis, and then one of your options will be to use
    Clang CFG:<br>
    <br>
        <a class="moz-txt-link-freetext" \
href="https://clang.llvm.org/doxygen/classclang_1_1CFG.html">https://clang.llvm.org/doxygen/classclang_1_1CFG.html</a><br>
  <br>
    Clang CFG is different from LLVM IR; it consists of Clang AST node
    pointers, so it still captures the information present in the
    original source code pretty much perfectly. There is a variety of
    existing analyses over Clang CFG available in Clang's lib/Analysis
    that you can use as an example or possibly even re-use.<br>
    <br>
    That's much more work than a Static Analyzer checker though, and
    you'll have to deal with a lot more false positives due to lack of
    path sensitivity. It'll also be a much bigger challenge to find bugs
    across function calls.<br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 23.11.2019 05:51, Pierre Graux via
      cfe-dev wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:1066126115.25466731.1574517103551.JavaMail.zimbra@inria.fr">
      <meta http-equiv="content-type" content="text/html;
        charset=windows-1252">
      <div style="font-family: arial, helvetica, sans-serif; font-size:
        12pt; color: #000000">
        <div>Hello,<br>
          <br>
          I am new to clang development and I would like to have your<br>
          opinion on how I can do a specific task.<br>
          <br>
          I want to add a static analysis to the compilation of C++ part
          of<br>
          Android applications (clang is the default compiler).<br>
          <br>
          During this analysis I want to locate the call of specific
          functions<br>
          and then determine the type of the right value of the last<br>
          assignation of their arguments.<br>
          <br>
          For example, if I track functions f1 and f2 in the following
          snippet:<br>
          "<br>
          unsigned long x1 = 0;<br>
          unsigned int x2 = 0;<br>
          unsigned char x3 = 0;<br>
          <br>
          x1 = malloc(...);<br>
          x2 = 42;<br>
          x3 = 'x';<br>
          x2 = x3;<br>
          <br>
          f1(x1);<br>
          f2(x2);<br>
          "<br>
          The analysis should return me "f1, void*" and "f2, unsigned
          char".<br>
          <br>
          Ideally, this analysis should generate a warning during the<br>
          compilation process (depending on other conditions not
          mentioned<br>
          here). However, if it is an external tool it is fully
          acceptable.<br>
          <br>
          I don't know if this kind of analysis is already present in
          clang but<br>
          I think that it will be easier to implement it over CFG of
          llvm IR<br>
          than over clang AST.<br>
          <br>
          I have looked at clang and llvm documentation but the
          different<br>
          methods that I have seen do not seem to fulfill my
          requirements:<br>
          - libclang or clang plugin: it seems that I can only access to
          the AST.<br>
          - llvm pass: I won't be able to generate a warning.<br>
          <br>
          Do you have any advice about which interface I should use? Do
          you know<br>
          any project/tool that could be good example and inspire me?<br>
          <br>
          Thank you very much,<br>
          <br>
          Pierre GRAUX<br data-mce-bogus="1">
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" \
wrap="">_______________________________________________ cfe-dev mailing list
<a class="moz-txt-link-abbreviated" \
href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a> <a \
class="moz-txt-link-freetext" \
href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a>
 </pre>
    </blockquote>
    <br>
  </body>
</html>


[Attachment #6 (text/plain)]

_______________________________________________
cfe-dev mailing list
cfe-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic