[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cfe-dev
Subject:    Re: [cfe-dev] [Analyzer] Obtain MemRegion corresponding to an pointer expression that has been cast 
From:       scott constable via cfe-dev <cfe-dev () lists ! llvm ! org>
Date:       2015-08-24 20:49:04
Message-ID: CADYF24ffuUnft6sHfO5Sd1jwEqOHt_jCf19yEjvzhx4xJMfoVg () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Thanks Ted,

The solution was to write the "dereference" function like this:

const MemRegion *
Util::getPointedToRegion(SVal addrVal, bool ignoreElemCast) {
Optional<Loc> l = addrVal.getAs<Loc>();
if (!l) // must be a null pointer
return nullptr;
const MemRegion *MR = l->getAsRegion();
if (!MR)
return nullptr;
const ElementRegion *ER = dyn_cast<ElementRegion>(MR);
if (ER && ignoreElemCast)
MR = ER->getSuperRegion();

return MR;
}

It's essentially just stripping off the ElementRegion, just like you
suggested.

~Scott Constable

On Wed, Aug 19, 2015 at 11:57 AM, Ted Kremenek via cfe-dev <
cfe-dev@lists.llvm.org> wrote:

> Hi Scott,
> 
> I don't actually see a reason here why you need to even look at the
> structure of the AST here.  The analyzer does a full symbolic execution, so
> there is a powerful separation between syntax and semantics right at your
> fingertips.
> 
> I would approach this from a different angle.  Once you have the location,
> in this case, ‘l', it should be an ElementRegion.  That will represent the
> cast from original MemRegion (a VarRegion) to uint8_t*.  Then just strip
> off the ElementRegion.  The MemRegion design captures how the casts were
> used to change the interpretation of a piece of memory.  It's all right
> there in the MemRegion hierarchy.
> 
> AST-based approaches like this are fundamentally very brittle.  For
> example, you would need to do something different if the code was instead
> written like this:
> 
> void foo() {
> struct S x;
> uint8_t *y = (uint8_t *)&x;
> bar(y);
> }
> 
> If you just use the MemRegions directly, these syntactic differences are
> irrelevant.  The MemRegions capture the actual semantics of the value you
> are working with.  In this case, the analyzer knows that the original
> memory address is for the VarRegion for ‘x'.
> 
> Typically if you find yourself going to the AST itself to do these kind of
> operations, the approach is inherently wrong.  Syntactic approaches work
> reasonably well for the compiler, where cheap local analysis is all you
> have.  For the static analyzer, there is so much semantics captured in the
> ProgramState that you can go far beyond the reasoning power of syntactic
> checks like this.
> 
> Cheers,
> Ted
> 
> > On Aug 19, 2015, at 8:44 AM, scott constable via cfe-dev <
> cfe-dev@lists.llvm.org> wrote:
> > 
> > Hi All,
> > 
> > I'm analyzing something like the following code:
> > 
> > struct S {
> > int a;
> > char b;
> > int c;
> > }
> > 
> > void foo() {
> > struct S x;
> > bar((uint8_t *)&x);
> > }
> > 
> > When I reach the CallEvent corresponding to the call to bar(), I would
> like to extract the MemRegion corresponding to x, i.e. by ignoring the
> (uint8_t *) cast. My code looks something like this:
> > 
> > const Expr *arg = Call.getArgExpr(0);
> > SVal addrVal = State->getSVal(arg, LCtx);
> > Optional<Loc> l = addrVal.getAs<Loc>();
> > if (!l) // must be a null pointer
> > return nullptr;
> > 
> > QualType T = getPointedToType(E);
> > return State->getSVal(*l, T).getAsRegion();
> > 
> > where getPointedToType() is defined as
> > 
> > getPointedToType(const Expr *E) {
> > assert(E);
> > if (!isPointer(E))
> > return QualType();
> > if (const CastExpr *cast = dyn_cast<CastExpr>(E))
> > return getPointedToType(cast->getSubExpr());
> > 
> > const PointerType *Ty =
> > 
> dyn_cast<PointerType>(E->getType().getCanonicalType().getTypePtr());
> > if (Ty)
> > return Ty->getPointeeType();
> > return QualType();
> > }
> > 
> > Everything seems to work just fine, until the call to State->getSVal(*l,
> T), which returns a NonLoc. If I instead call State->getSVal(*l) without
> the pointed-to type, then I do get a MemRegion, but it's an element region
> of type uint_8, NOT what I want.
> > 
> > Am I doing something wrong? Is there a much easier way to do this?
> > 
> > ~Scott Constable
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev@lists.llvm.org
> > 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2Dbin_mailman \
> _listinfo_cfe-2Ddev&d=BQIGaQ&c=eEvniauFctOgLOKGJOplqw&r=UVc407_CCx3FapxjS2xZ9jo4Q91u \
> pSGpJHRF8fPPYVY&m=kO3mADPT6iSj6j0bsR1t_h-zUwpU5pIswmJrYE52JpY&s=lDOFrm1CLnG-VY9ygoKFkayV7KRSC5BEgo-k_jJdf9k&e=
>  
> _______________________________________________
> cfe-dev mailing list
> cfe-dev@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 


[Attachment #5 (text/html)]

<div dir="ltr">Thanks Ted,<div><br></div><div>The solution was to write the \
&quot;dereference&quot; function like this:</div><div><br></div><div><div>const \
MemRegion *  </div><div>Util::getPointedToRegion(SVal addrVal, bool ignoreElemCast) \
{</div><div><span class="" style="white-space:pre">	</span>Optional&lt;Loc&gt; l = \
addrVal.getAs&lt;Loc&gt;();</div><div><span class="" \
style="white-space:pre">	</span>if (!l) // must be a null pointer</div><div><span \
class="" style="white-space:pre">		</span>return nullptr;</div><div><span class="" \
style="white-space:pre">	</span>const MemRegion *MR = \
l-&gt;getAsRegion();</div><div><span class="" style="white-space:pre">	</span>if \
(!MR)</div><div><span class="" style="white-space:pre">		</span>return \
nullptr;</div><div><span class="" style="white-space:pre">	</span>const ElementRegion \
*ER = dyn_cast&lt;ElementRegion&gt;(MR);</div><div><span class="" \
style="white-space:pre">	</span>if (ER &amp;&amp; ignoreElemCast)</div><div><span \
class="" style="white-space:pre">		</span>MR = \
ER-&gt;getSuperRegion();</div><div><br></div><div><span class="" \
style="white-space:pre">	</span>return \
MR;</div><div>}</div></div><div><br></div><div>It&#39;s essentially just stripping \
off the ElementRegion, just like you suggested.</div><div><br></div><div>~Scott \
Constable</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, \
Aug 19, 2015 at 11:57 AM, Ted Kremenek via cfe-dev <span dir="ltr">&lt;<a \
href="mailto:cfe-dev@lists.llvm.org" \
target="_blank">cfe-dev@lists.llvm.org</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">Hi Scott,<br> <br>
I don't actually see a reason here why you need to even look at the structure of the \
AST here.   The analyzer does a full symbolic execution, so there is a powerful \
separation between syntax and semantics right at your fingertips.<br> <br>
I would approach this from a different angle.   Once you have the location, in this \
case, ‘l', it should be an ElementRegion.   That will represent the cast from \
original MemRegion (a VarRegion) to uint8_t*.   Then just strip off the \
ElementRegion.   The MemRegion design captures how the casts were used to change the \
interpretation of a piece of memory.   It's all right there in the MemRegion \
hierarchy.<br> <br>
AST-based approaches like this are fundamentally very brittle.   For example, you \
would need to do something different if the code was instead written like this:<br> \
<span class=""><br>  void foo() {<br>
      struct S x;<br>
</span>     uint8_t *y = (uint8_t *)&amp;x;<br>
     bar(y);<br>
   }<br>
<br>
If you just use the MemRegions directly, these syntactic differences are irrelevant.  \
The MemRegions capture the actual semantics of the value you are working with.   In \
this case, the analyzer knows that the original memory address is for the VarRegion \
for ‘x'.<br> <br>
Typically if you find yourself going to the AST itself to do these kind of \
operations, the approach is inherently wrong.   Syntactic approaches work reasonably \
well for the compiler, where cheap local analysis is all you have.   For the static \
analyzer, there is so much semantics captured in the ProgramState that you can go far \
beyond the reasoning power of syntactic checks like this.<br> <br>
Cheers,<br>
Ted<br>
<div><div class="h5"><br>
&gt; On Aug 19, 2015, at 8:44 AM, scott constable via cfe-dev &lt;<a \
href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>&gt; wrote:<br> \
&gt;<br> &gt; Hi All,<br>
&gt;<br>
&gt; I&#39;m analyzing something like the following code:<br>
&gt;<br>
&gt; struct S {<br>
&gt;     int a;<br>
&gt;     char b;<br>
&gt;     int c;<br>
&gt; }<br>
&gt;<br>
&gt; void foo() {<br>
&gt;     struct S x;<br>
&gt;     bar((uint8_t *)&amp;x);<br>
&gt; }<br>
&gt;<br>
&gt; When I reach the CallEvent corresponding to the call to bar(), I would like to \
extract the MemRegion corresponding to x, i.e. by ignoring the (uint8_t *) cast. My \
code looks something like this:<br> &gt;<br>
&gt; const Expr *arg = Call.getArgExpr(0);<br>
&gt; SVal addrVal = State-&gt;getSVal(arg, LCtx);<br>
&gt; Optional&lt;Loc&gt; l = addrVal.getAs&lt;Loc&gt;();<br>
&gt; if (!l) // must be a null pointer<br>
&gt;           return nullptr;<br>
&gt;<br>
&gt; QualType T = getPointedToType(E);<br>
&gt; return State-&gt;getSVal(*l, T).getAsRegion();<br>
&gt;<br>
&gt; where getPointedToType() is defined as<br>
&gt;<br>
&gt; getPointedToType(const Expr *E) {<br>
&gt;           assert(E);<br>
&gt;           if (!isPointer(E))<br>
&gt;                       return QualType();<br>
&gt;           if (const CastExpr *cast = dyn_cast&lt;CastExpr&gt;(E))<br>
&gt;                       return getPointedToType(cast-&gt;getSubExpr());<br>
&gt;<br>
&gt;           const PointerType *Ty =<br>
&gt;                       \
dyn_cast&lt;PointerType&gt;(E-&gt;getType().getCanonicalType().getTypePtr());<br> \
&gt;           if (Ty)<br> &gt;                       return \
Ty-&gt;getPointeeType();<br> &gt;           return QualType();<br>
&gt; }<br>
&gt;<br>
&gt; Everything seems to work just fine, until the call to State-&gt;getSVal(*l, T), \
which returns a NonLoc. If I instead call State-&gt;getSVal(*l) without the \
pointed-to type, then I do get a MemRegion, but it&#39;s an element region of type \
uint_8, NOT what I want.<br> &gt;<br>
&gt; Am I doing something wrong? Is there a much easier way to do this?<br>
&gt;<br>
&gt; ~Scott Constable<br>
</div></div>&gt; _______________________________________________<br>
&gt; cfe-dev mailing list<br>
&gt; <a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>
&gt; <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.llvm.org_cgi-2 \
Dbin_mailman_listinfo_cfe-2Ddev&amp;d=BQIGaQ&amp;c=eEvniauFctOgLOKGJOplqw&amp;r=UVc407 \
_CCx3FapxjS2xZ9jo4Q91upSGpJHRF8fPPYVY&amp;m=kO3mADPT6iSj6j0bsR1t_h-zUwpU5pIswmJrYE52JpY&amp;s=lDOFrm1CLnG-VY9ygoKFkayV7KRSC5BEgo-k_jJdf9k&amp;e=" \
rel="noreferrer" target="_blank">https://urldefense.proofpoint.com/v2/url?u=http-3A__l \
ists.llvm.org_cgi-2Dbin_mailman_listinfo_cfe-2Ddev&amp;d=BQIGaQ&amp;c=eEvniauFctOgLOKG \
JOplqw&amp;r=UVc407_CCx3FapxjS2xZ9jo4Q91upSGpJHRF8fPPYVY&amp;m=kO3mADPT6iSj6j0bsR1t_h- \
zUwpU5pIswmJrYE52JpY&amp;s=lDOFrm1CLnG-VY9ygoKFkayV7KRSC5BEgo-k_jJdf9k&amp;e=</a><br> \
<br> _______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" \
target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br> \
</blockquote></div><br></div>


[Attachment #6 (text/plain)]

_______________________________________________
cfe-dev mailing list
cfe-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic