[prev in list] [next in list] [prev in thread] [next in thread] 

List:       olpc-devel
Subject:    Re: Infrequent heap corruption, XO-4, Fedora 20
From:       Jon Nettleton <jon.nettleton () gmail ! com>
Date:       2015-02-05 7:11:10
Message-ID: CALHpu36OfQ6gdHSaDhnPVnm_N_7B+ErsoGixX_QCz7==3JweOw () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


On Thu, Feb 5, 2015 at 8:00 AM, James Cameron <quozl@laptop.org> wrote:

> Thanks.
>
> Can I make it happen more often?
>
> Is there a later version of the driver?
>
> We have a different version that I may look into, on arm-3.5-android
> branch.
>
>
run memtester against the majority of your machines memory and then run
gtkperf in an X session.  That is usually enough to trigger it.

Considering that bug exists in all the 4.xx vivante galcore drivers I have
seen I doubt it is fixed in the other version.  Android is much simpler on
memory because it runs everything through a single GL context against a
framebuffer.

I have some tentative patches to fix parts of it in my trees but I doubt a
lot of them would apply to 3.5 without backporting a lot of upstream work.



> On Wed, Feb 04, 2015 at 12:14:02PM +0100, Jon Nettleton wrote:
> > It is a problem with the v4 version of the galcore driver.  We have
> replicated
> > it on a couple of platforms.
> >
> > On Wed, Feb 4, 2015 at 11:26 AM, Peter Robinson <[1]pbrobinson@gmail.com
> >
> > wrote:
> >
> >     On Wed, Feb 4, 2015 at 8:10 AM, James Cameron <[2]quozl@laptop.org>
> wrote:
> >     > Following up a thread from last September.
> >     >
> >     > This problem has just become more interesting, because it hit
> during
> >     > an activity startup.
> >     >
> >     > I'm quite used to seeing it with yum.  But seeing it without yum
> now
> >     > points us at kernel, glibc or python.
> >
> >     We've not seen this in the wider F-20 Fedora ARM distro so my bet
> >     would be on the kernel.
> >
> >     Peter
> >
> >     > [3]http://dev.laptop.org/ticket/12837#comment:4 has the details
> of the
> >     > most recent event.
> >     >
> >     > On Wed, Sep 10, 2014 at 01:56:27PM +1000, James Cameron wrote:
> >     >> G'day Peter,
> >     >>
> >     >> Thanks for any ideas you may have.
> >     >>
> >     >> The problem also reproduces on OLPC Fedora 20 image for XO-4:
> >     >>
> >     >> [4]http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd (552 MB)
> >     >>
> >     >> *** Error in `/usr/bin/python': free(): invalid pointer:
> 0x047c79ae ***
> >     >> ======= Backtrace: =========
> >     >> /lib/libc.so.6(+0x6c8b4)[0xb6c828b4]
> >     >> /lib/libc.so.6(+0x754e8)[0xb6c8b4e8]
> >     >> ======= Memory map: ========
> >     >> [...]
> >     >>
> >     >> The error varies in detail, but always suggests corruption of
> heap or
> >     >> pointers to heap.
> >     >>
> >     >> The triggering conditions are interactive use of yum, yum update,
> or
> >     >> yum used by olpc-os-builder.  The latter is a simple reproducer
> for me.
> >     >>
> >     >> I'm reproducing it on an XO-4, with 2GB of RAM, no swap, 8 GB
> eMMC, 8
> >     >> GB USB flash drive.
> >     >>
> >     >> While memory demand by yum is large by comparison to other
> programs,
> >     >> the available memory at the time of failure is ample.  There are
> no
> >     >> kernel out of memory (OOM) events.  It seems more likely to occur
> when
> >     >> the filesystem cache is under heavy demand.
> >     >>
> >     >> The method to recreate the problem was:
> >     >>
> >     >> 1.  install the system image 41001o4.zd using fs-update and then
> boot,
> >     >>
> >     >> 2.  configure wireless network,
> >     >>
> >     >> 3.  "yum install -y git olpc-os-builder"
> >     >>
> >     >> 4.  clone the master branch of
> >     >> git://[5]dev.laptop.org/projects/olpc-os-builder
> >     >> (last verified with b87e6ee)
> >     >>
> >     >> 5.  run "./osbuilder.py examples/olpc-os-14.1.0-xo4.ini"
> repeatedly
> >     >> until the error occurs (usually within about five attempts),
> >     >>
> >     >>
> >     >> I've also tried running under valgrind, but that causes illegal
> >     >> instruction.  It is quite likely I'm not using valgrind correctly.
> >     >> [6]http://dev.laptop.org/~quozl/z/1XRYtO.txt
> >     >>
> >     >> The workaround at the moment is to build our Fedora 20 images on
> >     >> Fedora 18.  Fedora 18 shows no sign of the problem.  I'm worried
> that
> >     >> a low probability heap corruptor may cause instability of
> applications
> >     >> in the field.
> >     >>
> >     >> The exact same kernel is being used for Fedora 18 and Fedora 20.
> >     >>
> >     >> On Tue, Sep 09, 2014 at 03:55:24PM +0100, Peter Robinson wrote:
> >     >> > What version of OOB are you using, and what config files? I can
> try
> >     >> > and recreate the problem here on other devices.
> >     >>
> >     >> --
> >     >> James Cameron
> >     >> [7]http://quozl.linux.org.au/
> >     >
> >     > --
> >     > James Cameron
> >     > [8]http://quozl.linux.org.au/
> >     _______________________________________________
> >     Devel mailing list
> >     [9]Devel@lists.laptop.org
> >     [10]http://lists.laptop.org/listinfo/devel
> >
> > References:
> >
> > [1] mailto:pbrobinson@gmail.com
> > [2] mailto:quozl@laptop.org
> > [3] http://dev.laptop.org/ticket/12837#comment:4
> > [4] http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd
> > [5] http://dev.laptop.org/projects/olpc-os-builder
> > [6] http://dev.laptop.org/~quozl/z/1XRYtO.txt
> > [7] http://quozl.linux.org.au/
> > [8] http://quozl.linux.org.au/
> > [9] mailto:Devel@lists.laptop.org
> > [10] http://lists.laptop.org/listinfo/devel
>
> --
> James Cameron
> http://quozl.linux.org.au/
>

[Attachment #5 (text/html)]

<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb \
5, 2015 at 8:00 AM, James Cameron <span dir="ltr">&lt;<a \
href="mailto:quozl@laptop.org" target="_blank">quozl@laptop.org</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex">Thanks.<br> <br>
Can I make it happen more often?<br>
<br>
Is there a later version of the driver?<br>
<br>
We have a different version that I may look into, on arm-3.5-android<br>
branch.<br>
<span class=""><br></span></blockquote><div><br></div><div>run memtester against the \
majority of your machines memory and then run gtkperf in an X session.   That is \
usually enough to trigger it.</div><div><br></div><div>Considering that bug exists in \
all the 4.xx vivante galcore drivers I have seen I doubt it is fixed in the other \
version.   Android is much simpler on memory because it runs everything through a \
single GL context against a framebuffer.</div><div><br></div><div>I have some \
tentative patches to fix parts of it in my trees but I doubt a lot of them would \
apply to 3.5 without backporting a lot of upstream work.</div><div><br></div><div>  \
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><span class=""> On Wed, Feb 04, 2015 at 12:14:02PM +0100, Jon \
Nettleton wrote:<br> &gt; It is a problem with the v4 version of the galcore driver.  \
We have replicated<br> &gt; it on a couple of platforms.<br>
&gt;<br>
</span>&gt; On Wed, Feb 4, 2015 at 11:26 AM, Peter Robinson &lt;[1]<a \
href="mailto:pbrobinson@gmail.com">pbrobinson@gmail.com</a>&gt;<br> &gt; wrote:<br>
<span class="">&gt;<br>
&gt;        On Wed, Feb 4, 2015 at 8:10 AM, James Cameron &lt;[2]<a \
href="mailto:quozl@laptop.org">quozl@laptop.org</a>&gt; wrote:<br> &gt;        &gt; \
Following up a thread from last September.<br> &gt;        &gt;<br>
&gt;        &gt; This problem has just become more interesting, because it hit \
during<br> &gt;        &gt; an activity startup.<br>
&gt;        &gt;<br>
&gt;        &gt; I&#39;m quite used to seeing it with yum.   But seeing it without \
yum now<br> &gt;        &gt; points us at kernel, glibc or python.<br>
&gt;<br>
&gt;        We&#39;ve not seen this in the wider F-20 Fedora ARM distro so my bet<br>
&gt;        would be on the kernel.<br>
&gt;<br>
&gt;        Peter<br>
&gt;<br>
</span>&gt;        &gt; [3]<a href="http://dev.laptop.org/ticket/12837#comment:4" \
target="_blank">http://dev.laptop.org/ticket/12837#comment:4</a> has the details of \
the<br> <span class="">&gt;        &gt; most recent event.<br>
&gt;        &gt;<br>
&gt;        &gt; On Wed, Sep 10, 2014 at 01:56:27PM +1000, James Cameron wrote:<br>
&gt;        &gt;&gt; G&#39;day Peter,<br>
&gt;        &gt;&gt;<br>
&gt;        &gt;&gt; Thanks for any ideas you may have.<br>
&gt;        &gt;&gt;<br>
&gt;        &gt;&gt; The problem also reproduces on OLPC Fedora 20 image for \
XO-4:<br> &gt;        &gt;&gt;<br>
</span>&gt;        &gt;&gt; [4]<a \
href="http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd" \
target="_blank">http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd</a> (552 MB)<br> \
<span class="">&gt;        &gt;&gt;<br> &gt;        &gt;&gt; *** Error in \
`/usr/bin/python&#39;: free(): invalid pointer: 0x047c79ae ***<br> &gt;        \
&gt;&gt; ======= Backtrace: =========<br> &gt;        &gt;&gt; \
/lib/libc.so.6(+0x6c8b4)[0xb6c828b4]<br> &gt;        &gt;&gt; \
/lib/libc.so.6(+0x754e8)[0xb6c8b4e8]<br> &gt;        &gt;&gt; ======= Memory map: \
========<br> &gt;        &gt;&gt; [...]<br>
&gt;        &gt;&gt;<br>
&gt;        &gt;&gt; The error varies in detail, but always suggests corruption of \
heap or<br> &gt;        &gt;&gt; pointers to heap.<br>
&gt;        &gt;&gt;<br>
&gt;        &gt;&gt; The triggering conditions are interactive use of yum, yum \
update, or<br> &gt;        &gt;&gt; yum used by olpc-os-builder.   The latter is a \
simple reproducer for me.<br> &gt;        &gt;&gt;<br>
&gt;        &gt;&gt; I&#39;m reproducing it on an XO-4, with 2GB of RAM, no swap, 8 \
GB eMMC, 8<br> &gt;        &gt;&gt; GB USB flash drive.<br>
&gt;        &gt;&gt;<br>
&gt;        &gt;&gt; While memory demand by yum is large by comparison to other \
programs,<br> &gt;        &gt;&gt; the available memory at the time of failure is \
ample.   There are no<br> &gt;        &gt;&gt; kernel out of memory (OOM) events.   \
It seems more likely to occur when<br> &gt;        &gt;&gt; the filesystem cache is \
under heavy demand.<br> &gt;        &gt;&gt;<br>
&gt;        &gt;&gt; The method to recreate the problem was:<br>
&gt;        &gt;&gt;<br>
&gt;        &gt;&gt; 1.   install the system image 41001o4.zd using fs-update and \
then boot,<br> &gt;        &gt;&gt;<br>
&gt;        &gt;&gt; 2.   configure wireless network,<br>
&gt;        &gt;&gt;<br>
&gt;        &gt;&gt; 3.   &quot;yum install -y git olpc-os-builder&quot;<br>
&gt;        &gt;&gt;<br>
&gt;        &gt;&gt; 4.   clone the master branch of<br>
</span>&gt;        &gt;&gt; git://[5]<a \
href="http://dev.laptop.org/projects/olpc-os-builder" \
target="_blank">dev.laptop.org/projects/olpc-os-builder</a><br> <span class="">&gt;   \
&gt;&gt; (last verified with b87e6ee)<br> &gt;        &gt;&gt;<br>
&gt;        &gt;&gt; 5.   run &quot;./osbuilder.py \
examples/olpc-os-14.1.0-xo4.ini&quot; repeatedly<br> &gt;        &gt;&gt; until the \
error occurs (usually within about five attempts),<br> &gt;        &gt;&gt;<br>
&gt;        &gt;&gt;<br>
&gt;        &gt;&gt; I&#39;ve also tried running under valgrind, but that causes \
illegal<br> &gt;        &gt;&gt; instruction.   It is quite likely I&#39;m not using \
valgrind correctly.<br> </span>&gt;        &gt;&gt; [6]<a \
href="http://dev.laptop.org/~quozl/z/1XRYtO.txt" \
target="_blank">http://dev.laptop.org/~quozl/z/1XRYtO.txt</a><br> <span class="">&gt; \
&gt;&gt;<br> &gt;        &gt;&gt; The workaround at the moment is to build our Fedora \
20 images on<br> &gt;        &gt;&gt; Fedora 18.   Fedora 18 shows no sign of the \
problem.   I&#39;m worried that<br> &gt;        &gt;&gt; a low probability heap \
corruptor may cause instability of applications<br> &gt;        &gt;&gt; in the \
field.<br> &gt;        &gt;&gt;<br>
&gt;        &gt;&gt; The exact same kernel is being used for Fedora 18 and Fedora \
20.<br> &gt;        &gt;&gt;<br>
&gt;        &gt;&gt; On Tue, Sep 09, 2014 at 03:55:24PM +0100, Peter Robinson \
wrote:<br> &gt;        &gt;&gt; &gt; What version of OOB are you using, and what \
config files? I can try<br> &gt;        &gt;&gt; &gt; and recreate the problem here \
on other devices.<br> &gt;        &gt;&gt;<br>
&gt;        &gt;&gt; --<br>
&gt;        &gt;&gt; James Cameron<br>
</span>&gt;        &gt;&gt; [7]<a href="http://quozl.linux.org.au/" \
target="_blank">http://quozl.linux.org.au/</a><br> &gt;        &gt;<br>
&gt;        &gt; --<br>
&gt;        &gt; James Cameron<br>
&gt;        &gt; [8]<a href="http://quozl.linux.org.au/" \
target="_blank">http://quozl.linux.org.au/</a><br> &gt;        \
_______________________________________________<br> &gt;        Devel mailing \
list<br> &gt;        [9]<a \
href="mailto:Devel@lists.laptop.org">Devel@lists.laptop.org</a><br> &gt;        \
[10]<a href="http://lists.laptop.org/listinfo/devel" \
target="_blank">http://lists.laptop.org/listinfo/devel</a><br> &gt;<br>
&gt; References:<br>
&gt;<br>
&gt; [1] mailto:<a href="mailto:pbrobinson@gmail.com">pbrobinson@gmail.com</a><br>
&gt; [2] mailto:<a href="mailto:quozl@laptop.org">quozl@laptop.org</a><br>
&gt; [3] <a href="http://dev.laptop.org/ticket/12837#comment:4" \
target="_blank">http://dev.laptop.org/ticket/12837#comment:4</a><br> &gt; [4] <a \
href="http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd" \
target="_blank">http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd</a><br> &gt; [5] \
<a href="http://dev.laptop.org/projects/olpc-os-builder" \
target="_blank">http://dev.laptop.org/projects/olpc-os-builder</a><br> &gt; [6] <a \
href="http://dev.laptop.org/~quozl/z/1XRYtO.txt" \
target="_blank">http://dev.laptop.org/~quozl/z/1XRYtO.txt</a><br> &gt; [7] <a \
href="http://quozl.linux.org.au/" target="_blank">http://quozl.linux.org.au/</a><br> \
&gt; [8] <a href="http://quozl.linux.org.au/" \
target="_blank">http://quozl.linux.org.au/</a><br> &gt; [9] mailto:<a \
href="mailto:Devel@lists.laptop.org">Devel@lists.laptop.org</a><br> &gt; [10] <a \
href="http://lists.laptop.org/listinfo/devel" \
target="_blank">http://lists.laptop.org/listinfo/devel</a><br> <div \
                class="HOEnZb"><div class="h5"><br>
--<br>
James Cameron<br>
<a href="http://quozl.linux.org.au/" \
target="_blank">http://quozl.linux.org.au/</a><br> \
</div></div></blockquote></div><br></div></div>


[Attachment #6 (text/plain)]

_______________________________________________
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic