[prev in list] [next in list] [prev in thread] [next in thread]
List: olpc-devel
Subject: Re: Infrequent heap corruption, XO-4, Fedora 20
From: Jon Nettleton <jon.nettleton () gmail ! com>
Date: 2015-02-05 7:11:10
Message-ID: CALHpu36OfQ6gdHSaDhnPVnm_N_7B+ErsoGixX_QCz7==3JweOw () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
On Thu, Feb 5, 2015 at 8:00 AM, James Cameron <quozl@laptop.org> wrote:
> Thanks.
>
> Can I make it happen more often?
>
> Is there a later version of the driver?
>
> We have a different version that I may look into, on arm-3.5-android
> branch.
>
>
run memtester against the majority of your machines memory and then run
gtkperf in an X session. That is usually enough to trigger it.
Considering that bug exists in all the 4.xx vivante galcore drivers I have
seen I doubt it is fixed in the other version. Android is much simpler on
memory because it runs everything through a single GL context against a
framebuffer.
I have some tentative patches to fix parts of it in my trees but I doubt a
lot of them would apply to 3.5 without backporting a lot of upstream work.
> On Wed, Feb 04, 2015 at 12:14:02PM +0100, Jon Nettleton wrote:
> > It is a problem with the v4 version of the galcore driver. We have
> replicated
> > it on a couple of platforms.
> >
> > On Wed, Feb 4, 2015 at 11:26 AM, Peter Robinson <[1]pbrobinson@gmail.com
> >
> > wrote:
> >
> > On Wed, Feb 4, 2015 at 8:10 AM, James Cameron <[2]quozl@laptop.org>
> wrote:
> > > Following up a thread from last September.
> > >
> > > This problem has just become more interesting, because it hit
> during
> > > an activity startup.
> > >
> > > I'm quite used to seeing it with yum. But seeing it without yum
> now
> > > points us at kernel, glibc or python.
> >
> > We've not seen this in the wider F-20 Fedora ARM distro so my bet
> > would be on the kernel.
> >
> > Peter
> >
> > > [3]http://dev.laptop.org/ticket/12837#comment:4 has the details
> of the
> > > most recent event.
> > >
> > > On Wed, Sep 10, 2014 at 01:56:27PM +1000, James Cameron wrote:
> > >> G'day Peter,
> > >>
> > >> Thanks for any ideas you may have.
> > >>
> > >> The problem also reproduces on OLPC Fedora 20 image for XO-4:
> > >>
> > >> [4]http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd (552 MB)
> > >>
> > >> *** Error in `/usr/bin/python': free(): invalid pointer:
> 0x047c79ae ***
> > >> ======= Backtrace: =========
> > >> /lib/libc.so.6(+0x6c8b4)[0xb6c828b4]
> > >> /lib/libc.so.6(+0x754e8)[0xb6c8b4e8]
> > >> ======= Memory map: ========
> > >> [...]
> > >>
> > >> The error varies in detail, but always suggests corruption of
> heap or
> > >> pointers to heap.
> > >>
> > >> The triggering conditions are interactive use of yum, yum update,
> or
> > >> yum used by olpc-os-builder. The latter is a simple reproducer
> for me.
> > >>
> > >> I'm reproducing it on an XO-4, with 2GB of RAM, no swap, 8 GB
> eMMC, 8
> > >> GB USB flash drive.
> > >>
> > >> While memory demand by yum is large by comparison to other
> programs,
> > >> the available memory at the time of failure is ample. There are
> no
> > >> kernel out of memory (OOM) events. It seems more likely to occur
> when
> > >> the filesystem cache is under heavy demand.
> > >>
> > >> The method to recreate the problem was:
> > >>
> > >> 1. install the system image 41001o4.zd using fs-update and then
> boot,
> > >>
> > >> 2. configure wireless network,
> > >>
> > >> 3. "yum install -y git olpc-os-builder"
> > >>
> > >> 4. clone the master branch of
> > >> git://[5]dev.laptop.org/projects/olpc-os-builder
> > >> (last verified with b87e6ee)
> > >>
> > >> 5. run "./osbuilder.py examples/olpc-os-14.1.0-xo4.ini"
> repeatedly
> > >> until the error occurs (usually within about five attempts),
> > >>
> > >>
> > >> I've also tried running under valgrind, but that causes illegal
> > >> instruction. It is quite likely I'm not using valgrind correctly.
> > >> [6]http://dev.laptop.org/~quozl/z/1XRYtO.txt
> > >>
> > >> The workaround at the moment is to build our Fedora 20 images on
> > >> Fedora 18. Fedora 18 shows no sign of the problem. I'm worried
> that
> > >> a low probability heap corruptor may cause instability of
> applications
> > >> in the field.
> > >>
> > >> The exact same kernel is being used for Fedora 18 and Fedora 20.
> > >>
> > >> On Tue, Sep 09, 2014 at 03:55:24PM +0100, Peter Robinson wrote:
> > >> > What version of OOB are you using, and what config files? I can
> try
> > >> > and recreate the problem here on other devices.
> > >>
> > >> --
> > >> James Cameron
> > >> [7]http://quozl.linux.org.au/
> > >
> > > --
> > > James Cameron
> > > [8]http://quozl.linux.org.au/
> > _______________________________________________
> > Devel mailing list
> > [9]Devel@lists.laptop.org
> > [10]http://lists.laptop.org/listinfo/devel
> >
> > References:
> >
> > [1] mailto:pbrobinson@gmail.com
> > [2] mailto:quozl@laptop.org
> > [3] http://dev.laptop.org/ticket/12837#comment:4
> > [4] http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd
> > [5] http://dev.laptop.org/projects/olpc-os-builder
> > [6] http://dev.laptop.org/~quozl/z/1XRYtO.txt
> > [7] http://quozl.linux.org.au/
> > [8] http://quozl.linux.org.au/
> > [9] mailto:Devel@lists.laptop.org
> > [10] http://lists.laptop.org/listinfo/devel
>
> --
> James Cameron
> http://quozl.linux.org.au/
>
[Attachment #5 (text/html)]
<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb \
5, 2015 at 8:00 AM, James Cameron <span dir="ltr"><<a \
href="mailto:quozl@laptop.org" target="_blank">quozl@laptop.org</a>></span> \
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex">Thanks.<br> <br>
Can I make it happen more often?<br>
<br>
Is there a later version of the driver?<br>
<br>
We have a different version that I may look into, on arm-3.5-android<br>
branch.<br>
<span class=""><br></span></blockquote><div><br></div><div>run memtester against the \
majority of your machines memory and then run gtkperf in an X session. That is \
usually enough to trigger it.</div><div><br></div><div>Considering that bug exists in \
all the 4.xx vivante galcore drivers I have seen I doubt it is fixed in the other \
version. Android is much simpler on memory because it runs everything through a \
single GL context against a framebuffer.</div><div><br></div><div>I have some \
tentative patches to fix parts of it in my trees but I doubt a lot of them would \
apply to 3.5 without backporting a lot of upstream work.</div><div><br></div><div> \
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><span class=""> On Wed, Feb 04, 2015 at 12:14:02PM +0100, Jon \
Nettleton wrote:<br> > It is a problem with the v4 version of the galcore driver. \
We have replicated<br> > it on a couple of platforms.<br>
><br>
</span>> On Wed, Feb 4, 2015 at 11:26 AM, Peter Robinson <[1]<a \
href="mailto:pbrobinson@gmail.com">pbrobinson@gmail.com</a>><br> > wrote:<br>
<span class="">><br>
> On Wed, Feb 4, 2015 at 8:10 AM, James Cameron <[2]<a \
href="mailto:quozl@laptop.org">quozl@laptop.org</a>> wrote:<br> > > \
Following up a thread from last September.<br> > ><br>
> > This problem has just become more interesting, because it hit \
during<br> > > an activity startup.<br>
> ><br>
> > I'm quite used to seeing it with yum. But seeing it without \
yum now<br> > > points us at kernel, glibc or python.<br>
><br>
> We've not seen this in the wider F-20 Fedora ARM distro so my bet<br>
> would be on the kernel.<br>
><br>
> Peter<br>
><br>
</span>> > [3]<a href="http://dev.laptop.org/ticket/12837#comment:4" \
target="_blank">http://dev.laptop.org/ticket/12837#comment:4</a> has the details of \
the<br> <span class="">> > most recent event.<br>
> ><br>
> > On Wed, Sep 10, 2014 at 01:56:27PM +1000, James Cameron wrote:<br>
> >> G'day Peter,<br>
> >><br>
> >> Thanks for any ideas you may have.<br>
> >><br>
> >> The problem also reproduces on OLPC Fedora 20 image for \
XO-4:<br> > >><br>
</span>> >> [4]<a \
href="http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd" \
target="_blank">http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd</a> (552 MB)<br> \
<span class="">> >><br> > >> *** Error in \
`/usr/bin/python': free(): invalid pointer: 0x047c79ae ***<br> > \
>> ======= Backtrace: =========<br> > >> \
/lib/libc.so.6(+0x6c8b4)[0xb6c828b4]<br> > >> \
/lib/libc.so.6(+0x754e8)[0xb6c8b4e8]<br> > >> ======= Memory map: \
========<br> > >> [...]<br>
> >><br>
> >> The error varies in detail, but always suggests corruption of \
heap or<br> > >> pointers to heap.<br>
> >><br>
> >> The triggering conditions are interactive use of yum, yum \
update, or<br> > >> yum used by olpc-os-builder. The latter is a \
simple reproducer for me.<br> > >><br>
> >> I'm reproducing it on an XO-4, with 2GB of RAM, no swap, 8 \
GB eMMC, 8<br> > >> GB USB flash drive.<br>
> >><br>
> >> While memory demand by yum is large by comparison to other \
programs,<br> > >> the available memory at the time of failure is \
ample. There are no<br> > >> kernel out of memory (OOM) events. \
It seems more likely to occur when<br> > >> the filesystem cache is \
under heavy demand.<br> > >><br>
> >> The method to recreate the problem was:<br>
> >><br>
> >> 1. install the system image 41001o4.zd using fs-update and \
then boot,<br> > >><br>
> >> 2. configure wireless network,<br>
> >><br>
> >> 3. "yum install -y git olpc-os-builder"<br>
> >><br>
> >> 4. clone the master branch of<br>
</span>> >> git://[5]<a \
href="http://dev.laptop.org/projects/olpc-os-builder" \
target="_blank">dev.laptop.org/projects/olpc-os-builder</a><br> <span class="">> \
>> (last verified with b87e6ee)<br> > >><br>
> >> 5. run "./osbuilder.py \
examples/olpc-os-14.1.0-xo4.ini" repeatedly<br> > >> until the \
error occurs (usually within about five attempts),<br> > >><br>
> >><br>
> >> I've also tried running under valgrind, but that causes \
illegal<br> > >> instruction. It is quite likely I'm not using \
valgrind correctly.<br> </span>> >> [6]<a \
href="http://dev.laptop.org/~quozl/z/1XRYtO.txt" \
target="_blank">http://dev.laptop.org/~quozl/z/1XRYtO.txt</a><br> <span class="">> \
>><br> > >> The workaround at the moment is to build our Fedora \
20 images on<br> > >> Fedora 18. Fedora 18 shows no sign of the \
problem. I'm worried that<br> > >> a low probability heap \
corruptor may cause instability of applications<br> > >> in the \
field.<br> > >><br>
> >> The exact same kernel is being used for Fedora 18 and Fedora \
20.<br> > >><br>
> >> On Tue, Sep 09, 2014 at 03:55:24PM +0100, Peter Robinson \
wrote:<br> > >> > What version of OOB are you using, and what \
config files? I can try<br> > >> > and recreate the problem here \
on other devices.<br> > >><br>
> >> --<br>
> >> James Cameron<br>
</span>> >> [7]<a href="http://quozl.linux.org.au/" \
target="_blank">http://quozl.linux.org.au/</a><br> > ><br>
> > --<br>
> > James Cameron<br>
> > [8]<a href="http://quozl.linux.org.au/" \
target="_blank">http://quozl.linux.org.au/</a><br> > \
_______________________________________________<br> > Devel mailing \
list<br> > [9]<a \
href="mailto:Devel@lists.laptop.org">Devel@lists.laptop.org</a><br> > \
[10]<a href="http://lists.laptop.org/listinfo/devel" \
target="_blank">http://lists.laptop.org/listinfo/devel</a><br> ><br>
> References:<br>
><br>
> [1] mailto:<a href="mailto:pbrobinson@gmail.com">pbrobinson@gmail.com</a><br>
> [2] mailto:<a href="mailto:quozl@laptop.org">quozl@laptop.org</a><br>
> [3] <a href="http://dev.laptop.org/ticket/12837#comment:4" \
target="_blank">http://dev.laptop.org/ticket/12837#comment:4</a><br> > [4] <a \
href="http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd" \
target="_blank">http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd</a><br> > [5] \
<a href="http://dev.laptop.org/projects/olpc-os-builder" \
target="_blank">http://dev.laptop.org/projects/olpc-os-builder</a><br> > [6] <a \
href="http://dev.laptop.org/~quozl/z/1XRYtO.txt" \
target="_blank">http://dev.laptop.org/~quozl/z/1XRYtO.txt</a><br> > [7] <a \
href="http://quozl.linux.org.au/" target="_blank">http://quozl.linux.org.au/</a><br> \
> [8] <a href="http://quozl.linux.org.au/" \
target="_blank">http://quozl.linux.org.au/</a><br> > [9] mailto:<a \
href="mailto:Devel@lists.laptop.org">Devel@lists.laptop.org</a><br> > [10] <a \
href="http://lists.laptop.org/listinfo/devel" \
target="_blank">http://lists.laptop.org/listinfo/devel</a><br> <div \
class="HOEnZb"><div class="h5"><br>
--<br>
James Cameron<br>
<a href="http://quozl.linux.org.au/" \
target="_blank">http://quozl.linux.org.au/</a><br> \
</div></div></blockquote></div><br></div></div>
[Attachment #6 (text/plain)]
_______________________________________________
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic