[prev in list] [next in list] [prev in thread] [next in thread]
List: gcc
Subject: RE: Help w/ PR61538?
From: Matthew Fortune <Matthew.Fortune () imgtec ! com>
Date: 2014-07-28 21:38:50
Message-ID: 6D39441BF12EF246A7ABCE6654B0235320EB4EEA () LEMAIL01 ! le ! imgtec ! org
[Download RAW message or body]
I'll switch to replying on PR61538. I had not read all the ticket
previously and although I may have found a problem it seems it may not
be the cause of this failure.
The generated code differences after the patches seem significant but
I may not get chance to look at the differences in detail for a little
while.
Matthew
> -----Original Message-----
> From: Joshua Kinard [mailto:kumba@gentoo.org]
> Sent: 28 July 2014 10:40
> To: Matthew Fortune; gcc@gcc.gnu.org
> Subject: Re: Help w/ PR61538?
>
> On 07/28/2014 04:41, Matthew Fortune wrote:
> > Hi Joshua,
> >
> > I know very little about this area but I'll try and offer some advice
> anyway...
> >
>
> You know more than I do :)
>
>
> >> On 07/05/2014 23:43, Joshua Kinard wrote:
> >>> Hi,
> >>>
> >>> I filed PR61538 about two weeks ago, regarding gcc-4.8.x and up not
> >>> compiling a g++/pthreads-linked app correctly on SGI R1x000-based
> systems
> >>> (Octane, Onyx2), running Linux. Running the subsequently-compiled
> >>> application simply hangs in a futex syscall until terminated via Ctrl+C.
> >> I
> >>> suspect it's a double-locking bug of some design, as evidenced by strace
> >>> showing two consecutive syscall()'s w/ 0x108e passed as the syscall #
> >> (4238
> >>> or futex on o32 MIPS), but I am stumped as to what else I can do to
> debug
> >> it
> >>> and help fix it.
> >>>
> >> [snip]
> >>> Full details:
> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61538
> >>
> >> So I've spent the last few weeks bisecting the gcc tree, and I've
> narrowed
> >> down the set of commits that appear to have introduced this problem:
> >>
> >> 1. 39a8c5eaded1e5771a941c56a49ca0a5e9c5eca0 * config/mips/mips.c
> >> (mips_emit_pre_atomic_barrier_p,)
> >
> > This is the prime candidate for introducing the issue.
>
> This is my guess, too. However, it appears to tie in w/ the fourth commit
> because the new mips_emit_{pre,post}_atomic_barrier_p functions added in
> commit 39a8c5ea are removed by commit 30c3c442 a mere ~7 minutes later
> (which I find really odd). Commit 974f0a74 is really the only one that
> seems innocent, but I suspect the other three are linked. If mkuvyrkov is
> still around, perhaps he could explain better?
>
>
> >> 2. 974f0a74e2116143b88d8cea8e1dd5a9c18ef96c * config/mips/constraints.md
> >> (ZR): New constraint.
> >
> > Unlikely
> >
> >> 3. 0f8e46b16a53c02d7255dcd6b6e9b5bc7f8ec953 * config/mips/mips.c
> >> (mips_process_sync_loop): Emit cmp result only if
> >
> > Possible but unlikely still
> >
> >> 4. 30c3c4427521f96fb58b6e1debb86da4f113f06f * emit-rtl.c
> >> (need_atomic_barrier_p): New function.
> >
> > Seems unlikely
> >
> >>
> >> There's a build failure somewhere in the middle of there that is blocking
> me
> >> from figuring out which specific one is the cause, but they all appear to
> be
> >> related anyways. All four were added on 2012-06-20.
> >>
> >> When I took a git checkout from 2012-06-26 and reverted those four
> commits,
> >> I was able to compile glibc-2.19 and get a working "sln" binary. I am
> >> unable to easily test the C++ side because I built the checkouts in my
> >> $HOME, and it's too risky to try and shoehorn one of them in as the
> system
> >> compiler. However, I think the C++ issue is also fixed by reverting the
> >> four, as that also involved hanging in Linux futex syscalls.
> >
> > Here is a wild guess at the problem... I think the workaround for R10000
> to
> > use branch likely instead of delay slot branches is ending up annulling
> > an instruction that is required for certain atomic operations. This is an
> > entirely untested theory (and patch) but can you see if this fixes the
> issue
> > you are seeing:
>
> Well, the branch-likely thing really only affects a specific revision of the
> R10000 processors. Later R10000 revisions (3.1+?) and R12000-R16000
> shouldn't be affected. I've been playing with disabling that specific
> workaround on my Octane's kernel and haven't seen any ill effects yet.
> Though, I haven't tried rebuilding the userland w/ -mno-fix-r10000 just yet.
>
> If you want, you can take a look at some of the additional info in the
> corresponding Gentoo bug that tracks PR61538:
>
> https://bugs.gentoo.org/show_bug.cgi?id=516548
>
> I have a gdb run (comment #5) of the several instructions in
> __lll_lock_wait_private, including register values, as each instruction
> executes. The hang happens after taking the futex syscall, t0-t3 get set to
> 0x0, and the following "ll v0,0(s0)" is what hangs. In gcc-4.7 and earlier,
> that 'll' is actually "li v0,2", though control never passes into
> __lll_lock_wait_private in the first place.
>
> There's also a PNG attached to that bug of the disassembled asm in WinMerge
> they shows what insns actually changed. Someone who understands MIPS asm
> ordering might be able to make something of that.
>
>
> > @@ -13014,7 +13023,8 @@ mips_process_sync_loop (rtx insn, rtx *operands)
> > mips_multi_copy_insn (tmp3_insn);
> > mips_multi_set_operand (mips_multi_last_index (), 0, newval);
> > }
> > - else if (!(required_oldval && cmp))
> > + else if (!(required_oldval && cmp)
> > + || mips_branch_likely)
> > mips_multi_add_insn ("nop", NULL);
> >
> > /* CMP = 1 -- either standalone or in a delay slot. */
> >
> > I suspect I can weave that in more naturally but can you tell me if that
> > fixes the problem first.
>
> Testing a fix takes about 7.5hrs to rebuild, plus another 3.5 to rebuild
> glibc. So I am a bit hesitant to task the machine to do that w/o having a
> better idea if that solves it or not. Technically, shouldn't passing
> -mno-fix-r10000 have a similar effect by causing branch-likely insns to not
> get emitted at all?
>
> Thanks!,
>
> --
> Joshua Kinard
> Gentoo/MIPS
> kumba@gentoo.org
> 4096R/D25D95E3 2011-03-28
>
> "The past tempts us, the present confuses us, the future frightens us. And
> our lives slip away, moment by moment, lost in that vast, terrible in-
> between."
>
> --Emperor Turhan, Centauri Republic
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic