[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-hackers
Subject:    Re: Help needed to identify golang fork / memory corruption issue on FreeBSD
From:       Steven Hartland <killing () multiplay ! co ! uk>
Date:       2017-03-28 22:50:37
Message-ID: 2f5952b5-104d-4e82-5ebe-8cb9caabedae () multiplay ! co ! uk
[Download RAW message or body]

On 28/03/2017 12:38, Konstantin Belousov wrote:
> On Tue, Mar 28, 2017 at 09:48:23AM +0100, Steven Hartland wrote:
>> On 28/03/2017 09:38, Konstantin Belousov wrote:
>>> On Tue, Mar 28, 2017 at 09:23:24AM +0100, Steven Hartland wrote:
>>>> As I stopped the panic before that I couldn't tell so I've re-run with
>>>> some debug added just before the panic to capture the addresses of the
>>>> workbuf structure that the issue was detected in, here goes (parent:
>>>> 62620, child: 98756):
>>>>
>>>> workbuf: 0x800b51800
>>>> fatal error: workbuf is not empty
>>>> workbuf: 0x800a72000
>>>> fatal error: workbuf is empty
>>>> workbuf: 0x800a72000
>>>> fatal error: workbuf is not empty
>>> I do not understand.  Why do you show several addresses ?  Wouldn't the
>>> runtime panic after detecting the discrepancy, so there could be only one
>>> address ?
>> There are several goroutines (threads) running each detected an error,
>> as I'm blocking the panic by entering a sleep in the faulting goroutine
>> to enable the capture of procstat, other routines continue and detect an
>> error too.
> Ok.
>
> So I tried to simulate the load with an isolated test. Code below is
> naive, but it should illustrate the idea. Parent allocates some
> number of private-mapped areas, then runs threads which write bytes into
> the areas. Simultaneously parent forks children which write distinct
> byte into the same anonymous memory.
>
> Parent checks that it cannot see a byte written by children.
>
> So far it did not tripped on my test machine.  Feel free to play with it,
> if you have more insights what go runtime does, modify the code to simulate
> the failing test more accurately.
I've updated to it to be more like the go, so single forking thread 
(non-main), ancillary threads mainly idle until triggered by forking 
thread to perform a check, and still no failure.

What's curious is why I don't get the issue if either:
1. The machine has just a single core.
2. The work (GC) is moved after the child wait.

Given the above I added some debug:
func (b *workbuf) checknonempty() {
         if b.nobj == 0 {
                 print("workbuf is empty: b: ", b, ", nobj: ", b.nobj, 
", nobj2: ", b.nobj2, ", pushcnt: ", b.node.pushcnt, "\n")
                 throw("workbuf is empty")
         }
}

func (b *workbuf) checkempty() {
         if b.nobj != 0 {
                 print("workbuf is not empty: b: ", b, ", nobj: ", 
b.nobj, ", nobj2: ", b.nobj2, ", pushcnt: ", b.node.pushcnt, "\n")
                 throw("workbuf is not empty")
         }
}

Here's the output:
workbuf is not empty: b: 0x800c51000, nobj: 4, nobj2: -2, pushcnt: 104881
fatal error: workbuf is not empty

Nothing strange, but now lets have a look using gdb after the parent has 
exited:
(gdb) frame 8
#8  0x000000000041f1e8 in runtime.(*workbuf).checkempty (b=0x800c51000) 
at /usr/local/go/src/runtime/mgcwork.go:328
328                     throw("workbuf is not empty")
(gdb) print b
$3 = (struct runtime.workbuf *) 0x800c51000
(gdb) print *b
$4 = {runtime.workbufhdr = {node = {next = 0, pushcnt = 104881}, nobj = 
0, nobj2 = -8},....

So after the error was printed the value for nobj was some how 
corrected, however nobj2 being -8 indicates the last call which altered 
nobj was func (w *gcWork) get() uintptr where as the -2 indicates it was 
a putfull which is very muddled up.

I was curious what the child had at 0x800c51000 but couldn't persuade 
gdb to cast and output it as a
struct runtime.workbuf.


     Regards
     Steve


_______________________________________________
freebsd-hackers@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic