[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-stable
Subject:    Re: Ryzen issues on FreeBSD ?
From:       Mike Tancsa <mike () sentex ! net>
Date:       2018-01-30 19:51:36
Message-ID: 5e48bbc2-e872-46bd-eece-25acbb180f77 () sentex ! net
[Download RAW message or body]

On 1/28/2018 7:41 PM, Don Lewis wrote:
> 
> My suspicion is a FreeBSD bug, probably a locking / race issue.  I know
> that we've had to make some tweeks to our code for AMD CPUs, like this:


OK, I got back the CPUs from AMD (fast turn around!)

And sadly, I am still able to hang the compile in about the same place.
However, if I set

hw.lower_amd64_sharedpage=0

it seems to hang in a different way. CTRL+t shows

load: 0.43  cmd: python2.7 15736 [umtxn] 165.00r 14.46u 6.65s 0% 233600k
make[1]: Working in: /usr/ports/net/samba47
make: Working in: /usr/ports/net/samba47


# procstat -t 15736
  PID    TID COMM                TDNAME              CPU  PRI STATE
WCHAN
15736 100855 python2.7           -                    -1  152 sleep
usem
15736 100956 python2.7           -                    -1  124 sleep
umtxn
15736 100957 python2.7           -                    -1  126 sleep
umtxn
15736 100958 python2.7           -                    -1  124 sleep
umtxn
15736 100959 python2.7           -                    -1  127 sleep
umtxn
15736 100960 python2.7           -                    -1  126 sleep
umtxn
15736 100961 python2.7           -                    -1  126 sleep
umtxn
15736 100962 python2.7           -                    -1  126 sleep
umtxn
15736 100963 python2.7           -                    -1  126 sleep
umtxn
15736 100964 python2.7           -                    -1  127 sleep
umtxn
15736 100965 python2.7           -                    -1  126 sleep
umtxn
15736 100966 python2.7           -                    -1  126 sleep
umtxn
15736 100967 python2.7           -                    -1  126 sleep
umtxn

 # procstat -kk 15736
  PID    TID COMM                TDNAME              KSTACK

15736 100855 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100956 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100957 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100958 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100959 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100960 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100961 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100962 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100963 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100964 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100965 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100966 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15736 100967 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc

If I kill the make, reboot and just type make, it completes after the
reboot.  If after the reboot, I do an rm -R work, it will hang again.
With the default of
hw.lower_amd64_sharedpage: 1
post reboot,

CTRL+T shows
load: 2.73  cmd: python2.7 15703 [usem] 40.92r 12.34u 3.45s 0% 233640k
make[1]: Working in: /usr/ports/net/samba47
make: Working in: /usr/ports/net/samba47



root@amdtestr12:/home/mdtancsa # procstat -kk 15703
  PID    TID COMM                TDNAME              KSTACK

15703 100824 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100956 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100957 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100958 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100959 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100960 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100961 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100962 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100963 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100964 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100965 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_lock_umutex+0x885 __umtx_op_wait_umutex+0x48
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100966 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
15703 100967 python2.7           -                   mi_switch+0xf5
sleepq_catch_signals+0x405 sleepq_wait_sig+0xf _sleep+0x231
umtxq_sleep+0x143 do_sem2_wait+0x68a __umtx_op_sem2_wait+0x4b
amd64_syscall+0xa48 fast_syscall_common+0xfc
root@amdtestr12:/home/mdtancsa # procstat -t 15703
  PID    TID COMM                TDNAME              CPU  PRI STATE
WCHAN
15703 100824 python2.7           -                    -1  152 sleep
usem
15703 100956 python2.7           -                    -1  125 sleep
usem
15703 100957 python2.7           -                    -1  127 sleep
usem
15703 100958 python2.7           -                    -1  125 sleep
usem
15703 100959 python2.7           -                    -1  125 sleep
usem
15703 100960 python2.7           -                    -1  126 sleep
usem
15703 100961 python2.7           -                    -1  126 sleep
usem
15703 100962 python2.7           -                    -1  126 sleep
usem
15703 100963 python2.7           -                    -1  126 sleep
usem
15703 100964 python2.7           -                    -1  126 sleep
usem
15703 100965 python2.7           -                    -1  126 sleep
umtxn
15703 100966 python2.7           -                    -1  126 sleep
usem
15703 100967 python2.7           -                    -1  125 sleep
usem
root@amdtestr12:/home/mdtancsa #


	---Mike


> 
> ------------------------------------------------------------------------
> r321608 | kib | 2017-07-27 01:37:07 -0700 (Thu, 27 Jul 2017) | 9 lines
> 
> Use MFENCE to serialize RDTSC on non-Intel CPUs.
> 
> Kernel already used the stronger barrier instruction for AMDs, correct
> the userspace fast gettimeofday() implementation as well.
> 
> 
> 
> I did go back and look at the build runaways that I've occasionally seen
> on my AMD FX-8320E package builder.  I haven't seen the python issue
> there, but have seen gmake get stuck in a sleeping state with a bunch of
> zombie offspring.
> 
> 


-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike@sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic