[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-arm
Subject:    Re: arm SMP fix
From:       Wojciech Macek <wma () semihalf ! com>
Date:       2013-11-29 14:11:32
Message-ID: CANsEV8ffGzq0P8BfyahK=w-b6G68Y-d8U01SseaPEGadH+dO8A () mail ! gmail ! com
[Download RAW message or body]

Hi Olivier,

Regarding your SMP fix, I've run some more tests. I've also attached some
debugs which could detect misbehavior of pcpu, and none of them ever fired.
I believe that at least with this issue, your patch is working and helped a
lot. However, we still have problems with stability on cortex-a15.

We've spent some time debugging and isolated problem a little bit, but any
comments are appreciated.

One of the configurations we have, was causing very interesting,
100%-reproducible crash at the end of kernel boot sequence. Inside
pmap_enter_locked, function cpu_icache_sync_range caused vm_fault exception
and lock congestion in pmap_fault_fixup. We've added some debugs and get
following logs (please see attached file "a15_rwlock_recursing.txt").
According to ktrs, pmap_enter_locked on cpu2 is mapping va=0x8000 to
pa=0x215a. After new pte is stored, icache_sync_range is executed. But
somehow, MMU has got wrong mappings, so generates fault, lock congestion
and panic. This can be easily debugged after entering kdb. First, cp15
pa-decoder sees 0x8000 as invalid (0x0000000f value). But just after
invalidating TLBs (other tests showed that invalidating only SE 0x8000 is
enough), MMU starts to decode the right address. So, everything suggests
that TLB cache holded a wrong entry.

We discovered, that adding tlb-invalidates inside pmap reduces reproduction
rate. For now, we are testing very nasty workaround which does tlb_flushID
on every PTE_SYNC. Although it seems to help (all tests are passing), there
is a lot work to be done to find a fix. For now, all I can tell is, that
pmap (probably?) lacks tlb_flushSE somewhere inside.

Regards,
Wojtek


2013/11/26 Wojciech Macek <wma@semihalf.com>

> Not realy... I'm only able to boot the 4-core Corex-A15, run world from
> SATA and perform some basic operations, including multithreaded
> compilation, if only takes less than ~5 minutes. Any more intensive stuff
> (creating/destroying a lot of processes at the same time, to be precise)
> results in kernel panic in pmap. We are trying to debug this issue, but
> have not found anything interesting so far.
>
> Regards,
> Wojtek
>
>
> 2013/11/26 Olivier Houchard <cognet@ci0.org>
>
>> On Tue, Nov 26, 2013 at 10:36:02AM +0100, Wojciech Macek wrote:
>> > Hi,
>> >
>> > Applying the patch is causing a failure on systems with more than 2
>> cores.
>> > get_pcpu() macro is masking cpuid with incorrect value, since the only
>> last
>> > two bits of cp15 register contain the right value. Simple fix attached
>> > below.
>> >
>> > Regards,
>> > Wojtek
>>
>> Hi Wojtek,
>>
>> Oops that's true, there I meant 0xf and not 0x15. Does that mean that
>> with your
>> fix, you get it to work ?
>>
>> Regards,
>>
>> Olivier
>>
>
>

["a15_rwlock_recursing.txt" (text/plain)]

ahci0: Forcing PI to 4 ports (mask = f)
ahci0: AHCI v1.30 with 4 6Gbps ports, Port Multiplier supported with FBS
ahci0: quirks=0x2000
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
Timecounters tick every 10.000 msec
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <ST380815AS 3.AAC> ATA-7 SATA 1.x device
ada0: Serial Number 6QZ1FV7G
ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
random: unblocking device.
Release APs
0wap zone Ternytirnige st or emdouucnetd  rforootm  f1r8o8m1 1u9f st:o/ d4e8v9/2m.d
  []...
warning: no time-of-day clock registered, system time will not be set accurately
panic: __rw_wlock_hard: recursing but non-recursive rw pmap pv global @ \
/home/wma/git/undiscloseddevice/sys/arm/arm/pmap-v6.c:1440

cpuid = 2
KDB: enter: panic
[ thread pid 1 tid 100001 ]
Stopped at      kdb_enter+0x4c: ldrb    r15, [r15, r15, ror r15]!
db> show cp15
Cpu ID: 0x412fc0f3
Current Cache Lvl ID: 0x8444c004
Ctrl: 0x00c5387f
Aux Ctrl: 0x80000000
Processor Feat 0: 0x00001131
Processor Feat 1: 0x00011011
Debug Feat 0: 0x02010555
Auxiliary Feat 0: 0x00000000
Memory Model Feat 0: 0x10201105
Memory Model Feat 1: 0x20000000
Memory Model Feat 2: 0x01240000
Memory Model Feat 3: 0x02102211
TTB0: 0x012e806a
db> bt
Tracing pid 1 tid 100001 td 0xcab15000
db_trace_self() at db_trace_self
         pc = 0xc058361c  lr = 0xc02438b8 (db_hex2dec+0x4e8)
         sp = 0xdeeff388  fp = 0xdeeff3a0
        r10 = 0xc06812e0
db_hex2dec() at db_hex2dec+0x4e8
         pc = 0xc02438b8  lr = 0xc0243218 (db_command_loop+0x2f0)
         sp = 0xdeeff3a8  fp = 0xdeeff448
         r4 = 0x00000000  r5 = 0x00000000
         r6 = 0xc05ec92a
db_command_loop() at db_command_loop+0x2f0
         pc = 0xc0243218  lr = 0xc0242f88 (db_command_loop+0x60)
         sp = 0xdeeff450  fp = 0xdeeff460
         r4 = 0xc05c98c7  r5 = 0xc05e5a82
         r6 = 0xc12d02f4  r7 = 0xc12c6050
         r8 = 0xdeeff648  r9 = 0xc0681550
        r10 = 0xc12c6054
db_command_loop() at db_command_loop+0x60
         pc = 0xc0242f88  lr = 0xc0245994 (X_db_symbol_values+0x254)
         sp = 0xdeeff468  fp = 0xdeeff588
         r4 = 0x00000000  r5 = 0xdeeff470
         r6 = 0xc12c6080
X_db_symbol_values() at X_db_symbol_values+0x254
         pc = 0xc0245994  lr = 0xc03f826c (kdb_trap+0x170)
         sp = 0xdeeff590  fp = 0xdeeff5b8
         r4 = 0x00000000  r5 = 0x00000001
         r6 = 0xc12c6080  r7 = 0xc12c6050
kdb_trap() at kdb_trap+0x170
         pc = 0xc03f826c  lr = 0xc0598bf8 (undefinedinstruction+0x2b4)
         sp = 0xdeeff5c0  fp = 0xdeeff640
         r4 = 0x00000000  r5 = 0xc05988a8
         r6 = 0x00000000  r7 = 0xe7ffffff
         r8 = 0xcab15000  r9 = 0xdeeff648
        r10 = 0xc03f79c4
undefinedinstruction() at undefinedinstruction+0x2b4
         pc = 0xc0598bf8  lr = 0xc05852ac (exception_exit)
         sp = 0xdeeff648  fp = 0xdeeff6a0
         r4 = 0xc05e5adc  r5 = 0x00000001
         r6 = 0xc12b2720  r7 = 0xc12b28c8
         r8 = 0xcab15000  r9 = 0xc12dff80
        r10 = 0xdeeff70c
exception_exit() at exception_exit
         pc = 0xc05852ac  lr = 0xc03f79b8 (kdb_enter+0x40)
         sp = 0xdeeff69c  fp = 0xdeeff6a0
         r0 = 0xc12c6064  r1 = 0x00000000
         r2 = 0xc05e9925  r3 = 0x000000ab
         r4 = 0xc05e5adc  r5 = 0x00000001
         r6 = 0xc12b2720  r7 = 0xc12b28c8
         r8 = 0xcab15000  r9 = 0xc12dff80
        r10 = 0xdeeff70c r12 = 0x00000000
kdb_enter() at kdb_enter+0x50
         pc = 0xc03f79c8  lr = 0xc03bf120 (kassert_panic+0x2a8)
         sp = 0xdeeff6a8  fp = 0xdeeff6c8
         r4 = 0x00000100
kassert_panic() at kassert_panic+0x2a8
         pc = 0xc03bf120  lr = 0xc03befd8 (kassert_panic+0x160)
         sp = 0xdeeff6d0  fp = 0xdeeff700
         r4 = 0xc12b27c8  r5 = 0xc05e522d
         r6 = 0xdeeff70c  r7 = 0xc12b2720
         r8 = 0xc12cf498  r9 = 0xc12cf488
        r10 = 0xcab15000
kassert_panic() at kassert_panic+0x160
         pc = 0xc03befd8  lr = 0xc03bc9ac (__rw_wlock_hard+0x30c)
         sp = 0xdeeff718  fp = 0xdeeff748
         r4 = 0x000005a0  r5 = 0xcab15000
         r6 = 0xc12cf498  r7 = 0xc12cf498
__rw_wlock_hard() at __rw_wlock_hard+0x30c
         pc = 0xc03bc9ac  lr = 0xc03bc648 (_rw_wlock_cookie+0xd0)
         sp = 0xdeeff750  fp = 0xdeeff778
         r4 = 0x000005a0  r5 = 0xc060cbb5
         r6 = 0xc12cf498  r7 = 0xc12cf498
         r8 = 0x00000000  r9 = 0x00000001
        r10 = 0xcab15000
_rw_wlock_cookie() at _rw_wlock_cookie+0xd0
         pc = 0xc03bc648  lr = 0xc058a710 (pmap_fault_fixup+0x34)
         sp = 0xdeeff780  fp = 0xdeeff7b0
         r4 = 0xc12cf420  r5 = 0x00000001
         r6 = 0x00008000  r7 = 0xc13fcabc
pmap_fault_fixup() at pmap_fault_fixup+0x34
         pc = 0xc058a710  lr = 0xc0597518 (data_abort_handler+0x304)
         sp = 0xdeeff7b8  fp = 0xdeeff880
         r4 = 0x00000007  r5 = 0x00000001
         r6 = 0x00002807  r7 = 0x00000013
         r8 = 0xdeeff888  r9 = 0xc13fca00
        r10 = 0xcab15000
data_abort_handler() at data_abort_handler+0x304
         pc = 0xc0597518  lr = 0xc05852ac (exception_exit)
         sp = 0xdeeff888  fp = 0xdeeff948
         r4 = 0xcb6b3004  r5 = 0x00008000
         r6 = 0xc0581f2c  r7 = 0xc12e8000
         r8 = 0xc13fcabc  r9 = 0xc13fcabc
        r10 = 0x0215a67e
exception_exit() at exception_exit
         pc = 0xc05852ac  lr = 0xc058fa20 (pmap_enter+0x9fc)
         sp = 0xdeeff8dc  fp = 0xdeeff948
         r0 = 0x00008000  r1 = 0x00001000
         r2 = 0xc059d638  r3 = 0x0000002f
         r4 = 0xcb6b3004  r5 = 0x00008000
         r6 = 0xc0581f2c  r7 = 0xc12e8000
         r8 = 0xc13fcabc  r9 = 0xc13fcabc
        r10 = 0x0215a67e r12 = 0x00000040
armv7_icache_sync_range() at armv7_icache_sync_range+0x8
         pc = 0xc059d640  lr = 0xc058fa20 (pmap_enter+0x9fc)
         sp = 0xdeeff8dc  fp = 0xdeeff948
Unwind failure (no registers changed)
db> show ktr
6264 (0xcab15000:cpu2): data abort: fault address=0x8000 (from pc=0xc059d640 \
lr=0xc058fa20)

6263 (0xcab15000:cpu2): data pmap_enter_locked beforei cache: pmap = 0xc13fcabc, va = \
0x8000, npte = 215a67e, opte = 0, ptep = 0xcb6b8020 6262 (0xcab15000:cpu2): WM_ADDED \
[pmap_get_pv_entry] m=0xc1544480 pc=0xc4600000 6261 (0xcab15000:cpu2): WM_ADDED \
[pmap_free_pv_chunk] m=0xc1543a00 pc=0xc4600000 6260 (0xcab14b40:cpu0): mi_switch: \
new thread 100002 (td_sched 0xcab14eb8, pid 10, idle: cpu0) 6259 (0xcab14b40:cpu0): \
KTRGRAPH group:"thread", id:"idle: cpu0 tid 100002", state:"running", attributes: \
prio:255 6258 (0xcab14b40:cpu0): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:1091]: oncpu = 0, cp15id = 0 \
6257 (0xcab12000:cpu0): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:992]: oncpu = 255, cp15id = 0 \
6256 (0xcab12000:cpu0): KTRGRAPH group:"load", id:"global load", counter:1, \
attributes: none 6255 (0xcab14000:cpu3): mi_switch: new thread 100005 (td_sched \
0xcab14378, pid 10, idle: cpu3) 6254 (0xcab14000:cpu3): KTRGRAPH group:"thread", \
id:"idle: cpu3 tid 100005", state:"running", attributes: prio:255 6253 \
(0xcab14000:cpu3): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:1091]: oncpu = 3, cp15id = 3 \
6252 (0xcab14000:cpu3): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:992]: oncpu = 255, cp15id = 3 \
6251 (0xcab14000:cpu3): KTRGRAPH group:"thread", id:"idle: cpu3 tid 100005", \
state:"idle", attributes: prio:255 6250 (0xcab12000:cpu0): KTRGRAPH group:"thread", \
id:"g_up tid 100013", state:"sleep", attributes: prio:92, wmesg:"-", \
lockname:"(null)" 6249 (0xcab14000:cpu3): mi_switch: old thread 100005 (td_sched \
0xcab14378, pid 10, idle: cpu3) 6248 (0xcab12000:cpu0): mi_switch: old thread 100013 \
(td_sched 0xcab12378, pid 12, g_up) 6247 (0xcab12000:cpu0): KTRGRAPH group:"thread", \
id:"g_up tid 100013", point:"priority change", attributes: prio:92, new prio:92, \
linkedto:"g_up tid 100013" 6246 (0xcab14780:cpu1): mi_switch: new thread 100003 \
(td_sched 0xcab14af8, pid 10, idle: cpu1) 6245 (0xcab14780:cpu1): KTRGRAPH \
group:"thread", id:"idle: cpu1 tid 100003", state:"running", attributes: prio:255 \
6244 (0xcab14780:cpu1): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:1091]: oncpu = 1, cp15id = 1 \
6243 (0xcab14780:cpu1): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:992]: oncpu = 255, cp15id = 1 \
6242 (0xcab14780:cpu1): KTRGRAPH group:"thread", id:"idle: cpu1 tid 100003", \
state:"idle", attributes: prio:255 6241 (0xcab14780:cpu1): mi_switch: old thread \
100003 (td_sched 0xcab14af8, pid 10, idle: cpu1) 6240 (0xcab15000:cpu2): sleepq \
resume: thread 0xcab15000 (pid 1, kernel) 6239 (0xcab15000:cpu2): mi_switch: new \
thread 100001 (td_sched 0xcab15378, pid 1, kernel) 6238 (0xcab15000:cpu2): KTRGRAPH \
group:"thread", id:"kernel tid 100001", state:"running", attributes: prio:84 6237 \
(0xcab15000:cpu2): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:1091]: oncpu = 2, cp15id = 2 \
6236 (0xcab143c0:cpu2): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:992]: oncpu = 255, cp15id = 2 \
6235 (0xcab143c0:cpu2): KTRGRAPH group:"thread", id:"idle: cpu2 tid 100004", \
state:"idle", attributes: prio:255 6234 (0xcab12000:cpu0): sleep: thread 100013 (pid \
12, g_up) on - (0xc12d0fa0) 6233 (0xcab143c0:cpu2): mi_switch: old thread 100004 \
(td_sched 0xcab14738, pid 10, idle: cpu2) 6232 (0xcab12000:cpu0): KTRGRAPH \
group:"load", id:"global load", counter:2, attributes: none 6231 (0xcab14000:cpu3): \
mi_switch: new thread 100005 (td_sched 0xcab14378, pid 10, idle: cpu3) 6230 \
(0xcab14000:cpu3): KTRGRAPH group:"thread", id:"idle: cpu3 tid 100005", \
state:"running", attributes: prio:255 6229 (0xcab14000:cpu3): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:1091]: oncpu = 3, cp15id = 3 \
6228 (0xcab12000:cpu0): KTRGRAPH group:"thread", id:"g_up tid 100013", \
point:"wokeup", attributes: linkedto:"kernel tid 100001" 6227 (0xcab12000:cpu0): \
KTRGRAPH group:"thread", id:"kernel tid 100001", state:"runq add", attributes: \
prio:84, linkedto:"g_up tid 100013" 6226 (0xcab12000:cpu0): sleepq_wakeup: thread \
0xcab15000 (pid 1, kernel) 6225 (0xcab12000:cpu0): sleepq_broadcast(0xdef18678, 0)
6224 (0xcb61f780:cpu3): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:992]: oncpu = 255, cp15id = 3 \
6223 (0xcb61f780:cpu3): KTRGRAPH group:"load", id:"global load", counter:1, \
attributes: none 6222 (0xcb61f780:cpu3): KTRGRAPH group:"thread", id:"md0 tid \
100027", state:"sleep", attributes: prio:92, wmesg:"mdwait", lockname:"(null)" 6221 \
(0xcb61f780:cpu3): mi_switch: old thread 100027 (td_sched 0xcb61faf8, pid 3, md0) \
6220 (0xcb61f780:cpu3): KTRGRAPH group:"thread", id:"md0 tid 100027", point:"priority \
change", attributes: prio:92, new prio:92, linkedto:"md0 tid 100027" 6219 \
(0xcab14780:cpu1): mi_switch: new thread 100003 (td_sched 0xcab14af8, pid 10, idle: \
cpu1) 6218 (0xcab14780:cpu1): KTRGRAPH group:"thread", id:"idle: cpu1 tid 100003", \
state:"running", attributes: prio:255 6217 (0xcab14780:cpu1): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:1091]: oncpu = 1, cp15id = 1 \
6216 (0xcab14780:cpu1): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:992]: oncpu = 255, cp15id = 1 \
6215 (0xcab14780:cpu1): KTRGRAPH group:"thread", id:"idle: cpu1 tid 100003", \
state:"idle", attributes: prio:255 6214 (0xcab14780:cpu1): mi_switch: old thread \
100003 (td_sched 0xcab14af8, pid 10, idle: cpu1) 6213 (0xcab12000:cpu0): sleepq \
resume: thread 0xcab12000 (pid 12, g_up) 6212 (0xcab12000:cpu0): mi_switch: new \
thread 100013 (td_sched 0xcab12378, pid 12, g_up) 6211 (0xcab12000:cpu0): KTRGRAPH \
group:"thread", id:"g_up tid 100013", state:"running", attributes: prio:92 6210 \
(0xcab12000:cpu0): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:1091]: oncpu = 0, cp15id = 0 \
6209 (0xcab14b40:cpu0): WM_ADDED \
[/home/wma/git/undiscloseddevice/sys/kern/sched_4bsd.c:992]: oncpu = 255, cp15id = 0 \
6208 (0xcab14b40:cpu0): KTRGRAPH group:"thread", id:"idle: cpu0 tid 100002", \
state:"idle", attributes: prio:255 6207 (0xcab14b40:cpu0): mi_switch: old thread \
100002 (td_sched 0xcab14eb8, pid 10, idle: cpu0) 6206 (0xcb61f780:cpu3): sleep: \
thread 100027 (pid 3, md0) on mdwait (0xcb618000) db> show vtop 0x8000
Physical address reg: 0x0000000f
db> show cp15
Cpu ID: 0x412fc0f3
Current Cache Lvl ID: 0x8444c004
Ctrl: 0x00c5387f
Aux Ctrl: 0x80000000
Processor Feat 0: 0x00001131
Processor Feat 1: 0x00011011
Debug Feat 0: 0x02010555
Auxiliary Feat 0: 0x00000000
Memory Model Feat 0: 0x10201105
Memory Model Feat 1: 0x20000000
Memory Model Feat 2: 0x01240000
Memory Model Feat 3: 0x02102211
TTB0: 0x012e806a
db> show proc 1
Process 1 (kernel) at 0xcab11000:
 state: NORMAL
 uid: 0  gids: 0
 parent: pid 0 at 0xc12d0fc8
 ABI: FreeBSD ELF32
 pmap = 0xc13fcabc
 threads: 1
100001                   Run     CPU 2               [kernel] TTB=0x12e8000 DACR=0x5
db> show vtop 0x8000
Physical address reg: 0x0000000f
db> armv7_tlb_flushID
No such command
db> call armv7_tlb_flushID
= 0
db> show vtop 0x8000      
Physical address reg: 0x0215a2d4
db> 



_______________________________________________
freebsd-arm@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe@freebsd.org"

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic