[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    [Lustre-discuss] crashes during dd
From:       efocht () hpce ! nec ! com (Erich Focht)
Date:       2007-08-22 10:36:08
Message-ID: 200708221836.40521.efocht () hpce ! nec ! com
[Download RAW message or body]

Hi,

is there anybody who can read these messages and give me a hint where to
look for the problem? I'm getting rather easilly this LBUG due to either
(o2iblnd_cb.c:1068:kiblnd_tx_complete()) ASSERTION(tx->tx_sending > 0) failed
or
(o2iblnd_cb.c:171:kiblnd_get_idle_tx()) ASSERTION(tx->tx_sending == 0) failed

Using lustre 1.6.1 as downloaded, on top of RHEL4U5, with o2ib and getting
this a few times per day while writing huge files with "dd".

Any hint (where to look into this further) would be very welcome! Some more
surroundings of the error message are below.

Best regards,
Erich




Lustre: necd3-OST0000-osc-0000010080cd5800: Connection restored to service \
                necd3-OST0000 using nid 192.168.0.27@o2ib.
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status -5, \
                desc 00000100156ba000
LustreError: 6820:0:(events.c:134:client_bulk_callback()) event type 0, status -5, \
                desc 000001001f1f4000
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status -5, \
                desc 00000100a10b0000
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status -5, \
                desc 000001002e3fa000
LustreError: 6820:0:(events.c:134:client_bulk_callback()) event type 0, status -5, \
                desc 0000010066604000
LustreError: 6819:0:(events.c:134:client_bulk_callback()) event type 0, status -5, \
                desc 0000010070022000
LustreError: 6820:0:(events.c:55:request_out_callback()) @@@ type 4, status -5  \
req@0000010135463800 x1000806/t0 o400->MGS@MGC192.168.0.23@o2ib_0:26 lens 128/128 ref \
                2 fl Rpc:N/0/0 rc 0/-22
LustreError: 6820:0:(events.c:55:request_out_callback()) Skipped 6 previous similar \
                messages
LustreError: 6820:0:(o2iblnd_cb.c:1068:kiblnd_tx_complete()) ASSERTION(tx->tx_sending \
>                 0) failed
LustreError: 6819:0:(o2iblnd_cb.c:1068:kiblnd_tx_complete()) ASSERTION(tx->tx_sending \
>                 0) failed
LustreError: 6819:0:(tracefile.c:433:libcfs_assertion_failed()) LBUG
Lustre: 6819:0:(linux-debug.c:168:libcfs_debug_dumpstack()) showing stack for process \
6819 kiblnd_sd_00  R  running task       0  6819      1          6824  6820 (L-TLB)
0000000000000000 0000000000000000 ffffffffa028d43d 0000000000000005
       ffffff000006c5a0 0000000000000000 0000000000000005 ffffffffa0288894
       0000000000000000 0000000000000000
Call Trace:<ffffffffa0288894>{:libcfs:libcfs_assertion_failed+84}
       <ffffffffa0404d53>{:ko2iblnd:kiblnd_tx_complete+67}
       <0>LustreError: 6820:0:(tracefile.c:433:libcfs_assertion_failed()) LBUG
kiblnd_sd_01  R  running task       0  6820      1          6819  6821 (L-TLB)
0000000000000000 0000000000000000 ffffffffa028d43d 0000000000000005
       ffffff000006c6d0 0000000000000000 0000000000000005 ffffffffa0288894
       <ffffffff80133741>{__wake_up+54} \
<ffffffffa0409e60>{:ko2iblnd:kiblnd_scheduler+736}  0000000000000000 0000000000000000
Call Trace:<ffffffffa0288894>{:libcfs:libcfs_assertion_failed+84}
       <ffffffffa0404d53>{:ko2iblnd:kiblnd_tx_complete+67}
       <ffffffff8013369a>{default_wake_function+0} <ffffffff80110de3>{child_rip+8}
       <ffffffffa0409b80>{:ko2iblnd:kiblnd_scheduler+0} \
<ffffffff80133741>{__wake_up+54}<ffffffff80110ddb>{child_rip+0}

 <3>LustreError: 6824:0:(client.c:962:ptlrpc_expire_one_request()) @@@ network error \
(sent at 1187792190, 0s ago)  req@0000010135463800 x1000806/t0 \
o400->MGS@MGC192.168.0.23@o2ib_0:26 lens 128/128 ref 1 fl Rpc:N/0/0 rc 0/-22 \
                <ffffffffa0409e60>{:ko2iblnd:kiblnd_scheduler+736}
       <3>LustreError: 6824:0:(client.c:962:ptlrpc_expire_one_request()) Skipped 8 \
                previous similar messages
LustreError: 166-1: MGC192.168.0.23@o2ib: Connection to service MGS via nid \
192.168.0.23@o2ib was lost; in progress operations using this service will fail. \
<ffffffff8013369a>{default_wake_function+0} <1>LustreError: dumping log to \
/tmp/lustre-log.1187792190.6819 <ffffffff80110de3>{child_rip+8}
       <ffffffffa0409b80>{:ko2iblnd:kiblnd_scheduler+0} \
<ffffffff80110ddb>{child_rip+0}

LustreError: dumping log to /tmp/lustre-log.1187792190.6820
LustreError: 2697:0:(events.c:55:request_out_callback()) @@@ type 4, status -113  \
req@0000010080d9a800 x1000808/t0 o400->necd3-OST0000_UUID@192.168.0.27@o2ib:28 lens \
                128/128 ref 2 fl Rpc:N/0/0 rc 0/-22
LustreError: 2697:0:(events.c:55:request_out_callback()) Skipped 1 previous similar \
                message
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with \
                192.168.0.23@o2ib
Lustre: necd3-OST0000-osc-0000010080cd5800: Connection to service necd3-OST0000 via \
nid 192.168.0.27@o2ib was lost; in progress operations using this service will wait \
                for recovery to complete.
Lustre: Skipped 3 previous similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID  \
req@0000010037e16200 x1000867/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref \
                1 fl Rpc:/0/0 rc 0/0
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with \
                192.168.0.23@o2ib
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with \
                192.168.0.27@o2ib
LustreError: 6823:0:(events.c:55:request_out_callback()) @@@ type 4, status -103  \
req@00000100c7e68e00 x1000850/t0 o400->necd3-OST0003_UUID@192.168.0.27@o2ib:28 lens \
                128/128 ref 2 fl Rpc:N/0/0 rc 0/-22
LustreError: 6823:0:(events.c:55:request_out_callback()) Skipped 1 previous similar \
                message
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID  \
req@000001007e3be600 x1000871/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref \
                1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous \
                similar messages
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) @@@ timeout (sent at \
1187792290, 100s ago)  req@00000100c7e68a00 x1000856/t0 \
                o250->MGS@MGC192.168.0.23@o2ib_0:26 lens 304/328 ref 2 fl Rpc:/0/0 rc \
                0/-22
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) Skipped 26 previous \
                similar messages
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Timed out RDMA with \
                192.168.0.27@o2ib
LustreError: 6823:0:(o2iblnd_cb.c:2843:kiblnd_check_conns()) Skipped 1 previous \
                similar message
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID  \
req@0000010008040c00 x1000886/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref \
                1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous \
                similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID  \
req@0000010132b50200 x1000890/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref \
                1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous \
                similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID  \
req@000001007d98ea00 x1000905/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref \
                1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous \
                similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID  \
req@0000010135548e00 x1000913/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref \
                1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous \
                similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID  \
req@0000010037ee6200 x1000924/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref \
                1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous \
                similar messages
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) @@@ timeout (sent at \
1187792641, 100s ago)  req@00000100c7eb1a00 x1000917/t0 \
                o38->necd3-MDT0000_UUID@192.168.0.23@o2ib:12 lens 304/328 ref 2 fl \
                Rpc:/0/0 rc 0/-22
LustreError: 6825:0:(client.c:962:ptlrpc_expire_one_request()) Skipped 63 previous \
                similar messages
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) @@@ IMP_INVALID  \
req@0000010075926200 x1000939/t0 o101->MGS@MGC192.168.0.23@o2ib_0:26 lens 232/240 ref \
                1 fl Rpc:/0/0 rc 0/0
LustreError: 22354:0:(client.c:520:ptlrpc_import_delay_req()) Skipped 3 previous \
similar messages


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic