[prev in list] [next in list] [prev in thread] [next in thread]
List: drbd-user
Subject: [DRBD-user] kernel hang problem with 0.7.13
From: Harry Edmon <harry () atmos ! washington ! edu>
Date: 2005-09-20 15:33:06
Message-ID: 43302BB2.6030800 () atmos ! washington ! edu
[Download RAW message or body]
Sorry if this appears twice - I sent it the first time from the wrong
e-mail address.
I have a kernel hang problem with drbd 0.7.13. Unfortunately it is a
hard hang, so I have no kernel traces. The problem occurs when I have
one system up as the primary under 0.7.13, and I then bring up a second
system running 0.7.13 as a secondary and it starts to sync. When the
secondary sync starts up the primary hangs. Under 0.7.11 the problem
does not happen. Here are the kernel messages from the primary right
before the hang:
Sep 20 07:55:42 dew2 kernel: drbd0: drbd0_receiver [3762]: cstate
WFConnection --> WFReportParams
Sep 20 07:55:42 dew2 kernel: drbd0: Handshake successful: DRBD Network
Protocol version 74
Sep 20 07:55:42 dew2 kernel: drbd0: Connection established.
Sep 20 07:55:42 dew2 kernel: drbd0: I am(P):
1:00000003:00000001:00000006:00000002:10
Sep 20 07:55:42 dew2 kernel: drbd1: drbd1_receiver [3770]: cstate
WFConnection --> WFReportParams
Sep 20 07:55:42 dew2 kernel: drbd1: Handshake successful: DRBD Network
Protocol version 74
Sep 20 07:55:42 dew2 kernel: drbd1: Connection established.
Sep 20 07:55:42 dew2 kernel: drbd1: I am(P):
1:00000003:00000001:00000006:00000002:10
Sep 20 07:55:42 dew2 kernel: drbd1: Peer(S):
1:00000003:00000001:00000006:00000001:11
Sep 20 07:55:42 dew2 kernel: drbd1: drbd1_receiver [3770]: cstate
WFReportParams --> WFBitMapS
Sep 20 07:55:42 dew2 kernel: drbd0: Peer(S):
1:00000003:00000001:00000006:00000001:11
Sep 20 07:55:42 dew2 kernel: drbd0: drbd0_receiver [3762]: cstate
WFReportParams --> WFBitMapSSep 20 07:55:42 dew2 kernel: drbd0:
Primary/Unknown --> Primary/Secondary
Sep 20 07:55:42 dew2 kernel: drbd0: drbd0_receiver [3762]: cstate
WFBitMapS --> SyncSource
Sep 20 07:55:42 dew2 kernel: drbd0: Resync started as SyncSource (need
to sync 1183752 KB [295938 bits set]).
Sep 20 07:55:42 dew2 kernel: drbd1: Primary/Unknown --> Primary/Secondary
Sep 20 07:55:42 dew2 kernel: drbd1: drbd1_receiver [3770]: cstate
WFBitMapS --> SyncSource
Sep 20 07:55:42 dew2 kernel: drbd1: Resync started as SyncSource (need
to sync 1192960 KB [298240 bits set]).
No other messages show up on the primary - it hangs here.
Here is what the secondary shows:
Sep 20 07:55:30 dew1 kernel: drbd: initialised. Version: 0.7.13
(api:77/proto:74)
Sep 20 07:55:30 dew1 kernel: drbd: SVN Revision: 1961 build by
root@dew1, 2005-09-18 19:52:12
Sep 20 07:55:38 dew1 kernel: drbd0: resync bitmap: bits=15030177
words=469694
Sep 20 07:55:38 dew1 kernel: drbd0: size = 57 GB (60120708 KB)
Sep 20 07:55:38 dew1 kernel: drbd0: 0 KB marked out-of-sync by on disk
bit-map.
Sep 20 07:55:38 dew1 kernel: drbd0: Found 6 transactions (324 active
extents) in activity log.
Sep 20 07:55:38 dew1 kernel: drbd0: Marked additional 128 MB as
out-of-sync based on AL.Sep 20 07:55:40 dew1 kernel: drbd0: drbdsetup
[3615]: cstate Unconfigured --> StandAlone
Sep 20 07:55:40 dew1 kernel: drbd1: resync bitmap: bits=43023440
words=1344484
Sep 20 07:55:40 dew1 kernel: drbd1: size = 164 GB (172093760 KB)
Sep 20 07:55:41 dew1 kernel: drbd1: 0 KB marked out-of-sync by on disk
bit-map.
Sep 20 07:55:41 dew1 kernel: drbd1: Found 6 transactions (324 active
extents) in activity log.
Sep 20 07:55:41 dew1 kernel: drbd1: Marked additional 128 MB as
out-of-sync based on AL.
Sep 20 07:55:42 dew1 kernel: drbd1: drbdsetup [3630]: cstate
Unconfigured --> StandAlone
Sep 20 07:55:42 dew1 kernel: drbd0: drbdsetup [3648]: cstate StandAlone
--> Unconnected
Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate
Unconnected --> WFConnection
Sep 20 07:55:42 dew1 kernel: drbd1: drbdsetup [3656]: cstate StandAlone
--> Unconnected
Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate
Unconnected --> WFConnection
Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate
WFConnection --> WFReportParams
Sep 20 07:55:42 dew1 kernel: drbd0: Handshake successful: DRBD Network
Protocol version 74
Sep 20 07:55:42 dew1 kernel: drbd0: Connection established.
Sep 20 07:55:42 dew1 kernel: drbd0: I am(S):
1:00000003:00000001:00000006:00000001:11
Sep 20 07:55:42 dew1 kernel: drbd0: Peer(P):
1:00000003:00000001:00000006:00000002:10
Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate
WFReportParams --> WFBitMapT
Sep 20 07:55:42 dew1 kernel: drbd0: Secondary/Unknown --> Secondary/Primary
Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate
WFConnection --> WFReportParams
Sep 20 07:55:42 dew1 kernel: drbd1: Handshake successful: DRBD Network
Protocol version 74
Sep 20 07:55:42 dew1 kernel: drbd1: Connection established.
Sep 20 07:55:42 dew1 kernel: drbd1: I am(S):
1:00000003:00000001:00000006:00000001:11
Sep 20 07:55:42 dew1 kernel: drbd1: Peer(P):
1:00000003:00000001:00000006:00000002:10
Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate
WFReportParams --> WFBitMapT
Sep 20 07:55:42 dew1 kernel: drbd1: Secondary/Unknown --> Secondary/Primary
Sep 20 07:55:42 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate
WFBitMapT --> SyncTarget
Sep 20 07:55:42 dew1 kernel: drbd0: Resync started as SyncTarget (need
to sync 1183752 KB [295938 bits set]).
Sep 20 07:55:42 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate
WFBitMapT --> SyncTarget
Sep 20 07:55:42 dew1 kernel: drbd1: Resync started as SyncTarget (need
to sync 1192960 KB [298240 bits set]).
Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_asender [3674]: cstate
SyncTarget --> NetworkFailure
Sep 20 07:56:13 dew1 kernel: drbd0: asender terminated
Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate
NetworkFailure --> BrokenPipe
Sep 20 07:56:13 dew1 kernel: drbd0: worker terminated
Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate
BrokenPipe --> Unconnected
Sep 20 07:56:13 dew1 kernel: drbd0: Connection lost.
Sep 20 07:56:13 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate
Unconnected --> WFConnection
Sep 20 07:56:13 dew1 kernel: drbd1: drbd1_asender [3675]: cstate
SyncTarget --> NetworkFailure
Sep 20 07:56:14 dew1 kernel: drbd1: asender terminated
Sep 20 07:56:14 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate
NetworkFailure --> BrokenPipe
Sep 20 07:56:14 dew1 kernel: drbd1: short read receiving data block:
read 3904 expected 4096
Sep 20 07:56:14 dew1 kernel: drbd1: worker terminated
Sep 20 07:56:14 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate
BrokenPipe --> Unconnected
Sep 20 07:56:14 dew1 kernel: drbd1: Connection lost.
Sep 20 07:56:14 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate
Unconnected --> WFConnection
Sep 20 07:57:55 dew1 kernel: drbd0: drbdsetup [3805]: cstate
WFConnection --> Unconnected
Sep 20 07:57:55 dew1 kernel: drbd0: worker terminated
Sep 20 07:57:55 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate
Unconnected --> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd0: Connection lost.
Sep 20 07:57:55 dew1 kernel: drbd0: Discarding network configuration.
Sep 20 07:57:55 dew1 kernel: drbd0: drbd0_receiver [3649]: cstate
StandAlone --> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd0: receiver terminated
Sep 20 07:57:55 dew1 kernel: drbd0: drbdsetup [3805]: cstate StandAlone
--> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd0: drbdsetup [3805]: cstate StandAlone
--> Unconfigured
Sep 20 07:57:55 dew1 kernel: drbd0: worker terminated
Sep 20 07:57:55 dew1 kernel: drbd1: drbdsetup [3807]: cstate
WFConnection --> Unconnected
Sep 20 07:57:55 dew1 kernel: drbd1: worker terminated
Sep 20 07:57:55 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate
Unconnected --> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd1: Connection lost.
Sep 20 07:57:55 dew1 kernel: drbd1: Discarding network configuration.
Sep 20 07:57:55 dew1 kernel: drbd1: drbd1_receiver [3657]: cstate
StandAlone --> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd1: receiver terminated
Sep 20 07:57:55 dew1 kernel: drbd1: drbdsetup [3807]: cstate StandAlone
--> StandAlone
Sep 20 07:57:55 dew1 kernel: drbd1: drbdsetup [3807]: cstate StandAlone
--> Unconfigured
Sep 20 07:57:55 dew1 kernel: drbd1: worker terminated
Sep 20 07:57:55 dew1 kernel: drbd: module cleanup done.
This is not the only pair of systems I have seen with this hang. In
every case the systems are dual Xeon boxes with hyperthreading turned
on. I have had this problem with kernels from 2.6.11.7 - 2.6.13.2. Any
ideas?
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic