[prev in list] [next in list] [prev in thread] [next in thread]
List: ocfs2-users
Subject: [Ocfs2-users] two nodes hang
From: Thomas Lau <thomaslau () esun ! com>
Date: 2011-04-19 9:59:16
Message-ID: 4DAD5CF4.3010502 () esun ! com
[Download RAW message or body]
we have total 6 nodes which is running ocfs2, then all of sudden server1
and server2 hang:
server1 log:
Apr 19 17:28:06 server1 kernel: o2net: connection to node server2 (num
8) at 10.10.10.11:7777 has been idle for 60.0 seconds, shutting it down.
Apr 19 17:28:06 server1 kernel: (swapper,0,2):o2net_idle_timer:1503 here
are some times that might help debug the situation: (tmr
1303205226.698111 now 1303205286.697866 dr 1303205226.698364 adv
1303205226.698371:1303205226.698372 func (a53de746:506)
1303205226.698112:1303205226.698117)
Apr 19 17:28:06 server1 kernel: o2net: no longer connected to node
server2 (num 8) at 10.10.10.11:7777
Apr 19 17:28:06 server1 kernel: (nfsd,5938,2):dlm_do_master_request:1334
ERROR: link to 8 went down!
Apr 19 17:28:06 server1 kernel: (nfsd,5938,2):dlm_get_lock_resource:917
ERROR: status = -112
Apr 19 17:28:06 server1 kernel:
(httpd,983,2):dlm_send_remote_convert_request:395 ERROR: status = -112
Apr 19 17:28:06 server1 kernel:
(httpd,983,2):dlm_wait_for_node_death:370
8A93E08BB47B4ABFBC4FD0AD1744EFC2: waiting 5000ms for notification of
death of node 8
Apr 19 17:28:06 server1 kernel:
(httpd,1061,2):dlm_do_master_request:1334 ERROR: link to 8 went down!
Apr 19 17:28:06 server1 kernel: (httpd,1061,2):dlm_get_lock_resource:917
ERROR: status = -112
Apr 19 17:28:06 server1 kernel:
(httpd,1137,2):dlm_do_master_request:1334 ERROR: link to 8 went down!
Apr 19 17:28:06 server1 kernel: (httpd,1137,2):dlm_get_lock_resource:917
ERROR: status = -112
Apr 19 17:28:11 server1 kernel:
(httpd,983,2):dlm_send_remote_convert_request:395 ERROR: status = -107
Apr 19 17:28:11 server1 kernel:
(httpd,983,2):dlm_wait_for_node_death:370
8A93E08BB47B4ABFBC4FD0AD1744EFC2: waiting 5000ms for notification of
death of node 8
Apr 19 17:28:16 server1 kernel:
(httpd,983,2):dlm_send_remote_convert_request:395 ERROR: status = -107
Apr 19 17:28:16 server1 kernel:
(httpd,983,2):dlm_wait_for_node_death:370
8A93E08BB47B4ABFBC4FD0AD1744EFC2: waiting 5000ms for notification of
death of node 8
server2:
Apr 19 17:28:05 server2 kernel: o2net: no longer connected to node
server1 (num 7) at 10.10.10.10:7777
Apr 19 17:28:05 server2 kernel:
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -112
Apr 19 17:28:05 server2 kernel:
(dlm_thread,13293,3):dlm_purge_lockres:206 ERROR: status = -112
Apr 19 17:28:05 server2 kernel:
(httpd,11084,0):dlm_do_master_request:1334 ERROR: link to 7 went down!
Apr 19 17:28:05 server2 kernel:
(httpd,11084,0):dlm_get_lock_resource:917 ERROR: status = -112
Apr 19 17:28:05 server2 kernel:
(httpd,8376,2):dlm_do_master_request:1334 ERROR: link to 7 went down!
Apr 19 17:28:05 server2 kernel: (httpd,8376,2):dlm_get_lock_resource:917
ERROR: status = -112
Apr 19 17:28:05 server2 kernel:
(crond,7757,2):dlm_send_remote_unlock_request:359 ERROR: status = -112
Apr 19 17:28:05 server2 kernel:
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -107
Apr 19 17:28:05 server2 kernel:
(dlm_thread,13293,3):dlm_purge_lockres:206 ERROR: status = -107
Apr 19 17:28:05 server2 kernel:
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -107
Apr 19 17:28:05 server2 kernel:
(dlm_thread,13293,3):dlm_purge_lockres:206 ERROR: status = -107
Apr 19 17:28:05 server2 kernel:
(crond,7757,2):dlm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:05 server2 kernel:
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -107
Apr 19 17:28:06 server2 kernel:
m_send_remolm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:06 server2 kernel:
(httpd,11282,0):dlm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:18 server2 last message repeated 7 times
Apr 19 17:28:18 server2 kernel:
(sshd,7691,0):dlm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:19 server2 last message repeated 45 times
Apr 19 17:28:19 server2 kernel:
(sshd,7691,0):dlm_send_remote_unlock_request:359 ERROR: status = -10OR:
status = -107
Anyone have idea why?
--
Thomas Lau
Infrastructure Delivery Manager
eSun Holdings Limited
Mobile: 852-93239670
Office phone: 29058104
"always I strive to push the boundaries of what we know, and what seems possible to \
us at this moment in time. The walls between art and engineering exist only in our \
minds, and few have the imagination to see beyond them."
– Theo Jansen
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic