[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ocfs2-users
Subject:    [Ocfs2-users] two nodes hang
From:       Thomas Lau <thomaslau () esun ! com>
Date:       2011-04-19 9:59:16
Message-ID: 4DAD5CF4.3010502 () esun ! com
[Download RAW message or body]

we have total 6 nodes which is running ocfs2, then all of sudden server1 
and server2 hang:

server1 log:
Apr 19 17:28:06 server1 kernel: o2net: connection to node server2 (num 
8) at 10.10.10.11:7777 has been idle for 60.0 seconds, shutting it down.
Apr 19 17:28:06 server1 kernel: (swapper,0,2):o2net_idle_timer:1503 here 
are some times that might help debug the situation: (tmr 
1303205226.698111 now 1303205286.697866 dr 1303205226.698364 adv 
1303205226.698371:1303205226.698372 func (a53de746:506) 
1303205226.698112:1303205226.698117)
Apr 19 17:28:06 server1 kernel: o2net: no longer connected to node 
server2 (num 8) at 10.10.10.11:7777
Apr 19 17:28:06 server1 kernel: (nfsd,5938,2):dlm_do_master_request:1334 
ERROR: link to 8 went down!
Apr 19 17:28:06 server1 kernel: (nfsd,5938,2):dlm_get_lock_resource:917 
ERROR: status = -112
Apr 19 17:28:06 server1 kernel: 
(httpd,983,2):dlm_send_remote_convert_request:395 ERROR: status = -112
Apr 19 17:28:06 server1 kernel: 
(httpd,983,2):dlm_wait_for_node_death:370 
8A93E08BB47B4ABFBC4FD0AD1744EFC2: waiting 5000ms for notification of 
death of node 8
Apr 19 17:28:06 server1 kernel: 
(httpd,1061,2):dlm_do_master_request:1334 ERROR: link to 8 went down!
Apr 19 17:28:06 server1 kernel: (httpd,1061,2):dlm_get_lock_resource:917 
ERROR: status = -112
Apr 19 17:28:06 server1 kernel: 
(httpd,1137,2):dlm_do_master_request:1334 ERROR: link to 8 went down!
Apr 19 17:28:06 server1 kernel: (httpd,1137,2):dlm_get_lock_resource:917 
ERROR: status = -112
Apr 19 17:28:11 server1 kernel: 
(httpd,983,2):dlm_send_remote_convert_request:395 ERROR: status = -107
Apr 19 17:28:11 server1 kernel: 
(httpd,983,2):dlm_wait_for_node_death:370 
8A93E08BB47B4ABFBC4FD0AD1744EFC2: waiting 5000ms for notification of 
death of node 8
Apr 19 17:28:16 server1 kernel: 
(httpd,983,2):dlm_send_remote_convert_request:395 ERROR: status = -107
Apr 19 17:28:16 server1 kernel: 
(httpd,983,2):dlm_wait_for_node_death:370 
8A93E08BB47B4ABFBC4FD0AD1744EFC2: waiting 5000ms for notification of 
death of node 8



server2:
Apr 19 17:28:05 server2 kernel: o2net: no longer connected to node 
server1 (num 7) at 10.10.10.10:7777
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -112
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_purge_lockres:206 ERROR: status = -112
Apr 19 17:28:05 server2 kernel: 
(httpd,11084,0):dlm_do_master_request:1334 ERROR: link to 7 went down!
Apr 19 17:28:05 server2 kernel: 
(httpd,11084,0):dlm_get_lock_resource:917 ERROR: status = -112
Apr 19 17:28:05 server2 kernel: 
(httpd,8376,2):dlm_do_master_request:1334 ERROR: link to 7 went down!
Apr 19 17:28:05 server2 kernel: (httpd,8376,2):dlm_get_lock_resource:917 
ERROR: status = -112
Apr 19 17:28:05 server2 kernel: 
(crond,7757,2):dlm_send_remote_unlock_request:359 ERROR: status = -112
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -107
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_purge_lockres:206 ERROR: status = -107
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -107
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_purge_lockres:206 ERROR: status = -107
Apr 19 17:28:05 server2 kernel: 
(crond,7757,2):dlm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -107
Apr 19 17:28:06 server2 kernel: 
m_send_remolm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:06 server2 kernel: 
(httpd,11282,0):dlm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:18 server2 last message repeated 7 times
Apr 19 17:28:18 server2 kernel: 
(sshd,7691,0):dlm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:19 server2 last message repeated 45 times
Apr 19 17:28:19 server2 kernel: 
(sshd,7691,0):dlm_send_remote_unlock_request:359 ERROR: status = -10OR: 
status = -107



Anyone have idea why?

-- 
Thomas Lau
Infrastructure Delivery Manager
eSun Holdings Limited
Mobile: 852-93239670
Office phone: 29058104


"always I strive to push the boundaries of what we know, and what seems possible to \
us at this moment in time. The walls between art and engineering exist only in our \
minds, and few have the imagination to see beyond them."

– Theo Jansen


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic