[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ocfs2-users
Subject:    Re: [Ocfs2-users] ocfs cluster node keeps rebooting
From:       Sunil Mushran <sunil.mushran () gmail ! com>
Date:       2013-01-14 20:12:28
Message-ID: CAEeiSHWdv0FD--o+zjM4OdWkAFc6VKbx+5D-zfzK6X5cjc_SHw () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


1.2.5 is 6+ year old release. You may want to use something more current.


On Mon, Jan 14, 2013 at 12:06 PM, Bill Zha <lfl2000us@yahoo.com> wrote:

> Hi Sunil and All,
>
> We have a 10 Redhat4.2-node OCFS cluster running on version 1.2.5-6.  One
> of the node started to rebooted almost everyday since last week.  The
> entire cluster had been stable for the past 1 year or so.  I captured the
> following console output, can you or someone had the similar issue let me
> know what the possible cause of these reboots?
>
> (25271,4):o2net_idle_timer:1426 here are some times that might help debug
> the situation: (tmr 1358156758.101016 now 1358156788.97593 dr
> 1358156758.101008 adv 1358156758.101022:1358156758.101024 func
> (5d21e188:507) 1357953447.247097:1357953447.247100)
> (25267,4):o2net_idle_timer:1426 here are some times that might help debug
> the situation: (tmr 1358156758.666788 now 1358156788.663604 dr
> 1358156760.666794 adv 1358156758.666793:1358156758.666795 func
> (5d21e188:505) 1357953453.107343:1357953453.107349)
> (25267,4):o2net_idle_timer:1426 here are some times that might help debug
> the situation: (tmr 1358156758.848933 now 1358156788.953367 dr
> 1358156760.847939 adv 1358156758.848939:1358156758.848941 func
> (0e6eb1eb:505) 1357965605.352156:1357965605.352162)
> (25267,4):o2net_idle_timer:1426 here are some times that might help debug
> the situation: (tmr 1358156759.108373 now 1358156789.243003 dr
> 1358156761.108392 adv 1358156759.108376:1358156759.108378 func
> (af22ae1f:502) 1357914301.741127:1357914301.741130)
> (25275,4):o2net_idle_timer:1426 here are some times that might help debug
> the situation: (tmr 1358156759.626366 now 1358156789.623629 dr
> 1358156789.622319 adv 1358156759.626369:1358156759.626371 func
> (abd851aa:505) 1357965605.363679:1357965605.363685)
> (25275,4):o2net_idle_timer:1426 here are some times that might help debug
> the situation: (tmr 1358156759.656350 now 1358156789.913330 dr
> 1358156761.656039 adv 1358156759.656354:1358156759.656355 func
> (0e6eb1eb:502) 1357907401.318584:1357907401.318587)
> (25275,4):o2net_idle_timer:1426 here are some times that might help debug
> the situation: (tmr 1358156759.663467 now 1358156790.203323 dr
> 1358156761.662745 adv 1358156759.663470:1358156759.663472 func
> (7dcded64:502) 1357875986.764566:1357875986.764568)
> (25275,4):o2net_idle_timer:1426 here are some times that might help debug
> the situation: (tmr 1358156759.987324 now 1358156790.493342 dr
> 1358156761.987117 adv 1358156759.987327:1358156759.987329 func
> (6bcd2bc6:502) 1357875995.222247:1357875995.222255)
> (25,7):o2hb_write_timeout:269 ERROR: Heartbeat write timeout to device
> dm-14 after 18 milliseconds
> Heartbeat thread (25) printing last 24 blocking operations (cur = 11):
> Heartbeat thread stuck at msleep, stuffing current time into that blocker
> (index 11)
> Index 12: took 0 ms to do allocating bios for read
> Index 13: took 0 ms to do bio alloc read
> Index 14: took 0 ms to do bio add page read
> Index 15: took 0 ms to do bio add page read
> Index 16: took 0 ms to do submit_bio for read
> Index 17: took 0 ms to do waiting for read completion
> Index 18: took 0 ms to do bio alloc write
> Index 19: took 0 ms to do bio add page write
> Index 20: took 0 ms to do submit_bio for write
> Index 21: took 0 ms to do checking slots
> Index 22: took 0 ms to do waiting for write completion
> Index 23: took 100897 ms to do msleep
> Index 0: took 0 ms to do allocating bios for read
> Index 1: took 0 ms to do bio alloc read
> Index 2: took 0 ms to do bio add page read
> Index 3: took 0 ms to do bio add page read
> Index 4: took 0 ms to do submit_bio for read
> Index 5: took 0 ms to do waiting for read completion
> Index 6: took 0 ms to do bio alloc write
> Index 7: took 0 ms to do bio add page write
> Index 8: took 0 ms to do submit_bio for write
> Index 9: took 0 ms to do checking slots
> Index 10: took 0 ms to do waiting for write completion
> Index 11: took 313 ms to do msleep
> *** ocfs2 is very sorry to be fencing this system by restarting ***
>
>
> Thank you so much for your help!
>
>
> Bill
>

[Attachment #5 (text/html)]

<div dir="ltr">1.2.5 is 6+ year old release. You may want to use something more \
current.</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Jan \
14, 2013 at 12:06 PM, Bill Zha <span dir="ltr">&lt;<a \
href="mailto:lfl2000us@yahoo.com" target="_blank">lfl2000us@yahoo.com</a>&gt;</span> \
wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex">Hi Sunil and All,<br> <br>
We have a 10 Redhat4.2-node OCFS cluster running on version 1.2.5-6.  One of the node \
started to rebooted almost everyday since last week.  The entire cluster had been \
stable for the past 1 year or so.  I captured the following console output, can you \
or someone had the similar issue let me know what the possible cause of these \
reboots?<br>

<br>
(25271,4):o2net_idle_timer:1426 here are some times that might help debug the \
situation: (tmr 1358156758.101016 now 1358156788.97593 dr 1358156758.101008 adv \
1358156758.101022:1358156758.101024 func (5d21e188:507) \
1357953447.247097:1357953447.247100)<br>

(25267,4):o2net_idle_timer:1426 here are some times that might help debug the \
situation: (tmr 1358156758.666788 now 1358156788.663604 dr 1358156760.666794 adv \
1358156758.666793:1358156758.666795 func (5d21e188:505) \
1357953453.107343:1357953453.107349)<br>

(25267,4):o2net_idle_timer:1426 here are some times that might help debug the \
situation: (tmr 1358156758.848933 now 1358156788.953367 dr 1358156760.847939 adv \
1358156758.848939:1358156758.848941 func (0e6eb1eb:505) \
1357965605.352156:1357965605.352162)<br>

(25267,4):o2net_idle_timer:1426 here are some times that might help debug the \
situation: (tmr 1358156759.108373 now 1358156789.243003 dr 1358156761.108392 adv \
1358156759.108376:1358156759.108378 func (af22ae1f:502) \
1357914301.741127:1357914301.741130)<br>

(25275,4):o2net_idle_timer:1426 here are some times that might help debug the \
situation: (tmr 1358156759.626366 now 1358156789.623629 dr 1358156789.622319 adv \
1358156759.626369:1358156759.626371 func (abd851aa:505) \
1357965605.363679:1357965605.363685)<br>

(25275,4):o2net_idle_timer:1426 here are some times that might help debug the \
situation: (tmr 1358156759.656350 now 1358156789.913330 dr 1358156761.656039 adv \
1358156759.656354:1358156759.656355 func (0e6eb1eb:502) \
1357907401.318584:1357907401.318587)<br>

(25275,4):o2net_idle_timer:1426 here are some times that might help debug the \
situation: (tmr 1358156759.663467 now 1358156790.203323 dr 1358156761.662745 adv \
1358156759.663470:1358156759.663472 func (7dcded64:502) \
1357875986.764566:1357875986.764568)<br>

(25275,4):o2net_idle_timer:1426 here are some times that might help debug the \
situation: (tmr 1358156759.987324 now 1358156790.493342 dr 1358156761.987117 adv \
1358156759.987327:1358156759.987329 func (6bcd2bc6:502) \
1357875995.222247:1357875995.222255)<br>

(25,7):o2hb_write_timeout:269 ERROR: Heartbeat write timeout to device dm-14 after 18 \
milliseconds<br> Heartbeat thread (25) printing last 24 blocking operations (cur = \
11):<br> Heartbeat thread stuck at msleep, stuffing current time into that blocker \
(index 11)<br> Index 12: took 0 ms to do allocating bios for read<br>
Index 13: took 0 ms to do bio alloc read<br>
Index 14: took 0 ms to do bio add page read<br>
Index 15: took 0 ms to do bio add page read<br>
Index 16: took 0 ms to do submit_bio for read<br>
Index 17: took 0 ms to do waiting for read completion<br>
Index 18: took 0 ms to do bio alloc write<br>
Index 19: took 0 ms to do bio add page write<br>
Index 20: took 0 ms to do submit_bio for write<br>
Index 21: took 0 ms to do checking slots<br>
Index 22: took 0 ms to do waiting for write completion<br>
Index 23: took 100897 ms to do msleep<br>
Index 0: took 0 ms to do allocating bios for read<br>
Index 1: took 0 ms to do bio alloc read<br>
Index 2: took 0 ms to do bio add page read<br>
Index 3: took 0 ms to do bio add page read<br>
Index 4: took 0 ms to do submit_bio for read<br>
Index 5: took 0 ms to do waiting for read completion<br>
Index 6: took 0 ms to do bio alloc write<br>
Index 7: took 0 ms to do bio add page write<br>
Index 8: took 0 ms to do submit_bio for write<br>
Index 9: took 0 ms to do checking slots<br>
Index 10: took 0 ms to do waiting for write completion<br>
Index 11: took 313 ms to do msleep<br>
*** ocfs2 is very sorry to be fencing this system by restarting ***<br>
<br>
<br>
Thank you so much for your help!<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
Bill<br>
</font></span></blockquote></div><br></div>



_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic