[prev in list] [next in list] [prev in thread] [next in thread] 

List:       redhat-linux-cluster
Subject:    Re: [Linux-cluster] "openais[XXXX]" [TOTEM] Retransmit List: XXXXX"
From:       Bernard Chew <bernardchew () gmail ! com>
Date:       2010-04-20 4:14:08
Message-ID: y2i95994e3c1004192114g3420dae1hd473b2391077696d () mail ! gmail ! com
[Download RAW message or body]

> On Fri, Apr 9, 2010 at 4:51 PM, Bernard Chew <bernardchew@gmail.com> wrote:
>> On Thu, Apr 8, 2010 at 12:58 AM, Steven Dake <sdake@redhat.com> wrote:
>> On Wed, 2010-04-07 at 18:52 +0800, Bernard Chew wrote:
>>> Hi all,
>>>
>>> I noticed "openais[XXXX]" [TOTEM] Retransmit List: XXXXX" repeated
>>> every few hours in /var/log/messages. What does the message mean and
>>> is it normal? Will this cause fencing to take place eventually?
>>>
>> This means your network environment dropped packets and totem is
>> recovering them.  This is normal operation, and in future versions such
>> as corosync no notification is printed when recovery takes place.
>>
>> There is a bug, however, fixed in revision 2122 where if the last packet
>> in the order is lost, and no new packets are unlost after it, the
>> processor will enter a failed to receive state and trigger fencing.
>>
>> Regards
>> -steve
>>> Thank you in advance.
>>>
>>> Regards,
>>> Bernard Chew
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster@redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
> Thank you for the reply Steve!
>
> The cluster was running fine until last week where 3 nodes restarted
> suddenly. I suspect fencing took place since all 3 servers restarted
> at the same time but I couldn't find any fence related entries in the
> log. I am guessing we hit the bug you mentioned? Will the log indicate
> fencing has taken place with regards to the bug you mentioned?
>
> Also I noticed the message "kernel: clustat[28328]: segfault at
> 0000000000000024 rip 0000003b31c75bc0 rsp 00007fff955cb098 error 4"
> occasionally; is this related to the TOTEM message or they indicate
> another problem?
>
> Regards,
> Bernard Chew
>

Hi Steve.

Just wondering if you can point me to the bug mentioned?

Thank you.

Regards,
Bernard

--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic