[prev in list] [next in list] [prev in thread] [next in thread]
List: linux-ha-dev
Subject: [Linux-ha-dev] Re: [Pacemaker] Re: Problems when DC node is
From: Satomi TANIGUCHI <taniguchis () intellilink ! co ! jp>
Date: 2008-10-16 6:43:36
Message-ID: 48F6E298.5080206 () intellilink ! co ! jp
[Download RAW message or body]
Hi Dejan,
Dejan Muhamedagic wrote:
> Hi Satomi-san,
>
> On Tue, Oct 14, 2008 at 07:07:00PM +0900, Satomi TANIGUCHI wrote:
>> Hi,
>>
>> I found that there are 2 problems when DC node is STONITH'ed.
>> (1) STONITH operation is executed two times.
>
> This has been discussed at length in bugzilla, see
>
> http://developerbugs.linux-foundation.org/show_bug.cgi?id=1904
>
> which was resolved with WONTFIX. In short, it was deemed to risky
> to implement a remedy for this problem. Of course, if you think
> you can add more to the discussion, please go ahead.
Sorry, I missed it.
Thank you for your pointing!
I understand how it came about.
Ideally, when DC-node is going to be STONITH'ed,
the new DC-node is elected and it STONITHs the ex-DC,
then these problems will not occur.
But maybe it is not good way from the viewpoint of emergency
because the ex-DC should be STONITH'ed as soon as possible.
Anyway, I understand this is an expected behavior, thanks!
But then, it seems that tengine has to keep having a timeout for waiting
stonithd's result, and long cluster-delay is still required.
Because second STONITH is requested on that transition timeout.
I'm afraid that I misunderstood the true meaning of what Andrew said.
>
>> (2) Timeout-value which stonithd on DC node waits to reply
>> the result of STONITH op from other node is
>> always set to "stonith-timeout" in <cluster_property_set>.
>> [...]
>> The case (2):
>> When this timeout occurs on stonithd on DC
>> during non-DC node's stonithd tries to reset DC,
>> DC-stonithd will send a request to other node,
>> and two or more STONITH plugins are executed in parallel.
>> This is a troublesome problem.
>> The most suitable value as this timeout might be
>> the sum total of "stonith-timeout" of STONITH plugins on the node
>> which is going to receive the STONITH request from DC node, I think.
>
> This would probably be very difficult for the CRM to get.
Right, I agree with you.
I meant "it is difficult because stonithd on DC can't know the values of
stonith-timeout on other node." with the following sentence
"But DC node can't know that...".
>
>> But DC node can't know that...
>> I would like to hear your opinions.
>
> Sorry, but I couldn't exactly follow. Could you please describe
> it in terms of actions.
Sorry, I restate what I meant.
The timeout which stonithd on DC waits for the return of other node's
stonithd needs the value that is longer than the sum total of "stonith-timeout"
of STONITH plugins on the node by all rights.
But it is so difficult to get the values for DC-stonithd.
Then I would like to hear your opinion about what is suitable and practical
value as this timeout which is set in insert_into_executing_queue().
I hope I conveyed to you what I want to say.
For reference, I attached logs when the aforesaid timeout occurs.
The cluster has 3 nodes.
When DC was going to be STONITH'ed, DC sent a request all of non-DC nodes,
and all of them tried to shutdown DC.
And the timeout on DC-stonithd occured, DC-stonithd sent the same request,
then two or more STONITH plugin worked in parallel on every non-DC node.
(Please see sysstats.txt.)
I want to make clear whether the current behavior is expected or a bug.
But I consider that the root of every problem is the node which sends STONITH
request and wait for completion of the op is killed.
Regards,
Satomi TANIGUCHI
>
> Thanks,
>
> Dejan
>
>> Best Regards,
>> Satomi TANIGUCHI
>
>
>> _______________________________________________________
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
["hb_report.tar.gz" (application/x-gzip)]
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic