[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-ha-dev
Subject:    Re: [Linux-ha-dev] ordered messages - some behaviors
From:       "Zhu, Yi" <yi.zhu () intel ! com>
Date:       2003-09-13 9:42:37
[Download RAW message or body]

On Fri, 12 Sep 2003, Alan Robertson wrote:

> 
> Hi,
> 
> There are a few interesting behaviors of the ordered message delivery code
> that are easier to see now that I've incorporated the code in context in CVS...

All the problems described below have the same reason, the ordered message
layer depends on the "reliable" message from heartheat, but if the receiver
client starts after some packets arrived, these packets are not "reliable"
to the receiver client. How to deal with these packets is what we are
discussing below.

> 1) If node "A" sends an ordered message to node "B", and the "B" client
> application connects after "A" sends the message, and "A" is waiting for a
> reply from "B", that reply will never come - for a couple of reasons.  And,
> if A sends it again, it still won't be received - because "B" will be
> waiting for the first message...

Yes, this is a problem. But client should not handle message retransmission
itself in this example, because it is handled in the low level heartbeat
protocol. The reason node "A" retransmited the first packet is just want to
know if client "B" has started in this "timeout" interval. However this
makes low level protocol confused, because from the view of ordered message
layer, every packet from client is distinct (unique) and they must be
delivered to the destination client (if exists) reliable and in order. I
agree some methods must be provided to let "A" know "B" is started or not.
 
> It seems that one shouldn't send ordered messages to an individual node
> until the application on the destination node "signs in" - or acknowledges a
> sign-in.  And, (I think) signing in and the sign-in reply should occur using
> an unordered message.

It _can_ send, but the message is not guaranteed to be received. If the
sender client requires all the message must be received by the receiver,
the sender should first make sure the receiver is "signed in".

Do you think this is something like the UDP and TCP? Before we are UDP,
there is no requirements that there must be a receiver client receiving on
that port before sender sends. Now we added some TCP features (like order),
then we require there must be a receiver listening on that port before the
sender sends. But because we did not define the connect (sign in) sematic,
we have some problem. So now we are defining it. ;-)

> In the case of cluster-scope messages, one shouldn't expect participation or
> a reply from a given node until it has "signed on".

Yes.

> And, I think there is a related bug here...
> 
> If node A is up, and B signs in, but A hasn't written very many messages
> yet, then B won't receive any messages until at least SEQGAP packets have
> been missed...
> 
> The case of ordered cluster-scoped messages seems to not be handled
> correctly here...

Yes, given the fact we do not have the connect sematic. I have to
distinguish the first packet from "A" is delayed or really lost.

> I would claim that any packets whose sequence number is lower than that of
> the first packet received on a cluster-scoped message channel should just be
> counted as lost - and that's that...

This is not correct. For example, both A and B are up. A sends packets 1,
2, 3 to B. B receives in the order 3, 1, 2. Do you think B should receive
1, 2, 3 or just 3?

> In fact, there's another kind of bug here...  And this one may not be so
> easy to fix...
> 
> Here's the sequence:
>         packet 4 is received by heartbeat
>         application starts
>         packet 3 is received by heartbeat
>         packet 5 is received - and held onto by heartbeat
>         application will never receive packet 4
>         Eventually (about 64 packets later) it will get excited and deliver
>                 some packets to the application - with gaps in the seqnos...
> 
> It seems to me that this is going to confuse applications...
> 
> What you really want to say is that the application can do two things:
>         inquire about the current sequence number of a given channel/node
>         tell the system that the next packet is the lowest sequence number
>                 they're interested in for communication from a given node
> 
> So for cluster-wide communication channels, the paradigm would be:
> 
>         B sends a signon message to the cluster (unordered)
>                 [Perhaps this could even be the current automatically
>                         generated "b has joined" message]
		 B initially sets minimum sequence number to 0 for each node and
			begin to put received ordered message to order queue.
>         Each node retrieves their current sequence number for the cluster
>                 ordered message channel
>         Each node sends an unsequenced "I see you" message to B with its
>                 current ordered cluster-wide sequence number in it
>         B receives each message and sets the minimum sequence number
>                 for communication from each node.

This is a very brilliant idea! This solves all above problems.

One addition (please see above) to prevent the "I see you" message delayed.
For example:

	B sends a signon message to A
	A retrieves the current sequence number which is 6
	A sends "I see you" reply to B together with "6"
	A sends ordered message 6, 7 to cluster
	(B receives 6, "I see you", 7)
	B puts 6 to order queue (because the current minimum seqno is 0)
	B saw "I see you" from A, set A order queue current minimum seqno
		to 6 and clear before slots (1~5)
	B puts 7 to order queue

> The whole point of all this is that it will now receive packets after a
> point in time when it's guaranteed that none have been missed - and not hear
> them in the middle of some kind of ongoing transaction.

Fully agree.

> 
> --
>      Alan Robertson <alanr@unix.sh>
> 
> "Openness is the foundation and preservative of friendship...  Let me claim
> from you at all times your undisguised opinions." - William Wilberforce
> 
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
> 
> FLAGS (\Seen \Recent))
> 

-- 
-----------------------------------------------------------------
Opinions expressed are those of the author and do not represent
Intel Corp.

Zhu Yi (Chuyee)
Intel China Software Lab (ICSL)
22nd Floor, ShanghaiMart Tower No. 2299 Yan'an Road(West)
Shanghai 200336, PRC
Tel: 8621-52574545-1261
Fax: 8621-62366119

GnuPG v1.0.6 (GNU/Linux)
http://cn.geocities.com/chewie_chuyee/gpg.txt or
$ gpg --keyserver wwwkeys.pgp.net --recv-keys 71C34820
1024D/71C34820 C939 2B0B FBCE 1D51 109A  55E5 8650 DB90 71C3 4820

_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic