[prev in list] [next in list] [prev in thread] [next in thread] 

List:       eros-arch
Subject:    new use for resume key
From:       Jonathan Shapiro jsshapiro () earthlink ! net
Date:       1998-01-27 20:52:23
[Download RAW message or body]

I clearly did not understand the original proposal.  It sounded to me as
though under this proposal, one could do the following.

    A sends deferred data to B, who incorporates it (still deferred)
    into a deferred data message to C.

This effectively builds up a chain of indirection blocks.

I think that if you want to do this you are better off implementing some
capability that grants access to a given range of another process's address
space.  This would be difficult in KeyKOS unless the capability were grown,
but it presents no difficulty in principle.

>Here's my logic in proposing this. I'm considering what the maximum
>string size should be. Restricting it to a few pages isn't a problem
>when the message overhead is small compared to copying that much data;
>you can make repeated calls to pass more data. But I want to consider a
>distributed system. For a call over a network, the message overhead is
>much greater. (The time to copy is also greater, but I'm assuming not as
>much so.) So it becomes necessary to support larger strings for
>efficiency....

Actually, I think you mis-apprehend where the costs are.  The data movement
itself is relatively fast, and in practice is almost never the dominating
factor.  The setup time to send the message from the source and to transfer
that message into the receiving process once received is where all the time
goes.  Also, we need to be clear about what overhead is to be measured -- is
the messaging synchronous or asynchronous?  My experience with distributed
applications is that you always want to do asynchronous, if only because it
lets you get other useful work done while those latencies are happening.

Most of the time, the messages will be short -- the statistics on UNIX say
that block net messages are 4K to 16K plus a few words of metadata.  The
rest is either network related protocols (e.g. route notifications) or
serial traffic.  To first and second order there is nothing in between. This
reflects the message sizes of NFS (i.e. it is a fact about the application,
not the OS or the net protocol).  Whether these numbers are appropriate to
your applications is a question only you can answer.

Be that as it may, messages beyond a certain size are clearly outliers.
Also, bear in mind that the IP message length limit is 64k, and even the
FDDI-style ethernets (which include the gigabit ethernets, I gather) have a
maximum link-layer frame size of only about 4k.  Beyond this point, your
protocol stack *must* do fragmentation and reassembly, at which point copy
costs in the protocol stack begin to dominate.  Over regular ethernet the
limit is 1500 bytes and change, and over PPP the max useful limit is 128-256
bytes.  Newer TCP/IP stacks do a process called "MTU discovery" to learn the
most limiting frame size in a link and prepackage things at that size from
the source in order to avoid fragmentation-related retransmissions.

At the *network* level, what you'ld like to do is blast the message out
there and let the recipient demand a resend if they cannot handle it.  On
the recipient side, you might decide to drop the message if your available
inbound buffer pool is above some watermark to guarantee other service
requirements.  Because the dominant cost is turnaround time between the app
and its local network interface, accepting the message if you can is by far
the best strategy.

I would expect the same to be true at the process level.

Perhaps a better model is one in which the process can say 'fault me if the
message you intend to send is longer than I've allowed for.'  This requires
an ability to transition from AVAIL->RUN and back in order to invoke the
keeper, or at least to transition from WAIT->AVAIL based on the fault key,
which is pretty easy.

Actually, that last is probably a good idea.

 With larger strings, there is a greater need for the ends to
>manage buffers before committing to receive the data. My proposal allows
>a recipient to find out how much data the sender wants to send before he
>has to find the space to store it. The motivation is not so much to get
>the data into the right location the first time, but to get it anywhere
>at all.


Yah, but bear in mind that the link layer is unreliable, so the sender
*must* be prepared to retransmit until an ACK is received.  Maybe you want
to consider doing cross-layer integration so you only have to buffer the
data once?


shap

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic