[prev in list] [next in list] [prev in thread] [next in thread] 

List:       asterisk-dev
Subject:    Re: [asterisk-dev] Proposal to seperate qualify & keep alive
From:       John Todd <jtodd () loligo ! com>
Date:       2006-06-28 18:56:22
Message-ID: p06230937c0c75c423650 () [204 ! 91 ! 156 ! 3]
[Download RAW message or body]

[top-posting madness continued]

Despite several client-side devices supporting keep-alives
(Linksys/Sipura) this is not typical from my experiences, and is
probably not desirable.  While I understand the desire to have
client-side issues handled by the client, it seems to me to be a poor
idea to allow end-users to decide how many DoS-look-alike packets
they want to send to the SIP proxy network.  It would seem (from the
perspective of the service provider) that this would be a function
best handled by the server.  I don't quite buy the argument "Well, we
lock the devices so the clients never have the option to change
anything." since this is broken on another level that I won't go into
here.

In any case, I don't know how the keep-alives would work other than
OPTIONS request or something that generated a "reply", since the
outside NAT port number is unknown and therefore can only be known by
the server.  A non-reply message seems to me to only work from
server->NAT->client and not the other way around, and OPTIONS seems
to be fairly "heavy" to just keep NAT translations alive, so this
still makes me prefer a non-SIP message to keep mappings open.

Olle has helpfully put some comments in the code which signify a
place where someone could contribute some code that would smooth the
OPTIONS requests to SIP entities instead of bursting, but there's
still a gap I think in the methodology used currently and what an
ideal/preferred configuration choice set might be (see my message
below.)

Lastly, if you can reproduce the bug you describe below in the
SVN-TRUNK tree, then you should by all means open a ticket in the
bugtracker as this is a serious problem (peers becoming unreachable
would be 'serious' in my book.  ;-)

PS: Sorry if some of these points have already been brought up; this
mail is being written off-line and will be delivered in a day or so
after final authoring.

JT


At 4:45 PM -0500 6/27/06, John Lange wrote:
>
>On a side note; There seems to be something wrong with qualifyP00. We
>have about 50 clients on a machine most of which have qualify=yes, the
>rest are qualifyP00. The ones set to 5000 simply "disappear" with no
>message logged and nothing mentioned in the console. "sip show peers"
>shows "Unspecified" for the Host and "Unknown" for the status (as if the
>device had never registered) instead of the normal "Unreachable" or
>"lagged".
>
>I'm starting to suspect a bug in qualify. Anyone else have issues like
>this?
>
>Back on topic, just to play devils advocate with my own suggestion; It
>should be noted that at least some devices have a keep-alive option of
>their own.
>
>So, in our case we seem to be making progress on this problem of
>satellite latency by setting qualify=no, nat=yes and a 5 second
>keep-alive on the client device.
>
>The question then becomes, if most client devices support keep-alive is
>there still a purpose to having it on the server side as well? How many
>client devices support keep-alive? I know Linksys products do but I
>haven't looked into others yet.
>
>I would advocate yes to server-side keep-alive simply because it offers
>the most flexibility. You never know when it might make more sense to do
>it server-side instead of client-side.
>
>John
>
>On Tue, 2006-06-27 at 07:30 +0200, Loic DIDELOT wrote:
>>  Hi,
>>  I actually like the idea of separating keep-alives from the qualify.
>>  Defining the frequency of the packages is very important to adapt the
>>  asterisk behaviour according to the customers one has. This would solve
>>  many of our problems. Are there more people who need this? Is there a
>>  way to get this developed and include it in asterisk 1.4? voipGATE would
>>  be interested in cosponsoring this feature but only if it will be
>>  included in the 1.4 stable release and if available for IAX as well.
>>
>>  Best regards,
>>  Loic Didelot.
>>
>>
>>  On Mon, 2006-06-26 at 13:06 -0700, John Todd wrote:
>>  > At 8:32 PM +0200 6/26/06, Johansson Olle E wrote:
>  > > >26 jun 2006 kl. 19.23 skrev John Lange:
>>  > >
>>  > >>In the current implementation, qualify sends out a SIP request at the
>>  > >>specified interval and if it doesn't receive a reply within that same
>>  > >>interval asterisk flags the peer as unreachable.
>>  > >>
>>  > >>This also acts as a sort of keep-alive for devices behind NAT when
>>  > >>combined with the nat=yes parameter. The regular flow of SIP packets
>>  > >>keeps the NAT connective alive for the device behind the firewall.
>>  > >>
>>  > >>The problem is, these are two very different concepts and at times it
>>  > >>would be nice if we could separate the two.
>>  > >>
>>  > >>Specifically; we have some clients with devices behind nat and
>>  > >>satellite. Their nat and satellite requires a more-or-less constant flow
>  > > >>of packets to keep the connection alive.  However due to the quirky
>  > > >>nature of satellite combined with long round-trip times the qualify
>  > > >>option needs to be set high (5000ms) or Asterisk won't send
>calls to the
>>  > >>client.
>>  > >>
>>  > >>In fact we would like to set qualify=no because often the client appears
>>  > >>to be very lagged when the satellite perceives the connection to be idle
>>  > >>(apparently it queues packets until it has a bunch and sends them in
>>  > >>groups) but if you initiate a call the lag drops immediately to an
>>  > >>acceptable level (800ms).
>>  > >>
>>  > >>But if we set qualify=no then the firewall closes the connection and
>>  > >>they can't receive any calls.
>>  > >>
>>  > >>So, the question is; is it reasonable to undertake the implementation of
>>  > >>a keep alive for sip clients?
>>  > >>
>>  > >>Any thoughts on how this should be done? SIP NOTIFY or would something
>>  > >>else make more sense?
>>  > >
>>  > >I don't see a reason for changing method. We should propably find a way
>>  > >to override and be able to dial out regardless of the monitoring status.
>  > > >That seems like a simple fix.
>>  > >
>>  > >/O
>>  >
>>  > I would actually agree that the two functions should be separated.  I
>>  > find myself often in the same position, where the use of "qualify="
>>  > is used as a NAT mapping tool only, and I don't particularly care
>>  > about the actual milliseconds of response time to the request.  I
>>  > also think we would be well-served to make these timers a bit more
>>  > flexible, since right now everyone is in the "same bucket" as far as
>>  > timing goes for how frequently OPTIONS requests are sent.  I'd like
>>  > to be more aggressive for foolish people who have poorly-configured
>>  > firewalls that close NAT UDP sessions after 30 (or fewer) seconds,
>>  > and currently the only way to do this is to change the code to send
>>  > ALL of my OPTIONS requests much more frequently, which eventually
>>  > leads to a huge amount of nonsense noise on my network to solve for a
>>  > few poorly behaved clients.
>>  >
>>  > SER sends "bogus" packets fairly frequently as part of it's NAT
>>  > module, and this seems to work well.
>>  >
>>  > The current method in Asterisk has a few downsides:
>>  >
>>  >    1) OPTIONS packets are larger than just simple UDP keepalives (but
>>  > not by much)
>>  >
>>  >    2) OPTIONS requests require stateful storage of status, so if I
>>  > have 6000 SIP "peers" each using "qualify=", then Asterisk needs to
>>  > store a fairly large amount of memory aside to track each one of
>>  > those transmitted OPTIONS statements, and if at any time there are
>>  > 10% of those peers which are slow to respond (say, two cycles) then I
>>  > have a huge backlog of stateful requests in queue.  If a UDP packet
>>  > that did not require return receipt was sent just for NAT keepalives,
>>  > this would be much lighter weight, and we could move the "heavier"
>>  > OPTIONS request interval to a larger time value.
>>  >
>>  >    3) The current OPTIONS request is bursty, and all of the OPTIONS
>>  > are sent in 60 second intervals using the same interval timer.  This
>>  > is really ugly, with big spikes of data every 60 seconds.  This
>>  > should be probably distributed so that each entry has it's own timer.
>>  >
>>  >
>>  > I propose a different way to do this, with an example out of sip.conf
>>  > listed below.  I know that this will require the creation of memory
>  > > space for each of these timers (and a whole slew of timer-related
>>  > issues internally to Asterisk) but it does seem like it would be more
>>  > flexible to do it this way and may reduce the amount of processing
>>  > for the OPTIONS requests if just lightweight UDP can be sent for NAT
>>  > translations.  With this method, I could possibly crank up the
>>  > OPTIONS qualifiers to something like 5 minutes, but leave the NAT
>>  > translation keepalives down at 20 seconds and hopefully see less load
>>  > on my Asterisk servers and network with large numbers of REGISTER'ed
>>  > hosts.  This is all kind of pointless for 20 users, but Asterisk is
>>  > no longer being used only for sites with double or triple-digit
>>  > numbers of users, and it makes a difference at scale.
>>  >
>>  >
>>  > ; Hypothetical sip.conf settings for "new" qualify/NAT timers
>>  > ;
>>  > ; Send OPTIONS requests to measure latency (450ms in this ex.)
>>  > ;  every 120 seconds.  The qualifytime timer starts based
>>  > ;  on the time the last REGISTER was successfully parsed, or
>>  > ;  if a static IP host, then based on the time the entry was
>>  > ;  parsed in this file plus a random number of seconds not
>>  > ;  greater than the value in "qualify=".  If "qualify="
>>  > ;  is non-zero but there is no "qualifytime=", then default
>>  > ;  of qualifytime is 60 seconds.  If "qualifytime=" is
>>  > ;  non-zero but there is no "qualify=", then qualifytime is
>>  > ;  500 milliseconds.
>>  > qualifyE0
>>  > qualifytime0
>>  > ;
>>  > ; Send very minimal, one-way packets to hosts in order
>>  > ;   to keep NAT translations open.  Send once every 20 seconds.
>>  > ;   No default value.
>>  > nat-keepalive 
>>  > ;
>>  >
>>  >
>  > > JT

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic