[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ipfilter
Subject:    Re: IPSTATE_MAX
From:       Charles <charles () cubbyhole ! com>
Date:       1999-06-17 19:10:17
[Download RAW message or body]


In simplified terms...A hash table is a trade off between space and 
time.  When you want to store random data, with the number of elements 
in constant change, that you need to frequently 
access, you can store it in a linear fashion, such as a linked list, 
which are slow to search as the search is linear from front to back.
Or, you can declare an indexed array, which may even need to be 
multidimensional.  You would need to delcare this array in advance with the 
maximum number of elements you expect to have.  This techniqiue is very 
fast, as you can directly index it, but it is also space consuming, as 
your index keys are often spaced widely apart (ie, a class B address 
would require 255x255 elements, most of which would likely never be used, 
at least not all at the same time, and would require a large amount of 
storage).  A 255x255 multidimensional array could consume a large amount 
of space, depending upon the size of each structure that you store.

Enter hash tables.   With a hash table, you might declare an indexed  
array of say, 29 elements.  Then you would use a hash 
formula to store the other 255x255 elements.   The catch here, is that 
you are obviously going to have table collisions, where two elements 
being used at the same time, hash out to the same spot in the indexed array.
At time of collision, you then have other choices to make.  Either start 
a linear search of the indexed array at that point to search for an empty 
element, or start a linked list at that point, which again involves 
linear searching.   With this method, you get some of the advantages of 
an indexed array (your hash gets you most of the way there) without the 
large storage necessary, but you then have the possibility of a 
performance hit.   Therefore, the larger you increase your table, the 
more space you may waste, but the better performance you might get, and 
vice versa.  Which is why, I'm assuming, that he said it should be 
bigger.  I haven't looked at the code, but I'll guess the structures he's 
storing are rather large, so he's trying to save kernel space, and taking 
the performance hit instead.   This may not be correct on high 
bandwidth links, with lots of NAT states to keep.

So I'm going to guess that IPSTATE_SIZE is the size of some indexed 
array, and UPSTATE_MAX is the maximum number of states you want to keep.  
Therefore, increasing "SIZE" will increase the size of the array, 
reducing table collisions, and hence, increasing performance, at the cost 
of cosuming more RAM.  Increasing MAX w/o increasing SIZE, would 
increasing the probablity of table collisions, decreasing performance.

Hopefully, I'm not too far off.   Second year Comp Sci was a long time 
ago :)


>   I'm not quite sure I understand the relationship - in the example,
> IPSTATE_SIZE is approximately 1/7th IPSTATE_MAX.  Is the implication of
> "IPSTATE_SIZE should be ~ 10x that, or more" that, for best performance, the
> relationship should be 1/10th, given a stipulation that IPSTATE_SIZE remains a
> prime number; for instance, if it was determined that 9216 sessions were needed
> (IPSTATE_MAX), then IPSTATE_SIZE should be 929 or 937 or 941?  
> 
>   Thanks.
> 
>   Erick.
> 
> P.S.  Yes, this is still probably a gross oversimplification of the problem,
> I'm just not understanding the syntax as well as I'd hoped :-(
> _________________________________________________________
> Do You Yahoo!?
> Get your free @yahoo.com address at http://mail.yahoo.com
> 
> 

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic