'Re: A top 10 statistics module?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       netfilter-devel
Subject:    Re: A top 10 statistics module?
From:       Wang Jian <lark () linux ! net ! cn>
Date:       2005-04-20 16:14:29
Message-ID: 20050420235712.03BD.LARK () linux ! net ! cn
[Download RAW message or body]

Hi Bill Rugolsky Jr.,

Thanks for your input.

On Wed, 20 Apr 2005 10:52:35 -0400, "Bill Rugolsky Jr." <brugolsky@telemetry-investments.com> wrote:

> On Wed, Apr 20, 2005 at 08:40:20PM +0800, Wang Jian wrote:
> > Top10 is used to monitor a while and then disabled. It could be expensive,
> > but is useful to investigate.
> > 
> > I will implement it anyway to complete the task, but before I code, I am
> > willing to listen to any one who has comment and suggestion.
> 
> To be generally useful, the table must be bounded.  That will result in
> inaccurate order statistics, but often that doesn't matter much, if the
> table is an order of magnitude or two larger than the required order
> statistic, (i.e., 100-1000 entries for estimating the top 10).
> 
> Given that the table must be bounded, there needs to be a replacement
> policy once it fills up.  The best choice of replacement strategy of
> course depends on the distribution of new entries; the problem is similar
> to that of the code table management in certain data compression algorithms.
> Heuristics and data structures tend to be somewhat intimate, as fast
> updates are required.
> 
> A common heuristic is to prioritize the entries via a scaled frequency
> that decays with a time (or packet count, etc.) scaling constant, and
> replace the lowest frequency members in the table with the new entries.
> This heuristic lends itself to being implemented with a heap-based
> priority queue whose top is the next entry to be expired.
> Complexity: O(log N)

This is the first thing appears in my mind :)

> 
> Other heuristics derive from various queuing models, the simplest
> being "move-to-front" -- every time an entry is referenced, it is moved
> to the front of the active list; entries are expired from the
> tail of the list. Complexity: O(1)

This is very easy to implement. I will try this first :)

> 
> In the distant past I've used the scaled frequency heuristic to good
> effect on long time-series data.
> 
> Regards,
> 
> 	Bill Rugolsky



-- 
  lark


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic