[prev in list] [next in list] [prev in thread] [next in thread] 

List:       spamassassin-users
Subject:    Re: MySQL and Size Of bayes_expiry_max_db_size
From:       Kris Deugau <kdeugau () vianet ! ca>
Date:       2008-05-28 22:05:59
Message-ID: 483DD747.8090500 () vianet ! ca
[Download RAW message or body]

Larry Nedry wrote:
> Of course.  But how would I figure out what works best?  How can I tell if
> it is working poorly or very well?

Results.  <g>  Customer/user complaints are always useful (if perhaps 
not really desireable);  customer/user *feedback* is critical on 
anything bigger than a trivial personal or very-small-business system. 
You have to feed in a variety of legitimate email - finding spam to feed 
in shouldn't be a problem.

> I'm looking for a way to calculate or experimentally find the sweet spot
> for bayes_expiry_max_db_size.  Is there an ideal range?  Or a maximum size?
> What happens if the size is too high?

I've found 600,000 works pretty well on a smallish filter server (about 
the same hardware class as your system, AKA "overkill" <g>);  for the 
larger cluster serving between high single-digit and low double-digit 
thousands of accounts, plus filtering outbound mail, I've been playing 
with various settings on and off for several months now.  I still 
haven't found a happy balance.

(Side note - This question in various forms has been asked 3 or 4 times 
in the past month or so - could someone who really knows the Bayes 
innards please speak up?  As noted near the beginning of this thread, 
the default number of tokens is too small for anything much bigger than 
purely personal/per-user Bayes.)

Benny Pedersen's reply a few messages back includes a few points that 
made my own experiments become a lot more coherent;  I'll be doing 
further tuning based on that.  At the moment, for my usage, I'm looking 
at ~2M tokens as a floor.

-kgd
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic