'Re: Test email hitting BAYES_00'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       spamassassin-users
Subject:    Re: Test email hitting BAYES_00
From:       Karsten =?ISO-8859-1?Q?Br=E4ckelmann?= <guenther () rudersport ! de>
Date:       2013-07-24 22:59:07
Message-ID: 1374706748.4797.56.camel () monkey
[Download RAW message or body]

On Wed, 2013-07-24 at 15:15 +0200, Simon Loewenthal wrote:
> I rewrote this (not GTUBE anymore) and had the same bayes score
> http://pastebin.com/ATqch32Y

Simon, it seems you have a false understanding of Bayes and how it
works. Quoting parts of the mail body from that paste:

> You should send this from outside of your car.

So you "rewrote" the body, or rather modified it slightly, replacing a
few words. Note that it *did* change the result, now hitting BAYES_20
rather than 00 before.


First things first: Your (original) test message is *based* on GTUBE. It
isn't GTUBE at all, though -- which led to quite some confusion in this
thread.

Starting off with GTUBE (the mail), you stripped what GTUBE actually is:
The weird 68 byte string. Then you took that message, which started off
as the test message to verify mail gets passed through SA, and continued
using it as test message -- modifying it to match your own rules.

Nothing wrong about that, though I'd suggest to remove the (descriptive,
textual, and actually meant as instructions) claim from the mail body to
constitute GTUBE.


Send it from outside of your car. Send it from outside of your network.

This summarizes the "rewriting" you just did, in hopes for SA to
magically stop hitting low Bayes rules.

You're barking up the wrong tree -- you should outright ignore the Bayes
score, when you actually are testing your own rules. Granted, short-
circuiting on BAYES_00 got in the way, resulting in your own rules not
being tested at all and thus not matching. [1]

Solution: Disable short-circuiting, or prevent your test mail from
sporting a really low Bayes score. Getting at that next.


Bayes, more precisely the Bayesian probability, is the likeliness of the
mail being solicited, wanted, hammy, you name it. The lower the number
in the BAYES_nn rules, the more hammy. Also, the SA Bayes implementation
considers tokens. Words, to keep it easy. It does not recognize and
consider sentences, nor multi-word tokens.

Example. See that quote above, and how you modified it. Let's assume
that'd be the complete mail text and all Bayes get's to see. Keeping in
mind the BAYES_nn change from 00 (less than 20% chance spam) to 20 (more
than 20%, less than 40% chance spam), we see an impact of that change.

Namely, the word "network" in mail has a much higher probability of the
mail being ham -- compared to the word "car". You're either not into
cars, or just don't like Lisp...

How does that help preventing BAYES_00 hits? Remove hammy tokens from
your test message. Even better, remove all the text. That is, the GTUBE
instructions you kept. You don't want it being evaluated, your focus is
on your rules for testing -- so why keep it around?


In closing, there is absolutely no problem with your test mail hitting
BAYES_00, unless these words and tokens are what all your spam looks
like...


[1] Taking a guess, that's probably why you had the impression Bayes
    might not have been hit at all before. Which is really unlikely.
    There's always a BAYES_nn rule indicating the Bayesian probability
    on a scale ranging from ham to spam.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic