[prev in list] [next in list] [prev in thread] [next in thread]
List: spamassassin-users
Subject: Re: Help tagging URL spam
From: Alex <mysqlstudent () gmail ! com>
Date: 2012-01-03 0:39:15
Message-ID: CAB1R3siSFvwX3hfL0X0sPxWXF4YACgeij5m+svpGfMNY_E70qA () mail ! gmail ! com
[Download RAW message or body]
Hi,
>>>> http://pastebin.com/raw.php?i=1Y5QCkfh
>>>> http://pastebin.com/raw.php?i=KdmZXM0d
...
>> What I haven't been able to figure out is a more generalized pattern
>> from these, such as something in the header that is inconsistent with
>> non-spam or contains some type of invalid header data, such as the
>> mismatch between having originated at yahoo but being sent as
>> sbcglobal?
>>
>> Shouldn't have bayes picked this up after learning a dozen or more of
>> these?
>>
>
> IMHO, yes. Are you sure you are training bayes correctly. Are you using the
> same user to train bayes as the user that is running SA? Work through some
> of the advice already given regarding bayes.
Yes, I'm pretty sure bayes is solid. I'm autolearning, but at -1 and
13 instead of the defaults, and about 9.7M tokens. I could probably
return it now to defaults since it's been running now for a while.
Bayes is in mysql, so I have bayes_sql_username set, so it always uses
that database, and there aren't any other databases. I am wondering
why even after a sync it isn't represented, but perhaps that's due to
mysql?
$ sa-learn --sync
$ sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db version
0.000 0 383489 0 non-token data: nspam
0.000 0 484418 0 non-token data: nham
0.000 0 9768178 0 non-token data: ntokens
0.000 0 1316487858 0 non-token data: oldest atime
0.000 0 1325550621 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal sync atime
0.000 0 1325550621 0 non-token data: last expiry atime
0.000 0 5529600 0 non-token data: last expire atime delta
0.000 0 982370 0 non-token data: last expire
reduction count
Does the mysql bayes even have a journal or is it managed by mysql?
> sanjit.in is now listed in a couple URIBLs (URIBL_PH_SURBL &
> URIBL_HOSTKARMA_BL) - don't know if it was listed at the time you received
> them.
Yes, for me too.
> They hit some local meta rules I have combining FREEMAIL_FROM with
> __HAS_ANY_URI, __MANY_RECIPS, and various missing/blank subject rules. For
> me these are relatively good indicators of FREEMAIL spam.
Yes, the missing/blank are good triggers for metas.
RW rwmaillists@googlemail.com wrote:
> RP_MATCHES_RCVD=-1.613,
> IIWY I'd take a look at how RP_MATCHES_RCVD is working for you. A lot
> of us find it does more harm than good. In particular it's adding a
> negative score to AOL, Yahoo, etc.
That's a good idea. I noticed one hit this rule and the other didn't.
Not sure I can remove it altogether, because so many ham messages have
hit it on my system, but maybe a meta can be built from it, such as
with FREEMAIL_FROM and adding points if it doesn't hit RP_MATCHES_RCVD
(from doesn't match recvd)?
Thanks again,
Alex
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic