[prev in list] [next in list] [prev in thread] [next in thread] 

List:       spamassassin-users
Subject:    RE: False Positive on SUBJECT_FUZZY_TION rule
From:       "Michael Hutchinson" <mhutchinson () manux ! co ! nz>
Date:       2008-09-30 23:33:18
Message-ID: 20DCA88E5CF1CE418824A867074281D20124346D () bigguy ! lhd ! manux ! co ! nz
[Download RAW message or body]

> -----Original Message-----
> From: Ned Slider [mailto:ned@unixmail.co.uk]
> Sent: 1 October 2008 12:15 p.m.
> To: users@spamassassin.apache.org
> Subject: Re: False Positive on SUBJECT_FUZZY_TION rule
> 
> Ned Slider wrote:
> > Hi List,
> >
> > I'm getting some FP hits against the SUBJECT_FUZZY_TION rule in
> > 25_replace.cf (SA 3.2.5, latest update):
> >
> >
> > header SUBJECT_FUZZY_TION       Subject =~ /<post
> P3>(?!tion)<T><I><O><N>/i
> > describe SUBJECT_FUZZY_TION     Attempt to obfuscate words in
Subject:
> > replace_rules SUBJECT_FUZZY_TION
> >
> >
> > is hitting on ham from a mailing list with the following subject
line:
> >
> > Subject: Re: [CentOS] mount UFS partition on CentOS 5.
> >
> > My regex isn't good enough to understand exactly what this rule is
> > trying to achieve, but it looks to me like some kind of obfuscation
of
> > "tion" within a word, but it appears to be hitting on "partition" in
> > this case to my untrained eye. A test email containing just the text
> > "partition" in the subject line also hits this rule so would appear
to
> > confirm my assumptions.
> >
> > Could anyone help me understand what this rule is designed to hit,
and
> > why it's hitting in this case?
> >
> > Thanks.
> >
> 
> 
> Replying to my own thread...
> 
> I'm assuming this rule is interpreting "tition" as an obfuscation of
> "tion" hence why it hits against "partition" as if it were an
> obfuscation of "partion".
> 
> Looking at some very crude stats for this rule against a recent corpus
> of ~1700 ham and ~1800 spam on my server, I see 13 FP hits against ham
> and only 1 hit against spam (an obfuscation of erection). Admittedly
my
> ham corpus was a technical mailing list likely to contain the term
> "partition" given it's common usage within IT and triggering of the
rule
> in no way got close to tagging any ham as spam.
> 
> Anyway, to me this rule doesn't appear to represent good value so I'll
> probably just adjust the score to 0.001 and monitor it unless someone
> can suggest a method to prevent it hitting against legitimate words
such
> as partition.

Hello Ned.

Lowering the score to something that will not be relevant at total score
time is a good idea for testing any rules. As you've done a corpus test,
and proven that it hits more Ham than Spam (by a significant figure)
this proves the rule doesn't really work for your site. If it were my
site, I'd disable the rule based on the corpus test. 

Cheers,
Mike


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic