[prev in list] [next in list] [prev in thread] [next in thread] 

List:       spambayes-bugs
Subject:    [spambayes-bugs] [ spambayes-Patches-1532862 ] Count runs of short
From:       noreply () sourceforge ! net (SourceForge ! net)
Date:       2006-08-02 2:14:40
Message-ID: E1G86Fs-00077X-9T () sc8-sf-web3 ! sourceforge ! net
[Download RAW message or body]

Patches item #1532862, was opened at 2006-08-01 21:14
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=1532862&group_id=61702

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Nobody/Anonymous (nobody)
Summary: Count runs of short 'words'

Initial Comment:
I don't believe I submitted this before.  A common spam
technique of relatively recent vintage is to spell spam
words with embedded spaces.  In the case of SpamBayes
at least, they are thus skipped.  This patch generates
tokens based on the longest such run seen in a message.
At the moment it seems to be not much help:

token,nspam,nham,spam prob
short:5,0,1,0.155172413793
short:4,1,2,0.158641753503
short:3,6,2,0.5
short:2,16,6,0.393162750975
short:1,138,31,0.5
short:0,52,9,0.5

but I seem to recall that when I first tried it, it helped.
Including here for completeness in case someone
wants to test it out.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=1532862&group_id=61702

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic