[prev in list] [next in list] [prev in thread] [next in thread]
List: dspam-users
Subject: Re: [dspam-users] chained
From: Jeffrey Taylor <jeff.taylor () ieee ! org>
Date: 2008-06-02 13:03:57
Message-ID: 20080602130357.GA14089 () odysseus ! bearhouse ! lan
[Download RAW message or body]
Quoting Qmail List <listqmail@gmail.com>:
> Hi,
>
> Had googled and found some threads about No such feature 'chained'. Changed
> my config file to Tokenizer chained/ Tokenizer chain. But the error does not
> go away.
>
> @40000000484390a4150f0f54 3358: [06/02/2008 14:18:02] No such feature
> 'chained'
From /etc/dspam.conf and "man dspam":
# Tokenizer: Specify the tokenizer to use. The tokenizer is the piece
# responsible for parsing the message into individual tokens. Depending on
# how many resources you are willing to trade off vs. accuracy, you may
# choose to use a less or more detailed tokenizer:
# word uniGram (single word) tokenizer
# Tokenizes message into single individual words/tokens
# example: "free" and "viagra"
# chain biGram (chained tokens) tokenizer (default)
# Single words + chains adjacent tokens together
# example: "free" and "viagra" and "free viagra"
# sbph Sparse Binary Polynomial Hashing tokenizer
# Creates sparse token patterns across sliding window of 5-tokens
# example: "the quick * fox jumped" and "the * * fox jumped"
# osb Orthogonal Sparse biGram
# Similar to SBPH, but only uses the biGrams
# example: "the * * fox" and "the * * * jumped"
#
Tokenizer chain
--feature=[chained,noise,tb=N,whitelist]
Specifies the features that should be activated for this filter \
instance. The following features may be used individually or combined using a \
comma as a delimiter:
chained : Chained Tokens (also known as biGrams). Chained Tokens \
combines adjacent tokens, presently with a window size of
2, to form token "chains". Chained tokens uses additional storage \
resources, but greatly improves accuracy. Recommended as a default feature.
noise : Bayesian Noise Reduction (BNR). Bayesian Noise \
Reduction kicks in at 2500 innocent messages and provides an
advanced progressive noise logic to reduce Bayesian Noise (wordlist \
attacks) in spams. See http://bnr.nuclearelephant.com for more information.
tb=N : Sets the training loop buffering level. Training loop \
buffering is the amount of statistical sedation performed to
water down statistics and avoid false positives during the user's \
training loop. The training buffer sets the buffer sensi-
tivity, and should be a number between 0 (no buffering whatsoever) to \
10 (heavy buffering). The default is 5, half of what
previous versions of DSPAM used. To avoid dulling down statistics at \
all during the training loop, set this to 0.
whitelist : Automatic whitelisting. DSPAM will keep track of the \
entire "From:" line for each message received per user,
and automatically whitelist messages from senders with more than 20 \
innocent messages and zero spams. Once the user reports
a spam from the sender, automatic whitelisting will automatically be \
deactivated for that sender. Since DSPAM uses the
entire "From:" line, and not just the sender's email address, \
automatic whitelisting is a very safe approach to improving accuracy especially \
during initial training.
sbph : Sparse Binary Polynomial Hashing. Bill Yerazunis' tokenizer \
method from CRM114. Tokenizer method only - works with existing combination \
algorithms.
!DSPAM:1011,4843efc2150921786919164!
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic