[prev in list] [next in list] [prev in thread] [next in thread] 

List:       dspam-users
Subject:    Re: [dspam-users] chained
From:       Jeffrey Taylor <jeff.taylor () ieee ! org>
Date:       2008-06-02 13:03:57
Message-ID: 20080602130357.GA14089 () odysseus ! bearhouse ! lan
[Download RAW message or body]

Quoting Qmail List <listqmail@gmail.com>:
> Hi,
> 
> Had googled and found some threads about No such feature 'chained'. Changed
> my config file to Tokenizer chained/ Tokenizer chain. But the error does not
> go away.
> 
> @40000000484390a4150f0f54 3358: [06/02/2008 14:18:02] No such feature
> 'chained'

From /etc/dspam.conf and "man dspam":

# Tokenizer: Specify the tokenizer to use. The tokenizer is the piece
# responsible for parsing the message into individual tokens. Depending on
# how many resources you are willing to trade off vs. accuracy, you may
# choose to use a less or more detailed tokenizer:
#   word    uniGram (single word) tokenizer
#           Tokenizes message into single individual words/tokens
#           example: "free" and "viagra"
#   chain   biGram (chained tokens) tokenizer (default)
#           Single words + chains adjacent tokens together
#           example: "free" and "viagra" and "free viagra"
#   sbph    Sparse Binary Polynomial Hashing tokenizer
#           Creates sparse token patterns across sliding window of 5-tokens
#           example: "the quick * fox jumped" and "the * * fox jumped"
#   osb     Orthogonal Sparse biGram
#           Similar to SBPH, but only uses the biGrams
#           example: "the * * fox" and "the * * * jumped"
#
Tokenizer chain



       --feature=[chained,noise,tb=N,whitelist]
              Specifies the features that should be activated for this filter \
instance.  The following features may be  used  individually  or combined using a \
comma as a delimiter:

              chained  : Chained Tokens (also known as biGrams).  Chained Tokens \
                combines adjacent tokens, presently with a window size of
              2, to form token "chains".  Chained tokens uses additional storage \
resources, but greatly improves accuracy.  Recommended as  a default feature.

              noise  :   Bayesian  Noise  Reduction  (BNR).   Bayesian  Noise \
                Reduction kicks in at 2500 innocent messages and provides an
              advanced progressive noise logic to reduce Bayesian Noise (wordlist \
attacks) in spams.   See  http://bnr.nuclearelephant.com  for more information.

              tb=N  :  Sets the training loop buffering level.  Training loop \
                buffering is the amount of statistical sedation performed to
              water down statistics and avoid false positives during the user's \
                training loop.  The training buffer sets the buffer sensi-
              tivity,  and should be a number between 0 (no buffering whatsoever) to \
                10 (heavy buffering).  The default is 5, half of what
              previous versions of DSPAM used.  To avoid dulling down statistics at \
all during the training loop, set this to 0.

              whitelist :  Automatic whitelisting.  DSPAM will keep track of the \
                entire "From:" line for each message received  per  user,
              and automatically whitelist messages from senders with more than 20 \
                innocent messages and zero spams.  Once the user reports
              a spam from the sender, automatic whitelisting will automatically be \
                deactivated for that  sender.   Since  DSPAM  uses  the
              entire  "From:"  line,  and not just the sender's email address, \
automatic whitelisting is a very safe approach to improving  accuracy especially \
during initial training.

              sbph :  Sparse Binary Polynomial Hashing. Bill Yerazunis' tokenizer \
method from CRM114. Tokenizer method only -  works  with  existing combination \
algorithms.

!DSPAM:1011,4843efc2150921786919164!


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic