[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-announce
Subject:    Linux-Announce Digest #991
From:       Digestifier <Linux-Announce-Request () senator-bedfellow ! mit ! edu>
Date:       2004-12-25 15:13:02
Message-ID: 20041225201303.15854.qmail () senator-bedfellow ! mit ! edu
[Download RAW message or body]

Linux-Announce Digest #991, Volume #4          Sat, 25 Dec 2004 15:13:02 EST

Contents:
  bogofilter-0.93.3 - New Current Release (David Relson)

----------------------------------------------------------------------------

Date: Fri, 24 Dec 2004 22:15:47 CST
From: David Relson <relson@osagesoftware.com>
Subject: bogofilter-0.93.3 - New Current Release


Bogofilter is a mail filter that classifies messages as spam or ham
(non-spam) by using a statistical analysis of the message's header and
content (body).  The program is able to learn from the user's
classifications and corrections.

The statistical technique is known as the Bayesian technique and its use
for spam was described by Paul Graham in his article "A Plan For Spam". 
Gary Robinson, in his weblog Rants, suggests some refinements for
improved discrimination between spam and ham.  Bogofilter's primary
algorithm uses the f(w) parameter and the Fisher inverse chi-square
technique that Robinson describes.

Bogofilter is run by an MDA script to classify an incoming message as
spam or ham (using wordlists stored by Berkeley DB).  Bogofilter
provides processing for plain text and html, supports multi-part mime
message with decoding of base64, quoted-printable, and uuencoded text
and ignores attachments, such as images.

Bogofilter is written in C.  Supported platforms: Linux, FreeBSD,
Solaris, OS X, HP-UX, AIX, RISC-OS, OS/2, ...

******* ******* ******* ******* *******

The 0.93.3 release of bogofilter brings with it two significant
changes.  First bogoutil now supports multiple options for working
with the Berkeley DB database environment.  "bogoutil --help" lists
the following options:

        --db_verify=file        - verify data file.
        --db_prune=dir          - remove inactive log files in dir.
        --db_recover=dir        - run recovery on database in dir.
        --db_recover-harder=dir - run catastrophic recovery on database.
        --db_remove-environment - remove environment.
        --db_lk_max_locks       - set max lock count.
        --db_lk_max_objects     - set max object count.

These are described in the man page and file doc/README.db has
additional information.

Secondly, 0.93.3 supports SQLite v3, which is a zero-maintenance
transactional database.  It's also slower than Berkeley DB and has
larger files.

Files are available at http://sourceforge.net/projects/bogofilter for
download.

Here are the md5sums for the release:

d6ae49e8cc98036810de5b145ab9f9ca  bogofilter-0.93.3-1.i586.rpm
9c543324c3f258f65eb667567158ce03  bogofilter-0.93.3-1.src.rpm
1e11794a6f989ba42ec726272eb3bdcd  bogofilter-0.93.3.tar.bz2
1961e5f203bdd463e962c6e61cf3f7a0  bogofilter-0.93.3.tar.gz
a82f5fd3cc5d04fa772ad22cb915bcf6  bogofilter-static-0.93.3-1.i586.rpm

                               =================
                                BOGOFILTER NEWS
                               =================

NOTE: More information on important changes for bogofilter updaters
is in the RELEASE.NOTES files.  Read them!!

RELEASE.NOTES has two important sections entitled:

        INCOMPATIBLE CHANGES IN BOGOFILTER 0.93
and     MAJOR CHANGES IN BOGOFILTER 0.93

Briefly:

        ** Bogofilter is now using Berkeley DB's Transaction
           capability to ensure database integrity.

        ** Bogofilter is now generating tri-state results labeled
           Spam, Ham, and Unsure, compared to the old two-state Yes/No
           results.

        !!!!!!!! READ THE RELEASE.NOTES !!!!!!!!


0.93.3  2004-12-24

        * Bogoutil's options for maintaining the database environment
          are all long options with a "db-" prefix.
        * Bogoutil's help message and man page include the new long
          options.

        2004-12-21

        * Early Christmas Gift: Bogofilter now supports SQLite v3.
          Requires SQLite v3.0.8. See the RELEASE.NOTES.

        2004-12-20

        * Internal cleanup: Move transaction handling back into database space,
          and let the database backend driver map this into the environment if
          necessary.

        * Portability fix for BerkeleyDB versions 3.1 and 3.2:
          log_archive expects a fourth argument.

        2004-12-17

        * lexer_v3 HTML parser fix for urlencoded characters, by Krzysztof
          Foltman. Speeds up a particular case of malformatted mail.

        2004-12-14

        * bogoutil -C file  now checks if the database file file is intact.
          (Only implemented for Berkeley DB stores with and without
          transactions.)

        2004-12-13

        * bf_compact now uses db_archive without -d option and loops on the
          results instead, calling rm in turn for each file. -d is not
          supported by older Berkeley DB versions such as 4.0.

        * bogoutil -P directory  now checkpoints the database and removes
          inactive log files. Note you must save the database and remaining log
          files, in that order, if you want to be able to recover from
          corrupted files.

        2004-12-10

        * Limit mime overflow error messages to 1 per email.

        2004-12-09

        * configure now checks if Berkeley DB supports shared environments and
          suggests workarounds if it doesn't, to aid Fedora Core users.

        2004-12-05

        * New directory doc/programmer/OS2 contains configure.os2
          script contributed by Yuri Dario

0.93.2  2004-12-03

        * New script bf_resize DIR that checks the sizes of all databases in an
          environment and writes a lock size to DB_CONFIG.

        2004-12-02

        * Accuracy fix: message counts of ignore lists (that can be present)
          will be ignored and no longer skew the spamicity.

        2004-12-01

        * Allow environment to be group writable, reported by Fletcher Mattox.

        * Accuracy fix: no longer pretend that we had seen an empty message
          registered when there was no registration. Use ROBX for spamicity.
          This changes the output format of bogofilter -vvv mode when no spam
          or no ham messages have been registered previously.

        2004-11-29

        * Support for Berkeley DB 3.0 was explicitly removed again, so that no
          stable bogofilter version since 0.17.5 will have had support for this
          version. This eliminates the need for on-disk database format
          upgrades and keeps things simple.
          As the unadvertised breaking of BDB 3.0 didn't raise a single
          complaint and 3.1 has been around since July 2000, this should be
          safe.

        * Support long options in bogoutil.

        * Add --remove-environment DIR long option to bogoutil, to remove the
          environment. Only one such option can be used and there is no
          corresponding short option.

        * Remove useless numeric Berkeley DB error codes from error messages.

        2004-11-26

        * bogofilter processes will refuse to open multiple wordlists in
          different database environments (directories) when the transactional
          Berkeley DB datastore is compiled (default). The non-transactional
          (--disable-transactions), QDBM and TDB datastores are unaffected.

        2004-11-21

        * bogotune now uses getopt() to process the argument list,
          hence requires a '-n' flag before each non-spam file and a
          '-s' flag before each spam file.
        * bogotune now accepts '-x flags' to set debug flags.

        2004-11-20

        * Make scoring one huge transaction, rather than one individual
          transaction per token. This fixes consistency and should improve
          score speed.

          WARNING: this seems to have broken bogotune, which, BTW, doesn't
          return errors to the test suite (t.bulkmode, with message-count
          files), it reports a bogus "PASS" in spite of database PANICs.

        2004-11-19

        * Restored the old traditional Berkeley DB datastore that cannot be
          recovered. Its use is discouraged, to use this, type
          ./configure --disable-transactions

        * Restored the error message when recovery is attempted on QDBM
          databases, was lost in the DEPOT (hash) ->VILLA (B+tree) switch.

        2004-11-15

        * Added utility script bf_tar.

        2004-11-14

        * Added utility scripts bf_copy and bf_compact.
        * Added BerkeleyDB warning for binary rpm users.

        2004-11-12

        * New entries in bogofilter-faq.html on error messages
              "Lock table is out of available locks" and
              "Lock table is out of available object entries"

        * Add %u formatting option to print login or user ID information,
          SourceForge Feature Request #1056729.

0.93.1  2004-11-11

        * The README.db file now has information on the DB_CONFIG file that
          can be created and used to configure the Berkeley DB module.

        * Bogofilter's config file now supports setting max lock and
          object counts for Berkeley DB using options
              db_lk_max_locks=N
              db_lk_max_objects=N

        * Bogofilter and bogoutil now allow these options on the
          command line, as:
              --db_lk_max_locks=N
              --db_lk_max_objects=N

        * When running database recovery automatically, don't let go of the
          lockfile, so we can do our actual work subsequently.

        2004-11-10

        * Support for BerkeleyDB 4.3 was added. We'll avoid DB_NOSYNC on
          DB->close() when DB_LOG_INMEMORY is configured for now.

        * Update manual pages/example outputs and filter recipe examples from
          "X-Bogosity: yes" to "X-Bogosity: Spam". Fixes Debian bug #280557.
        
        * Bugfix for BerkeleyDB 4.2 support: check the data base flags, not the
          environment flags, for DB_TXN_NOT_DURABLE, when determining whether
          DB_NOSYNC is safe on DB->close(). May fix some kinds of database
          corruption encountered with DB_TXN_NOT_DURABLE.

        * Return DB_VERSION_STRING contents in -V (version) output when
          compiled against Berkeley DB. Minor change to the output format.

        2004-11-09

        * Unify and clean up the horrible RELEASE.NOTES-*, CHANGES* and NEWS-*
          mess with lots of duplicated info.
          There shall only be one RELEASE.NOTES file and one NEWS file.
          RELEASE.NOTES shall contain important information for updates.
          NEWS shall contain noteworthy code changes in technical detail.

          This also removes the confusion that RELEASE.NOTES didn't contain
          information relevant for 0.93.X.

        2004-11-08

        * Berkeley DB mode: do not create data base in read mode (properly map
          open_mode to DB_RDONLY flag, store open_mode).

        * Berkeley DB mode: exit with error code if lock file cannot be
          created. Attempt recovery even if creation of lock file succeeded.

        2004-11-07

        * Fixed negative buffer index in mime.c

0.93.0  2004-11-06 "Broken compatibility" release

        * Fix bogotune's '-D' option.

        2004-11-02

        * Use only reentrant functions in the signal handler that runs
          periodically to check for crashed processes.
          Reported by Pavel Kankovsky.

        2004-11-01

        * Add a debugged and enhanced version of Stefan Bellon's QDBM
          Hash->B+tree converter.

        * Broke QDBM compatibility with 2004-10-30 change, check unsigned
          characters to match Berkeley DB behavior of bogoutil -d.

        2004-10-31

        * Rearranged flag setting for Berkeley DB data store, so as only to set
          DB_CHKSUM[_SHA1] when creating the data base.
          Fixes "checksum error: catastrophic recovery required" and
          consequential "wordlist.db: page 1: reference count overflow" errors
          Reported by Torsten Veller.

        * Revised RELEASE.NOTES-0.93 to move QDBM change into "Incompatible
          Changes" section and to mention BerkeleyDB dump/load for 4.1 and 4.2
          to add checksums.

        * Inserted new section 2.2 into doc/README.db to mention that it is
          recommended to dump/load the data base when using BerkeleyDB 4.1 and
          4.2.

        2004-10-30

        * Converted QDBM from hash files (DEPOT API) to B+ trees
          (Villa API) for better speed (Stefan Bellon).

        2004-10-29

        * Attempting recovery with TDB or QDBM data bases results in an error,
          so the user does not think it succeeded.

        * Document that recovery only works for Berkeley DB, but not TDB or
          QDBM.

        2004-10-28

        * Merged Transactional branch (for BerkeleyDB) back into the trunk.
          Further changes below.

        2004-10-25

        * Added GETTING.STARTED document.

        * Changed default mode from two-state to three-state
          - with ham_cutoff=0.45 and spam_cutoff=0.99
            The ham_cutoff value is new and spam_cutoff is unchanged.
          - changed the "Yes/No" tags used in the "X-Bogosity:" line
            to "Spam/Ham/Unsure"

        NOTE: the next entries appear to be out of order, the pertinent changes
        have been developed on a side branch of bogofilter and have been merged
        for bogofilter 0.93.0.

        2004-09-21

        * bogofilter can now be used with Berkeley DB 3.0 or 3.1 although this
          is not recommended. You should prefer 4.2 or 4.1 instead.
          UPDATE: support for 3.0 was later removed on 2004-11-29

        * Documentation on the write cache issue (recoverability of data bases)
          has been revised.

        2004-09-13

        * Updates doc/README.db with a section on the log file size and
          pointers to db_checkpoint and db_archive.

        2004-09-03 (txn 2.1)

        * The on-line crash detector would consider its own process a zombie,
          so all processes that lasted 30 s or longer would abort themselves
          after that period.

          This was particularly prominent with BerkeleyDB 4.1 with
          x86/gcc-assembly mutexes as this combination appears rather slow when
          facing lock contention, causing t.lock3 failure. BDB 4.1 compiled to
          use POSIX mutexes (where working) appears to be a lot faster in this
          situation.

        2004-09-01 (txn 2.0)

        * Hook up crash detection code. Bogofilter is now able to detect
          when recovery is necessary and should detect stalled data bases
          within 30 seconds.
          NOTE: this means if one process crashes all other processes
          accessing the same data base will abort with an error code.

          Stalled data bases happen when one process or the system crashes and
          doesn't have a chance to clear its locks.

          This code uses ideas from Matthias Andree and Pavel Kankovsky.

        2004-08-23 (txn 1.1)

        * Add -f and -F options to bogoutil (mnemonic: fix) to run data base
          recovery.

        * Reimplement our own locking so that recovery and data base access
          don't collide and no two processes try running recovery at the same
          time.

0.92.8  2004-10-15 - Stable Release

##########################################################################
# Send submissions for comp.os.linux.announce to: cola@stump.algebra.com #
# PLEASE remember a short description of the software and the LOCATION.  #
# This group is archived at http://stump.algebra.com/~cola/              #
##########################################################################


------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: Linux-Announce-Request@NEWS-DIGESTS.MIT.EDU

You can submit announcements to be moderated via:

    Internet: linux-announce@NEWS.ORNL.GOV

Linux may be obtained via one of these FTP sites:
    ftp.funet.fi				pub/Linux
    tsx-11.mit.edu				pub/linux
    sunsite.unc.edu				pub/Linux

End of Linux-Announce Digest
******************************
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic