[prev in list] [next in list] [prev in thread] [next in thread]
List: spambayes
Subject: Re: [Spambayes] overtraining and retraining
From: Jesus Cea <jcea () jcea ! es>
Date: 2011-10-17 14:01:14
Message-ID: 4E9C352A.6080201 () jcea ! es
[Download RAW message or body]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 16/10/11 18:45, skip@pobox.com wrote:
>
>>> 2. When I train over a message, I keep training in a loop until
>>> the message probability goes under 20% (ham) or over 90%
>>> (spam). As the database ages, training spam needs more
>>> "looping", that is, the probability goes up slowly. The ham
>>> training, nevertheless, is fast and the loop counting is low.
>
> Jesus> Uhm, the wiki says: "never train the same message Jesus>
> twice". Reason?. I am breaking this badly.
>
> Jesus,
>
> I use train to exhaustion as referenced in your other email
> (contrib/tte.py in the SpamBayes distribution). I currently have
> 21 hams and 17 spams in my current training database. I suggest
> you just toss out everything but the most recent 10-15 hams and
> spams then start with that.
>
> I cheat as well, since both my pobox.com mail forwarding service
> and Gmail (where it forwards to) apply their own spam filters
> before SpamBayes gets a crack at my mail. The downside of that is
> that I need to scan their held spams periodically.
Thanks for your reply, Skip, but you don't address any of my concerns
:-): 1. Do not train with the same message twice, 2. Keep spam/ham
balanced, 3. Is normal that "training" can slowly degrade the
quality?, and if so, what people do about it (beside deleting the DB
and retrain again with recent samples).
I think that 1&2 are related to the bayes asumption about independent
samples. But the code is abusing bayes so badly that breaking this
condition is actually irrelevant in our context :-).
BTW, what are the changes between 1.1a4 (my version) and 1.1a6?. I
can't find an updated CHANGELOG...
- --
Jesus Cea Avion _/_/ _/_/_/ _/_/_/
jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/
jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/
. _/_/ _/_/ _/_/ _/_/ _/_/
"Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/
"My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iQCVAwUBTpw1Kplgi5GaxT1NAQLlbQP/RxagFrvQcmWpz54cku6GR2KLkZByS54E
1ArPp92RlarYEaB0fUhn1D8JBbIOgwPHT65sE1p94mh18D7NxIVsJdUW4Ay9ZnR7
62CttlHFBMynv7xJGSzZ8d4OECwIqSobNqUYZgRLEwdKOvT/uak1t3DXW2o8xpRD
swfOemBzEtI=
=98ok
-----END PGP SIGNATURE-----
_______________________________________________
SpamBayes@python.org
http://mail.python.org/mailman/listinfo/spambayes
Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic