'Re: iso-8859-1 in subject field'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       procmail
Subject:    Re: iso-8859-1 in subject field
From:       PSE-L () mail ! professional ! org (Professional Software Engineering)
Date:       2003-10-20 23:32:24
[Download RAW message or body]

At 16:01 2003-10-20 -0700, Carl B. Constantine wrote:
>True enough, but you then either have to run it through further spam
>checks to find out if it's truely spam (most likely) or just relegate it
>to /dev/null whenever you see a message like this since it's most likely
>spam. Right?

Well, if the subject is encoded, the body probably is as well - and if not, 
you've got to wonder WHY NOT?

I realize you posed your "which do you use" question at another lister 
(though that lister only just now was informed _how_ to decode the string, 
so they probably don't have a methodology in place for dealing with these 
messages, nor a firm idea as to how effective that really is compared with 
any alternative method of doing the same), however, I'll provide some 
insight from my own experience:

I myself flag messages based on a number of characteristics - a base64 only 
body is one of those characteristics.  So is a subject (or From:) header 
specifying a foreign-to-me encoding, but that's just because I don't 
interract on foreign-to-me language lists, and therefore have no reason to 
expect legitimate messages with such encodings.

I _do_not_ flag a message as spam based on any one characteristic - I used 
to, but too much snuck through because certain characteristics couldn't be 
relied upon as a 100% match.  By using a "spammishness" score - built up by 
the various contributory bits, I develop a much more positive feel about 
the message being spam.  This has proven *EXTREMELY* successful for me, and 
results in very few false pozzies (and those usually still look like spam - 
lots of "look I'm an idiot and use excessive punctuation!" and the like).

I don't presently use any of the recipes which I provided for addressing 
the current request, but as I said, I've thought about decoding and 
replacing the Subject: with the decoded string (shifting the original 
encoded subject to Old-Subject, as per the invocation of formail that I 
provided).  This means that the message would SUBSEQUENTLY be handled by my 
spam filters exactly as it is already (well, with the exception that I 
current extract the SUBJECT, FROM, TO, etc headers to variables at the top 
of my procmailrc, and presently, the SUBJECT is whatever the original 
format of it would be -- I should have ORIGINAL-SUBJECT and SUBJECT 
variables to allow for recipes that might care for the difference).

---
  Sean B. Straw / Professional Software Engineering

  Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
  Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail@lists.RWTH-Aachen.DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

[prev in list] [next in list] [prev in thread] [next in thread]