[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kdepim-users
Subject:    Re: [kdepim-users] Filter on language
From:       Henk van Velden <henk.vanvelden () xs4all ! nl>
Date:       2012-11-17 19:16:03
Message-ID: 201211172016.03596.henk.vanvelden () xs4all ! nl
[Download RAW message or body]

On Saturday 17 November 2012 19:48:32 Martin Steigerwald wrote:
> Am Donnerstag, 15. November 2012 schrieb ianseeks:
> > Hi
>
> Hi Ian,
>
> > I'm getting loads of spam from a japanese sites and i'm now bored of
> > updating my junk filters every day .
> >
> > Is there a way i can filter out emails that are using asian language
> > fonts?
>
> I am not aware of something like this. But the encoding might be in the
> mail headers that you can view with the V key. You can filter for anything
> in there. Maybe there is also something else. Hmmm, I scanned some of my
> using foreign charsets spams that CRM114 has sorted my into spam folder
> and they do not seem to have any helpful headers.
>
> Thus I can only imagine running it through an external program that
> detects encoding, or a small script calling such a program and then
> decides whether spam or not.
>
> Anyway, I recommend something more generic – at least if you are running
> your own mail server: policyd-weight. It removes most spam at SMTP level
> by some tests and asking a set of blacklists.
>
> On the client I suggest CRM114. I wrote an article on how to integrate,
> but did not test this with KDEPIM 2 already. Tell me if you are interested
> and I see if this article has been translated to english and possibly
> provide a link.
>
> Whats the advantage of CRM114 or another self-learning spam filter? You do
> not have to create your own spam filter rules every day.
>
> From the tons of spam to my mail address each day, I only see 0-10 in
> unsure folder. There are more in the local spam folder, but I only scan
> subject lines quickly to make sure CRM114 had no false positives, which it
> didn ´t recently.
>
> In fact, policyd-weight and CRM114 make it possible to actually read my
> mail. Otherwise I would have to search it in a sea of spam first.
>
> CRM114 could be used client side, even stand alone. I use it client side,
> but still with POP3. Heck, this works so fine and I only ever read my mail
> on this laptop, that I might continue using POP3.
>
> Both need some time I get the concepts and set them up, but IMHO its
> really worth it. I have no single hand crafted spam filter rule at all. So
> I do not have to do anything except for give CRM114 a little training when
> the next spam wave comes from somewhere else than Japan. Actually I hardly
> ever notice any spam waves. CRM114 learns quickly, efficiently and also
> forgets as needed. All with just two about 12 MiB sized mmap()ed files.
>
> And this setup works for years already. Without any major changes.
>
> With everybody and every provider doing this, there would probably not be
> a market for spammers anymore. Thats the hope of some CRM114 developers.
>
> That said, there may be other spam filters being that efficient, like dspam
> or newer spamassassin versions that I think the Zimbra at work uses.
> CRM114 isn ´t even only a spam filter, it can classify any texts.
>
> Ciao,

As the only character encoding that is (going to be) used by modern systems is 
UTF-8 encoded Unicode, filtering on encoding (like the several ISO 8859 
versions) is not going to work. 

@ianseeks: though I understand what you want, talking about Asian fonts will 
not help in understanding the problem. A font is only a small picture 
generated to make visable to a human being what a character code is. It is 
about the Unicode codes you get in the mail. As these are in groups (e.g. 
Japanese Hiragana is 3040 - 309F) a filter could decide that many characters 
within specific ranges in a mail could rate that mail as spam. But (as all 
spam filtering) it is no exact science.
-- 
Met vriendelijke groet,
Henk van Velden
_______________________________________________
KDE PIM users mailing list
Subscription management: https://mail.kde.org/mailman/listinfo/kdepim-users

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic