[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kdepim-users
Subject:    Re: [kdepim-users] Filter on language
From:       ianseeks <ianseeks () dsl ! pipex ! com>
Date:       2012-11-29 17:22:30
Message-ID: 9034719.FB6doMtEnj () linux-k5wg
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


snip
 > Ciao,
> 
> As the only character encoding that is (going to be) used by modern systems
> is UTF-8 encoded Unicode, filtering on encoding (like the several ISO 8859
> versions) is not going to work.
> 
> @ianseeks: though I understand what you want, talking about Asian fonts will
> not help in understanding the problem. A font is only a small picture
> generated to make visable to a human being what a character code is. It is
> about the Unicode codes you get in the mail. As these are in groups (e.g.
> Japanese Hiragana is 3040 - 309F) a filter could decide that many
> characters within specific ranges in a mail could rate that mail as spam.
> But (as all spam filtering) it is no exact science.


Just found a way of filtering on language.  

I did a "View Source" of a Japanese spam email and checked the Subject: line.  
It seems to contain a prefix of either "ISO-2022-JP" or "Shift_JIS". Creating 
a filter on "Complete Message" containing either of those terms seems to work 
fine.

regards

Ian
[Attachment #5 (unknown)]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" \
"http://www.w3.org/TR/REC-html40/strict.dtd"> <html><head><meta name="qrichtext" \
content="1" /><style type="text/css"> p, li { white-space: pre-wrap; }
</style></head><body style=" font-family:'Liberation Sans [unknown]'; font-size:9pt; \
font-weight:400; font-style:normal;"> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;">snip</p> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;"> &gt; Ciao,</p> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;">&gt; </p> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;">&gt; As the only character encoding that is (going to be) used by \
modern systems</p> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; is \
UTF-8 encoded Unicode, filtering on encoding (like the several ISO 8859</p> <p \
style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; versions) is not going \
to work.</p> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; </p> \
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; @ianseeks: though I \
understand what you want, talking about Asian fonts will</p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; not help in \
understanding the problem. A font is only a small picture</p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; generated to make \
visable to a human being what a character code is. It is</p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; about the Unicode codes \
you get in the mail. As these are in groups (e.g.</p> <p style=" margin-top:0px; \
margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; \
text-indent:0px; -qt-user-state:0;">&gt; Japanese Hiragana is 3040 - 309F) a filter \
could decide that many</p> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;">&gt; characters within specific ranges in a mail could rate that \
mail as spam.</p> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; But \
(as all spam filtering) it is no exact science.</p> <p \
style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nbsp;</p> <p \
style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nbsp;</p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Just found a way of filtering \
on language.  </p> <p style="-qt-paragraph-type:empty; margin-top:0px; \
margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; \
text-indent:0px; ">&nbsp;</p> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;">I did a &quot;View Source&quot; of a Japanese spam email and \
checked the Subject: line.  It seems to contain a prefix of either \
&quot;ISO-2022-JP&quot; or &quot;Shift_JIS&quot;. Creating a filter on &quot;Complete \
Message&quot; containing either of those terms seems to work fine.</p> <p \
style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nbsp;</p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">regards</p> <p \
style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nbsp;</p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Ian</p></body></html>



_______________________________________________
KDE PIM users mailing list
Subscription management: https://mail.kde.org/mailman/listinfo/kdepim-users


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic