[prev in list] [next in list] [prev in thread] [next in thread]
List: kdepim-users
Subject: Re: [kdepim-users] Filter on language
From: ianseeks <ianseeks () dsl ! pipex ! com>
Date: 2012-11-29 17:22:30
Message-ID: 9034719.FB6doMtEnj () linux-k5wg
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
snip
> Ciao,
>
> As the only character encoding that is (going to be) used by modern systems
> is UTF-8 encoded Unicode, filtering on encoding (like the several ISO 8859
> versions) is not going to work.
>
> @ianseeks: though I understand what you want, talking about Asian fonts will
> not help in understanding the problem. A font is only a small picture
> generated to make visable to a human being what a character code is. It is
> about the Unicode codes you get in the mail. As these are in groups (e.g.
> Japanese Hiragana is 3040 - 309F) a filter could decide that many
> characters within specific ranges in a mail could rate that mail as spam.
> But (as all spam filtering) it is no exact science.
Just found a way of filtering on language.
I did a "View Source" of a Japanese spam email and checked the Subject: line.
It seems to contain a prefix of either "ISO-2022-JP" or "Shift_JIS". Creating
a filter on "Complete Message" containing either of those terms seems to work
fine.
regards
Ian
[Attachment #5 (unknown)]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" \
"http://www.w3.org/TR/REC-html40/strict.dtd"> <html><head><meta name="qrichtext" \
content="1" /><style type="text/css"> p, li { white-space: pre-wrap; }
</style></head><body style=" font-family:'Liberation Sans [unknown]'; font-size:9pt; \
font-weight:400; font-style:normal;"> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;">snip</p> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;"> > Ciao,</p> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;">> </p> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;">> As the only character encoding that is (going to be) used by \
modern systems</p> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> is \
UTF-8 encoded Unicode, filtering on encoding (like the several ISO 8859</p> <p \
style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> versions) is not going \
to work.</p> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> </p> \
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> @ianseeks: though I \
understand what you want, talking about Asian fonts will</p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> not help in \
understanding the problem. A font is only a small picture</p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> generated to make \
visable to a human being what a character code is. It is</p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> about the Unicode codes \
you get in the mail. As these are in groups (e.g.</p> <p style=" margin-top:0px; \
margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; \
text-indent:0px; -qt-user-state:0;">> Japanese Hiragana is 3040 - 309F) a filter \
could decide that many</p> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;">> characters within specific ranges in a mail could rate that \
mail as spam.</p> <p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> But \
(as all spam filtering) it is no exact science.</p> <p \
style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p> <p \
style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Just found a way of filtering \
on language. </p> <p style="-qt-paragraph-type:empty; margin-top:0px; \
margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; \
text-indent:0px; "> </p> <p style=" margin-top:0px; margin-bottom:0px; \
margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; \
-qt-user-state:0;">I did a "View Source" of a Japanese spam email and \
checked the Subject: line. It seems to contain a prefix of either \
"ISO-2022-JP" or "Shift_JIS". Creating a filter on "Complete \
Message" containing either of those terms seems to work fine.</p> <p \
style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">regards</p> <p \
style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; \
margin-right:0px; -qt-block-indent:0; text-indent:0px; "> </p> <p style=" \
margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; \
-qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Ian</p></body></html>
_______________________________________________
KDE PIM users mailing list
Subscription management: https://mail.kde.org/mailman/listinfo/kdepim-users
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic