[prev in list] [next in list] [prev in thread] [next in thread]
List: spamassassin-users
Subject: Re: Evasion with Unicode format characters
From: John Hardin <jhardin () impsec ! org>
Date: 2018-10-30 20:34:17
Message-ID: alpine.LNX.2.21.1810301329480.4629 () athena ! impsec ! org
[Download RAW message or body]
On Tue, 30 Oct 2018, Cedric Knight wrote:
> I thought of submitting a patch via Bugzilla, but then decided to first
> ask and check that I understood the general principles of body checks,
> and SpamAssassin's current approach to Unicode. Apologies for the length
> of this message. I hope the main points make sense.
>
> A fair number of webcam bitcoin 'sextortion' scams have evaded detection
> and worried recipients because of including relevant credentials.
> (Incidentally, I assume the credentials and addresses are mostly from
> the 2012 LinkedIn breach, but someone on the RIPE abuse list reports
> Mailman passwords were also used). BITCOIN_SPAM_05 is catching some of
> this spam, but on writing body regexes to catch the wave around 16
> October, I noticed that my rules weren't matching because the source was
> liberally injected with invisible characters:
> Content preview: I a<U+200C>m a<U+200C>wa<U+200C>re blabla is one of
> your pa<U+200C>ss. L<U+200C>ets g<U+200C>et strai<U+200C>ght
> to<U+200C> po<U+200C>i<U+200C>nt. No<U+200C>t o<U+200C>n<U+200C>e
Would you send me a zipped copy? I would like to update the ZW text
obfuscation rule for that, and possibly others.
> As minor points, 'Format' excludes a couple of separator characters in
> the same range that instead match [:space:]
> https://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:subhead=Format%20character:]
> Then there is the C1 [:cntrl:] set, which some MUA's may render
> silently, I think including the 0x9D matched by the recent
> __UNICODE_OBFU_ZW (what's the significance of UNICODE in the rule name?):
> https://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:General_Category=Control:]
> Finally, there may be a case for including as 'almost' invisible narrow
> blanks like U+200A   U+202F and maybe U+205F. The Perl Unicode
> database may not be completely up-to-date here, and Perl 5.18 doesn't
> recognise U+61c, U+2066 and U+1BCA1 ranges as p\{Format}, although 5.24
> does.
"UNICODE" because the invisible crap ain't ANSI. :)
> So my patch was going to be something to eliminate Format characters
> from get_rendered_body_text_array() like:
> --- lib/Mail/SpamAssassin/Message.pm (revision 1844922)
> +++ lib/Mail/SpamAssassin/Message.pm (working copy)
> @@ -1167,6 +1167,8 @@
> $text =~ s/\n+\s*\n+/\x00/gs; # double newlines => null
> # $text =~ tr/ \t\n\r\x0b\xa0/ /s; # whitespace (incl. VT, NBSP) => space
> # $text =~ tr/ \t\n\r\x0b/ /s; # whitespace (incl. VT) => single space
> + # do not render zero-width Unicode characters used as obfuscation:
> + $text =~
> s/[\p{Format}\N{U+200C}\N{U+2028}\N{U+2029}\N{U+061C}\N{U+180E}\N{U+2065}-\N{U+2069}]//gs;
> $text =~ s/\s+/ /gs; # Unicode whitespace => single space
> $text =~ tr/\x00/\n/; # null => newline
The problem with this approach is the *presence* of such characters is a
pretty strong spam sign.
Potentially those tests could be moved to RAWBODY rules, though - I'll
investigate that for the ZW rule.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
...the Fates notice those who buy chainsaws...
-- www.darwinawards.com
-----------------------------------------------------------------------
Tomorrow: Halloween
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic