[prev in list] [next in list] [prev in thread] [next in thread] 

List:       spamassassin-devel
Subject:    Re: svn commit: r1885178 - /spamassassin/trunk/rulesrc/sandbox/jhardin/40_local_419replyto.cf
From:       John Hardin <jhardin () impsec ! org>
Date:       2021-01-06 18:48:00
Message-ID: alpine.LNX.2.21.2101060844390.8694 () athena ! impsec ! org
[Download RAW message or body]


On Wed, 6 Jan 2021, Bill Cole wrote:

> John,
> 
> 1st: thank you very much for working on generated rules and for all the rest of \
> your work on rules. 
> I am curious about whether these very long regexes have been proven to 
> actually work in full, or if it is possible that they are getting 
> mishandled quietly. I don't see any hits on either rule on any of the 
> mail systems I work with going back a month, so I am wondering if it is 
> worthwhile to construct test messages that should hit due to elements in 
> the latter parts of the patterns or if you've already done such tests.

I did some quick searching to see if there's any documented max RE size 
and couldn't find any such. I'd wager that's based on available resources 
rather than being a hard size limit.

If there is a hard length limit on REs, and SA silently breaks when that 
limit is exceeded, it doesn't appear to have hit it yet:

Jan  6 09:28:07.206 [30594] dbg: rules: ran header rule __REPTO_419_FRAUD_GM_LOOSE \
======> got hit: "zminhong65@gmail.com" Jan  6 09:28:07.207 [30594] dbg: rules: ran \
header rule REPTO_419_FRAUD_GM ======> got hit: "zminhong65@gmail.com"

Jan  6 09:28:40.398 [30728] dbg: rules: ran header rule REPTO_419_FRAUD ======> got \
hit: "zimcargoservicehelpdesks@tlen.pl"


But in retesting this I did find and fix a minor RE error that caused it 
to miss addresses in yahoo.com.XX

Thanks for asking!


There's no guarantee you'll see hits. The only feed I have for this is 419 
spams sent to me and my wife, and a few 419 spamples that others have 
provided, so the sample set is probably rather small even though it feels 
like I get a metric buttload of such garbage.

If anybody has a well-vetted 419 scam corpus that they'd be willing to 
extract reply-to addresses from to contribute to this, feel free to 
contact me privately.


The reason I decided to do this is that I'm still getting 419 pitches 
having gmail contact addresses that I started reporting *more than six 
months ago* (and continue to report every time I get another one). I would 
assume that if Google had actually suspended the accounts after (multiple) 
reports then the 419 scammers would have stopped using those contact 
addresses in their pitches because they couldn't receive replies, thus it 
looks to me like google just doesn't give a shit about my 419 collector 
mailbox reports. However, I recognize that assumption may be flawed, and 
anyone with actual contacts inside the gmail administrative team is 
invited to email me privately to discuss this.

I figure if I'm still seeing them then others are probably seeing them too 
and would benefit from them being scored in addition to the body-based 419 
scam tests.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Je ne suis pas Charlie. Je suis armé.
-----------------------------------------------------------------------
  Tomorrow: the 6th anniversary of the Charlie Hebdo massacre



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic