[prev in list] [next in list] [prev in thread] [next in thread] 

List:       spamassassin-users
Subject:    Re: ExtractText tuning
From:       Matus UHLAR - fantomas <uhlar () fantomas ! sk>
Date:       2023-03-17 8:00:25
Message-ID: ZBQeGUfh+QP28zsA () fantomas ! sk
[Download RAW message or body]

>> I have successfully set up ExtractText plugin with proposed settings 
>> (those in pod/manual page) and here's a tip:
>>
>> - put extracttext.pm into /etc/spamassassin or similar directory
>>    (extracttest settings aren't loaded from user_prefs)
>>
>> - tesseract takes too much time to process (at least on my server),
>>    so I recommend to set:
>>
>> extracttext_timeout     20      60

On 06.03.23 12:23, Alex wrote:
>Have you noticed an increase in false positives due to legitimate "invoice"
>PDFs or other attachments being processed by body filters and getting
>tagged incorrectly?

Update:

so far I am only happy by catching spams using BAYES:

X-Spam-ExtractText-Chars: 118
X-Spam-ExtractText-Words: 19
X-Spam-ExtractText-Tools: pdftotext
X-Spam-ExtractText-Types: application/pdf
X-Spam-ExtractText-Extensions: pdf

I believe training of invoices would quickly fix any problem

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Nothing is fool-proof to a talented fool.
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic