[prev in list] [next in list] [prev in thread] [next in thread]
List: spamassassin-users
Subject: Re: ExtractText tuning
From: Matus UHLAR - fantomas <uhlar () fantomas ! sk>
Date: 2023-03-17 8:00:25
Message-ID: ZBQeGUfh+QP28zsA () fantomas ! sk
[Download RAW message or body]
>> I have successfully set up ExtractText plugin with proposed settings
>> (those in pod/manual page) and here's a tip:
>>
>> - put extracttext.pm into /etc/spamassassin or similar directory
>> (extracttest settings aren't loaded from user_prefs)
>>
>> - tesseract takes too much time to process (at least on my server),
>> so I recommend to set:
>>
>> extracttext_timeout 20 60
On 06.03.23 12:23, Alex wrote:
>Have you noticed an increase in false positives due to legitimate "invoice"
>PDFs or other attachments being processed by body filters and getting
>tagged incorrectly?
Update:
so far I am only happy by catching spams using BAYES:
X-Spam-ExtractText-Chars: 118
X-Spam-ExtractText-Words: 19
X-Spam-ExtractText-Tools: pdftotext
X-Spam-ExtractText-Types: application/pdf
X-Spam-ExtractText-Extensions: pdf
I believe training of invoices would quickly fix any problem
--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Nothing is fool-proof to a talented fool.
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic