'Re: Architectural problems shown by the anti spam wizard'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kmail-devel
Subject:    Re: Architectural problems shown by the anti spam wizard
From:       Don Sanders <sanders () kde ! org>
Date:       2004-01-31 8:53:08
Message-ID: 200401311853.08711.sanders () kde ! org
[Download RAW message or body]

On Saturday 31 January 2004 06:39, Ingo Klöcker wrote:
> On Friday 30 January 2004 02:33, Don Sanders wrote:
> > On Thursday 29 January 2004 17:33, Andreas Gungl wrote:
> > > Don Sanders wrote:
> > > > Hi Andreas,
> > > >
> > > > On Thursday 29 January 2004 05:45, Andreas Gungl wrote:
> > > >>Hi,
> > > >>
> > > >>for those who are interested I want to describe some problems
> > > >> I came over while I worked on some details for the anti spam
> > > >> wizard.
> > > >
> > > > I'll try to answer your questions as best I can. But I'm
> > > > having difficulty because I don't understand the intended
> > > > workflow for setting up the anti-spam stuff you have been
> > > > working on.
> > > >
> > > > Don't get me wrong it's a critical feature in my opinion, but
> > > > I'm lacking clarity on how exactly it is meant to work from
> > > > the end users point of view. Is it intended to work like the
> > > > Mozilla ant-spam stuff? (I haven't used that either, but I
> > > > see lots of people saying good things about it).
> > > >
> > > > I don't understand what value the new spam/ham statuses have
> > > > for instance.
> > >
> > > I'll try to explain it in short here:
> > > The main problem is that KMail doesn't have built in spam
> > > filtering - in contradiction to Mozilla. Installing Mozilla
> > > gives you all you need, so you can rely on your own. KMail
> > > wants to cooperate with existing anti spam tools. The "average
> > > user" (whoever it is) has two problems IMO. She would have to
> > > check what tools are installed (installation could be prepared
> > > by the distributors) and then (perhaps more difficult) she
> > > would have to define filter rules to let KMail properly
> > > cooperate with the tools.
> > >
> > > The wizard tries to find installed tools. Then it allows to let
> > > create filters to use the tools to detect spam mails (basically
> > > by piping through the tools) and to handle spam messages e.g.
> > > by moving to trash (by identifying related headers like
> > > X-SpamFlag). Having bayesian spam tools, you need to learn
> > > them. So the wizard can create rules to let the tools learn ham
> > > and spam messages. The creation of the tools is done based on
> > > information in a config file (kmail.antispamrc, see CVS), so
> > > new tools can get added without changing the code.
> >
> > Ok. I guess it makes sense for the user to have to explicitly
> > activate spam filtering before it is used so a wizard that
> > appears when the user first tries to classify spam or something
> > makes sense.
> >
> > > Detection and handling rules are applied on incoming messages
> > > and on manual filtering. Classification (learning) is done
> > > using ad-hoc filters.
> >
> > To me ad-hoc filters refers to filters listed in the filter
> > configuration dialog with "Add this filter to the Apply Filter
> > Actions menu' chechbox ticked.
> >
> > I'm not sure whether it is a good idea to involve those with spam
> > filters.
>
> Why do you think it's not a good idea? 

Wonder more than think, I don't have a clear enough mental image of 
how the spam filtering patch works to form a hard opinion on it yet.

Ideally I would hope that using spam filtering would just be a case of 
clicking a checkbox to turn on spam filtering, and accepting the 
suggested spam tool.

Introducing new filters in the filter dialog is adding a little bit of 
UI complexity that I would prefer to avoid.

But I hadn't considered the possibility of efficiency concerns, I 
hadn't thought about the act of spam filtering itself being time 
consuming. That complicates things. As you imply it isn't necessary 
to filter on kde mailing lists and not doing so could provide a 
significant benefit to the user.

(I wonder if Thunderbird has efficiency concerns and if so how it 
addresses them.)

The use of async filtering could mitigate this efficiency concern 
somewhat, but it isn't activated yet, will require a bit of work to 
activate for pop/local mail. I guess it makes some sense to implement 
a solution that is usable without async filtering first.

I wonder what the filter(s?) created by the spam wizard will look 
like. I mean will there be one filter that pipes through the spam 
tool, and then a subsequent filter that checks to see if the 
X-Spam-Flag is set to something and if so moves the message to the 
trash?

> Isn't this a prime example 
> for the usefulness of ad-hoc filters? AFAIU the purpose of ad-hoc
> filters is to provide a way for the user to apply filter actions to
> messages. So what's wrong with creating ad-hoc filters for
> classifying messages as spam or ham?

I'm a bit unhappy with the filter criteria groupbox area appearing for 
ad hoc filters. I guess I would like to be able to get rid of it for 
ad hoc filters. I guess I would like to be able to click on the 
'fewer' button a few times to make that widget collapse into a state 
where it is obvious that the filter always matches.

I guess the use of ad hoc filters by spam filtering is bringing out or 
reminding me of weaknesses that I find objectionable in ad hoc 
filters.

I guess ad hoc filters suit spam filtering better than I had 
anticipated, nice.

> > An alternative would be to create two KActions 'Classify as
> > spam', 'Classify as ham' in kmmainwidget, create two
> > corresponding slots* KMMainWidget::slotClassifyAsSpam,
> > KMMainWidget::slotClassifyAsHam, and in the kernel create two
> > KMFilters one to mark as spam, one to mark as spam. These
> > KMFilters would be deleted and replaced by new KMFilters if spam
> > tool options were changed (via the wizard or a configuration
> > dialog or whatever).
> >
> > To make sure the KMFilters are applied to incoming messages the
> > KMFilterMgr (which should eventually be obsolete) and the
> > ActionScheduler (which is the replacement) could be updated.
>
> I don't think that this is a good idea. The user should be able to
> control when the spam filter is tested. For example I filter first
> for all KDE mailing-list and only then I check the remaining
> messages for spam. Since checking a message for being spam takes a
> long time (several seconds) I don't want to check all messages for
> spam.

I see, understood.

> Also I don't understand what the advantage of hardcoding the
> actions would be. I mean why did you invent ad-hoc filters if you
> now propose to hardcode the spam classification action although
> ad-hoc filters are perfectly suited for this task.

Will the spam wizard allow the existing (ad hoc) spam filters to be 
edited? This would be easy if hardcoded filters were used. I think it 
could also be done if ad hoc filters were used and KMFilters had 
unique ids.

> > > You may end up with some (many?) unused resp. invalid action
> > > entries for the toolbar in the XMLGUI ressource file. As soon
> > > as you have any new action named like such an entry, a toolbar
> > > button will show up even if you never intended to create one.
> >
> > If each KMMainWidget has two fixed KActions for classifying as
> > spam/ham then would this problem still exist?
>
> Of course not. But the problem is of general nature, i.e. it
> affects all ad-hoc filters (Create an ad-hoc filter, add it to your
> toolbar, delete the ad-hoc filter, create a new ad-hoc filter with
> the same name. Result: The new ad-hoc filter will show up in the
> toolbar although the user has never added it to the toolbar). So it
> has to be solved anyway regardless of whether we introduce
> hardcoded actions for spam/ham classification.

Ok, I find this argument persuasive, agreed.

> > I'm not sure I understand, this problem is due to the QObject
> > name of the spam classification KActions being variable, correct?
>
> Yes and no. The problem is due to the QObject name of all ad-hoc
> filters (not only of the spam classification KActions) because it's
> neither unique (the user can create multiple ad-hoc filters with
> the same name) nor constant (the user can rename ad-hoc filters) in
> time.

Understood, agreed.

> > > >>   The wizard plugs the action into the toolbar and to make
> > > >> the change persistent it modifies the XMLGUI file. This is
> > > >> done by manual manipulation of the XMLGUI file, currently
> > > >> there is no API which would support automatic write back of
> > > >> the current config by the toolbar itself. Of course the user
> > > >> will face the sync problems described above sooner or later.
> > > >
> > > > Not completely sure I understand the sync problems.
> > >
> > > The toolbar does not change it's XMLGUI config file when you
> > > plug an ad-hoc filter based action into it. A button would show
> > > up after you have plugged the action, but after a restart the
> > > button is lost if you don't replug it again.
> > > The wizard tries to eliminate the restart problem by
> > > additionally adding the actions to the ressource file.
> >
> > So again this is due to the QObject name of the spam
> > classification KActions being variable?
>
> Not really. This is due to KDE lacking an API for manipulating
> and/or saving the toolbar.
>
> > > >>Another problem might be based on
> > > >>translation (i18n) issues when e.g. the user switches the
> > > >> language, but I have no concrete scenario for this, it's
> > > >> just a guess.
> > > >
> > > > If you look at the KAction constructor in
> > > > initializeFilterActions() the QObject::name() is not i18n()'d
> > > > so this should be invariant in the case of the user switching
> > > > the language.
> > >
> > > The action names are based on the filter's name.
> >
> > So that would seem to be the problem, the action names should be
> > fixed rather than dependant on the filter's name.
>
> Exactly.
>
> > > >>Let's say that we have a default action "mark as spam". The
> > > >>appropriate filter should register to this action making the
> > > >> action active. After that the user may configure the action
> > > >> regarding the toolbar position. My conclusion was that this
> > > >> (up to now) a very special case. It's more complex to
> > > >> implement than the solution above. And I still have no clear
> > > >> picture about what would this imply for other parts of
> > > >> KMail.
> > > >
> > > > I think default actions make sense.
> > >
> > > The mark spam action would be by default in KMail.
> >
> > Good.
>
> Not good. See my other reply.
>
> > > Then, there
> > > might be some code (the provider) which registers to it by
> > > saying, hey I want to be called when the mark spam action is
> > > triggered. It can be made inside the KMail code base e.g. by
> > > letting the user associate a filter with an action (oh, I don't
> > > really want this) or by even having a dcop interface for
> > > external plugins.
> >
> > So I think it makes sense to have a couple KMFilters in the
> > KMKernel, or create a new SpamClassification class that contains
> > a couple of KMFilters and have a reference to that in KMKernel.
> >
> > *The slotClassifyAsSpam, and slotClassifyAsHam could also be
> > methods of this SpamClassification class, or whatever.
>
> I think that this is completely unnecessary. Why should we special
> case spam handling? It works perfectly with the current filters and
> ad-hoc filters. The problems that now rear their heads are of
> general nature and not specific to the spam handling case. It's
> just that Andreas' work on the spam wizard brought those problems
> to the surface. But fixing the spam handling by introducing a
> special class won't magically fix the general problems.

Indeed, ok you're right.

> To be honest, I'm not totally opposed to special case spam handling
> by introducing two new actions (but only if those actions are
> capable of supporting multiple spam tools at the same time).

Conversely, if the ad hoc filters for spam filtering are given nice 
names like 'Mark as Spam', and 'Mark as Ham' then I think using ad 
hoc filters would be fine.

> Independently of this, the general problems have to be fixed as
> well. A solution for the uniqueness and constantness problem of the
> action names (which I already mentioned it in my other message)
> would be to give all filters a unique identifier.

Agreed. So how about a QString KMFilter::id() method?

I suggest considering making KMMsgDict::getNextMsgSerNum() public and 
using something like :
  int random = whatEver;
  int serial = kmkernel->msgDict()->getNextMsgSerNum();
  filterId = QString( "%d:%d" ).arg( serial,0,36 ).arg( random,0, 36);

Rather than using a random number alone. Less chance of collision due 
to a broken clock.

All KMFilters could be given a name when they a constructed (0 means 
make one up). KMMainWidget::initializeFilterActions could be updated 
to use these ids.

Well that's the approach I think makes sense.

> And the problem 
> with manipulating the XMLGUI file would no longer be a problem if
> we simply add the classification actions to the default toolbar.

Yeah, if we could get some attractive spam/ham toolbuttons I think 
they could make quite an attractive pair in the toolbar.

Don.
_______________________________________________
KMail developers mailing list
KMail-devel@kde.org
https://mail.kde.org/mailman/listinfo/kmail-devel
[prev in list] [next in list] [prev in thread] [next in thread]