[prev in list] [next in list] [prev in thread] [next in thread] 

List:       qmail
Subject:    Re: Why does qmail-remote hang past timeout value; outline of problem and possible solution.
From:       "Mark Delany" <markd-qmail () BushWire ! Net>
Date:       2002-07-31 23:23:17
[Download RAW message or body]

On Wed, Jul 31, 2002 at 11:34:57PM +0200, Bruce Campbell allegedly wrote:
> On 31 Jul 2002, Mark Delany wrote:
> 
> > > As I said, it is a problem with the design philosophy of qmail.
> >
> > I guess given that it's a long-standing known kernel problem, then
> > it's a reasonable thing for qmail to implement a work-around and let
> > the kernel folk off the hook until the bug mysteriously hits enough
> > programs to make it a serious problem. Certainly plenty of code exists
> > to get around kernel and library bugs. But to me it's a bit like
> > giving up.
> 
> Yes, plenty of code does exist in other packages to get around known
> problems in different kernels.
> 
> The essential difference in the design philosophy of these packages (we'll
> limit it to other mailers) vs qmail is that one group insists on having
> the package work as well as possible even if the OS has a frontal
> lobotomy.  The other insists on having the OS put down, and most of the
> time, blames the operator.

There's a third option. By covering up the symptom you help hide the
cause. By exposing and trying to fully understand the cause we may
well squish this bug and help many others.

No one, it seems, has quite nailed the exact circumstances under which
this problem can be reliably reproduced, yet we have a number of hints
and reports here that there is a problem. Furthermore, we may be
tantalizingly close to identifying exactly what is causing the problem
as it seems that qmail-remote tweaks this bug quite regularly.

If we come to be able to reliable reproduce the problem and identify
the exact circumstances under which it occurs, we may help some
kernel/driver folks fix a nasty bug that will benefit all users for
all time.

While the band-aid approach of patching qmail-remote may help a few
qmail folk, that's about it. It does nothing for anyone else and
leaves a bug lurking in the kernel to trip up some other poor
unsuspecting bunny in the future.

Myopia like that is a disservice to people who have given so much time
to create these kernels in the first place.

So don't mistake this discussion as a "program right, kernel wrong"
sort of thing, it's not. We have the means to identify an insipid bug
and each time we band-aid it we throw away that opportunity.

The reason I go on about this is that this particular problem has been
around and variously reported since 1998 or thereabouts, both on this
list and to me privately. I first saw it on a Solaris 2.4 x86 platform
of all things, with, I think some sort of Intel NIC - though the
details now escape me. Since that time I've privately volunteered to
help debug other sites that apparently have the same problem, all in
an attempt to nail this sucker. Alas it has eluded my thus far.

In a way I'm jealous that you have a live one of these lurking in your
machine room :>

> > > Please provide me with a link to this in the archives, as my searches were
> > > unable to find it.
> >
> > Search for "qmail-remote" and "select".
> >
> > This is a start to one thread:
> > http://www.ornl.gov/its/archives/mailing-lists/qmail/2001/06/msg00486.html
> 
> I think you meant the start of the thread actually, being,
> 
> http://www.ornl.gov/its/archives/mailing-lists/qmail/2001/06/msg00461.html

Good to see you've come up to speed on the archives so quickly.


> If I find time, I'll see if I can reproduce the problem on the other OSes
> in our test environment (Solaris and FreeBSD on assorted hardware).

That would be most useful. Though it seems like you have an
environment that tweaks this pretty regularly. Can you attract the
interest of a kernel/driver person?


Regards.
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic