[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postfix-users
Subject:    Re: smtpd processes congregating at the pub
From:       Wietse Venema <wietse () porcupine ! org>
Date:       2010-01-31 16:38:28
Message-ID: 20100131163828.E6B691F3EA2 () spike ! porcupine ! org
[Download RAW message or body]

Stan Hoeppner:
> This is making good progress.  Seeing the smtpd's memory footprint
> drop so dramatically is fantastic.  However, I'm still curious as
> to why proxymap doesn't appear to be honoring $max_idle or $max_use.
> Maybe my understanding of $max_use is not correct?  It's currently
> set to 100, the default.  Watching top while sending a test message
> through, I see proxymap launch but then exit within 5 seconds,
> while smtpd honors max_idle.  Is there some other setting I need
> to change to keep proxymap around longer?

Short answer (workaround for low-traffic sites): set ipc_idle=$max_idle
to approximate the expected behavior. This keeps the smtpd-to-proxymap
connection open for as long as smtpd runs. Then, proxymap won't
terminate before its clients terminate.

Better: apply the long-term solution, in the form of the patch below.
This undoes the max_idle override (a workaround that I introduced
with Postfix 2.3).  I already introduced the better solution with
Postfix 2.4 while solving a different problem.

Long answer:  in ancient times, all Postfix daemons except qmgr
implemented the well-known max_idle=100s and max_use=100, as well
as the lesser-known ipc_idle=100s (see "short answer" for the effect
of that parameter).

While this worked fine for single-client servers such as smtpd, it
was not so great for multi-client servers such as proxymap or
trivial-rewrite.  This problem was known, and the idea was that it
would be solved over time.

Theoretically, smtpd could run for up to $max_idle * $max_use = 3
hours, while proxymap and trivial-rewrite could run for up to
$max_idle * $max_use * $max_use = 12 days on low-traffic systems
(one SMTP client every 100s, or a little under 900 SMTP clients a
day), and it would run forever on systems with a steady mail flow.

This was a problem. The point of max_use is to limit the impact of
bugs such as memory or file handle leaks, by retiring a process
after doing a limited amount of work. I can test Postfix itself
with tools such as Purify and Valgrind, but I can't do those tests
with every version of everyone's system libraries.

If a proxymap or trivial-rewrite server can run for 11 days even
on systems with a minuscule load, then max_use isn't working as
intended.

The main cause is that the proxymap etc. clients reuse a connection
to improve efficiency. Therefore, the proxymap etc. server politely
waits until all its clients have disconnected before checking the
max_use counter.  While this politeness thing can't be changed
easily, it is relatively easy to play with the proxymap etc. server's
max_idle value, and with the smtpd etc.  ipc_ttl value.

Postfix 2.3 reduced the proxymap etc. max_idle to a fixed 1s value
to make those processes go away sooner when idle.  I think that
this was a mistake, because it makes processes terminate too soon,
and thereby worsens the low-traffic behavior.  Instead, we should
speed up the proxymap etc.  server's max_use counter.

Postfix 2.4 reduced ipc_ttl to 5s. This was done for a different
purpose: to allow proxymap etc. clients to switch to the least-loaded
proxymap etc. server. But, I think that this was also the right way
to deal with long-lived proxymap etc. processes, because it speeds
up the proxymap etc.  max_use counter.

The patch below keeps the reduced ipc_ttl from Postfix 2.4, and
removes the max_idle overrides from Postfix 2.3.

	Wietse

*** ./src/proxymap/proxymap.c-	Thu Jan 10 09:03:55 2008
--- ./src/proxymap/proxymap.c	Sun Jan 31 10:52:50 2010
***************
*** 594,605 ****
      myfree(saved_filter);
  
      /*
-      * This process is called by clients that already enforce the max_idle
-      * time, so we don't have to do it another time.
-      */
-     var_idle_limit = 1;
- 
-     /*
       * Never, ever, get killed by a master signal, as that could corrupt a
       * persistent database when we're in the middle of an update.
       */
--- 594,599 ----
*** ./src/trivial-rewrite/trivial-rewrite.c-	Wed Dec  9 18:39:51 2009
--- ./src/trivial-rewrite/trivial-rewrite.c	Sun Jan 31 10:53:01 2010
***************
*** 565,576 ****
      if (resolve_verify.transport_info)
  	transport_post_init(resolve_verify.transport_info);
      check_table_stats(0, (char *) 0);
- 
-     /*
-      * This process is called by clients that already enforce the max_idle
-      * time, so we don't have to do it another time.
-      */
-     var_idle_limit = 1;
  }
  
  MAIL_VERSION_STAMP_DECLARE;
--- 565,570 ----
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic