[prev in list] [next in list] [prev in thread] [next in thread]
List: postfix-users
Subject: Re: smtpd processes congregating at the pub
From: Wietse Venema <wietse () porcupine ! org>
Date: 2010-01-31 16:38:28
Message-ID: 20100131163828.E6B691F3EA2 () spike ! porcupine ! org
[Download RAW message or body]
Stan Hoeppner:
> This is making good progress. Seeing the smtpd's memory footprint
> drop so dramatically is fantastic. However, I'm still curious as
> to why proxymap doesn't appear to be honoring $max_idle or $max_use.
> Maybe my understanding of $max_use is not correct? It's currently
> set to 100, the default. Watching top while sending a test message
> through, I see proxymap launch but then exit within 5 seconds,
> while smtpd honors max_idle. Is there some other setting I need
> to change to keep proxymap around longer?
Short answer (workaround for low-traffic sites): set ipc_idle=$max_idle
to approximate the expected behavior. This keeps the smtpd-to-proxymap
connection open for as long as smtpd runs. Then, proxymap won't
terminate before its clients terminate.
Better: apply the long-term solution, in the form of the patch below.
This undoes the max_idle override (a workaround that I introduced
with Postfix 2.3). I already introduced the better solution with
Postfix 2.4 while solving a different problem.
Long answer: in ancient times, all Postfix daemons except qmgr
implemented the well-known max_idle=100s and max_use=100, as well
as the lesser-known ipc_idle=100s (see "short answer" for the effect
of that parameter).
While this worked fine for single-client servers such as smtpd, it
was not so great for multi-client servers such as proxymap or
trivial-rewrite. This problem was known, and the idea was that it
would be solved over time.
Theoretically, smtpd could run for up to $max_idle * $max_use = 3
hours, while proxymap and trivial-rewrite could run for up to
$max_idle * $max_use * $max_use = 12 days on low-traffic systems
(one SMTP client every 100s, or a little under 900 SMTP clients a
day), and it would run forever on systems with a steady mail flow.
This was a problem. The point of max_use is to limit the impact of
bugs such as memory or file handle leaks, by retiring a process
after doing a limited amount of work. I can test Postfix itself
with tools such as Purify and Valgrind, but I can't do those tests
with every version of everyone's system libraries.
If a proxymap or trivial-rewrite server can run for 11 days even
on systems with a minuscule load, then max_use isn't working as
intended.
The main cause is that the proxymap etc. clients reuse a connection
to improve efficiency. Therefore, the proxymap etc. server politely
waits until all its clients have disconnected before checking the
max_use counter. While this politeness thing can't be changed
easily, it is relatively easy to play with the proxymap etc. server's
max_idle value, and with the smtpd etc. ipc_ttl value.
Postfix 2.3 reduced the proxymap etc. max_idle to a fixed 1s value
to make those processes go away sooner when idle. I think that
this was a mistake, because it makes processes terminate too soon,
and thereby worsens the low-traffic behavior. Instead, we should
speed up the proxymap etc. server's max_use counter.
Postfix 2.4 reduced ipc_ttl to 5s. This was done for a different
purpose: to allow proxymap etc. clients to switch to the least-loaded
proxymap etc. server. But, I think that this was also the right way
to deal with long-lived proxymap etc. processes, because it speeds
up the proxymap etc. max_use counter.
The patch below keeps the reduced ipc_ttl from Postfix 2.4, and
removes the max_idle overrides from Postfix 2.3.
Wietse
*** ./src/proxymap/proxymap.c- Thu Jan 10 09:03:55 2008
--- ./src/proxymap/proxymap.c Sun Jan 31 10:52:50 2010
***************
*** 594,605 ****
myfree(saved_filter);
/*
- * This process is called by clients that already enforce the max_idle
- * time, so we don't have to do it another time.
- */
- var_idle_limit = 1;
-
- /*
* Never, ever, get killed by a master signal, as that could corrupt a
* persistent database when we're in the middle of an update.
*/
--- 594,599 ----
*** ./src/trivial-rewrite/trivial-rewrite.c- Wed Dec 9 18:39:51 2009
--- ./src/trivial-rewrite/trivial-rewrite.c Sun Jan 31 10:53:01 2010
***************
*** 565,576 ****
if (resolve_verify.transport_info)
transport_post_init(resolve_verify.transport_info);
check_table_stats(0, (char *) 0);
-
- /*
- * This process is called by clients that already enforce the max_idle
- * time, so we don't have to do it another time.
- */
- var_idle_limit = 1;
}
MAIL_VERSION_STAMP_DECLARE;
--- 565,570 ----
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic