[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-general
Subject: Re: [GENERAL] debugging server connection issue
From: Stephen Constable <sjconsta () scisat ! ca>
Date: 2016-03-31 12:37:15
Message-ID: CACfdzQmdT6=dUgUFehn12+sC9trCLFf=h6d76MTwgTKFYQU7QA () mail ! gmail ! com
[Download RAW message or body]
In case anybody looks back on this thread in the future, I fixed the
problem (at least temporarily) by running the following in CentOS 7:
# echo "1025 65535" >/proc/sys/net/ipv4/ip_local_port_range
# sysctl -w net.ipv4.tcp_timestamps=1
# sysctl -w net.ipv4.tcp_tw_recycle=1
# sysctl -w tcp_tw_reuse=1
On both the database server and the client nodes. It seems the clients
were eating up all the available ephemeral ports opening and closing
database connections. I'm going to have a long talk with the developer
about why this is done in the first place :)
Cheers, and thanks for the tips.
--Steve
On Tue, Mar 29, 2016 at 6:51 PM Adrian Klaver <adrian.klaver@aklaver.com>
wrote:
> On 03/29/2016 03:25 PM, Stephen Constable wrote:
> > Sorry, my client environment is Linux.
>
> Hmm, so I was reading win32.c wrong. It is mapping a Windows error
> message to that string.
>
> >
> > My current theory is that my clients are running out of available
> > ephemeral ports, like in this thread:
> >
> http://dba.stackexchange.com/questions/59650/pgbouncer-works-great-but-occasionally-becomes-unavailable
> (but
> > I"m not currently using pg bouncer). I tried pg bouncer before and had
> > the same errors, which in retrospect makes the client-side issue seem
> > more likely. Are there any configuration variables I can set to reduce
> > the number of ephemeral ports required in the postgresql client
> > libraries? Otherwise, I will attempt to reconfigure the OS of the
> > client machines tomorrow morning.
>
> Not sure how that would work. To make a network connection would seem to
> me to require a port.
>
> Are you seeing the same sort of port churn on your 8.4 machine?
>
> Is the fact that is processing results slower maybe giving the ports a
> chance to timeout their wait time, versus not on the newer faster machine?
>
> The issue, to me at least, seems to be less the number of jobs, but the
> number of connections each job is producing. T
> >
> > Thanks,
> > Steve
> >
> > On Tue, Mar 29, 2016 at 4:44 PM Adrian Klaver <adrian.klaver@aklaver.com
> > <mailto:adrian.klaver@aklaver.com>> wrote:
> >
> > On 03/29/2016 01:28 PM, Stephen Constable wrote:
> > > My apologies, I'm not sure what part of the networking stack the
> > > messages are coming from. It also states:
> > > """
> > > could not connect to server: Cannot assign requested address
> > > Is the server running on host "<hostname>" and accepting
> > > TCP/IP connections on port <port>?
> > > """
> >
> > Alright I lied, the above is a Postgres error message. I am just not
> > used to seeing 'Cannot assign requested address'. Turns out it is in
> > interfaces/libpq/win32.c.
> >
> > So your client is running on Windows?
> >
> >
> > > This error is only printed under a 32-job load, never a single
> > job load.
> > >
> > > The processes are indeed connecting over a local network.
> > >
> > > I have only enabled the logging of connections and disconnections
> > since
> > > I figured that would be the most telling :) perhaps that was not
> the
> > > best idea. but, FYI, I see over 5000 such notices in a single
> > minute.
> > > I will reconfigure the logging to be more verbose.
> > >
> > > Thanks,
> > > Steve
> > >
> > > On Tue, Mar 29, 2016 at 4:21 PM Adrian Klaver
> > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> > > <mailto:adrian.klaver@aklaver.com
> > <mailto:adrian.klaver@aklaver.com>>> wrote:
> > >
> > > On 03/29/2016 01:10 PM, Stephen Constable wrote:
> > > > Hi All,
> > > >
> > > > I'm a new-ish sysadmin working on porting legacy
> > scientific code
> > > from a
> > > > local server/client to new supercomputer environment. My
> > work is
> > > mostly
> > > > done, except that my postgres database doesn't seem to be
> > able to
> > > keep
> > > > up with the new environment. The application is written
> > in-house
> > > in a
> > > > mixture of FORTAN 77 and C, and uses postgres BLOBS as its
> > main data
> > > > store. This application in particular only reads from the
> > > database, it
> > > > never writes, which *should* make it easy to scale.
> > > >
> > > > My main problem is that this client application is unable
> to
> > > connect to
> > > > the database under a modest load (32 simultaneous jobs).
> > The client
> > > > error logs print out messages like "could not connect to
> > server:
> > > Cannot
> > > > assign requested address" and "Cannot connect to database
> > > [runlog]!!!"
> > > > (an important database of ours). The "cannot assign
> requested
> > > address"
> > >
> > > Well those do not look like Postgres error messages to me, so
> > the first
> > > thing would be to determine what part of the stack is
> > generating them.
> > >
> > > Is the client software connecting to the database over a
> network?
> > >
> > > Are you using connection pooling?
> > >
> > > > message makes me think it's a configuration issue. The
> > logs are
> > > flooded
> > > > with hundreds of connection and disconnection notices per
> > > second. This
> > >
> > > Might want to turn off logging connections/disconnections:
> > >
> > >
> >
> http://www.postgresql.org/docs/9.4/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT
> > >
> > > log_connections (boolean)
> > >
> > > log_disconnections (boolean)
> > >
> > > > same code and configuration runs fine on our mid-2000's
> > Solaris
> > > 10 box
> > > > with postgres 8.4 (albeit very slowly) but totally fails
> > with these
> > > > connection errors on a modern Dell system running CentOS 7
> or
> > > FreeBSD 10
> > > > (I tested both) with postgres 9.4.
> > > >
> > > > While the database is under load (and jobs are actively
> > failing),
> > > select
> > > > count(*) from pg_stat_activity returns 30-34 ish
> > connections, show
> > > > max_connections returns 100, and show
> > superuser_reserved_connections
> > > > shows 3. My only other hint is that right after a fresh
> > install of
> > > > CentOS 7 my job success rate was around 50%, and now it has
> > > approached
> > > > approximately 5%, so something is changing over time.
> > > >
> > > > Does anyone have any advice or experience with similar
> issues?
> > >
> > > What else does the Postgres log show besides the
> > > connections/disconnections, that might be of interest?
> > >
> > > What does the system log show?
> > >
> > > >
> > > > Thanks,
> > > > Steve
> > > >
> > >
> > >
> > > --
> > > Adrian Klaver
> > > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> > <mailto:adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com
> >>
> > >
> >
> >
> > --
> > Adrian Klaver
> > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> >
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>
[Attachment #3 (text/html)]
<div dir="ltr">In case anybody looks back on this thread in the future, I fixed the \
problem (at least temporarily) by running the following in CentOS 7:<div># echo \
"1025 65535" >/proc/sys/net/ipv4/ip_local_port_range<br></div><div># \
sysctl -w net.ipv4.tcp_timestamps=1</div><div># sysctl -w \
net.ipv4.tcp_tw_recycle=1</div><div># sysctl -w \
tcp_tw_reuse=1</div><div><br></div><div>On both the database server and the client \
nodes. It seems the clients were eating up all the available ephemeral ports \
opening and closing database connections. I'm going to have a long talk with \
the developer about why this is done in the first place \
:)</div><div><br></div><div>Cheers, and thanks for the \
tips.</div><div>--Steve</div></div><br><div class="gmail_quote"><div dir="ltr">On \
Tue, Mar 29, 2016 at 6:51 PM Adrian Klaver <<a \
href="mailto:adrian.klaver@aklaver.com">adrian.klaver@aklaver.com</a>> \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex">On 03/29/2016 03:25 PM, Stephen \
Constable wrote:<br> > Sorry, my client environment is Linux.<br>
<br>
Hmm, so I was reading win32.c wrong. It is mapping a Windows error<br>
message to that string.<br>
<br>
><br>
> My current theory is that my clients are running out of available<br>
> ephemeral ports, like in this thread:<br>
> <a href="http://dba.stackexchange.com/questions/59650/pgbouncer-works-great-but-occasionally-becomes-unavailable" \
rel="noreferrer" target="_blank">http://dba.stackexchange.com/questions/59650/pgbouncer-works-great-but-occasionally-becomes-unavailable</a> \
(but<br> > I"m not currently using pg bouncer). I tried pg bouncer before \
and had<br> > the same errors, which in retrospect makes the client-side issue \
seem<br> > more likely. Are there any configuration variables I can set to \
reduce<br> > the number of ephemeral ports required in the postgresql client<br>
> libraries? Otherwise, I will attempt to reconfigure the OS of the<br>
> client machines tomorrow morning.<br>
<br>
Not sure how that would work. To make a network connection would seem to<br>
me to require a port.<br>
<br>
Are you seeing the same sort of port churn on your 8.4 machine?<br>
<br>
Is the fact that is processing results slower maybe giving the ports a<br>
chance to timeout their wait time, versus not on the newer faster machine?<br>
<br>
The issue, to me at least, seems to be less the number of jobs, but the<br>
number of connections each job is producing. T<br>
><br>
> Thanks,<br>
> Steve<br>
><br>
> On Tue, Mar 29, 2016 at 4:44 PM Adrian Klaver <<a \
href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a><br> > <mailto:<a \
href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a>>> wrote:<br> ><br>
> On 03/29/2016 01:28 PM, Stephen Constable wrote:<br>
> > My apologies, I'm not sure what part of the networking stack \
the<br> > > messages are coming from. It also states:<br>
> > """<br>
> > could not connect to server: Cannot assign requested address<br>
> > Is the server running on host "<hostname>" and \
accepting<br> > > TCP/IP connections on port <port>?<br>
> > """<br>
><br>
> Alright I lied, the above is a Postgres error message. I am just not<br>
> used to seeing 'Cannot assign requested address'. Turns out it is \
in<br> > interfaces/libpq/win32.c.<br>
><br>
> So your client is running on Windows?<br>
><br>
><br>
> > This error is only printed under a 32-job load, never a single<br>
> job load.<br>
> ><br>
> > The processes are indeed connecting over a local network.<br>
> ><br>
> > I have only enabled the logging of connections and \
disconnections<br> > since<br>
> > I figured that would be the most telling :) perhaps that was not \
the<br> > > best idea. but, FYI, I see over 5000 such notices in a \
single<br> > minute.<br>
> > I will reconfigure the logging to be more verbose.<br>
> ><br>
> > Thanks,<br>
> > Steve<br>
> ><br>
> > On Tue, Mar 29, 2016 at 4:21 PM Adrian Klaver<br>
> <<a href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a> <mailto:<a \
href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a>><br> > > <mailto:<a \
href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a><br> > <mailto:<a \
href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a>>>> wrote:<br> > \
><br> > > On 03/29/2016 01:10 PM, Stephen Constable \
wrote:<br> > > > Hi All,<br>
> > ><br>
> > > I'm a new-ish sysadmin working on porting \
legacy<br> > scientific code<br>
> > from a<br>
> > > local server/client to new supercomputer environment. \
My<br> > work is<br>
> > mostly<br>
> > > done, except that my postgres database doesn't \
seem to be<br> > able to<br>
> > keep<br>
> > > up with the new environment. The application is \
written<br> > in-house<br>
> > in a<br>
> > > mixture of FORTAN 77 and C, and uses postgres BLOBS as \
its<br> > main data<br>
> > > store. This application in particular only reads \
from the<br> > > database, it<br>
> > > never writes, which *should* make it easy to \
scale.<br> > > ><br>
> > > My main problem is that this client application is \
unable to<br> > > connect to<br>
> > > the database under a modest load (32 simultaneous \
jobs).<br> > The client<br>
> > > error logs print out messages like "could not \
connect to<br> > server:<br>
> > Cannot<br>
> > > assign requested address" and "Cannot \
connect to database<br> > > [runlog]!!!"<br>
> > > (an important database of ours). The "cannot \
assign requested<br> > > address"<br>
> ><br>
> > Well those do not look like Postgres error messages to me, \
so<br> > the first<br>
> > thing would be to determine what part of the stack is<br>
> generating them.<br>
> ><br>
> > Is the client software connecting to the database over a \
network?<br> > ><br>
> > Are you using connection pooling?<br>
> ><br>
> > > message makes me think it's a configuration issue. \
The<br> > logs are<br>
> > flooded<br>
> > > with hundreds of connection and disconnection notices \
per<br> > > second. This<br>
> ><br>
> > Might want to turn off logging \
connections/disconnections:<br> > ><br>
> ><br>
> <a href="http://www.postgresql.org/docs/9.4/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT" \
rel="noreferrer" target="_blank">http://www.postgresql.org/docs/9.4/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT</a><br>
> ><br>
> > log_connections (boolean)<br>
> ><br>
> > log_disconnections (boolean)<br>
> ><br>
> > > same code and configuration runs fine on our \
mid-2000's<br> > Solaris<br>
> > 10 box<br>
> > > with postgres 8.4 (albeit very slowly) but totally \
fails<br> > with these<br>
> > > connection errors on a modern Dell system running \
CentOS 7 or<br> > > FreeBSD 10<br>
> > > (I tested both) with postgres 9.4.<br>
> > ><br>
> > > While the database is under load (and jobs are \
actively<br> > failing),<br>
> > select<br>
> > > count(*) from pg_stat_activity returns 30-34 ish<br>
> connections, show<br>
> > > max_connections returns 100, and show<br>
> superuser_reserved_connections<br>
> > > shows 3. My only other hint is that right after a \
fresh<br> > install of<br>
> > > CentOS 7 my job success rate was around 50%, and now \
it has<br> > > approached<br>
> > > approximately 5%, so something is changing over \
time.<br> > > ><br>
> > > Does anyone have any advice or experience with similar \
issues?<br> > ><br>
> > What else does the Postgres log show besides the<br>
> > connections/disconnections, that might be of interest?<br>
> ><br>
> > What does the system log show?<br>
> ><br>
> > ><br>
> > > Thanks,<br>
> > > Steve<br>
> > ><br>
> ><br>
> ><br>
> > --<br>
> > Adrian Klaver<br>
> > <a href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a> <mailto:<a \
href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a>><br> > <mailto:<a \
href="mailto:adrian.klaver@aklaver.com" target="_blank">adrian.klaver@aklaver.com</a> \
<mailto:<a href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a>>><br> > ><br>
><br>
><br>
> --<br>
> Adrian Klaver<br>
> <a href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a> <mailto:<a \
href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a>><br> ><br>
<br>
<br>
--<br>
Adrian Klaver<br>
<a href="mailto:adrian.klaver@aklaver.com" \
target="_blank">adrian.klaver@aklaver.com</a><br> </blockquote></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic