[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: [PERFORM] sudden spurt in swap utilization (was:cpu bound
From:       Rajesh Kumar Mallah <mallah.rajesh () gmail ! com>
Date:       2010-06-26 13:34:42
Message-ID: AANLkTilCLpvv86RX2gbKp39zhFcMzka4gQIdfw657Ryz () mail ! gmail ! com
[Download RAW message or body]

Dear List ,

A simple (perl) script was made to 'watch' the state transitions of
back ends. On startup It captures a set of pids for watching
and displays  a visual representation of the states for next 30 intervals
of 1 seconds each. The X axis is interval cnt, Y axis is pid and the
origin is on top-left.

The state value can be Active Query (*) , or <IDLE> indicated by '.' or
'<IDLE> in transaction' indicated by '?' . for my server below is a random
output (during lean hours and on a lean day).

----------------------------------------------------------------------------------------------------
   PID 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30
----------------------------------------------------------------------------------------------------
  4334  ?  ?  ?  ?  *  ?  ?  ?  ?  ?  *  ?  ?  ?  ?  ?  ?  ?  ?
  6904  ?  ?  .  .  .  *  ?  .  .  .  .  .  .  ?  ?  .
  6951  ?  ?  ?  .  .  .  .  ?  ?  ?  ?  ?  .  .  .  ?  ?  ?  .  .  .  ?  .
.  .  .  .  ?  ?  .
  7009  ?  *  ?  ?  .  .  .  .  .  .  .  .  .  *  *  .  *  ?  ?  .  .  .  *
?  ?  ?  .  .  .  ?
  7077  ?  .  ?  .  .  .  *  .  ?  .  .  .  .  ?  .  .  .  .  .  .  .  ?  .
.  ?  .  .  .  ?  ?
  7088  ?  .  .  ?  .  ?  ?  ?  .  .  .  .  .  .  ?  .  .  ?  ?  *  .  .  .
.  .  ?  .  ?  .  *
  7091  ?  .  .  *  ?  ?  ?  ?  ?  ?  ?  *  ?  .  .  ?  *  .  *  .  .  .  .
.  .  .  .  .  .  .
  7093  ?  ?  .  ?  .  .  .  .  ?  .  ?  *  .  .  .  .  .  .  .  .  .  ?  ?
?  .  ?  ?  ?  .  .
  7112  *  *  .  .  .  ?  ?  ?  .  .  .  .  .  .  .  .  ?  ?  .  ?  .  ?  .
.  ?  .  .  .  .  .
  7135  ?  .  .  *  .  ?  ?  ?  .  ?  ?  .  .  .  ?  .  .  .  .  .  .  .  ?
.  .  .  ?  ?  .  .
  7142  ?  .  ?  .  .  .  .  .  .  *  .  .  .  ?  .  .  .  .  .  .  .  .  .
.  .  .  .  .
  7166  ?  .  ?  ?  ?  *  *  .  ?  *  .  ?  .  .  .  ?  .  ?  ?  .  .  .  *
.  .  .  ?  .  .  .
  8202  ?  ?  .  .  .  *  .  ?  .  .  .  .  .  .  .  *  ?  .  .  .  ?  ?  .
.  .  .  ?  ?  ?  .
  8223  ?  .  .  .  .  .  .  ?
  8237  ?  ?  ?  .  ?  ?  ?  ?  .  .  .  .  .  .  .  ?  .  .  .  .  .  ?  .
.  *  ?  .  .  .  .
  8251  ?  .  ?  .  .  .  .  .  ?  ?  .  .  .  *  ?  .  .  .  ?  .  .  .  .
.  .  .  .  .  .  .
  8278  ?  ?  .  .  .  .  ?  .  .  .  .  .  .  .  ?  .  .  .  .  .  .  ?  ?
.  .  *  .  .  .  .
  8290  ?  .  .
  8294  ?  ?  .  .  .  .  .  .  .  .  .  .  .  .  ?  .  .  .  ?  ?  .  .  .
.  .  .  .  .  *  *
  8303  ?  *  ?  .  ?  ?  ?  .  ?  ?  ?  .  .  .  .  *  .  .  .  .  .  .  .
.  .  .  .  .  .  .
  8306  ?  ?  .  .  .  ?  .  .  .  ?  .  .  .  .  .  .  *  .  .  .
  8309  *  ?  ?  ?  ?  .  .  .  ?  .  .  .
  8329  ?  .  *  *  .  .  .  .  .  .  .  *  .  ?  .  *  .  ?  .  *  .  *  ?
.  .  .
----------------------------------------------------------------------------------------------------
       (*) Active Query , (.) Idle , (?) Idle in transaction,<blank> backend
over.
----------------------------------------------------------------------------------------------------

Looks like most of the graph space is filled with (.) or (?) and very
less active queries (long running queries > 1s). on a busy day and busi hour
i shall check the and post again. The script is presented which depends only
on perl , DBI and DBD::Pg.

script pasted here:
http://pastebin.com/mrjSZfLB

Regds
mallah.


On Sat, Jun 26, 2010 at 3:23 PM, Rajesh Kumar Mallah <
mallah.rajesh@gmail.com> wrote:

> Dear List,
>
> Today  has been good since morning. Although it is a lean day
> for us but the indications are nice. I thank everyone who shared
> the concern. I think the most significant change has been to reduce
> shared_buffers from 10G to 4G , this has lead to reduced memory
> usage and some breathing space to the OS.
>
> Although i am yet to incorporate the suggestions from pgtune but
> i think the issue of max_connection needs to be addressed first.
>
> I am investigating application issues and about the mechanism that
> puts many backend to '<IDLE> in transaction ' mode for significant
> times. I thank Tom for the script he sent. Once that resolves i shall
> check pooling as suggested by Kevin, then eventually max_connections
> can be reduced. I shall also check pgpool and pgbouncer if they are
> helpful in this regard.
>
> I observed that the number of simultaneous connection today (lean day)
> hovers between 1 to 10 , occasionally shooting to 15 but never more than
> 20 i would say.
>
>
> I am happy that i/o waits are negligible and cpu is idling also for a
> while.
>
>
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
>  r  b   swpd   free   buff  cache     si   so    bi    bo   in   cs   us sy
> id wa st
> 22  0  18468 954120  13460 28491772    0    0   568  1558 13645 18355 62 10
> 27  2  0
> 16  0  18468 208100  13476 28469084    0    0   580   671 14039 17055 67 13
> 19  1  0
> 10  2  18812 329032  13400 28356972    0   46   301  1768 13848 17884 68 10
> 20  1  0
> 16  2  18812 366596  13416 28361620    0    0   325   535 13957 16649 72 11
> 16  1  0
> 50  1  18812 657048  13432 28366548    0    0   416   937 13823 16667 62  9
> 28  1  0
>  6  1  18812 361040  13452 28371908    0    0   323   522 14352 16789 74 12
> 14  0  0
> 33  0  18812 162760  12604 28210152    0    0   664  1544 14701 16315 66 11
> 22  2  0
>  5  0  18812 212028  10764 27921800    0    0   552   648 14567 17737 67 10
> 21  1  0
>  6  0  18796 279920  10548 27890388    3    0   359   562 12635 15976 60  9
> 30  1  0
>  8  0  18796 438820  10564 27894440    0    0   289  2144 12234 15770 57  8
> 34  1  0
>  5  0  18796 531800  10580 27901700    0    0   514   394 12169 16005 59  8
> 32  1  0
> 17  0  18796 645868  10596 27890704    0    0   423   948 13369 16554 67 10
> 23  1  0
>  9  1  18796 1076540  10612 27898604   0    0   598   403 12703 17363 71 10
> 18  1  0
>  8  0  18796 1666508  10628 27904748   0    0   430  1123 13314 17421 57  9
> 32  1  0
>  9  1  18776 1541444  10644 27913092   1    0   653   954 13194 16822 75 11
> 12  1  0
>  8  0  18776 1526728  10660 27921380   0    0   692   788 13073 16987 74  9
> 15  1  0
>  8  0  18776 1482304  10676 27933176   0    0   966  2029 13017 16651 76 12
> 11  1  0
> 21  0  18776 1683260  10700 27937492   0    0   298   663 13110 15796 67 10
> 23  1  0
> 18  0  18776 2087664  10716 27943512   0    0   406   622 12399 17072 62  9
> 28  1  0
>
>
> With 300 connections, I think that either of these could lead you to
>> experience intermittent bursts of extreme swapping.  I'd drop it to
>> somewhere in the 16MB to 32MB range until I had a connection pool
>> configured such that it was actually keeping the number of active
>> connections much lower.
>>
>> > (*) wal_buffers = 8MB # pgtune wizard 2010-06-25
>> > (64kb , via default)
>>
>> Sure, I'd boost this.
>>
>> > checkpoint_segments = 16 # pgtune wizard 2010-06-25
>> > (30 , specified)
>>
>> If you have the disk space for the 30 segments, I wouldn't reduce
>> it.
>>
>> > shared_buffers = 7680MB # pgtune wizard 2010-06-25
>> > (4096 MB , specified)
>>
>> This one is perhaps the most sensitive to workload.  Anywhere
>> between 1GB and 8GB might be best for you.  Greg Smith has some
>> great advice on how to tune this for your workload.
>>
>> > (*) max_connections = 80 # pgtune wizard 2010-06-25
>> > (300 , ;-) specified)
>> >
>> > when i reduce max_connections i start getting errors, i will see
>> > again concurrent connections during business hours.
>>
>> That's probably a good number to get to, but you have to reduce the
>> number of actual connections before you set the limit that low.
>>
>> > lot of our connections are in <IDLE> in transaction state
>>
>> If any of these stay in that state for more than a minute or two,
>> you need to address that if you want to get your connection count
>> under control.  If any of them persist for hours or days, you need
>> to fix it to avoid bloat which can kill performance.
>>
>> -Kevin
>>
>
>

[Attachment #3 (text/html)]

<span style="font-family: courier new,monospace;">Dear List ,<br><br>A simple (perl) \
script was made to &#39;watch&#39; the state transitions of <br>back ends. On startup \
It captures a set of pids for watching <br>and displays  a visual representation of \
the states for next 30 intervals<br> of 1 seconds each. The X axis is interval cnt, Y \
axis is pid and the <br>origin is on top-left.<br><br>The state value can be Active \
Query (*) , or &lt;IDLE&gt; indicated by &#39;.&#39; or <br>&#39;&lt;IDLE&gt; in \
transaction&#39; indicated by &#39;?&#39; . for my server below is a random <br> \
output (during lean hours and on a lean \
day).<br><br>----------------------------------------------------------------------------------------------------<br> \
PID 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 \
                28 29 30<br>
----------------------------------------------------------------------------------------------------<br> \
4334  ?  ?  ?  ?  *  ?  ?  ?  ?  ?  *  ?  ?  ?  ?  ?  ?  ?  ?<br>  6904  ?  ?  .  .  \
.  *  ?  .  .  .  .  .  .  ?  ?  .<br>  6951  ?  ?  ?  .  .  .  .  ?  ?  ?  ?  ?  .  \
.  .  ?  ?  ?  .  .  .  ?  .  .  .  .  .  ?  ?  .<br>  7009  ?  *  ?  ?  .  .  .  .  \
.  .  .  .  .  *  *  .  *  ?  ?  .  .  .  *  ?  ?  ?  .  .  .  ?<br>  7077  ?  .  ?  \
.  .  .  *  .  ?  .  .  .  .  ?  .  .  .  .  .  .  .  ?  .  .  ?  .  .  .  ?  ?<br>  \
7088  ?  .  .  ?  .  ?  ?  ?  .  .  .  .  .  .  ?  .  .  ?  ?  *  .  .  .  .  .  ?  . \
?  .  *<br>  7091  ?  .  .  *  ?  ?  ?  ?  ?  ?  ?  *  ?  .  .  ?  *  .  *  .  .  .  \
.  .  .  .  .  .  .  .<br>  7093  ?  ?  .  ?  .  .  .  .  ?  .  ?  *  .  .  .  .  .  \
.  .  .  .  ?  ?  ?  .  ?  ?  ?  .  .<br>  7112  *  *  .  .  .  ?  ?  ?  .  .  .  .  \
.  .  .  .  ?  ?  .  ?  .  ?  .  .  ?  .  .  .  .  .<br>  7135  ?  .  .  *  .  ?  ?  \
?  .  ?  ?  .  .  .  ?  .  .  .  .  .  .  .  ?  .  .  .  ?  ?  .  .<br>  7142  ?  .  \
?  .  .  .  .  .  .  *  .  .  .  ?  .  .  .  .  .  .  .  .  .  .  .  .  .  .<br>  \
7166  ?  .  ?  ?  ?  *  *  .  ?  *  .  ?  .  .  .  ?  .  ?  ?  .  .  .  *  .  .  .  ? \
.  .  .<br>  8202  ?  ?  .  .  .  *  .  ?  .  .  .  .  .  .  .  *  ?  .  .  .  ?  ?  \
.  .  .  .  ?  ?  ?  .<br>  8223  ?  .  .  .  .  .  .  ?<br>  8237  ?  ?  ?  .  ?  ?  \
?  ?  .  .  .  .  .  .  .  ?  .  .  .  .  .  ?  .  .  *  ?  .  .  .  .<br>  8251  ?  \
.  ?  .  .  .  .  .  ?  ?  .  .  .  *  ?  .  .  .  ?  .  .  .  .  .  .  .  .  .  .  \
.<br>  8278  ?  ?  .  .  .  .  ?  .  .  .  .  .  .  .  ?  .  .  .  .  .  .  ?  ?  .  \
.  *  .  .  .  .<br>  8290  ?  .  .<br>  8294  ?  ?  .  .  .  .  .  .  .  .  .  .  .  \
.  ?  .  .  .  ?  ?  .  .  .  .  .  .  .  .  *  *<br>  8303  ?  *  ?  .  ?  ?  ?  .  \
?  ?  ?  .  .  .  .  *  .  .  .  .  .  .  .  .  .  .  .  .  .  .<br>  8306  ?  ?  .  \
.  .  ?  .  .  .  ?  .  .  .  .  .  .  *  .  .  .<br>  8309  *  ?  ?  ?  ?  .  .  .  \
?  .  .  .<br>  8329  ?  .  *  *  .  .  .  .  .  .  .  *  .  ?  .  *  .  ?  .  *  .  \
*  ?  .  .  .<br>----------------------------------------------------------------------------------------------------<br>
  (*) Active Query , (.) Idle , (?) Idle in transaction,&lt;blank&gt; backend \
over.<br>----------------------------------------------------------------------------------------------------<br><br>Looks \
like most of the graph space is filled with (.) or (?) and very<br> less active \
queries (long running queries &gt; 1s). on a busy day and busi hour i shall check the \
and post again. The script is presented which depends only on perl , DBI and \
DBD::Pg.<br><br>script pasted here:<br><a \
href="http://pastebin.com/mrjSZfLB">http://pastebin.com/mrjSZfLB</a><br> \
<br>Regds<br>mallah.<br><br></span><br><div class="gmail_quote">On Sat, Jun 26, 2010 \
at 3:23 PM, Rajesh Kumar Mallah <span dir="ltr">&lt;<a \
href="mailto:mallah.rajesh@gmail.com">mallah.rajesh@gmail.com</a>&gt;</span> \
wrote:<br> <blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, \
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Dear List,<br><br>Today  \
has been good since morning. Although it is a lean day<br>for us but the indications \
are nice. I thank everyone who shared<br> the concern. I think the most significant \
change has been to reduce<br> shared_buffers from 10G to 4G , this has lead to \
reduced memory <br>usage and some breathing space to the OS.<br><br>Although i am yet \
to incorporate the suggestions from pgtune but <br>i think the issue of \
max_connection needs to be addressed first.<br>

<br>I am investigating application issues and about the mechanism that<br>puts many \
backend to &#39;&lt;IDLE&gt; in transaction &#39; mode for significant<br>times. I \
thank Tom for the script he sent. Once that resolves i shall<br>

check pooling as suggested by Kevin, then eventually max_connections <br>can be \
reduced. I shall also check pgpool and pgbouncer if they are<br>helpful in this \
regard.<br><br>I observed that the number of simultaneous connection today (lean \
day)<br>

hovers between 1 to 10 , occasionally shooting to 15 but never more than <br>20 i \
would say.<br><br><br>I am happy that i/o waits are negligible and cpu is idling also \
for a while.<div class="im"><br><br><span style="font-family: courier \
new,monospace;">procs -----------memory---------- ---swap-- -----io---- --system-- \
-----cpu------</span><br style="font-family: courier new,monospace;">

<span style="font-family: courier new,monospace;"> r  b   swpd   free   buff  cache   \
si   so    bi    bo   in   cs   us sy id wa st</span><br style="font-family: courier \
new,monospace;"></div><span style="font-family: courier new,monospace;">22  0  18468 \
954120  13460 28491772    0    0   568  1558 13645 18355 62 10 27  2  0</span><br \
style="font-family: courier new,monospace;">

<span style="font-family: courier new,monospace;">16  0  18468 208100  13476 28469084 \
0    0   580   671 14039 17055 67 13 19  1  0</span><br style="font-family: courier \
new,monospace;"><span style="font-family: courier new,monospace;">10  2  18812 329032 \
13400 28356972    0   46   301  1768 13848 17884 68 10 20  1  0</span><br \
style="font-family: courier new,monospace;">

<span style="font-family: courier new,monospace;">16  2  18812 366596  13416 28361620 \
0    0   325   535 13957 16649 72 11 16  1  0</span><br style="font-family: courier \
new,monospace;"><span style="font-family: courier new,monospace;">50  1  18812 657048 \
13432 28366548    0    0   416   937 13823 16667 62  9 28  1  0</span><br \
style="font-family: courier new,monospace;">

<span style="font-family: courier new,monospace;"> 6  1  18812 361040  13452 28371908 \
0    0   323   522 14352 16789 74 12 14  0  0</span><br style="font-family: courier \
new,monospace;"><span style="font-family: courier new,monospace;">33  0  18812 162760 \
12604 28210152    0    0   664  1544 14701 16315 66 11 22  2  0</span><br \
style="font-family: courier new,monospace;">

<span style="font-family: courier new,monospace;"> 5  0  18812 212028  10764 27921800 \
0    0   552   648 14567 17737 67 10 21  1  0</span><br style="font-family: courier \
new,monospace;"><span style="font-family: courier new,monospace;"> 6  0  18796 279920 \
10548 27890388    3    0   359   562 12635 15976 60  9 30  1  0</span><br \
style="font-family: courier new,monospace;">

<span style="font-family: courier new,monospace;"> 8  0  18796 438820  10564 27894440 \
0    0   289  2144 12234 15770 57  8 34  1  0</span><br style="font-family: courier \
new,monospace;"><span style="font-family: courier new,monospace;"> 5  0  18796 531800 \
10580 27901700    0    0   514   394 12169 16005 59  8 32  1  0</span><br \
style="font-family: courier new,monospace;">

<span style="font-family: courier new,monospace;">17  0  18796 645868  10596 27890704 \
0    0   423   948 13369 16554 67 10 23  1  0</span><br style="font-family: courier \
new,monospace;"><span style="font-family: courier new,monospace;"> 9  1  18796 \
1076540  10612 27898604   0    0   598   403 12703 17363 71 10 18  1  0</span><br \
style="font-family: courier new,monospace;">

<span style="font-family: courier new,monospace;"> 8  0  18796 1666508  10628 \
27904748   0    0   430  1123 13314 17421 57  9 32  1  0</span><br \
style="font-family: courier new,monospace;"><span style="font-family: courier \
new,monospace;"> 9  1  18776 1541444  10644 27913092   1    0   653   954 13194 16822 \
75 11 12  1  0</span><br style="font-family: courier new,monospace;">

<span style="font-family: courier new,monospace;"> 8  0  18776 1526728  10660 \
27921380   0    0   692   788 13073 16987 74  9 15  1  0</span><br \
style="font-family: courier new,monospace;"><span style="font-family: courier \
new,monospace;"> 8  0  18776 1482304  10676 27933176   0    0   966  2029 13017 16651 \
76 12 11  1  0</span><br style="font-family: courier new,monospace;">

<span style="font-family: courier new,monospace;">21  0  18776 1683260  10700 \
27937492   0    0   298   663 13110 15796 67 10 23  1  0</span><br \
style="font-family: courier new,monospace;"><span style="font-family: courier \
new,monospace;">18  0  18776 2087664  10716 27943512   0    0   406   622 12399 17072 \
62  9 28  1  0</span><div> <div></div><div class="h5"><br style="font-family: courier \
new,monospace;"> <br><div class="gmail_quote"><blockquote class="gmail_quote" \
style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; \
padding-left: 1ex;"><div> </div>With 300 connections, I think that either of these \
could lead you to<br> experience intermittent bursts of extreme swapping.  I&#39;d \
drop it to<br> somewhere in the 16MB to 32MB range until I had a connection pool<br>
configured such that it was actually keeping the number of active<br>
connections much lower.<br>
<div><br>
&gt; (*) wal_buffers = 8MB # pgtune wizard 2010-06-25<br>
&gt; (64kb , via default)<br>
<br>
</div>Sure, I&#39;d boost this.<br>
<div><br>
&gt; checkpoint_segments = 16 # pgtune wizard 2010-06-25<br>
&gt; (30 , specified)<br>
<br>
</div>If you have the disk space for the 30 segments, I wouldn&#39;t reduce<br>
it.<br>
<div><br>
&gt; shared_buffers = 7680MB # pgtune wizard 2010-06-25<br>
&gt; (4096 MB , specified)<br>
<br>
</div>This one is perhaps the most sensitive to workload.  Anywhere<br>
between 1GB and 8GB might be best for you.  Greg Smith has some<br>
great advice on how to tune this for your workload.<br>
<div><br>
&gt; (*) max_connections = 80 # pgtune wizard 2010-06-25<br>
&gt; (300 , ;-) specified)<br>
&gt;<br>
&gt; when i reduce max_connections i start getting errors, i will see<br>
&gt; again concurrent connections during business hours.<br>
<br>
</div>That&#39;s probably a good number to get to, but you have to reduce the<br>
number of actual connections before you set the limit that low.<br>
<div><br>
&gt; lot of our connections are in &lt;IDLE&gt; in transaction state<br>
<br>
</div>If any of these stay in that state for more than a minute or two,<br>
you need to address that if you want to get your connection count<br>
under control.  If any of them persist for hours or days, you need<br>
to fix it to avoid bloat which can kill performance.<br>
<font color="#888888"><br>
-Kevin<br>
</font></blockquote></div><br>
</div></div></blockquote></div><br>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic