'Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-kernel
Subject:    Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
From:       Marcelo Tosatti <marcelo.tosatti () cyclades ! com>
Date:       2004-08-23 14:12:06
Message-ID: 20040823141206.GE2157 () logos ! cnet
[Download RAW message or body]

On Sun, Aug 22, 2004 at 09:18:51PM +0200, Karl Vogel wrote:
> When using elevator=as I'm unable to trigger the swap of death, so it seems
> that the CFQ scheduler is at blame here.
> 
> With AS scheduler, the system recovers in +-10 seconds, vmstat output during
> that time:
> 
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> 1  0      0 295632  40372  49400   87  278   324   303 1424   784  7  2 78 13
> 0  0      0 295632  40372  49400    0    0     0     0 1210   648  3  1 96  0
> 0  0      0 295632  40372  49400    0    0     0     0 1209   652  4  0 96  0
> 2  0      0 112784  40372  49400    0    0     0     0 1204   630 23 34 43  0
> 1  9 156236    788    264   8128   28 156220  3012 156228 3748  3655 11 31  0 59
> 0 15 176656   2196    280   8664    0 20420   556 20436 1108   374  2  5  0 93
> 0 17 205320    724    232   7960   28 28664   396 28664 1118   503  7 12  0 81
> 2 12 217892   1812    252   8556  248 12584   864 12584 1495   318  2  7  0 91
> 4 14 253268   2500    268   8728  188 35392   432 35392 1844   399  3  7  0 90
> 0 13 255692   1188    288   9152  960 2424  1408  2424 1173  2215 10  5  0 85
> 0  7 266140   2288    312   9276  604 10468   752 10468 1248   644  5  5  0 90
> 0  7 190516 340636    348   9860 1400    0  2016     0 1294   817  4  8  0 88
> 1  8 190516 339460    384  10844  552    0  1556     4 1241   642  3  1  0 96
> 1  3 190516 337084    404  11968 1432    0  2576     4 1292   788  3  1  0 96
> 0  6 190516 333892    420  13612 1844    0  3500     0 1343   850  5  2  0 93
> 0  1 190516 333700    424  13848  480    0   720     0 1250   654  3  2  0 95
> 0  1 190516 334468    424  13848  188    0   188     0 1224   589  3  2  0 95
> 
> With CFQ processes got stuck in 'D' and never left that state. See URL's in my
> initial post for diagnostics.

I can confirm this on a 512MB box with 512MB swap (2.6.8-rc4). Using CFQ the machine \
swaps out 400 megs, with AS it swaps out 30M.  

That leads to allocation failures/etc. 

CFQ allocates a huge number of bio/biovecs:

 cat /proc/slabinfo | grep bio
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata \
128    128      0 biovec-128           256    260   1536    5    2 : tunables   24   \
12    0 : slabdata 52     52      0 biovec-64            265    265    768    5    1 \
: tunables   54   27    0 : slabdata 53     53      0 biovec-16            260    260 \
192   20    1 : tunables  120   60    0 : slabdata 13     13      0 biovec-4          \
272    305     64   61    1 : tunables  120   60    0 : slabdata  5      5      0 \
biovec-1          121088 122040     16  226    1 : tunables  120   60    0 : slabdata \
540    540      0 bio               121131 121573     64   61    1 : tunables  120   \
60    0 : slabdata   1992   1993      0


biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata \
128    128      0 biovec-128           256    260   1536    5    2 : tunables   24   \
12    0 : slabdata  52     52      0 biovec-64            265    265    768    5    1 \
: tunables   54   27    0 : slabdata  53     53      0 biovec-16            258    \
260    192   20    1 : tunables  120   60    0 : slabdata  13     13      0 biovec-4  \
257    305     64   61    1 : tunables  120   60    0 : slabdata   5      5      0 \
biovec-1           66390  68026     16  226    1 : tunables  120   60    0 : slabdata \
301    301      0 bio                66389  67222     64   61    1 : tunables  120   \
60    0 : slabdata   1102   1102      0

(which are freed later on, but the cause for the trashing during the swap IO).

While AS does:

[marcelo@yage marcelo]$ cat /proc/slabinfo | grep bio
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata \
128    128      0 biovec-128           256    260   1536    5    2 : tunables   24   \
12    0 : slabdata     52     52      0 biovec-64            260    260    768    5   \
1 : tunables   54   27    0 : slabdata     52     52      0 biovec-16            280  \
280    192   20    1 : tunables  120   60    0 : slabdata     14     14      0 \
biovec-4             264    305     64   61    1 : tunables  120   60    0 : slabdata \
5      5      0 biovec-1            4478   5424     16  226    1 : tunables  120   60 \
0 : slabdata     24     24      0 bio                 4525   5002     64   61    1 : \
tunables  120   60    0 : slabdata     81     82      0


Odd thing is the 400M swapped out are not reclaimed after exp (the 512MB callocator) \
exits. With AS  almost all swapped out memory is reclaimed on exit.

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0 492828  13308    320   3716    0    0     0     0 1002     5  0  0 100  0


Jens, is this huge amount of bio/biovec's allocations expected with CFQ? Its really \
really bad.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic