From linux-kernel  Mon Aug 23 14:12:06 2004
From: Marcelo Tosatti <marcelo.tosatti () cyclades ! com>
Date: Mon, 23 Aug 2004 14:12:06 +0000
To: linux-kernel
Subject: Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
Message-Id: <20040823141206.GE2157 () logos ! cnet>
X-MARC-Message: https://marc.info/?l=linux-kernel&m=109327597003708

On Sun, Aug 22, 2004 at 09:18:51PM +0200, Karl Vogel wrote:
> When using elevator=as I'm unable to trigger the swap of death, so it seems
> that the CFQ scheduler is at blame here.
> 
> With AS scheduler, the system recovers in +-10 seconds, vmstat output during
> that time:
> 
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  1  0      0 295632  40372  49400   87  278   324   303 1424   784  7  2 78 13
>  0  0      0 295632  40372  49400    0    0     0     0 1210   648  3  1 96  0
>  0  0      0 295632  40372  49400    0    0     0     0 1209   652  4  0 96  0
>  2  0      0 112784  40372  49400    0    0     0     0 1204   630 23 34 43  0
>  1  9 156236    788    264   8128   28 156220  3012 156228 3748  3655 11 31  0 59
>  0 15 176656   2196    280   8664    0 20420   556 20436 1108   374  2  5  0 93
>  0 17 205320    724    232   7960   28 28664   396 28664 1118   503  7 12  0 81
>  2 12 217892   1812    252   8556  248 12584   864 12584 1495   318  2  7  0 91
>  4 14 253268   2500    268   8728  188 35392   432 35392 1844   399  3  7  0 90
>  0 13 255692   1188    288   9152  960 2424  1408  2424 1173  2215 10  5  0 85
>  0  7 266140   2288    312   9276  604 10468   752 10468 1248   644  5  5  0 90
>  0  7 190516 340636    348   9860 1400    0  2016     0 1294   817  4  8  0 88
>  1  8 190516 339460    384  10844  552    0  1556     4 1241   642  3  1  0 96
>  1  3 190516 337084    404  11968 1432    0  2576     4 1292   788  3  1  0 96
>  0  6 190516 333892    420  13612 1844    0  3500     0 1343   850  5  2  0 93
>  0  1 190516 333700    424  13848  480    0   720     0 1250   654  3  2  0 95
>  0  1 190516 334468    424  13848  188    0   188     0 1224   589  3  2  0 95
> 
> With CFQ processes got stuck in 'D' and never left that state. See URL's in my
> initial post for diagnostics.

I can confirm this on a 512MB box with 512MB swap (2.6.8-rc4). Using CFQ the machine swaps out
400 megs, with AS it swaps out 30M.  

That leads to allocation failures/etc. 

CFQ allocates a huge number of bio/biovecs:

 cat /proc/slabinfo | grep bio
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata 52     52      0
biovec-64            265    265    768    5    1 : tunables   54   27    0 : slabdata 53     53      0
biovec-16            260    260    192   20    1 : tunables  120   60    0 : slabdata 13     13      0
biovec-4             272    305     64   61    1 : tunables  120   60    0 : slabdata  5      5      0
biovec-1          121088 122040     16  226    1 : tunables  120   60    0 : slabdata    540    540      0
bio               121131 121573     64   61    1 : tunables  120   60    0 : slabdata   1992   1993      0


biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata 128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata  52     52      0
biovec-64            265    265    768    5    1 : tunables   54   27    0 : slabdata  53     53      0
biovec-16            258    260    192   20    1 : tunables  120   60    0 : slabdata  13     13      0
biovec-4             257    305     64   61    1 : tunables  120   60    0 : slabdata   5      5      0
biovec-1           66390  68026     16  226    1 : tunables  120   60    0 : slabdata 301    301      0
bio                66389  67222     64   61    1 : tunables  120   60    0 : slabdata   1102   1102      0

(which are freed later on, but the cause for the trashing during the swap IO).

While AS does:

[marcelo@yage marcelo]$ cat /proc/slabinfo | grep bio
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
biovec-64            260    260    768    5    1 : tunables   54   27    0 : slabdata     52     52      0
biovec-16            280    280    192   20    1 : tunables  120   60    0 : slabdata     14     14      0
biovec-4             264    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
biovec-1            4478   5424     16  226    1 : tunables  120   60    0 : slabdata     24     24      0
bio                 4525   5002     64   61    1 : tunables  120   60    0 : slabdata     81     82      0


Odd thing is the 400M swapped out are not reclaimed after exp (the 512MB callocator) exits. With AS 
almost all swapped out memory is reclaimed on exit.

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0 492828  13308    320   3716    0    0     0     0 1002     5  0  0 100  0


Jens, is this huge amount of bio/biovec's allocations expected with CFQ? Its really really bad.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/