[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-netdev
Subject:    Re: [PATCH RFC] mm+net: allow to set kmem_cache create flag for SLAB_NEVER_MERGE
From:       Roman Gushchin <roman.gushchin () linux ! dev>
Date:       2023-05-31 20:08:17
Message-ID: ZHepMQybpBcKLgVg () P9FQF9L96D ! corp ! robot ! car
[Download RAW message or body]

On Wed, May 31, 2023 at 02:03:05PM +0200, Vlastimil Babka wrote:
> On 1/17/23 14:40, Jesper Dangaard Brouer wrote:
> > Allow API users of kmem_cache_create to specify that they don't want
> > any slab merge or aliasing (with similar sized objects). Use this in
> > network stack and kfence_test.
> > 
> > The SKB (sk_buff) kmem_cache slab is critical for network performance.
> > Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> > performance by amortising the alloc/free cost.
> > 
> > For the bulk API to perform efficiently the slub fragmentation need to
> > be low. Especially for the SLUB allocator, the efficiency of bulk free
> > API depend on objects belonging to the same slab (page).
> > 
> > When running different network performance microbenchmarks, I started
> > to notice that performance was reduced (slightly) when machines had
> > longer uptimes. I believe the cause was 'skbuff_head_cache' got
> > aliased/merged into the general slub for 256 bytes sized objects (with
> > my kernel config, without CONFIG_HARDENED_USERCOPY).
> > 
> > For SKB kmem_cache network stack have reasons for not merging, but it
> > varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY).
> > We want to explicitly set SLAB_NEVER_MERGE for this kmem_cache.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 
> Since this idea was revived by David [1], and neither patch worked as is,
> but yours was more complete and first, I have fixed it up as below. The
> skbuff part itself will be best submitted separately afterwards so we don't
> get conflicts between trees etc. Comments?
> 
> ----8<----
> From 485d3f58f3e797306b803102573e7f1367af2ad2 Mon Sep 17 00:00:00 2001
> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Date: Tue, 17 Jan 2023 14:40:00 +0100
> Subject: [PATCH] mm/slab: introduce kmem_cache flag SLAB_NO_MERGE
> 
> Allow API users of kmem_cache_create to specify that they don't want
> any slab merge or aliasing (with similar sized objects). Use this in
> kfence_test.
> 
> The SKB (sk_buff) kmem_cache slab is critical for network performance.
> Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> performance by amortising the alloc/free cost.
> 
> For the bulk API to perform efficiently the slub fragmentation need to
> be low. Especially for the SLUB allocator, the efficiency of bulk free
> API depend on objects belonging to the same slab (page).
> 
> When running different network performance microbenchmarks, I started
> to notice that performance was reduced (slightly) when machines had
> longer uptimes. I believe the cause was 'skbuff_head_cache' got
> aliased/merged into the general slub for 256 bytes sized objects (with
> my kernel config, without CONFIG_HARDENED_USERCOPY).
> 
> For SKB kmem_cache network stack have reasons for not merging, but it
> varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY).
> We want to explicitly set SLAB_NO_MERGE for this kmem_cache.

I believe it's also good for the visibility: having a separate entity in
/proc/slabinfo for skbuff_head_cache can be really useful.

> 
> Another use case for the flag has been described by David Sterba [1]:
> 
> > This can be used for more fine grained control over the caches or for
> > debugging builds where separate slabs can verify that no objects leak.
> 
> > The slab_nomerge boot option is too coarse and would need to be
> > enabled on all testing hosts. There are some other ways how to disable
> > merging, e.g. a slab constructor but this disables poisoning besides
> > that it adds additional overhead. Other flags are internal and may
> > have other semantics.
> 
> > A concrete example what motivates the flag. During 'btrfs balance'
> > slab top reported huge increase in caches like
> 
> >  1330095 1330095 100%    0.10K  34105       39    136420K Acpi-ParseExt
> >  1734684 1734684 100%    0.14K  61953       28    247812K pid_namespace
> >  8244036 6873075  83%    0.11K 229001       36    916004K khugepaged_mm_slot
> 
> > which was confusing and that it's because of slab merging was not the
> > first idea.  After rebooting with slab_nomerge all the caches were
> > from btrfs_ namespace as expected.
> 
> [1] https://lore.kernel.org/all/20230524101748.30714-1-dsterba@suse.com/
> 
> [ vbabka@suse.cz: rename to SLAB_NO_MERGE, change the flag value to the
>   one proposed by David so it does not collide with internal SLAB/SLUB
>   flags, write a comment for the flag, expand changelog, drop the skbuff
>   part to be handled spearately ]
> 
> Reported-by: David Sterba <dsterba@suse.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
both for this patch and the corresponding change on the networking side.

Thanks!

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic