'Re: [PATCH V4 3/3] scsi: core: avoid to pre-allocate big chunk for sg list'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-scsi
Subject:    Re: [PATCH V4 3/3] scsi: core: avoid to pre-allocate big chunk for sg list
From:       Bart Van Assche <bvanassche () acm ! org>
Date:       2019-04-29 18:16:38
Message-ID: 1556561798.161891.166.camel () acm ! org
[Download RAW message or body]

On Sun, 2019-04-28 at 15:39 +-0800, Ming Lei wrote:
+AD4 Now scsi+AF8-mq+AF8-setup+AF8-tags() pre-allocates a big buffer for IO sg list,
+AD4 and the buffer size is scsi+AF8-mq+AF8-sgl+AF8-size() which depends on smaller
+AD4 value between shost-+AD4-sg+AF8-tablesize and SG+AF8-CHUNK+AF8-SIZE.
+AD4 
+AD4 Modern HBA's DMA is often capable of deadling with very big segment
+AD4 number, so scsi+AF8-mq+AF8-sgl+AF8-size() is often big. Suppose the max sg number
+AD4 of SG+AF8-CHUNK+AF8-SIZE is taken, scsi+AF8-mq+AF8-sgl+AF8-size() will be 4KB.
+AD4 
+AD4 Then if one HBA has lots of queues, and each hw queue's depth is
+AD4 high, pre-allocation for sg list can consume huge memory.
+AD4 For example of lpfc, nr+AF8-hw+AF8-queues can be 70, each queue's depth
+AD4 can be 3781, so the pre-allocation for data sg list is 70+ACo-3781+ACo-2k
+AD4 +AD0-517MB for single HBA.
+AD4 
+AD4 There is Red Hat internal report that scsi+AF8-debug based tests can't
+AD4 be run any more since legacy io path is killed because too big
+AD4 pre-allocation.
+AD4 
+AD4 So switch to runtime allocation for sg list, meantime pre-allocate 2
+AD4 inline sg entries. This way has been applied to NVMe PCI for a while,
+AD4 so it should be fine for SCSI too. Also runtime sg entries allocation
+AD4 has verified and run always in the original legacy io path.
+AD4 
+AD4 Not see performance effect in my big BS test on scsi+AF8-debug.

Reviewed-by: Bart Van Assche +ADw-bvanassche+AEA-acm.org+AD4


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic