[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linaro-kernel
Subject:    swap on eMMC and other flash
From:       lporzio () micron ! com (Luca Porzio (lporzio))
Date:       2012-04-27 7:34:09
Message-ID: 26E7A31274623843B0E8CF86148BFE326FB66E94 () NTXAVZMBX04 ! azit ! micron ! com
[Download RAW message or body]

Stephan,

Good ideas. Some comments of mine below.

> -----Original Message-----
> From: linux-mmc-owner at vger.kernel.org [mailto:linux-mmc-owner at \
> vger.kernel.org] On Behalf Of Stephan Uphoff
> Sent: Tuesday, April 17, 2012 3:22 AM
> To: Arnd Bergmann
> Cc: Minchan Kim; linaro-kernel at lists.linaro.org; android-
> kernel at googlegroups.com; linux-mm at kvack.org; Luca Porzio (lporzio); Alex
> Lemberg; linux-kernel at vger.kernel.org; Saugata Das; Venkatraman S; Yejin Moon;
> Hyojin Jeong; linux-mmc at vger.kernel.org
> Subject: Re: swap on eMMC and other flash
> 
> I really like where this is going and would like to use the
> opportunity to plant a few ideas.
> 
> In contrast to rotational disks read/write operation overhead and
> costs are not symmetric.
> While random reads are much faster on flash - the number of write
> operations is limited by wearout and garbage collection overhead.
> To further improve swapping on eMMC or similar flash media I believe
> that the following issues need to be addressed:
> 
> 1) Limit average write bandwidth to eMMC to a configurable level to
> guarantee a minimum device lifetime
> 2) Aim for a low write amplification factor to maximize useable write
> bandwidth
> 3) Strongly favor read over write operations
> 
> Lowering write amplification (2) has been discussed in this email
> thread - and the only observation I would like to add is that
> over-provisioning the internal swap space compared to the exported
> swap space significantly can guarantee a lower write amplification
> factor with the indirection and GC techniques discussed.
> 
> I believe the swap functionality is currently optimized for storage
> media where read and write costs are nearly identical.
> As this is not the case on flash I propose splitting the anonymous
> inactive queue (at least conceptually) - keeping clean anonymous pages
> with swap slots on a separate queue as the cost of swapping them
> out/in is only an inexpensive read operation. A variable similar to
> swapiness (or a more dynamic algorithmn) could determine the
> preference for swapping out clean pages or dirty pages. ( A similar
> argument could be made for splitting up the file inactive queue )
> 

I totally agree. Read are inexpensive on flash based devices and as such a good swap \
algorithm (as well as a flash oriented FS) should take this into account.

> The problem of limiting the average write bandwidth reminds me of
> enforcing cpu utilization limits on interactive workloads.
> Just as with cpu workloads - using the resources to the limit produces
> poor interactivity.

I don't quite get your definition of interactive workload and I am not sure here \
which is the technique for limiting resource utilization you have in mind. CGroups, \
for example, have proven not to be much reliable through time.  Also in my experience \
it has always been very difficult to correlate resources utilization stats with user \
interactivity. The only technique which has been proven reliable through time is to \
do something while the system is idle, which is what, to my understanding, is already \
done.

> When interactivity suffers too much I believe the only sane response
> for an interactive device is to limit usage of the swap device and
> transition into a low memory situation - and if needed - either
> allowing userspace to reduce memory usage or invoking the OOM killer.
> As a result low memory situations could not only be encountered on new
> memory allocations but also on workload changes that increase the
> number of dirty pages.
> 

I agree with your comments about the OOM killer (what is the point of swapping out a \
page if that process is going to be killed soon? That is only increasing the WAF \
factor on MMCs). In fact one proposal here could be to somewhat mix OOM index with \
page age. I would suggest to first optimize swap traffic for an MMC device and then \
start thinking about this.

> A wild idea to avoid some writes altogether is to see if
> de-duplication techniques can be used to (partially?) match pages
> previously written so swap.

If you have such a situation, I think this is where KSM may help. It is my personal \
belief that with a bit of work, the KSM algorithm can be extended to swapped out \
pages too with little effort (at the expense of few increase of read traffic, which \
is ok for flash based storage devices). 

> In case of unencrypted swap  (or encrypted swap with a static key)
> swap pages on eMMC could even be re-used across multiple reboots.
> A simple version would just compare dirty pages with data in their
> swap slots as I suspect (but really don't know) that some user space
> algorithms (garbage collection?) dirty a page just temporarily -
> eventually reverting it to the previous content.
> 

This goes in contrast with discarding or trimming a page and as such the advantages \
of this technique needs to be proven vs the performance gain of using the discard \
command.

> Stephan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Cheers,
    Luca


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic