[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-bcache
Subject:    Re: [PATCH 15/19] bcache: fix issue of writeback rate at minimum 1 key per second
From:       Coly Li <i () coly ! li>
Date:       2017-10-28 8:58:49
Message-ID: 973911b2-db49-ab6e-a7f6-183c72a7226f () coly ! li
[Download RAW message or body]

On 2017/10/28 上午3:09, Eric Wheeler wrote:
> 
> [+cc Michael Lyle]
> 
> On Fri, 27 Oct 2017, Eric Wheeler wrote:
> 
>> On Sun, 16 Jul 2017, Coly Li wrote:
>>
>>> On 2017/7/1 上午4:43, bcache@lists.ewheeler.net wrote:
>>>> From: Tang Junhui <tang.junhui@zte.com.cn>
>>>>
>>>> When there is not enough dirty data in writeback cache,
>>>> writeback rate is at minimum 1 key per second
>>>> util all dirty data to be cleaned, it is inefficiency,
>>>> and also causes waste of energy;
>>>
>>> Hi Junhui and Eric,
>>>
>>> What: /sys/block/<disk>/bcache/writeback_percent
>>> Description:
>>>       For backing devices: If nonzero, writeback from cache to
>>>       backing device only takes place when more than this percentage
>>>       of the cache is used, allowing more write coalescing to take
>>>       place and reducing total number of writes sent to the backing
>>>       device. Integer between 0 and 40.
>>>
>>> I see above text from Documentation/ABI/testing/sysfs-block-bcache (I
>>> know this document is quite old), it seems if "not enough" means dirty
>>> data percentage is less then writback_percent, bcache should not
>>> performance writeback I/O. But in __update_writeback_rate(),
>>> writeback_rate.rate is clamped in [1, NSEC_PER_MSEC]. It seems in PD
>>> controller code of __update_writeback_rate(), writeback_percent is only
>>> used to calculate dirty target number, its another functionality as
>>> writeback threshold is not handled here.
>>>
>>>>
>>>> in this patch, When there is not enough dirty data,
>>>> let the writeback rate to be 0, and writeback re-schedule
>>>> in bch_writeback_thread() periodically with schedule_timeout(),
>>>> the behaviors are as follows :
>>>>
>>>> 1) If no dirty data have been read into dc->writeback_keys,
>>>> goto step 2), otherwise keep writing these dirty data to
>>>> back-end device at 1 key per second, until all these dirty data
>>>> write over, then goto step 2).
>>>>
>>>> 2) Loop in bch_writeback_thread() to check if there is enough
>>>> dirty data for writeback. if there is not enough diry data for
>>>> writing, then sleep 10 seconds, otherwise, write dirty data to
>>>> back-end device.
>>>
>>> Bcache uses a Proportion-Differentiation Controller to control writeback
>>> rate. When dirty data is far from target, writeback rate is higher; when
>>> dirty data is close to target, writeback rate is slower. The advantage
>>> of PD controller here is, when regular I/O and writeback I/O happens in
>>> same time,
>>> - When there are a lot of dirty data, writeback I/O can have more chance
>>> to write them back to cached device, which in turns has positive impact
>>> to regular I/O.
>>> - When dirty data is decreased and close to target dirty number, less
>>> writeback I/O can help regular I/O has better throughput and latency.
>>>
>>> The root cause of 1 key per second is, the PD controller is designed for
>>> better I/O performance, not less energy consumption. When the existing
>>> dirty data gets closed to target dirty number, the PD controller chooses
>>> to use longer writeback time to make a better regular I/O performance.
>>> If it is designed for less energy consumption, it should keep the
>>> writeback rate in a high level and finish writing back all dirty data as
>>> soon as possible.
>>>
>>> This patch may introduce an unexpected behavior of dirty data writeback
>>> throughput, when regular write I/O and writeback I/O happen in same
>>> time. In this case, dirty data number may shake up and down around
>>> target dirty number, it is possible that change (the variable in
>>> __update_writeback_rate()) is a minus value, and the result of
>>> dc->writeback_rate.rate + change happens to be 0. This patch changes the
>>> clamp range of writeback_rate.rate to [0, NSEC_PER_MSEC], so
>>> writeback_rate.rate can be possible to be 0. And in bch_next_delay() if
>>> d->rate is zero, the write back I/O will be delayed to now +
>>> NSEC_PER_SEC. When there is no regular I/O it works well, but when there
>>> is regular I/O, this longer delay may cause more dirty data piled in
>>> cache device, and PD controller cannot generage a stable writeback rate.
>>> This is not an expected behavior for the writeback rate PD controller.
>>>
>>> Another method to fix might be,
>>> 1) define a sysfs to define writeback_rate with max/dynamic option.
>>> 2) dynamic writeback_rate as default
>>> 3) when max is set, in __update_writeback_rate() assign NSEC_PER_MSEC to
>>> writeback_rate.rate
>>> 4) in bch_writeback_thread(), if no writeback I/O on fly, and dirty data
>>> does not reach dc->writeback_percent, schedule out.
>>> 5) if writeback is necessary then do it in max rate and finish it as
>>> soon as possible, to save laptop energy.
>>>
>>> The above method might be helpful to energy save as well (perform dirty
>>> dat write back in batch), and does not change default PD controller
>>> behavior.
>>>
>>> Just for your reference. Or if you are too busy to look at it, I can try
>>> to compose a patch for review.
>>
>> Hi Coli,
>>
>> Did this go anywere?  I think the 1-key/sec fix is a good idea and your 
>> suggestion will help out mobile users.
>>

Hi Eric,

Michael is working on writeback improvement currently. He proposes some
patches to improve writeback efficiency from a little bit different
view, and after some quite deep discussion I feel some of his ideas are
promising. e.g. writeback more keys if backing device is idle.

Currently it seems a better writeback performance results more lock
contention in between with front end I/O. This is why Junhui posts a
realy time buckets in use counting patch. This is a start to reduce lock
contention in bcache tree writebac/gc/key insert.

I just feel this is a serieal continuous effort to improve writeback
efficiency. the 1-key/sec fix might be one of them, let's
improve-and-test :-)

Thanks.

Coly Li


-- 
Coly Li
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic