'Re: SYNCHRONIZE_CACHE from sd_preppare_flush does not have retries.!'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-scsi
Subject:    Re: SYNCHRONIZE_CACHE from sd_preppare_flush does not have retries.!
From:       Ric Wheeler <rwheeler () redhat ! com>
Date:       2010-04-29 20:13:03
Message-ID: 4BD9E84F.8000707 () redhat ! com
[Download RAW message or body]

On 04/19/2010 02:45 PM, James Bottomley wrote:
> On Mon, 2010-04-19 at 20:14 +0200, Bernd Schubert wrote:
>    
>> On Monday 19 April 2010, Mike Christie wrote:
>>      
>>> On 04/19/2010 06:32 AM, Desai, Kashyap wrote:
>>>        
>>>> I am facing one issue with scsi stack.
>>>> Here is a background of my test.
>>>>
>>>> Mount ext3 file system with journaling support with barrier=1, commit=5
>>>> Now, with this setup file system will do submit_bh with  WRITE_BARRIER
>>>> flag set for interval of 5 seconds. (This is a part of journaling.)
>>>> Eventually it will call queue_flush() which will generate SCSI command of
>>>> CDB: SYNCHRONIZE_CAHCE and insert it into the request queue. I observed
>>>> that creation of SYNCHRONIZE_CACHE is a part of sd_prepare_flush(). Here
>>>> we have timeout set to SD_TIMEOUT but retries are not set. Because of
>>>> retries of the request is not set, there is no retries allowed for
>>>> SYNCHRONIZE_CACHE at mid layer.
>>>>
>>>> Because of zero retries for SYNCHRONIZE_CACHE command at mid-layer, it is
>>>> creating trouble for file system. In current situation, Even though LLD
>>>> send back commands with DID_RESET, SYNCHRONIZE_CACHE will fail
>>>> immediately without going for any retries, when HBA is in recovery state.
>>>> Eventually this information goes to File system and it sees
>>>> SYNCHRONIZE_CAHCE is failed and file system goes to Read only mode.
>>>>
>>>> My question is "Can we add in sd_prepare_flush(), rq->retries = X" some
>>>> reasonable retries value ?
>>>>          
>>> I am not sure where we want it, but I think we want to be able to set
>>> both the retries and timeout. I have seen where a sync cache can take
>>> longer than the default 30 secs.
>>>
>>> Do you think we want to the block layer to manage retries/timeouts for
>>> all block device flushes or is this more device specific? I was thinking
>>> that we may want to create a sysfs interface under the block dirs and
>>> have blk-sysfs.c and blk-barrier.c handle this. queue_flush could set
>>> the timeout and retries that is set by some new files under
>>> /sys/block/sdX/queue/ ?
>>>        
>>
>> Good that now also other people run into it. 30s is far too small for any
>> hardware raid unit with SATA disks.
>>      
> It's far too short for just about any HW RAID since they all tend to
> have multi-megabytes to gigabytes of cache (some of the high end have
> terrabytes).  It has to be said that most arrays with battery backed
> caches lie when asked to flush the cache, but we probably need to get
> users into the habit of not using flush barriers with external Arrays.
>
>    

As you say, a lot of arrays don't do anything with these requests. We 
should be using barriers (and sending down the synch commands) when the 
device advertises a write back cache.

For a simple hardware RAID card, it is not sufficient simply to flush 
the write cache on the HBA. It must also either disable the write cache 
on the downstream devices (S-ATA/SAS disks, etc) or send cache flush 
commands to each device.

In those cases, this could be a quite lengthy process...

>> http://markmail.org/message/ewicheafcvgwm4p7
>>
>> I wrote this patch while having trouble with Infortrend Raids, but it also
>> comes up with DDN storage if the write back cache is enabled.
>> Shall I update the patch, add retries and then resend the entire series?
>>      
> rq->timeout is the timeout of the request triggering the flush ... it's
> likely the wrong value since it's for a fast completing r/w operation,
> whereas this is a slow drain operation.
>
> James
>
>    

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic