[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opensolaris-storage-discuss
Subject:    Re: [storage-discuss] Zfs IO not consistent/pausing every 60 seconds
From:       Giovanni Tirloni <tirloni () gmail ! com>
Date:       2009-12-20 9:34:17
Message-ID: e33bd8cc0912200134o61493383k86e2600b2cc93447 () mail ! gmail ! com
[Download RAW message or body]

On Fri, Dec 18, 2009 at 7:50 PM, Steffen Plotner <swplotner@amherst.edu> wrote:
> Hello,
>
> When driving reads or writes to ZFS continuously to a commerical backend
> storage (42 disk array with hardware raid controller exposing disk via fiber
> channel interface), I see time frames during which no IO takes place when it
> should. Reading the underlying raw disk device does not present this
> problem.
>
> The pauses begin after 3-4 minutes of IO have taken place, and then they
> appear after every 60 seconds. At that time the pauses last for about 3
> seconds and IO resumes. The server itself does nothing else during these
> times. The server is a Dell 1950 with dual quad core CPUs with 32GB of RAM.
> I have performed the same tests on a second piece of identical hardware with
> the same results.
>
> I have included a link to a graph that depicts what I see at the fiber
> channel interface of the server:
> http://www3.amherst.edu/~swplotner/comstar/debug/zfs_pauses.png
>
> Here is one of the pauses shown with zpool iostat:
> vg_satabeast8_vol0  1004G  3.08T  3.09K      0   120M      0
> vg_satabeast8_vol0  1004G  3.08T  3.09K      0   109M      0
> vg_satabeast8_vol0  1004G  3.08T  3.00K      0   105M      0
> vg_satabeast8_vol0  1004G  3.08T  3.02K      0   111M      0
> vg_satabeast8_vol0  1004G  3.08T  3.25K      0   118M      0
> vg_satabeast8_vol0  1004G  3.08T  3.17K      0   111M      0
> vg_satabeast8_vol0  1004G  3.08T  2.34K      0  86.5M      0
> vg_satabeast8_vol0  1004G  3.08T  3.19K      0   115M      0
> vg_satabeast8_vol0  1004G  3.08T  2.24K      0  67.2M      0
> vg_satabeast8_vol0  1004G  3.08T      0      0      0      0            <-
> these 3 lines are 1 pause in the graph (3 seconds worth)
>
> vg_satabeast8_vol0  1004G  3.08T      0      0      0      0            <-
> the graph shows 3 of those pauses.
> vg_satabeast8_vol0  1004G  3.08T    716      0  33.5M      0    <-
> vg_satabeast8_vol0  1004G  3.08T  3.36K      0   112M      0
> vg_satabeast8_vol0  1004G  3.08T  4.14K      0   113M      0
> vg_satabeast8_vol0  1004G  3.08T  3.82K      0   111M      0
> vg_satabeast8_vol0  1004G  3.08T  2.09K      0  72.8M      0
> vg_satabeast8_vol0  1004G  3.08T  3.27K      0   102M      0
> vg_satabeast8_vol0  1004G  3.08T  2.88K      0   102M      0
> vg_satabeast8_vol0  1004G  3.08T  3.14K      0   114M      0
> vg_satabeast8_vol0  1004G  3.08T  2.65K      0  97.6M      0
> vg_satabeast8_vol0  1004G  3.08T  2.93K      0   105M      0
>
> The pauses are of concern - actually - they are a problem since no IO is
> being processed for several seconds each minute and if they could be removed
> could make zfs really fast.

I'm not sure why you are seeing those but here we have a similar
problem, although not as consistent as yours.

Whenever we have a disk trying multiple reads to recover data from a
bad block, ZFS will hang for as much as 3-5 minutes until the disk
gives up. I find that behavior understandable but wish it wouldn't
halt all I/O to the machine.

Perhaps you should look at what's happening Fibre Channel interface
that could be making ZFS wait on some operation.

Just an idea, I'm not a ZFS expert by any means.

-- 
Giovanni P. Tirloni
_______________________________________________
storage-discuss mailing list
storage-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic