'Re: [zfs-discuss] [storage-discuss] high read iops - more memory'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       zfs-discuss
Subject:    Re: [zfs-discuss] [storage-discuss] high read iops - more memory
From:       PW <beneri3 () yahoo ! com>
Date:       2009-12-28 6:38:50
Message-ID: 98261.34135.qm () web112008 ! mail ! gq1 ! yahoo ! com
[Download RAW message or body]

Pre-fletching on the file and device level has been disabled yielding good results so \
far.  We've lowered the number of concurrent ios from 35 to 1 causing the service \
times to go even lower (1 -> 8ms) but inflating actv (.4 -> 2ms).

I've followed your recommendation in setting primarycache to metadata.  I'll have to \
check with our tester in the morning if it made a difference.

I'm trying to understand why we're seeing getting a lot of read requests to disk when \
the arc is set to 8GB and we have a 32GB ssd l2arc. With so much read requests \
hitting disks it may cause contention with the writes.

--- On Sat, 12/26/09, Robert Heinzmann (reg) <reg@elconas.de> wrote:

> From: Robert Heinzmann (reg) <reg@elconas.de>
> Subject: Re: [storage-discuss] high read iops - more memory for arc?
> To: "Brad" <beneri3@yahoo.com>
> Cc: storage-discuss@opensolaris.org
> Date: Saturday, December 26, 2009, 6:07 AM
> Hi Brad,
> 
> just an idea:
> 
> If you run Oracle / RDBM, read cache should not be of any
> use because a 
> well sized RDBM (except a warehouse) should not do many
> reads. If a well 
> sized RDBMs does reads (ok we are not talking MyISAM here
> > ), it 
> requests NEW data and a good RDBS does query for this data
> only once - 
> so read cache is not of much use (except if it is a magic
> one, 
> pre-reading guessed future-request addresses).
> 
> So my idea would be to disbale the device read-ahead for
> the Oracle 
> ZVOLS / folders by setting "primarycache" to "medata" and
> thus using the 
> pressious main memory for metadata only (ZFS does a lot of
> metadata 
> operastions, so having this in memory helps). If this does
> not help, 
> secondarycache to "metadata" may be a good idea also.
> 
> Maybe this helps,
> Robert
> 
> Brad schrieb:
> > I'm running into a issue where there seems to be a
> high number of read iops hitting disks and physical free
> memory is fluctuating between 200MB -> 450MB out of 16GB
> total.  We have the l2arc configured on a 32GB Intel
> X25-E ssd and slog on another32GB X25-E ssd.
> > 
> > According to our tester, Oracle writes are extremely
> slow (high latency).   
> > 
> > Below is a snippet of iostat:
> > 
> > r/s   
> w/s   Mr/s   Mw/s wait actv
> wsvc_t asvc_t  %w  %b device
> > 0.0    0.0 
> 0.0    0.0  0.0  0.0   
> 0.0    0.0   0   0
> c0
> > 0.0    0.0 
> 0.0    0.0  0.0  0.0   
> 0.0    0.0   0   0
> c0t0d0
> > 
> 4898.3   34.2   23.2 
> 1.4  0.1 385.3   
> 0.0   78.1   0 1246 c1
> > 0.0    0.8 
> 0.0    0.0  0.0  0.0   
> 0.0   16.0   0   1
> c1t0d0
> > 401.7    0.0   
> 1.9    0.0  0.0 31.5   
> 0.0   78.5   1 100 c1t1d0
> > 421.2    0.0   
> 2.0    0.0  0.0 30.4   
> 0.0   72.3   1  98
> c1t2d0
> > 403.9    0.0   
> 1.9    0.0  0.0 32.0   
> 0.0   79.2   1 100 c1t3d0
> > 406.7    0.0   
> 2.0    0.0  0.0 33.0   
> 0.0   81.3   1 100 c1t4d0
> > 414.2    0.0   
> 1.9    0.0  0.0 28.6   
> 0.0   69.1   1  98
> c1t5d0
> > 406.3    0.0   
> 1.8    0.0  0.0 32.1   
> 0.0   79.0   1 100 c1t6d0
> > 404.3    0.0   
> 1.9    0.0  0.0 31.9   
> 0.0   78.8   1 100 c1t7d0
> > 404.1    0.0   
> 1.9    0.0  0.0 34.0   
> 0.0   84.1   1 100 c1t8d0
> > 407.1    0.0   
> 1.9    0.0  0.0 31.2   
> 0.0   76.6   1 100 c1t9d0
> > 407.5    0.0   
> 2.0    0.0  0.0 33.2   
> 0.0   81.4   1 100 c1t10d0
> > 402.8    0.0   
> 2.0    0.0  0.0 33.5   
> 0.0   83.2   1 100 c1t11d0
> > 408.9    0.0   
> 2.0    0.0  0.0 32.8   
> 0.0   80.3   1 100 c1t12d0
> > 
> 9.6   10.8   
> 0.1    0.9  0.0  0.4   
> 0.0   20.1   0  17
> c1t13d0
> > 
> 0.0   22.7   
> 0.0    0.5  0.0  0.5   
> 0.0   22.8   0  33
> c1t14d0
> > 
> > Is this an indicator that we need more physical
> memory?  From http://blogs.sun.com/brendan/entry/test, the order that
> a read request is satisfied is:
> > 
> > 1) ARC
> > 2) vdev cache of L2ARC devices
> > 3) L2ARC devices
> > 4) vdev cache of disks
> > 5) disks
> > 
> > Using arc_summary.pl, we determined that prefletch was
> not helping much so we disabled.
> > 
> > CACHE HITS BY DATA TYPE:
> > Demand
> Data:               
> 22%        158853174
> > Prefetch
> Data:             
> 17%       
> 123009991   <---not helping???
> > Demand
> Metadata:            60% 
> 437439104
> > Prefetch
> Metadata:       
> 0%        2446824
> > 
> > The write iops started to kick in more and latency
> reduced on spinning disks:
> > r/s   
> w/s   Mr/s   Mw/s wait actv
> wsvc_t asvc_t  %w  %b device
> > 0.0    0.0 
> 0.0    0.0  0.0  0.0   
> 0.0    0.0   0   0
> c0
> > 0.0    0.0 
> 0.0    0.0  0.0  0.0   
> 0.0    0.0   0   0
> c0t0d0
> > 1629.0  968.0   17.4 
> 7.3  0.0 35.9   
> 0.0   13.8   0 1088 c1
> > 0.0    1.9 
> 0.0    0.0  0.0  0.0   
> 0.0    1.7   0   0
> c1t0d0
> > 126.7   67.3 
> 1.4    0.2  0.0  2.9   
> 0.0   14.8   0  90
> c1t1d0
> > 129.7   76.1 
> 1.4    0.2  0.0  2.8   
> 0.0   13.7   0  90
> c1t2d0
> > 128.0   73.9 
> 1.4    0.2  0.0  3.2   
> 0.0   16.0   0  91
> c1t3d0
> > 128.3   79.1 
> 1.3    0.2  0.0  3.6   
> 0.0   17.2   0  92
> c1t4d0
> > 125.8   69.7 
> 1.3    0.2  0.0  2.9   
> 0.0   14.9   0  89
> c1t5d0
> > 128.3   81.9 
> 1.4    0.2  0.0  2.8   
> 0.0   13.1   0  89
> c1t6d0
> > 128.1   69.2 
> 1.4    0.2  0.0  3.1   
> 0.0   15.7   0  93
> c1t7d0
> > 128.3   80.3 
> 1.4    0.2  0.0  3.1   
> 0.0   14.7   0  91
> c1t8d0
> > 129.2   69.3 
> 1.4    0.2  0.0  3.0   
> 0.0   15.2   0  90
> c1t9d0
> > 130.1   80.0 
> 1.4    0.2  0.0  2.9   
> 0.0   13.6   0  89
> c1t10d0
> > 126.2   72.6 
> 1.3    0.2  0.0  2.8   
> 0.0   14.2   0  89
> c1t11d0
> > 129.7   81.0 
> 1.4    0.2  0.0  2.7   
> 0.0   12.9   0  88
> c1t12d0
> > 90.4   41.3   
> 1.0    4.0  0.0  0.2   
> 0.0    1.2   0   6
> c1t13d0
> > 
> 0.0   24.3   
> 0.0    1.2  0.0  0.0   
> 0.0    0.2   0   0
> c1t14d0
> > 
> > 
> > Is it true if your MFU stats start to go over 50% then
> more memory is needed?
> > CACHE HITS BY
> CACHE LIST:
> > 
> Anon:         
> 
> 10%       
> 74845266           
> [ New Customer, First Cache Hit ]
> > Most
> Recently Used:     
> 19%        140478087
> (mru)        [ Return Customer ]
> > Most
> Frequently Used:       65% 
> 475719362 (mfu)     
> [ Frequent Customer ]
> > Most
> Recently Used Ghost:    2%     
> 20785604 (mru_ghost)   [ Return
> Customer Evicted, Now Back ]
> > Most
> Frequently Used Ghost:  1%       
> 9920089 (mfu_ghost)    [ Frequent Customer
> Evicted, Now Back ]
> > CACHE HITS BY
> DATA TYPE:
> > Demand
> Data:               
> 22%        158852935
> > Prefetch
> Data:             
> 17%        123009991
> > Demand
> Metadata:            60% 
> 437438658
> > Prefetch
> Metadata:       
> 0%        2446824
> > 
> > My theory is since there's not enough memory for the
> arc to cache data, its hits the l2arc where it can't find
> data and has to query the disk for the request.  This
> causes contention between reads and writes causing the
> service times to inflate.
> > 
> > Thoughts?
> > 
> 
> 

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[prev in list] [next in list] [prev in thread] [next in thread]