[prev in list] [next in list] [prev in thread] [next in thread]
List: opensolaris-lvm-discuss
Subject: [lvm-discuss] Re: Comment: SVM default interlace and resync buffer
From: Truong.Q.Nguyen () Sun ! COM (Tony Nguyen)
Date: 2005-10-09 23:09:44
Message-ID: 434A046A.8020108 () sun ! com
[Download RAW message or body]
D. Rock wrote:
> Tony Nguyen schrieb:
>
>> Daniel
>>
>> Very interesting. We've not seen this so I'd like to get more
>> information about this behavior. How large of an interlace value did
>> you use (512k)? It would be great if we can understand the procedure
>> you use to measure/count the probability of full line writes onto the
>> metadevice.
>>
>> Note that having the fs on top of the RAID5 metadevice will change
>> the behavior of I/O to the metadevice. For example, an I/O of 1Mb to
>> a UFS filesystem will not necessary give you one 1Mb write to the
>> underlying metadevice since UFS has its own tunable I/O size.
>> -tony
>
>
> Well,
>
> it was partly my own fault.
>
> On x86 hardware the default value of maxphys is ridiculously small:
> 56kBytes (maybe historical compatibility reasons?). After setting
> maxphys to a reasonable value (1MByte) normal I/O performance
> increased with the interlace factor.
>
> For UFS filesystems you mostly don't get enough data for full line
> writes. You have to get ((n-1) * interlace) blocks - also properly
> aligned - to have the benefit of full line writes. But I have done
> another test: Writing directly to the metadevice with no filesystem in
> between.
>
> Let's begin with:
> # metainit d100 -r c0d0s6 c0d1s6 c1d0s6 c1d1s6 -i 8k
> # dd if=/dev/zero of=/dev/md/rdsk/d100 bs=1024k &
> # iostat -xnz
> 537.8 1074.3 330.8 8863.3 0.0 0.3 0.0 0.2 0 21 c1d0
> 538.4 1074.1 331.1 8861.6 0.0 0.2 0.0 0.1 0 14 c1d1
> 547.8 1091.1 399.3 9001.2 0.0 0.3 0.0 0.2 0 33 c0d0
> 548.0 1090.3 396.4 8994.5 0.0 0.4 0.0 0.2 0 22 c0d1
> 0.0 12.5 0.0 12787.3 0.0 1.0 0.0 78.9 0 99 d100
>
> You can clearly see the -i 8k on the physical devices (8863.3/1074.3)
> and the bs=1024k on the logical device (12787.3/12.5)
>
> Now the same with -i 64k
> 95.6 189.9 793.6 12202.8 0.0 0.8 0.0 2.7 0 50 c1d0
> 96.2 189.9 793.9 12202.8 0.0 0.7 0.0 2.3 0 54 c1d1
> 109.5 213.8 1559.1 13737.7 0.0 0.9 0.0 2.8 1 66 c0d0
> 109.7 213.0 1533.9 13686.5 0.0 1.1 0.0 3.3 0 65 c0d1
> 0.0 17.7 0.0 18143.7 0.0 1.0 0.0 55.3 0 98 d100
>
> Ok, much better, as expected.
>
> But if you increase your interlace too much, so a full line is larger
> than bs=1024k (in this case -i 512k)
> 14.6 28.0 7158.5 14323.4 0.0 0.6 0.0 14.4 0 52 c1d0
> 15.0 27.6 7056.5 14118.8 0.0 0.6 0.0 13.9 0 50 c1d1
> 16.2 28.0 7057.1 14323.4 0.0 0.6 0.1 13.2 0 52 c0d0
> 17.0 27.6 7159.7 14118.8 0.0 0.6 0.0 13.7 0 51 c0d1
> 0.0 13.8 0.0 14111.9 0.0 1.0 0.0 71.4 0 98 d100
>
> Better than -i 8k, but worse than -i 64k.
>
> So if you want to use a RAID-5 metadevice as a raw device to a
> database (bad idea BTW) you shouldn't set the interlace too large.
>
>
>
>
> But now to filesystem benchmarks. Before each test I recreated the
> RAID-5 and newfs'd a filesystem on top of it with default parameters.
> Then I extracted usr/src/cmd (tons of small files) from the
> OpenSolaris source distribution. I measured the time of extraction +
> sync + umount (the tarball was put in /tmp):
>
> -i 8k 3:40.18
> -i 32k 2:51.54
> -i 64k 2:46.46
> -i 128k 2:47.73
> -i 256k 2:43.30
>
> I was a little surprised by the results. I thought the small file
> sizes would favour also small interlace factors but I was wrong.
>
> Maybe I can do some more detailed tests in a few weeks. At the moment
> I don't have spare drives available. The above test setup was not
> optimal:
> - ATA drives shared as master/slave on the same controller
> - write cache enabled
>
>
>
> Regards,
>
> Daniel
Daniel,
Sorry for a late response. I agree that with your observations. However,
I would comment that a better I/O performance is achieved if the RAID 5
metadevice with 512k interlace size has 3 component (so the data per
line is 1024k). See my testing below.
d30 -r c1t0d0s3 c1t0d1s3 c1t0d2s3 c1t0d3s3 -k -i 128b (64k interlace)
# dd if=/dev/zero of=/dev/md/rdsk/d30 bs=1024k
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.2 0 0 c2d0
0.0 16.6 0.0 17034.5 0.0 1.0 0.0 57.3 0 95 d30
100.0 200.2 1450.0 12864.3 0.0 0.9 0.0 3.1 0 78 c1t0d1
100.0 200.3 1450.0 12870.8 0.0 1.1 0.0 3.7 0 74 c1t0d0
89.1 178.1 744.5 11441.4 0.0 1.0 0.0 3.6 0 67 c1t0d2
89.1 178.1 744.5 11441.4 0.0 0.8 0.0 3.0 0 71 c1t0d3
d40 -r c1t0d0s4 c1t0d1s4 c1t0d2s4 -k -i 1024b (512k interlace)
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.4 0.0 0.2 0.0 0.0 0.0 0.0 3.7 0 0 c2d0
0.0 26.1 0.0 26687.8 0.0 0.9 0.0 35.7 0 93 d40
26.1 52.1 13.0 26700.8 0.0 1.4 0.0 18.3 0 99 c1t0d1
26.1 52.1 13.0 26700.8 0.0 1.4 0.0 17.6 0 99 c1t0d0
26.2 52.1 13.1 26700.8 0.0 1.4 0.0 17.6 0 99 c1t0d2
d20 -r c1t0d0s1 c1t0d1s1 c1t0d2s1 c1t0d3s1 c1t0d4s1 -k -i 512b (256k)
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.4 0.0 0.2 0.0 0.0 0.0 0.0 0.1 0 0 c2d0
0.0 27.3 0.0 27999.1 0.0 0.9 0.0 33.9 0 93 d20
27.2 54.6 13.6 13987.6 0.0 0.9 0.0 11.5 0 85 c1t0d1
27.2 54.6 13.6 13987.6 0.0 1.0 0.0 11.9 0 85 c1t0d0
27.2 54.6 13.6 13987.6 0.0 1.0 0.0 11.8 0 85 c1t0d2
27.2 54.6 13.6 13987.6 0.0 0.9 0.0 10.6 0 85 c1t0d3
27.2 54.6 13.6 13987.6 0.0 0.9 0.0 10.6 0 85 c1t0d4
I think we can generalize that we'll get the best raw I/O performance
when the size of the full line equals the I/O size. Moreover,
optimization for raw I/O apps with fixed I/O size should probably
dividing I/O size by the number of data columns to get the best
interlace size. What do you think?
[Filesystem test]
Your finding is also consistent with the testing I've done. Besides the
file creation with multiple processes, I've seen comparable or much
improved performance with larger interlace size for the following tests,
creation and deletion with a single process, populating large number of
files in a single directory, lock file creation, directory walks,
filling a fs and deleting alternate files, filling fragmented fs, and
read I/O. I'm not sure why you would expect better I/O for smaller
files with smaller interlace size. Would you expand?
Which Solaris release are you running? Yes, the 56k maxphys value is
there for backward compatibility with old hardware. It's interesting
since I don't see any I/O performance when changing maxphys to a 1MByte.
As you can see from my numbers above, the I/O size to the disks in all
cases are greater than the default maxphys value. Does this mean SVM
uses some other means to determine its md_maxphys value? I don't
actually remeber and will need to look more into this :^)
Regards,
-tony
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic