[prev in list] [next in list] [prev in thread] [next in thread] 

List:       adsm-l
Subject:    Re: OnTap read block size?
From:       "Rhodes, Richard L." <rrhodes () FIRSTENERGYCORP ! COM>
Date:       2016-07-01 14:46:22
Message-ID: DM2PR05MB7654A8EEC70515F3CFD2EF2B6250 () DM2PR05MB765 ! namprd05 ! prod ! outlook ! com
[Download RAW message or body]

I forgot to say this is all Fibrechannel based luns on this NetApp head.  The partner \
head handles CIFS.

The aggregate is less than a third full:  9tb available, 2.6tb used.
It is comprised of 22x600gb HDD built as 2x(9d+2p).

  Aggregate 'aggrfcp'

    Total space    WAFL reserve    Snap reserve    Usable space       BSR NVLOG       \
A-SIS          Smtape  10321542144KB    1032154212KB             0KB    9289387932KB  \
0KB             0KB             0KB

  <snip - vol info removed>

  Aggregate                       Allocated            Used           Avail
  Total space                  2655868604KB    2616289680KB    6519196976KB
  Snap reserve                          0KB             0KB             0KB
  WAFL reserve                 1032154212KB     114641872KB     917512340KB


All volumes (63 of them) hold only luns are are THIN.
All volumes have 32 snaps.
All volumes are snapmirrored to a 2nd datacenter.
All volumes are snapvaulted to another local NetApp/nSeries system.

Both lpars use VIO based virtual Fibrechannel adapters.
I'm going to test sequential I/O to another vendors storage system to rule (or point \
to) AIX/VIO as the problem.






From: Steiner, Jeffrey [mailto:Jeffrey.Steiner@netapp.com]
Sent: Thursday, June 30, 2016 7:05 AM
To: Sebastian Goetze <spgoetze@gmail.com>; Rhodes, Richard L. \
                <rrhodes@firstenergycorp.com>; toasters@teaparty.net
Subject: RE: OnTap read block size?

In theory, if read_realloc was off and the aggregate was close to 100% full you could \
get this kind of IO pattern. I doubt that's happening, but I can't rule it out.

I did a test with an all-Flash system where I pretty much puréed an aggregate. In a \
healthy environment, everything should be nicely allocated and a sequential read \
operation should result in huge read chains, like 64x4K blocks read as a unit. I took \
an aggregate and filled it up to 100% and then ran about 72 hours of random \
overwrites. The end result was an array nothing was contiguous. All the 8K blocks \
were distributed randomly across all the disks. The read chains during sequential \
IO's were just 2. That would destroy performance on a system with spinning disk, but \
surprisingly it had no impact on my all-Flash system. Not a whit. That's why part of \
why there is no read_realloc on AFF systems at this time. It doesn't do anything \
useful.

I had to deliberately misconfigure the system to make that happen, though. I wouldn't \
expect a real-world environment to get into that situation.

From: Sebastian Goetze [mailto:spgoetze@gmail.com]
Sent: Thursday, June 30, 2016 12:34 PM
To: Steiner, Jeffrey <Jeffrey.Steiner@netapp.com<mailto:Jeffrey.Steiner@netapp.com>>; \
Rhodes, Richard L. <rrhodes@firstenergycorp.com<mailto:rrhodes@firstenergycorp.com>>; \
                toasters@teaparty.net<mailto:toasters@teaparty.net>
Subject: Re: OnTap read block size?


Hi Rick,



in addition to what Jeff said:

What's going on with the GREADs? Is there a RAID-rebuild in progress?

That column should be 0 in normal circumstances and having this load in parallel to \
your DB load completely messes up the performance picture IMHO...



Oh, and the 'read_realloc' option on a volume with a "random write/sequential read" \
load often leads to nice performance improvements over time, dynamically optimizing \
the DB layout on disk and keeping the volume/file 'defragmented'.





Sebastian

On 6/30/2016 7:33 AM, Steiner, Jeffrey wrote:
NFS behavior depends on the OS. For example, on Linux if the application tries to do \
a 1MB read and you have an rsize set to 65536 what happens is the OS issues 8 \
parallel 64KB requests. The ONTAP system will pick up what's happening and start \
doing read requests.

You are indeed showing 16KB IO requests here. The read chain is about 4, which means \
4 times 4K blocks.

Are you certain that you don't just have a database with a 16KB block size and you're \
doing 16KB random reads? If this was sequential IO, the read chain should be a lot \
larger. I can't think of a realistic scenario where AIX would break a sequential IO \
operation into a series of 16KB reads by itself.

Here's a theory - is someone misreading Oracle IO stats? If you see activity that is \
primarily db_file_sequential_read, then everything is doing exactly what it's \
supposed to do because db_file_sequential_read is random IO. Depending on who you \
ask, it's either a random reads of an index sequence or a sequence of random IO \
operations. Either way, it's random IO, so if you see a database doing \
db_file_sequential_io and it has a 16KB block size, that would explain this.

Sequential IO is performed as either direct_path_read or db_file_scattered read. Yes, \
that means random is sequential and sequential is scattered. Everyone confused yet? \
Specifically, db_file_scattered_read is a large-block sequential IO operation that is \
loaded into scattered memory buffers.

I can't tell you how many times this has caused confusion for DBA's who are certainly \
their IO pattern is random and it's actually sequential or they think it's sequential \
and it's actually random.

Once you have the AWR we'll have a better idea what's happening. It's not just the IO \
sizes I'd be looking for, it's the associated latencies and some of the configuration \
files. If there's no explanation there, we'll have to look at the AIX configuration.

From: toasters-bounces@teaparty.net<mailto:toasters-bounces@teaparty.net> \
                [mailto:toasters-bounces@teaparty.net] On Behalf Of Rhodes, Richard \
                L.
Sent: Wednesday, June 29, 2016 9:07 PM
To: toasters@teaparty.net<mailto:toasters@teaparty.net>
Subject: RE: OnTap read block size?

I've asked a dba to look at your questions/comments.

I'm looking at a blog post \
http://recoverymonkey.org/2014/09/18/when-competitors-try-too-hard-and-miss-the-point-part-two/


It discusses how to read a STATIT for sequential I/O size.  I have a statit listing . \
. .


disk             ut%  xfers  ureads--chain-usecs writes--chain-usecs \
                cpreads-chain-usecs greads--chain-usecs gwrites-chain-usecs
/aggrfcp/plex0/rg0:
0b.01.0           54 107.88    0.00   2.11  2211   1.00  34.35   214   1.91  13.48   \
276 104.97  64.00   188   0.00   ....     . 0b.01.1           55 107.96    0.00   \
2.11  1684   1.13  30.56   216   1.86  12.90   347 104.97  64.00   192   0.00   ....  \
. 0b.01.10          56 111.70    4.14   4.76  1852   0.98  29.22   258   1.61   6.35  \
750 104.97  64.00   195   0.00   ....     . 0b.01.2           56 110.67    4.07   \
4.72  1814   0.65  43.40   192   0.98   9.70   565 104.97  64.00   200   0.00   ....  \
. 0b.01.3           56 110.75    4.16   4.72  1856   0.66  43.15   199   0.97  10.01  \
517 104.97  64.00   201   0.00   ....     . 0b.01.4           57 110.85    4.23   \
4.71  1751   0.65  42.99   194   1.00   9.96   517 104.97  64.00   206   0.00   ....  \
. 0b.01.5           57 110.62    4.06   4.97  1770   0.65  43.42   194   0.94  10.15  \
522 104.97  64.00   210   0.00   ....     . 0b.01.6           57 110.63    4.05   \
4.82  1764   0.65  43.55   197   0.96   9.83   562 104.97  64.00   210   0.00   ....  \
. 0b.01.7           57 110.73    4.12   4.61  1853   0.66  43.27   196   0.98   9.13  \
603 104.97  64.00   217   0.00   ....     . 0b.01.8           57 110.74    4.16   \
4.72  1844   0.65  43.54   197   0.95   9.18   583 104.97  64.00   218   0.00   ....  \
. 0b.01.9           57 110.75    4.16   4.76  1819   0.65  43.06   207   0.97   9.13  \
560 104.97  64.00   223   0.00   ....     .

This looks like it's doing sequential reads in 4k I/O's.
I have multiple of these listings and they are all the same.


rick







From: Steiner, Jeffrey [mailto:Jeffrey.Steiner@netapp.com]
Sent: Wednesday, June 29, 2016 11:33 AM
To: Rhodes, Richard L. \
<rrhodes@firstenergycorp.com<mailto:rrhodes@firstenergycorp.com>>; \
                toasters@teaparty.net<mailto:toasters@teaparty.net>
Subject: RE: OnTap read block size?

Is this NFS or FC?

By default, Oracle does sequential reads in 1M chunks. If they have a 16k block size \
on the database, it should be reading in units of 64, not 128. Also, just because \
Oracle tries to read 1MB chunks doesn't mean the database can do that.

They really shouldn't be using cio as a mount option either. Any remotely current \
version of Oracle will mount the datafiles with concurrent IO so long as they have \
filesystemio_options=setall, which is also what they should have.

If you can send me a sample report from 'awrrpt.sql' of no more than one hour elapsed \
time from a period where they are unhappy with performance, I will take a look and \
what's going on. I can say with 100% certainty that if they really are doing \
multiblock reads with 16K units the problem isn't ONTAP. I suppose it could be a 16K \
block size on a badly fragmented jfs2 filesystem, but I really doubt it. I think \
something is being misinterpreted.

From: toasters-bounces@teaparty.net<mailto:toasters-bounces@teaparty.net> \
                [mailto:toasters-bounces@teaparty.net] On Behalf Of Rhodes, Richard \
                L.
Sent: Wednesday, June 29, 2016 4:36 PM
To: toasters@teaparty.net<mailto:toasters@teaparty.net>
Subject: OnTap read block size?

OnTap 8.1.2p1

Our DBA's are complaining that our nSeries (N3220/FAS2240) is reading really slow due \
to it only returning small 16k blocks.  The DBA's are saying the Oracle multi-block \
read ahead should be reading 128 x 16k blocks = 2m read, but it's only seems to be \
reading/returning 16k at a time.

On a AIX filesystem mounted CIO, if I run
    "dd if=/dev/zero of=z bs=1m count=9999"
I see writes of 500k.

In the same filesystem mounted CIO, if I read an existing db file
  "dd if=<dbfile> of=/dev/null bs=1m"
I see reads of up to 30k.


Q) Is there a limit in OnTap on read size?


Thanks

Rick

________________________________
________________________________

The information contained in this message is intended only for the personal and \
confidential use of the recipient(s) named above. If the reader of this message is \
not the intended recipient or an agent responsible for delivering it to the intended \
recipient, you are hereby notified that you have received this document in error and \
that any review, dissemination, distribution, or copying of this message is strictly \
prohibited. If you have received this communication in error, please notify us \
immediately, and delete the original message.

________________________________
________________________________

The information contained in this message is intended only for the personal and \
confidential use of the recipient(s) named above. If the reader of this message is \
not the intended recipient or an agent responsible for delivering it to the intended \
recipient, you are hereby notified that you have received this document in error and \
that any review, dissemination, distribution, or copying of this message is strictly \
prohibited. If you have received this communication in error, please notify us \
immediately, and delete the original message.





_______________________________________________

Toasters mailing list

Toasters@teaparty.net<mailto:Toasters@teaparty.net>

http://www.teaparty.net/mailman/listinfo/toasters



-----------------------------------------
The information contained in this message is intended only for the personal and \
confidential use of the recipient(s) named above. If the reader of this message is \
not the intended recipient or an agent responsible for delivering it to the intended \
recipient, you are hereby notified that you have received this document in error and \
that any review, dissemination, distribution, or copying of this message is strictly \
prohibited. If you have received this communication in error, please notify us \
immediately, and delete the original message.


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic