[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drbd-user
Subject:    Re: [DRBD-user] Extremely high latency problem
From:       Bret Mette <bret.mette () dbihosting ! com>
Date:       2014-06-05 16:30:37
Message-ID: CAKpThowC1yjLbWNO6gzg-oxKCL7=XZ9=A5qmp=QUB4z4T1cw5g () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


dd if=/dev/zero of=./testbin  bs=512 count=1000 oflag=direct
12000 bytes (512 kB) copied, 0.153541 s, 3.3 MB/s

This was run against /root/testbin which is /dev/md1 with no LVM or DRBD



dd if=/dev/zero of=./testbin  bs=512 count=1000 oflag=direct
512000 bytes (512 kB) copied, 32.3254 s, 15.8 kB/s

This was run against /mnt/tmp which is DRBD /dev/drbd2 backed by an LVM
logical volume, with the logical volume backed by /dev/md127 while
/dev/drbd2 was in the connected state



On Thu, Jun 5, 2014 at 9:24 AM, Digimer <lists@alteeve.ca> wrote:

> I use iperf for network testing.
>
> Those dd's are run on the machine directly with the HDDs attached, not
> over the network connection? It's also direct to the backing device, not
> through /dev/drbdX? If so, your storage is the problem.
>
>
> On 05/06/14 01:51 AM, Bret Mette wrote:
>
>> Do you have any suggestions on how I can test the network in isolation
>> that would yield results helpful in this scenario?
>>
>> DRBD was not syncing, as I got those results even with the secondary in
>> disconnect. Storage directly yields the following results:
>>
>> node1
>> dd if=/dev/zero of=./testbin  bs=512 count=1000 oflag=direct
>> 12000 bytes (512 kB) copied, 0.153541 s, 3.3 MB/s
>>
>> node2
>> dd if=/dev/zero of=~/testbin  bs=512 count=1000 oflag=direct
>> 512000 bytes (512 kB) copied, 0.864994 s, 592 kB/s
>> 512000 bytes (512 kB) copied, 0.328994 s, 1.6 MB/
>>
>>
>> On Wed, Jun 4, 2014 at 10:35 PM, Digimer <lists@alteeve.ca
>> <mailto:lists@alteeve.ca>> wrote:
>>
>>     On 04/06/14 11:31 AM, Bret Mette wrote:
>>
>>         Hello,
>>
>>         I started looking at DRBD as a HA ISCSI target. I am
>>         experiencing very
>>         poor performance and decided to run some tests. My current setup
>>         is as
>>         follows:
>>
>>         Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GH
>>         CentoS 6.5 - 2.6.32-431.17.1.el6.x86_64
>>         drbd version: 8.3.16 (api:88/proto:86-97)
>>         md RAID10 using 7200rpm drives
>>
>>         The 2 drbd nodes are synced using an intel  82579LM Gigabit card
>>
>>         I have created an logical drive using LVM and configured a
>>         couple drbd
>>         resources on top of that. drbd0 is my iscsi configuration file,
>>         which is
>>         shared between the 2 nodes and drbd1 is a 1.75TB ISCSI target.
>>
>>         I run heartbeat on the two nodes and expose a virtual IP to the
>>         ISCSI
>>         initiators.
>>
>>         Originally I was running ISCSI with write-cache off (for data
>>         integrity
>>         reasons) but have recently switched to write-cache on during
>> testing
>>         (with little to no gain).
>>
>>         My major concern is the extremely high latency test results I
>>         got when
>>         when dd against drbd0 mounted on the primary node.
>>
>>         dd if=/dev/zero of=./testbin  bs=512 count=1000 oflag=direct
>>         512000 bytes (512 kB) copied, 32.3254 s, 15.8 kB/s
>>
>>         I have pinged the second node as a very basic network latency
>>         test and
>>         get 0.209ms response time. I have also run the same test on both
>>         nodes
>>         with drbd disconnected (or on partitions not associated with
>>         drbd) and
>>         get typical results:
>>
>>         node1
>>         dd if=/dev/zero of=./testbin  bs=512 count=1000 oflag=direct
>>         12000 bytes (512 kB) copied, 0.153541 s, 3.3 MB/s
>>
>>         node2
>>         dd if=/dev/zero of=~/testbin  bs=512 count=1000 oflag=direct
>>         512000 bytes (512 kB) copied, 0.864994 s, 592 kB/s
>>         512000 bytes (512 kB) copied, 0.328994 s, 1.6 MB/s
>>
>>         node2's latency (without drbd connected) is inconsistent but
>> always
>>         falls between those two ranges.
>>
>>         These tests were run with no ISCSI targets exposed, no initiators
>>         connected, essentially on an idle system.
>>
>>         My question is why are my drbd connected latency tests showing
>>         results
>>         35 to 100 times slower than my results when dbrd is not
>>         connected (or
>>         against partitions not backed by drbd)?
>>
>>         This seems to be the source of my horrible performance on the
>> ISCSI
>>         targs (300~900 K/sec dd writes on the initiators) and very high
>>         iowait
>>         (35-75%) on mildly busy initiators.
>>
>>
>>         Any advice pointers, etc. would be highly appreciated. I have
>>         already
>>         tried numerous performance tuning settings (suggested by the drbd
>>         manual). But I am open to any suggestion and will try anything
>>         again if
>>         it might solve my problem.
>>
>>         Here are the important bits of my current drbd.conf
>>
>>                   net {
>>                   cram-hmac-alg sha1;
>>                   shared-secret "password";
>>                   after-sb-0pri disconnect;
>>                   after-sb-1pri disconnect;
>>                   after-sb-2pri disconnect;
>>                   rr-conflict disconnect;
>>                   max-buffers 8000;
>>                   max-epoch-size 8000;
>>                   sndbuf-size 0;
>>                   }
>>
>>                   syncer {
>>                   rate 100M;
>>                   verify-alg sha1;
>>                   al-extents 3389;
>>                   }
>>
>>         I've played with the watermark setting and a few others and
>>         latency only
>>         seems to get worse or stay where it's at.
>>
>>
>>         Thank you,
>>         Bret
>>
>>
>>     Have you tried testing the network in isolation? Is the DRBD
>>     resource syncing? With a syncer rate of 100M on a 1 Gbps NIC, that's
>>     just about all your bandwidth consumed by background sync. Can you
>>     test the speed of the storage directly, not over iSCSI/network?
>>
>>     --
>>     Digimer
>>     Papers and Projects: https://alteeve.ca/w/
>>     What if the cure for cancer is trapped in the mind of a person
>>     without access to education?
>>
>>
>>
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>

[Attachment #5 (text/html)]

<div dir="ltr"><div style="font-family:arial,sans-serif;font-size:13px"><div>dd \
if=/dev/zero of=./testbin   bs=512 count=1000 oflag=direct<br></div><div>12000 bytes \
(512 kB) copied, 0.153541 s, 3.3 MB/s</div><div><br></div> </div><div \
style="font-family:arial,sans-serif;font-size:13px">This was run against \
/root/testbin which is  <span \
style="font-family:arial;font-size:small">/dev/md1</span><span \
style="font-family:arial;font-size:small">  with no LVM or DRBD</span></div> <div \
style="font-family:arial,sans-serif;font-size:13px"><span \
style="font-family:arial;font-size:small"><br></span></div>















<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div \
style="font-family:arial,sans-serif;font-size:13px"><br></div><div \
style="font-family:arial,sans-serif;font-size:13px">dd if=/dev/zero of=./testbin   \
bs=512 count=1000 oflag=direct<br> 512000 bytes (512 kB) copied, 32.3254 s, 15.8 \
kB/s<br></div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div \
style="font-family:arial,sans-serif;font-size:13px">This was run against /mnt/tmp \
which is DRBD  <span style="font-family:arial;font-size:small">/dev/drbd2  \
</span>backed by an LVM logical volume, with the logical volume backed by /dev/md127 \
while /dev/drbd2 was in the connected state</div>








<div style="font-family:arial,sans-serif;font-size:13px"><br></div></div><div \
class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jun 5, 2014 at 9:24 AM, \
Digimer <span dir="ltr">&lt;<a href="mailto:lists@alteeve.ca" \
target="_blank">lists@alteeve.ca</a>&gt;</span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">I use iperf for network testing.<br> <br>
Those dd&#39;s are run on the machine directly with the HDDs attached, not over the \
network connection? It&#39;s also direct to the backing device, not through \
/dev/drbdX? If so, your storage is the problem.<div class=""> <br>
<br>
On 05/06/14 01:51 AM, Bret Mette wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div class=""> Do you have any suggestions on how I can test \
the network in isolation<br> that would yield results helpful in this scenario?<br>
<br>
DRBD was not syncing, as I got those results even with the secondary in<br>
disconnect. Storage directly yields the following results:<br>
<br>
node1<br>
dd if=/dev/zero of=./testbin   bs=512 count=1000 oflag=direct<br>
12000 bytes (512 kB) copied, 0.153541 s, 3.3 MB/s<br>
<br>
node2<br>
dd if=/dev/zero of=~/testbin   bs=512 count=1000 oflag=direct<br>
512000 bytes (512 kB) copied, 0.864994 s, 592 kB/s<br>
512000 bytes (512 kB) copied, 0.328994 s, 1.6 MB/<br>
<br>
<br>
On Wed, Jun 4, 2014 at 10:35 PM, Digimer &lt;<a href="mailto:lists@alteeve.ca" \
target="_blank">lists@alteeve.ca</a><br></div><div><div class="h5"> &lt;mailto:<a \
href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>&gt;&gt; \
wrote:<br> <br>
      On 04/06/14 11:31 AM, Bret Mette wrote:<br>
<br>
            Hello,<br>
<br>
            I started looking at DRBD as a HA ISCSI target. I am<br>
            experiencing very<br>
            poor performance and decided to run some tests. My current setup<br>
            is as<br>
            follows:<br>
<br>
            Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GH<br>
            CentoS 6.5 - 2.6.32-431.17.1.el6.x86_64<br>
            drbd version: 8.3.16 (api:88/proto:86-97)<br>
            md RAID10 using 7200rpm drives<br>
<br>
            The 2 drbd nodes are synced using an intel   82579LM Gigabit card<br>
<br>
            I have created an logical drive using LVM and configured a<br>
            couple drbd<br>
            resources on top of that. drbd0 is my iscsi configuration file,<br>
            which is<br>
            shared between the 2 nodes and drbd1 is a 1.75TB ISCSI target.<br>
<br>
            I run heartbeat on the two nodes and expose a virtual IP to the<br>
            ISCSI<br>
            initiators.<br>
<br>
            Originally I was running ISCSI with write-cache off (for data<br>
            integrity<br>
            reasons) but have recently switched to write-cache on during testing<br>
            (with little to no gain).<br>
<br>
            My major concern is the extremely high latency test results I<br>
            got when<br>
            when dd against drbd0 mounted on the primary node.<br>
<br>
            dd if=/dev/zero of=./testbin   bs=512 count=1000 oflag=direct<br>
            512000 bytes (512 kB) copied, 32.3254 s, 15.8 kB/s<br>
<br>
            I have pinged the second node as a very basic network latency<br>
            test and<br>
            get 0.209ms response time. I have also run the same test on both<br>
            nodes<br>
            with drbd disconnected (or on partitions not associated with<br>
            drbd) and<br>
            get typical results:<br>
<br>
            node1<br>
            dd if=/dev/zero of=./testbin   bs=512 count=1000 oflag=direct<br>
            12000 bytes (512 kB) copied, 0.153541 s, 3.3 MB/s<br>
<br>
            node2<br>
            dd if=/dev/zero of=~/testbin   bs=512 count=1000 oflag=direct<br>
            512000 bytes (512 kB) copied, 0.864994 s, 592 kB/s<br>
            512000 bytes (512 kB) copied, 0.328994 s, 1.6 MB/s<br>
<br>
            node2&#39;s latency (without drbd connected) is inconsistent but \
always<br>  falls between those two ranges.<br>
<br>
            These tests were run with no ISCSI targets exposed, no initiators<br>
            connected, essentially on an idle system.<br>
<br>
            My question is why are my drbd connected latency tests showing<br>
            results<br>
            35 to 100 times slower than my results when dbrd is not<br>
            connected (or<br>
            against partitions not backed by drbd)?<br>
<br>
            This seems to be the source of my horrible performance on the ISCSI<br>
            targs (300~900 K/sec dd writes on the initiators) and very high<br>
            iowait<br>
            (35-75%) on mildly busy initiators.<br>
<br>
<br>
            Any advice pointers, etc. would be highly appreciated. I have<br>
            already<br>
            tried numerous performance tuning settings (suggested by the drbd<br>
            manual). But I am open to any suggestion and will try anything<br>
            again if<br>
            it might solve my problem.<br>
<br>
            Here are the important bits of my current drbd.conf<br>
<br>
                           net {<br>
                           cram-hmac-alg sha1;<br>
                           shared-secret &quot;password&quot;;<br>
                           after-sb-0pri disconnect;<br>
                           after-sb-1pri disconnect;<br>
                           after-sb-2pri disconnect;<br>
                           rr-conflict disconnect;<br>
                           max-buffers 8000;<br>
                           max-epoch-size 8000;<br>
                           sndbuf-size 0;<br>
                           }<br>
<br>
                           syncer {<br>
                           rate 100M;<br>
                           verify-alg sha1;<br>
                           al-extents 3389;<br>
                           }<br>
<br>
            I&#39;ve played with the watermark setting and a few others and<br>
            latency only<br>
            seems to get worse or stay where it&#39;s at.<br>
<br>
<br>
            Thank you,<br>
            Bret<br>
<br>
<br>
      Have you tried testing the network in isolation? Is the DRBD<br>
      resource syncing? With a syncer rate of 100M on a 1 Gbps NIC, that&#39;s<br>
      just about all your bandwidth consumed by background sync. Can you<br>
      test the speed of the storage directly, not over iSCSI/network?<br>
<br>
      --<br>
      Digimer<br>
      Papers and Projects: <a href="https://alteeve.ca/w/" \
target="_blank">https://alteeve.ca/w/</a><br>  What if the cure for cancer is trapped \
in the mind of a person<br>  without access to education?<br>
<br>
<br>
</div></div></blockquote><div class="HOEnZb"><div class="h5">
<br>
<br>
-- <br>
Digimer<br>
Papers and Projects: <a href="https://alteeve.ca/w/" \
target="_blank">https://alteeve.ca/w/</a><br> What if the cure for cancer is trapped \
in the mind of a person without access to education?<br> \
</div></div></blockquote></div><br></div>



_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic