[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drbd-user
Subject:    Re: [DRBD-user] Extremely high latency problem
From:       Bret Mette <bret.mette () dbihosting ! com>
Date:       2014-06-05 5:51:40
Message-ID: CAKpThoyMqo=okJrZ9NBG_Nf7EErsQfTRA2ExfJc8V0fGca-KPA () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Do you have any suggestions on how I can test the network in isolation that
would yield results helpful in this scenario?

DRBD was not syncing, as I got those results even with the secondary in
disconnect. Storage directly yields the following results:

node1
dd if=/dev/zero of=./testbin  bs=512 count=1000 oflag=direct
12000 bytes (512 kB) copied, 0.153541 s, 3.3 MB/s

node2
dd if=/dev/zero of=~/testbin  bs=512 count=1000 oflag=direct
512000 bytes (512 kB) copied, 0.864994 s, 592 kB/s
512000 bytes (512 kB) copied, 0.328994 s, 1.6 MB/


On Wed, Jun 4, 2014 at 10:35 PM, Digimer <lists@alteeve.ca> wrote:

> On 04/06/14 11:31 AM, Bret Mette wrote:
>
>> Hello,
>>
>> I started looking at DRBD as a HA ISCSI target. I am experiencing very
>> poor performance and decided to run some tests. My current setup is as
>> follows:
>>
>> Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GH
>> CentoS 6.5 - 2.6.32-431.17.1.el6.x86_64
>> drbd version: 8.3.16 (api:88/proto:86-97)
>> md RAID10 using 7200rpm drives
>>
>> The 2 drbd nodes are synced using an intel  82579LM Gigabit card
>>
>> I have created an logical drive using LVM and configured a couple drbd
>> resources on top of that. drbd0 is my iscsi configuration file, which is
>> shared between the 2 nodes and drbd1 is a 1.75TB ISCSI target.
>>
>> I run heartbeat on the two nodes and expose a virtual IP to the ISCSI
>> initiators.
>>
>> Originally I was running ISCSI with write-cache off (for data integrity
>> reasons) but have recently switched to write-cache on during testing
>> (with little to no gain).
>>
>> My major concern is the extremely high latency test results I got when
>> when dd against drbd0 mounted on the primary node.
>>
>> dd if=/dev/zero of=./testbin  bs=512 count=1000 oflag=direct
>> 512000 bytes (512 kB) copied, 32.3254 s, 15.8 kB/s
>>
>> I have pinged the second node as a very basic network latency test and
>> get 0.209ms response time. I have also run the same test on both nodes
>> with drbd disconnected (or on partitions not associated with drbd) and
>> get typical results:
>>
>> node1
>> dd if=/dev/zero of=./testbin  bs=512 count=1000 oflag=direct
>> 12000 bytes (512 kB) copied, 0.153541 s, 3.3 MB/s
>>
>> node2
>> dd if=/dev/zero of=~/testbin  bs=512 count=1000 oflag=direct
>> 512000 bytes (512 kB) copied, 0.864994 s, 592 kB/s
>> 512000 bytes (512 kB) copied, 0.328994 s, 1.6 MB/s
>>
>> node2's latency (without drbd connected) is inconsistent but always
>> falls between those two ranges.
>>
>> These tests were run with no ISCSI targets exposed, no initiators
>> connected, essentially on an idle system.
>>
>> My question is why are my drbd connected latency tests showing results
>> 35 to 100 times slower than my results when dbrd is not connected (or
>> against partitions not backed by drbd)?
>>
>> This seems to be the source of my horrible performance on the ISCSI
>> targs (300~900 K/sec dd writes on the initiators) and very high iowait
>> (35-75%) on mildly busy initiators.
>>
>>
>> Any advice pointers, etc. would be highly appreciated. I have already
>> tried numerous performance tuning settings (suggested by the drbd
>> manual). But I am open to any suggestion and will try anything again if
>> it might solve my problem.
>>
>> Here are the important bits of my current drbd.conf
>>
>>          net {
>>          cram-hmac-alg sha1;
>>          shared-secret "password";
>>          after-sb-0pri disconnect;
>>          after-sb-1pri disconnect;
>>          after-sb-2pri disconnect;
>>          rr-conflict disconnect;
>>          max-buffers 8000;
>>          max-epoch-size 8000;
>>          sndbuf-size 0;
>>          }
>>
>>          syncer {
>>          rate 100M;
>>          verify-alg sha1;
>>          al-extents 3389;
>>          }
>>
>> I've played with the watermark setting and a few others and latency only
>> seems to get worse or stay where it's at.
>>
>>
>> Thank you,
>> Bret
>>
>
> Have you tried testing the network in isolation? Is the DRBD resource
> syncing? With a syncer rate of 100M on a 1 Gbps NIC, that's just about all
> your bandwidth consumed by background sync. Can you test the speed of the
> storage directly, not over iSCSI/network?
>
> --
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
>

[Attachment #5 (text/html)]

<div dir="ltr">Do you have any suggestions on how I can test the network in isolation \
that would yield results helpful in this scenario?<div><br></div><div>DRBD was not \
syncing, as I got those results even with the secondary in disconnect. Storage \
directly yields the following results:</div> <div><br></div><div><div \
style="font-family:arial,sans-serif;font-size:13px">node1</div><div \
style="font-family:arial,sans-serif;font-size:13px">dd if=/dev/zero of=./testbin   \
bs=512 count=1000 oflag=direct<br></div><div \
style="font-family:arial,sans-serif;font-size:13px"> 12000 bytes (512 kB) copied, \
0.153541 s, 3.3 MB/s<br></div><div \
style="font-family:arial,sans-serif;font-size:13px"><br></div><div \
style="font-family:arial,sans-serif;font-size:13px">node2</div><div \
style="font-family:arial,sans-serif;font-size:13px"> dd if=/dev/zero of=~/testbin   \
bs=512 count=1000 oflag=direct<br></div><div \
style="font-family:arial,sans-serif;font-size:13px">512000 bytes (512 kB) copied, \
0.864994 s, 592 kB/s<br></div><div \
style="font-family:arial,sans-serif;font-size:13px"> 512000 bytes (512 kB) copied, \
0.328994 s, 1.6 MB/</div></div></div><div class="gmail_extra"><br><br><div \
class="gmail_quote">On Wed, Jun 4, 2014 at 10:35 PM, Digimer <span dir="ltr">&lt;<a \
href="mailto:lists@alteeve.ca" target="_blank">lists@alteeve.ca</a>&gt;</span> \
wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 04/06/14 11:31 \
AM, Bret Mette wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex"> Hello,<br>
<br>
I started looking at DRBD as a HA ISCSI target. I am experiencing very<br>
poor performance and decided to run some tests. My current setup is as<br>
follows:<br>
<br>
Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GH<br>
CentoS 6.5 - 2.6.32-431.17.1.el6.x86_64<br>
drbd version: 8.3.16 (api:88/proto:86-97)<br>
md RAID10 using 7200rpm drives<br>
<br>
The 2 drbd nodes are synced using an intel   82579LM Gigabit card<br>
<br>
I have created an logical drive using LVM and configured a couple drbd<br>
resources on top of that. drbd0 is my iscsi configuration file, which is<br>
shared between the 2 nodes and drbd1 is a 1.75TB ISCSI target.<br>
<br>
I run heartbeat on the two nodes and expose a virtual IP to the ISCSI<br>
initiators.<br>
<br>
Originally I was running ISCSI with write-cache off (for data integrity<br>
reasons) but have recently switched to write-cache on during testing<br>
(with little to no gain).<br>
<br>
My major concern is the extremely high latency test results I got when<br>
when dd against drbd0 mounted on the primary node.<br>
<br>
dd if=/dev/zero of=./testbin   bs=512 count=1000 oflag=direct<br>
512000 bytes (512 kB) copied, 32.3254 s, 15.8 kB/s<br>
<br>
I have pinged the second node as a very basic network latency test and<br>
get 0.209ms response time. I have also run the same test on both nodes<br>
with drbd disconnected (or on partitions not associated with drbd) and<br>
get typical results:<br>
<br>
node1<br>
dd if=/dev/zero of=./testbin   bs=512 count=1000 oflag=direct<br>
12000 bytes (512 kB) copied, 0.153541 s, 3.3 MB/s<br>
<br>
node2<br>
dd if=/dev/zero of=~/testbin   bs=512 count=1000 oflag=direct<br>
512000 bytes (512 kB) copied, 0.864994 s, 592 kB/s<br>
512000 bytes (512 kB) copied, 0.328994 s, 1.6 MB/s<br>
<br>
node2&#39;s latency (without drbd connected) is inconsistent but always<br>
falls between those two ranges.<br>
<br>
These tests were run with no ISCSI targets exposed, no initiators<br>
connected, essentially on an idle system.<br>
<br>
My question is why are my drbd connected latency tests showing results<br>
35 to 100 times slower than my results when dbrd is not connected (or<br>
against partitions not backed by drbd)?<br>
<br>
This seems to be the source of my horrible performance on the ISCSI<br>
targs (300~900 K/sec dd writes on the initiators) and very high iowait<br>
(35-75%) on mildly busy initiators.<br>
<br>
<br>
Any advice pointers, etc. would be highly appreciated. I have already<br>
tried numerous performance tuning settings (suggested by the drbd<br>
manual). But I am open to any suggestion and will try anything again if<br>
it might solve my problem.<br>
<br>
Here are the important bits of my current drbd.conf<br>
<br>
              net {<br>
              cram-hmac-alg sha1;<br>
              shared-secret &quot;password&quot;;<br>
              after-sb-0pri disconnect;<br>
              after-sb-1pri disconnect;<br>
              after-sb-2pri disconnect;<br>
              rr-conflict disconnect;<br>
              max-buffers 8000;<br>
              max-epoch-size 8000;<br>
              sndbuf-size 0;<br>
              }<br>
<br>
              syncer {<br>
              rate 100M;<br>
              verify-alg sha1;<br>
              al-extents 3389;<br>
              }<br>
<br>
I&#39;ve played with the watermark setting and a few others and latency only<br>
seems to get worse or stay where it&#39;s at.<br>
<br>
<br>
Thank you,<br>
Bret<br>
</blockquote>
<br></div></div>
Have you tried testing the network in isolation? Is the DRBD resource syncing? With a \
syncer rate of 100M on a 1 Gbps NIC, that&#39;s just about all your bandwidth \
consumed by background sync. Can you test the speed of the storage directly, not over \
iSCSI/network?<span class="HOEnZb"><font color="#888888"><br>

<br>
-- <br>
Digimer<br>
Papers and Projects: <a href="https://alteeve.ca/w/" \
target="_blank">https://alteeve.ca/w/</a><br> What if the cure for cancer is trapped \
in the mind of a person without access to education?<br> \
</font></span></blockquote></div><br></div>



_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic