'Re: [DRBD-user] Adjusting al-extents on-the-fly'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       drbd-user
Subject:    Re: [DRBD-user] Adjusting al-extents on-the-fly
From:       mahadevsb mahadevsb <mahadevsb () gmail ! com>
Date:       2014-05-31 12:54:40
Message-ID: 5389ce42.4c10430a.2480.ffff9324 () mx ! google ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

-----Original Message-----
From: Lars Ellenberg
Sent: 5/30/2014 6:43 AM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Adjusting al-extents on-the-fly

On Wed, May 28, 2014 at 01:23:55PM +1000, Stuart Longland wrote:
> Hi Lars,
> On 27/05/14 20:31, Lars Ellenberg wrote:
> >> The system logs PLC-generated process data every 5 seconds, and at two
> >> times of the day, at midnight and midday, it misses a sample with the
> >> logging taking 6 seconds.  There's no obvious CPU spike at this time, so
> >> my hunch is I/O, and so I'm looking at ways to try and improve this.
> > 
> > Funny how if "something" happens,
> > and there is DRBD anywhere near it,
> > it is "obviously" DRBD's fault, naturally.
> 
> No, it's not "obviously" DRBD's fault.  It is a factor, as is the CPU.
> Rather, it's the network and/or disk, of which DRBD is reliant both of
> these, and (to a lesser extent) CPU time.
> 
> I'm faced with a number of symptoms, and so it is right I consider *all*
> factors, including DRBD and the I/O subsystems that underpin it.

Okok ...

> >> iotop didn't show any huge spikes that I'd imagine the disks would have
> >> trouble with.  Then again, since it's effectively polling, I could have
> >> "blinked" and missed it.
> > 
> > If your data gathering and logging thingy misses a sample
> > because of the logging to disk (assuming for now that this is in fact
> > what happens), you are still doing it wrong.
> > 
> > Make the data sampling asynchronous wrt. flushing data to disk.
> 
> Sadly how it does the logging is outside my control.  The SCADA package
> is one called MacroView, and is made available for a number of platforms
> under a proprietary license.  I do not have the source code, however it
> has been used successfully on quite a large number of systems.
> 
> The product has been around since the late 80's on numerous Unix
> variants.  Its methods may not be "optimal", but they seem to work well
> enough in a large number of cases.
> 
> The MacroView Historian basically reads its data from shared memory
> segments exported by PLC drivers, computes whatever summary data is
> needed then writes this out to disk.  So the process is both I/O and
> possibly CPU intensive.
> 
> I can't do much about the CPU other than fiddling with `nice` without a
> hardware upgrade (which may yet happen; time will tell).
> 
> I don't see the load-average sky rocketing which is why I suspected I/O:
> either disk writes that are being bottle-necked by the gigabit network
> link, or perhaps the disk controller.
> 
> The DRBD installation there was basically configured and gotten to a
> working state, there was a little monkey-see-monkey-do learning in the
> beginning, so it's possible that performance can be enhanced with a
> little tweaking.
> 
> The literature suggests a number of parameters are dependent on the
> hardware used, and this, is what I'm looking into.
> 
> This is one possibility I am investigating: being mindful that this is a
> live production cluster that I'm working on.  Thus I have to be careful
> what I adjust, and how I adjust it.

Sure.

Well, IO subsystems may have occasional latency spikes.
DRBD may trigger, be responsible for, or even cause
additional latency spikes.

IF your scada would "catch one sample then synchronously log it",
particular high latency spikes might cause it to miss the next samle.

I find that highly unlikely.
Both that sampling and logging would be so tightly coupled,
and that the latency spike would take that long (if nothing else is
going on, and the system is not completely overloaded;
with really loaded systems, arbitrary queue length and buffer bloat,
I can easily make the latency spike for minutes).

As this is "pro" stuff, I think it is safe to assume
that gathering data, and logging that data, is not so tightly coupled.
Which leads me to believe that it missing a sample
has nothing to do with persisting the previous sample(s) to disk.

Especially if it happens so regularly twice a day noon and midnight.
What is so "special" about those times?
flushing logs?  log rotation?

You wrote "with the logging taking 6 seconds".
What exactly does that mean?
"the logging"?
"taking 6 seconds"?
what exactly takes six seconds?
how do you know?

Are some clocks slightly off
and get adjusted twice a day?

> >> DR:BD is configured with a disk partition on a RAID array as its backing
> > 
> > Wrong end of the system to tune in this case, imo.
> 
> Well, hardware configuration and BIOS settings are out of my reach as
> I'm in Brisbane and the servers in question are somewhere in Central
> Queensland some 1000km away.
> 
> > This (adjusting of the "al-extents" only) is a rather boring command
> > actually.  It may stall IO on a very busy backend a bit,
> > changes some internal "caching hash table size" (sort of),
> > and continues.
> 
> Does the change of the internal 'caching hash table size' do anything
> destructive to the DR:BD volume?

No.  Really.
Why would we do something destructive to your data
because you change some syncronisation parameter.
And I even just wrote it was "boring, at most briefly stalls then
continues IO".  I did not write
it-will-reformat-and-panic-the-box-be-careful-dont-use.

But unless your typical working set size is much larger than what
the current setting covered, this is unlikely to help.
(257 al-extents correspond to ~ 1GByte working set)
If it is not about the size, but the change rate, of your working set,
you will need to upgrade to drbd 8.4.

> http://www.drbd.org/users-guide-8.3/re-drbdsetup.html mentions that
> --create-device "In case the specified DRBD device (minor number) does
> not exist yet, create it implicitly."
> 
> Unfortunately to me "device" is ambiguous, is this the block device file
> in /dev, or the actual logical DR:BD device (i.e. the partition).

So what. "In case .* does not exist yet".
Well, it does exist.
So that's a no-op, right?

Anyways.  That flag is passed from drbdadm to drbdsetup *always*
(in your drbd version).
And it does no harm. Not even to your data.
It's an internal convenience flag.

> I don't want to create a new device, I just want to re-use the existing
> one that's there and keep its data.
> 
> > As your server seems to be rather not-so-busy, IO wise,
> > I don't think this will even be noticable.
> 
> Are there other parameters that I should be looking at?

If this is about DRBD tuning,
well, yes, there are many things to consider.
If there were just one optimal set of values,
those would be hardcoded, and not tunables.

> Sync-rates perhaps?

Did you have resync going on during your "interesting" times?
If not, why bother, at this time, for this issue.
If yes, why would you always resync at noon and midnight?

> Once again, the literature suggests this should be higher if the writes
> are small and "scattered" in nature, which given we're logging data from
> numerous sources, I'd expect to be the case.

Sync rate is not relevant at all here.
Those parameters control the background resynchronization
after connection loss and re-establishment.
As I understand, your DRBD is healthy, connected,
and happily replicating. No resync.

> Thus following the documentation's recommendations (and not being an
> expert myself) I figured I'd try carefully adjusting that figure to
> something more appropriate.

Sure, careful is good.
Test system is even better ;-)

If you really want to improve on random write latency with DRBD,
you need to upgrade to 8.4. (8.4.5 will be released within days).

I guess that upgrade is too scary for such a system?

Also, you could use auditctl to find out in detail what is happenening
on your system. You likely want to play with that on a test system first
as well, until you get the event filters right,
or you could end up spamming your production systems logs.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBDŽ and LINBITŽ are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

[Attachment #5 (unknown)]

<html><head><meta content="text/html; charset=iso-8859-1" \
http-equiv="Content-Type"></head><body><div><div style="font-family: \
Calibri,sans-serif; font-size: 11pt;"><br></div></div><hr><span style="font-family: \
Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">From: </span><span \
style="font-family: Tahoma,sans-serif; font-size: 10pt;">Lars \
Ellenberg</span><br><span style="font-family: Tahoma,sans-serif; font-size: 10pt; \
font-weight: bold;">Sent: </span><span style="font-family: Tahoma,sans-serif; \
font-size: 10pt;">5/30/2014 6:43 AM</span><br><span style="font-family: \
Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">To: </span><span \
style="font-family: Tahoma,sans-serif; font-size: \
10pt;">drbd-user@lists.linbit.com</span><br><span style="font-family: \
Tahoma,sans-serif; font-size: 10pt; font-weight: bold;">Subject: </span><span \
style="font-family: Tahoma,sans-serif; font-size: 10pt;">Re: [DRBD-user] Adjusting \
al-extents on-the-fly</span><br><br>On Wed, May 28, 2014 at 01:23:55PM +1000, Stuart \
Longland wrote:<br>&gt; Hi Lars,<br>&gt; On 27/05/14 20:31, Lars Ellenberg \
wrote:<br>&gt; &gt;&gt; The system logs PLC-generated process data every 5 seconds, \
and at two<br>&gt; &gt;&gt; times of the day, at midnight and midday, it misses a \
sample with the<br>&gt; &gt;&gt; logging taking 6 seconds.&nbsp; There's no obvious \
CPU spike at this time, so<br>&gt; &gt;&gt; my hunch is I/O, and so I'm looking at \
ways to try and improve this.<br>&gt; &gt; <br>&gt; &gt; Funny how if "something" \
happens,<br>&gt; &gt; and there is DRBD anywhere near it,<br>&gt; &gt; it is \
"obviously" DRBD's fault, naturally.<br>&gt; <br>&gt; No, it's not "obviously" DRBD's \
fault.&nbsp; It is a factor, as is the CPU.<br>&gt; Rather, it's the network and/or \
disk, of which DRBD is reliant both of<br>&gt; these, and (to a lesser extent) CPU \
time.<br>&gt; <br>&gt; I'm faced with a number of symptoms, and so it is right I \
consider *all*<br>&gt; factors, including DRBD and the I/O subsystems that underpin \
it.<br><br>Okok ...<br><br>&gt; &gt;&gt; iotop didn't show any huge spikes that I'd \
imagine the disks would have<br>&gt; &gt;&gt; trouble with.&nbsp; Then again, since \
it's effectively polling, I could have<br>&gt; &gt;&gt; "blinked" and missed \
it.<br>&gt; &gt; <br>&gt; &gt; If your data gathering and logging thingy misses a \
sample<br>&gt; &gt; because of the logging to disk (assuming for now that this is in \
fact<br>&gt; &gt; what happens), you are still doing it wrong.<br>&gt; &gt; <br>&gt; \
&gt; Make the data sampling asynchronous wrt. flushing data to disk.<br>&gt; <br>&gt; \
Sadly how it does the logging is outside my control.&nbsp; The SCADA package<br>&gt; \
is one called MacroView, and is made available for a number of platforms<br>&gt; \
under a proprietary license.&nbsp; I do not have the source code, however it<br>&gt; \
has been used successfully on quite a large number of systems.<br>&gt; <br>&gt; The \
product has been around since the late 80's on numerous Unix<br>&gt; variants.&nbsp; \
Its methods may not be "optimal", but they seem to work well<br>&gt; enough in a \
large number of cases.<br>&gt; <br>&gt; The MacroView Historian basically reads its \
data from shared memory<br>&gt; segments exported by PLC drivers, computes whatever \
summary data is<br>&gt; needed then writes this out to disk.&nbsp; So the process is \
both I/O and<br>&gt; possibly CPU intensive.<br>&gt; <br>&gt; I can't do much about \
the CPU other than fiddling with `nice` without a<br>&gt; hardware upgrade (which may \
yet happen; time will tell).<br>&gt; <br>&gt; I don't see the load-average sky \
rocketing which is why I suspected I/O:<br>&gt; either disk writes that are being \
bottle-necked by the gigabit network<br>&gt; link, or perhaps the disk \
controller.<br>&gt; <br>&gt; The DRBD installation there was basically configured and \
gotten to a<br>&gt; working state, there was a little monkey-see-monkey-do learning \
in the<br>&gt; beginning, so it's possible that performance can be enhanced with \
a<br>&gt; little tweaking.<br>&gt; <br>&gt; The literature suggests a number of \
parameters are dependent on the<br>&gt; hardware used, and this, is what I'm looking \
into.<br>&gt; <br>&gt; This is one possibility I am investigating: being mindful that \
this is a<br>&gt; live production cluster that I'm working on.&nbsp; Thus I have to \
be careful<br>&gt; what I adjust, and how I adjust it.<br><br>Sure.<br><br>Well, IO \
subsystems may have occasional latency spikes.<br>DRBD may trigger, be responsible \
for, or even cause<br>additional latency spikes.<br><br>IF your scada would "catch \
one sample then synchronously log it",<br>particular high latency spikes might cause \
it to miss the next samle.<br><br>I find that highly unlikely.<br>Both that sampling \
and logging would be so tightly coupled,<br>and that the latency spike would take \
that long (if nothing else is<br>going on, and the system is not completely \
overloaded;<br>with really loaded systems, arbitrary queue length and buffer \
bloat,<br>I can easily make the latency spike for minutes).<br><br>As this is "pro" \
stuff, I think it is safe to assume<br>that gathering data, and logging that data, is \
not so tightly coupled.<br>Which leads me to believe that it missing a sample<br>has \
nothing to do with persisting the previous sample(s) to disk.<br><br>Especially if it \
happens so regularly twice a day noon and midnight.<br>What is so "special" about \
those times?<br>flushing logs?&nbsp; log rotation?<br><br>You wrote "with the logging \
taking 6 seconds".<br>What exactly does that mean?<br>"the logging"?<br>"taking 6 \
seconds"?<br>what exactly takes six seconds?<br>how do you know?<br><br>Are some \
clocks slightly off<br>and get adjusted twice a day?<br><br>&gt; &gt;&gt; DR:BD is \
configured with a disk partition on a RAID array as its backing<br>&gt; &gt; <br>&gt; \
&gt; Wrong end of the system to tune in this case, imo.<br>&gt; <br>&gt; Well, \
hardware configuration and BIOS settings are out of my reach as<br>&gt; I'm in \
Brisbane and the servers in question are somewhere in Central<br>&gt; Queensland some \
1000km away.<br>&gt; <br>&gt; &gt; This (adjusting of the "al-extents" only) is a \
rather boring command<br>&gt; &gt; actually.&nbsp; It may stall IO on a very busy \
backend a bit,<br>&gt; &gt; changes some internal "caching hash table size" (sort \
of),<br>&gt; &gt; and continues.<br>&gt; <br>&gt; Does the change of the internal \
'caching hash table size' do anything<br>&gt; destructive to the DR:BD \
volume?<br><br>No.&nbsp; Really.<br>Why would we do something destructive to your \
data<br>because you change some syncronisation parameter.<br>And I even just wrote it \
was "boring, at most briefly stalls then<br>continues IO".&nbsp; I did not \
write<br>it-will-reformat-and-panic-the-box-be-careful-dont-use.<br><br>But unless \
your typical working set size is much larger than what<br>the current setting \
covered, this is unlikely to help.<br>(257 al-extents correspond to ~ 1GByte working \
set)<br>If it is not about the size, but the change rate, of your working set,<br>you \
will need to upgrade to drbd 8.4.<br><br>&gt; \
http://www.drbd.org/users-guide-8.3/re-drbdsetup.html mentions that<br>&gt; \
--create-device "In case the specified DRBD device (minor number) does<br>&gt; not \
exist yet, create it implicitly."<br>&gt; <br>&gt; Unfortunately to me "device" is \
ambiguous, is this the block device file<br>&gt; in /dev, or the actual logical DR:BD \
device (i.e. the partition).<br><br>So what. "In case .* does not exist \
yet".<br>Well, it does exist.<br>So that's a no-op, right?<br><br>Anyways.&nbsp; That \
flag is passed from drbdadm to drbdsetup *always*<br>(in your drbd version).<br>And \
it does no harm. Not even to your data.<br>It's an internal convenience \
flag.<br><br>&gt; I don't want to create a new device, I just want to re-use the \
existing<br>&gt; one that's there and keep its data.<br>&gt; <br>&gt; &gt; As your \
server seems to be rather not-so-busy, IO wise,<br>&gt; &gt; I don't think this will \
even be noticable.<br>&gt; <br>&gt; Are there other parameters that I should be \
looking at?<br><br>If this is about DRBD tuning,<br>well, yes, there are many things \
to consider.<br>If there were just one optimal set of values,<br>those would be \
hardcoded, and not tunables.<br><br>&gt; Sync-rates perhaps?<br><br>Did you have \
resync going on during your "interesting" times?<br>If not, why bother, at this time, \
for this issue.<br>If yes, why would you always resync at noon and \
midnight?<br><br>&gt; Once again, the literature suggests this should be higher if \
the writes<br>&gt; are small and "scattered" in nature, which given we're logging \
data from<br>&gt; numerous sources, I'd expect to be the case.<br><br>Sync rate is \
not relevant at all here.<br>Those parameters control the background \
resynchronization<br>after connection loss and re-establishment.<br>As I understand, \
your DRBD is healthy, connected,<br>and happily replicating. No resync.<br><br>&gt; \
Thus following the documentation's recommendations (and not being an<br>&gt; expert \
myself) I figured I'd try carefully adjusting that figure to<br>&gt; something more \
appropriate.<br><br>Sure, careful is good.<br>Test system is even better \
;-)<br><br>If you really want to improve on random write latency with DRBD,<br>you \
need to upgrade to 8.4. (8.4.5 will be released within days).<br><br>I guess that \
upgrade is too scary for such a system?<br><br>Also, you could use auditctl to find \
out in detail what is happenening<br>on your system. You likely want to play with \
that on a test system first<br>as well, until you get the event filters right,<br>or \
you could end up spamming your production systems logs.<br><br>-- <br>: Lars \
Ellenberg<br>: LINBIT | Your Way to High Availability<br>: DRBD/HA support and \
consulting http://www.linbit.com<br><br>DRBDŽ and LINBITŽ are registered trademarks \
of LINBIT, Austria.<br>__<br>please don't Cc me, but send to list&nbsp;&nbsp; \
--&nbsp;&nbsp; I'm subscribed<br>_______________________________________________<br>drbd-user \
mailing list<br>drbd-user@lists.linbit.com<br>http://lists.linbit.com/mailman/listinfo/drbd-user<br></body></html>

_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

[prev in list] [next in list] [prev in thread] [next in thread]