[prev in list] [next in list] [prev in thread] [next in thread] 

List:       rhq-devel
Subject:    Aggregate alerting feature - Bug 1019472
From:       Elias Ross <genman () noderunner ! net>
Date:       2013-12-05 18:49:50
Message-ID: CAKsEmEMa7kiFK10J15YXaBb5mtC9TG1qewCX5U_PLfaZsEPHng () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hi, I know you've seen a lot of patches from me. I've been doing some major
work on getting our fairly large RHQ 4.9 installation working.

Speaking of fairly large, one feature that RHQ lacks is alerting based on
group average metrics. I added a server plugin feature that can alert on
this data.

One obvious use case: You have a group of 10 hosts serving web traffic. If
the amount of traffic drops during a 10 minute period from the previous
day, say by 30%, you would like to be alerted. Of course during the day
itself there are traffic fluctuations by more than 30%, so comparisons
against an absolute baseline isn't particularly useful. It also isn't
useful to alert per host, as some hosts may be receiving no traffic or
increased traffic, depending on load balance configurations, server
upgrades, or whatever. Or having to manage 10 alerts (or more) isn't easy.

The way the plugin works (and you can see the documentation here:
https://bugzilla.redhat.com/show_bug.cgi?id=1019472 ) is every 5 minutes,
there are calculations made using the most recent data, and yesterday's
data. (There are other rules, such as comparing last week's data, or
absolute value comparisons, or availability checks like the percentage of
up servers.) Then when an alert occurs, an alert definition is created at
the resource group level (since RHQ has no UI support for this), and an
alert is sent.

What I have is fully functional, but required some patches to RHQ core to
support alert notifications for resource groups.

I don't know if this feature is in demand, but it would be nice to see the
following introduced, in order:
0) The feature included as part of the server plugin suite, but perhaps
disabled by default.
1) Patches included to fix alerting for resource groups. (Patches are part
of the bug.)
2) UI support for listing and clearing alerts for resource groups. (Partial
patches in the bug, should be easy to fix.)
3) Alert definitions created through the UI, not through resource tags.
This requires expanding the types of alert definitions RHQ formally
supports.
4) An API for adding alert definitions through the command line interface.
(Because adding a tag to a resource group is supported through the CLI.)

Thoughts?

[Attachment #5 (text/html)]

<div dir="ltr"><br><div>Hi, I know you&#39;ve seen a lot of patches from me. I&#39;ve \
been doing some major work on getting our fairly large RHQ 4.9 installation \
working.</div><div><br></div><div>Speaking of fairly large, one feature that RHQ \
lacks is alerting based on group average metrics. I added a server plugin feature \
that can alert on this data.</div> <div><br></div><div>One obvious use case: You have \
a group of 10 hosts serving web traffic. If the amount of traffic drops during a 10 \
minute period from the previous day, say by 30%, you would like to be alerted. Of \
course during the day itself there are traffic fluctuations by more than 30%, so \
comparisons against an absolute baseline isn&#39;t particularly useful. It also \
isn&#39;t useful to alert per host, as some hosts may be receiving no traffic or \
increased traffic, depending on load balance configurations, server upgrades, or \
whatever. Or having to manage 10 alerts (or more) isn&#39;t easy.</div> \
<div><br></div><div>The way the plugin works (and you can see the documentation here: \
<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1019472">https://bugzilla.redhat.com/show_bug.cgi?id=1019472</a> \
) is every 5 minutes, there are calculations made using the most recent data, and \
yesterday&#39;s data. (There are other rules, such as comparing last week&#39;s data, \
or absolute value comparisons, or availability checks like the percentage of up \
servers.) Then when an alert occurs, an alert definition is created at the resource \
group level (since RHQ has no UI support for this), and an alert is sent. </div> \
<div><br></div><div>What I have is fully functional, but required some patches to RHQ \
core to support alert notifications for resource groups.</div><div><br></div><div>I \
don&#39;t know if this feature is in demand, but it would be nice to see the \
following introduced, in order:</div> <div>0) The feature included as part of the \
server plugin suite, but perhaps disabled by default.</div><div>1) Patches included \
to fix alerting for resource groups. (Patches are part of the bug.)</div><div>2) UI \
support for listing and clearing alerts for resource groups. (Partial patches in the \
bug, should be easy to fix.)</div> <div>3) Alert definitions created through the UI, \
not through resource tags. This requires expanding the types of alert definitions RHQ \
formally supports.</div><div>4) An API for adding alert definitions through the \
command line interface. (Because adding a tag to a resource group is supported \
through the CLI.)</div> <div><br></div><div>Thoughts?</div><div><br></div></div>


[Attachment #6 (text/plain)]

_______________________________________________
rhq-devel mailing list
rhq-devel@lists.fedorahosted.org
https://lists.fedorahosted.org/mailman/listinfo/rhq-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic