[prev in list] [next in list] [prev in thread] [next in thread] 

List:       flume-user
Subject:    Re: /metrics
From:       Ashish <paliwalashish () gmail ! com>
Date:       2015-07-23 23:39:14
Message-ID: CA++geaDXLZtXKw-Kh=FJiNZKkEwVdPC5ZFG=gETJSWocz7eTGQ () mail ! gmail ! com
[Download RAW message or body]

Depends on what do you want to see as part of health check. AFAIK,
metrics would be the only thing available at Agent level that does not
depend on type of source used. I would see this more as a Application
level health check, perhaps can look at Sematext SPM for more ideas
(http://sematext.com/spm/)

On Thu, Jul 23, 2015 at 12:15 PM, George Blazer <gblazer@gmail.com> wrote:
> Is it even the right strategy to poll /metrics as a healthcheck? Are there
> better alternative sources
>
> On Thursday, July 23, 2015, iain wright <iainwrig@gmail.com> wrote:
>>
>> GC is a good idea. Was also thinking maybe there is a config management
>> tool in your environment changing the modified time of the flume.properties
>> file, causing flume to re-initialize, which takes the metrics down for a few
>> seconds depending on startup time. That seems like a stretch though. I would
>> definitely throw JMX monitoring on it to monitor JVM (or use the GC logs),
>> and watch flume logs during the time the problem exists.
>>
>> Also ssh and try polling localhost:port/metrics at the time your
>> monitoring system is unable to poll it.
>>
>> Anytime ive seen this in our enviornment its been OOM or re-intializing
>>
>>
>> On Jul 23, 2015 9:09 AM, "Ashish" <paliwalashish@gmail.com> wrote:
>>>
>>> I think the Flume Agent is up, since the issue is intermittent.
>>> Whenever the issue is happening check the Flume Agent which you are
>>> polling i.e. it's up and running and processing messages. If you
>>> already have GC logs enabled, check if GC could be causing the freeze.
>>> Nothing else comes is striking as of now, assuming the network is
>>> good.
>>>
>>> On Thu, Jul 23, 2015 at 12:09 AM, George Blazer <gblazer@gmail.com>
>>> wrote:
>>> > We poll metrics once a minute. It's pretty intermittent
>>> >
>>> > On Wednesday, July 22, 2015, iain wright <iainwrig@gmail.com> wrote:
>>> >>
>>> >> How often do you poll the metrics?
>>> >> Have you checked flume logs?
>>> >> Is flume starting up fine , then at some point not responding on
>>> >> metrics,
>>> >> then you do something to bring it back up?
>>> >> Or is it intermitently not responsive but fixes itself?
>>> >>
>>> >> On Jul 22, 2015 5:49 PM, "George Blazer" <gblazer@gmail.com> wrote:
>>> >>>
>>> >>> I use :5653/metrics endpoint as my Flume healthcheck, but very often
>>> >>> the
>>> >>> healthcheck refuses connection, i.e. the server doesn't run.
>>> >>>
>>> >>> Is there anything I could look at?
>>> >>>
>>> >>> I'm using Flume 1.5.
>>> >>>
>>> >>> Thanks.
>>>
>>>
>>>
>>> --
>>> thanks
>>> ashish
>>>
>>> Blog: http://www.ashishpaliwal.com/blog
>>> My Photo Galleries: http://www.pbase.com/ashishpaliwal



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic