[prev in list] [next in list] [prev in thread] [next in thread] 

List:       nagios-users
Subject:    Re: [Nagios-users] weirdness in the scheduling of host checks
From:       "Frater, Greg J" <GJFRATER () bechtel ! com>
Date:       2009-06-29 18:30:47
Message-ID: 872CB0AEB377C240A112DD7C10B2592904B216C3 () wtps0171 ! amers ! ibechtel ! com
[Download RAW message or body]

--===============8798049297379843502==
Content-class: urn:content-classes:message
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C9F8E7.B898E47C"

This is a multi-part message in MIME format.


I figured out my problem, I had two instances of Nagios running.  That
would explain a lot of the scheduling weirdness, maybe all of it.  :-)


> Is anyone else seeing weird things in the scheduling of checks?  I
don't have a good sense of what is wrong but, it's definitely not the
way it was under Nagios 1.0 (or the way it should  be).  I've been
watching the scheduling queue on our Nagios 3 box for a week or so,
here's a list of what I've seen:

 > Under Nagios 3.0.6: 
  > - host checks staying at the top of the queue for a long time (over
an hour sometimes) even when they have a timeout set at 30 seconds

 > Under Nagios 3.1.6 
 >  - host check showing up unexpectedly in the scheduling queue, this
morning when I looked at the queue the top event was about 15 minutes
behind the current time but things were moving along okay, when I last
checked there was a host check at the top of the queue with a next check
time from 4 days ago.

 >  - We had a host go down yesterday (Sunday) but we did not get
alerted.  When I looked at it in Nagios I noticed the host check was in
an OKAY state and the 'last check' value for it was from 12 days ago
(6/17/2009)!

 >  - Host checks don't seem to be getting stuck in the queue like they
were under 3.0.6, at least not for as long 

 > I'm going to submit a ticket to tracker.nagios.org but would like to
have more empirical evidence of the problem first, all I have so far are
symptoms, no good data points (logs, errors, etc.).  Is anyone else
seeing this type of behavior?

 > Nagios 3.1.2 (also had trouble with 3.0.6) 
 > RHEL 5 64 bit 

 


[Attachment #3 (text/html)]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>weirdness in the scheduling of host checks</TITLE>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.6000.16850" name=GENERATOR></HEAD>
<BODY>
<P><FONT face=Arial><FONT size=2><SPAN class=379232718-29062009><FONT 
color=#0000ff>I figured out my problem, I&nbsp;had two instances of Nagios 
running.&nbsp; That would explain a lot of the scheduling weirdness, maybe all 
of it.&nbsp; :-)&nbsp; &nbsp;</FONT></SPAN></FONT></FONT></P>
<P><FONT><FONT face=Arial><FONT size=2><SPAN 
class=379232718-29062009>&gt;&nbsp;</SPAN>Is anyone else seeing weird things in 
the scheduling of checks?&nbsp; I don't have a good sense of what is wrong but, 
it's definitely not the way it was under Nagios 1.0 (or the way it 
should&nbsp;<SPAN class=379232718-29062009><FONT 
color=#0000ff>&nbsp;</FONT></SPAN>be).&nbsp; I've been watching the scheduling 
queue on our Nagios 3 box for a week or so, here's a list of what I've 
seen:</FONT></FONT></FONT></P>
<P><FONT face=Arial><FONT size=2><SPAN class=379232718-29062009><FONT 
color=#0000ff>&nbsp;&gt;&nbsp;</FONT></SPAN>Under Nagios 
3.0.6:</FONT></FONT>&nbsp;<BR><FONT face=Arial size=2>&nbsp;<SPAN 
class=379232718-29062009><FONT color=#0000ff>&nbsp;&gt;&nbsp;</FONT></SPAN>- 
host checks staying at the top of the queue for a long time (over an hour 
sometimes) even when they have a timeout set at 30 seconds</FONT></P>
<P><FONT face=Arial><FONT size=2><SPAN class=379232718-29062009><FONT 
color=#0000ff>&nbsp;&gt;&nbsp;</FONT></SPAN>Under Nagios 
3.1.6</FONT></FONT>&nbsp;<BR><FONT face=Arial><FONT size=2><SPAN 
class=379232718-29062009><FONT color=#0000ff>&nbsp;<FONT face="Times New Roman" 
color=#000000 size=3>&gt;</FONT>&nbsp;</FONT></SPAN>&nbsp;- host check showing 
up unexpectedly in the scheduling queue, this morning when I looked at the queue 
the top event was about 15 minutes behind the current time but things were 
moving along okay, when I last checked there was a host check at the top of the 
queue with a next check time from 4 days ago.</FONT></FONT></P>
<P><FONT face=Arial><FONT size=2><SPAN class=379232718-29062009><FONT 
color=#0000ff>&nbsp;&gt;&nbsp;</FONT></SPAN>&nbsp;- We had a host go down 
yesterday (Sunday) but we did not get alerted.&nbsp; When I looked at it in 
Nagios I noticed the host check was in an OKAY state and the 'last check' value 
for it was from 12 days ago (6/17/2009)!</FONT></FONT></P>
<P><FONT face=Arial><FONT size=2><SPAN class=379232718-29062009><FONT 
color=#0000ff>&nbsp;&gt;&nbsp;</FONT></SPAN>&nbsp;- Host checks don't seem to be 
getting stuck in the queue like they were under 3.0.6, at least not for as 
long</FONT></FONT> </P>
<P><FONT face=Arial><FONT size=2><SPAN class=379232718-29062009><FONT 
color=#0000ff>&nbsp;&gt;&nbsp;</FONT></SPAN>I'm going to submit a ticket to 
tracker.nagios.org but would like to have more empirical evidence of the problem 
first, all I have so far are symptoms, no good data points (logs, errors, 
etc.).&nbsp; Is anyone else seeing this type of behavior?</FONT></FONT></P>
<P><FONT face=Arial><FONT size=2><SPAN class=379232718-29062009><FONT 
color=#0000ff>&nbsp;&gt;&nbsp;</FONT></SPAN>Nagios 3.1.2 (also had trouble with 
3.0.6)</FONT></FONT>&nbsp;<BR><FONT face=Arial><FONT size=2><SPAN 
class=379232718-29062009><FONT color=#0000ff>&nbsp;<FONT face="Times New Roman" 
color=#000000 size=3>&gt;</FONT>&nbsp;</FONT></SPAN>RHEL 5 64 bit</FONT></FONT> 
</P>
<P>&nbsp;</P></BODY></HTML>

[Attachment #4 (--===============8798049297379843502==)]
------------------------------------------------------------------------------


_______________________________________________
Nagios-users mailing list
Nagios-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic