'Re: [opennms-discuss] Antw: Fw: 30 seconad outage issue'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opennms-discuss
Subject:    Re: [opennms-discuss] Antw:  Fw: 30 seconad outage issue
From:       "Michael Seibold" <Michael.Seibold () barmer-gek ! de>
Date:       2010-05-27 8:29:33
Message-ID: 4BFE498D.8752.00AD.0 () barmer-gek ! de
[Download RAW message or body]

Sorry, but I already wrote what I would do. The configuration you are
running now doesn't make any sense in my opinion. I'm still not
convinced that those are FALSE outages. Maybe your network has some
problems you don't know, but that's why we all use network management
tools to find them.

- Michael


> > > Anup Bhavsar <anupbhavasar@yahoo.com> 27.05.2010 08:27 >>>
Hello Michael,
 
Thank you for your response and suggestions.
Actually I increased the interval and retry valuse after reading some
article posted on forum and was observing system's behavior.
Here intension was to supress those false outages which are generated
for 30 sec. My understanding was if I increase interval, then system
will unable to report 30 sec outage as SNMP poll will occur in every 30
second.

Regarding notification in /opt/opennms/etc/destinationPaths.xml, I
changed it to 1 min and don't want to set it more that 1 min as NOMS
system is monitoring other services which are in production environment
and team should get alert call for other services.
Through GUI option, I have superssed the Notification aleart calls for
these FALSE outages. 
 
Now my concern is, I am not getting any alert call for these outages
but I want to stop these 30 second outages which system is generating.

Is there any way, NOMS system shouldn't generate 30 second false outage
messages. 
I would appreciate if you could help me to fix this problem.
Thanks & Regards,
Anup Bhavsar
+91 99869 13412

--- On Wed, 5/26/10, Michael Seibold <Michael.Seibold@barmer-gek.de>
wrote:


From: Michael Seibold <Michael.Seibold@barmer-gek.de>
Subject: [opennms-discuss] Antw: Fw: 30 seconad outage issue
To: opennms-discuss@lists.sourceforge.net 
Date: Wednesday, May 26, 2010, 10:54 PM


Hi Anup,

while I didn't check all of your supplied config one thing I found:

         <service name="<Service Name>" interval="420000"
user-defined="false" status="on"> 
         <parameter key="retry" value="40"/> 
         <parameter key="timeout" value="60000"/> 

Here you define to wait 60 seconds for an answer to the service. If
the
service does not respond within those 60 seconds, you want to try it
40
more times, so in total you will have to wait for over 40 minutes
before
a service is considered to be down. I'm not sure if this is what you
want...

And I don't know how opennms can handle this. Other network management
applications I saw starting new threads at the definde service
interval
(7 minutes in your config). So if a service is really down, after a
while there will be a lot of parallel threads polling this service.
This
will result in high load on your server and unpredictable results.
Probably OpenNMS can handle with this, but there is no sense
configuring
it this way.

I use following config for ICMP:

         <service name="ICMP" interval="60000" user-defined="false"
status="on"> 
         <parameter key="retry" value="2"/> 
         <parameter key="timeout" value="3000"/> 

So 
- poll every 60 seconds
- wait 3 seconds for timeouts
- retry 2 times if polling fails

This will create outages if the ICMP ping fails for at least 9 seconds
(at least I hope so... not sure if 2 retries are 2 RETRIES or 2
TRIES).
With this config we see in many cases even when a remote line switches
over to a backup line.


Next I configured notification in
/opt/opennms/etc/destinationPaths.xml

  
<path name="OnCall" initial-delay="3m"> 

to avoid getting notification if the failure vanishes within this time
(b.e. backup line was coming up).

If, with this config, we see a lot of failures there ARE problems that
must be solved somewhere. It might be a device (opnnms server
operating
system, firewall, router, ...) that drops icmp pings due to overload,
intermitting routing problems, problems with device buffers (mostly
routers or firewall) overrun by example because of servers using tcp
window scaling and sending 2 GB of data in one TCP window, broadcast
problems, ...

- Michael





> > > Anup Bhavsar <anupbhavasar@yahoo.com> 26.05.2010 12:24 >>>

Hello,

Can some one please advise on "30 second outages" which are false
outages getting generated by NOMS system? I tried with couple of
options
but it didn't help me to stop false outages. 

Thanks in advanced.

Thanks & Regards,
Anup 


--- On Mon, 5/17/10, Anup Bhavsar <anupbhavasar@yahoo.com> wrote:


From: Anup Bhavsar <anupbhavasar@yahoo.com>
Subject: 30 seconad outage issue
To: opennms-discuss@lists.sourceforge.net 
Cc: "Anup" <anupbhavasar@yahoo.com>
Date: Monday, May 17, 2010, 5:59 PM







Hello,
  
I need your assistance to overcome on problem *30 seconds outage*.
I have pasted below for what and where I modified parameters. After
changed parameter into configuration file I made sure that I have
restarted opennms services. 
  
Below is OpenNMS installed Version,   
  
Installed version of OpenNMS is 
jicmp.i386 1.0.7-1 
jdk.i586 2000:1.5.0_18-fcs 
opennms-core.noarch 1.6.5-1 
opennms-webapp-jetty.noarch 1.6.5-1 
opennms.noarch 1.6.5-1 
  
Below is Linux OS Version, 
  
# uname *a 
# Linux 2.6.9-67.ELsmp #1 SMP i686 i686 i386 GNU/Linux 
  
# cat /etc/redhat-release 
# Red Hat Enterprise Linux ES release 4 (Nahant Update 6) 
  
  
Referring to below URL, made changes at parameters like interval,
retry
and timeout in poller-configuration.xml files and restarted services.
But system unable to stop 30 sec outages. 
  
http://www.opennms.org/wiki/30_second_outage 
  
  
Below are the changes which I modified but still I am facing problem. 
  
vi /opt/opennms/etc/destinationPaths.xml 
  
<path name="OnCall" initial-delay="1m"> 
        <target interval="1m"> 
            <name xmlns="">OnCall</name> 
            <autoNotify xmlns="">on</autoNotify> 
            <command xmlns="">phoneCall</command> 
        </target> 
        <escalate delay="15m"> 
            <target interval="15m"> 
  
  
vi /opt/opennms/etc/poller-configuration.xml 
  
[Modified timeout to 60ms] 
  
<service name="SERVICE NAME" interval="420000" user-defined="false"
status="on"> 
            <parameter key="retry" value="10"/> 
            <parameter key="timeout" value="60000"/> 
            <parameter key="service-name" value="name"/> 
        </service> 
        <service name="SERVICE NAME" interval="420000"
user-defined="false" status="on"> 
            <parameter key="retry" value="10"/> 
            <parameter key="timeout" value="60000"/> 
            <parameter key="service-name" value="name"/> 
        </service> 
        <service name="SERVICE NAME" interval="420000"
user-defined="false" status="on"> 
            <parameter key="retry" value="10"/> 
            <parameter key="timeout" value="60000"/> 
            <parameter key="service-name" value="name"/> 
        </service> 
  
  
Referring to web link [http://www.opennms.org/wiki/Path_Outage_How-To]
, increase the retry count from 10 to 40. 
  
</service> 
        <service name="<Service Name>" interval="420000"
user-defined="false" status="on"> 
            <parameter key="retry" value="40"/> 
            <parameter key="timeout" value="60000"/> 
            <parameter key="service-name" value="name"/> 
        </service> 
</service> 
        <service name="<Service Name>" interval="420000" 
            user-defined="false" status="on"> 
            <parameter key="retry" value="40"/> 
            <parameter key="timeout" value="60000"/> 
            <parameter key="service-name" value="name"/> 
        </service> 
  
Also, 
  
with reference to web link -
http://www.opennms.org/wiki/Polling_Configuration_How-To#The_Poller_Configuration_File_Header


modified paramater for "serviceUnresponsiveEnabled" 
in opennms/etc/poller-configuration.xml file and set it "true" and
restarted the opennms services. 
  
  
Checked the interface setting at server, looks OK. 
  
        Settings for eth1: 
        Supported ports: [ TP ] 
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Supports auto-negotiation: Yes 
        Advertised link modes:  10baseT/Half 10baseT/Full 
                               100baseT/Half 100baseT/Full 
                                1000baseT/Full 
        Advertised auto-negotiation: Yes 
        Speed: 100Mb/s 
        Duplex: Full 
        Port: Twisted Pair 
        PHYAD: 1 
        Transceiver: internal 
        Auto-negotiation: on 
        Supports Wake-on: g 
        Wake-on: d 
        Link detected: yes 
  
Can someone please advise in order t stop the escalation of 30 seconds
outages? 

Thanks in advanced.

Regards,
Anup Bhavsar




      

*) zu 2,9 Cent pro Minute aus dem Festnetz der Deutschen Telekom,
Mobilfunkgebühren können abweichen.
**) zum günstigen Tarif Ihres Vertragspartners

Ihr starker Gesundheitspartner * die neue BARMER GEK.
Gemeinsam. Noch besser! www.barmer-gek.de 

Diese Nachricht der BARMER GEK kann vertrauliche firmeninterne
Informationen enthalten. Sofern Sie nicht der beabsichtigte Empfänger
sind, bitten wir Sie, den Absender zu informieren und die Nachricht
sowie deren Anhänge zu löschen. Unzulässige Veröffentlichung,
Verwendung, Verbreitung, Weiterleitung und das Kopieren dieser Mail und
ihrer verknüpften Anhänge sind nicht gestattet.

------------------------------------------------------------------------------

_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ 

opennms-discuss mailing list

To *unsubscribe* or change your subscription options, see the bottom of
this page:
https://lists.sourceforge.net/lists/listinfo/opennms-discuss 


      

*) zu 2,9 Cent pro Minute aus dem Festnetz der Deutschen Telekom, Mobilfunkgebühren \
                können abweichen.
**) zum günstigen Tarif Ihres Vertragspartners

Ihr starker Gesundheitspartner – die neue BARMER GEK.
Gemeinsam. Noch besser! www.barmer-gek.de

Diese Nachricht der BARMER GEK kann vertrauliche firmeninterne Informationen \
enthalten. Sofern Sie nicht der beabsichtigte Empfänger sind, bitten wir Sie, den \
Absender zu informieren und die Nachricht sowie deren Anhänge zu löschen. \
Unzulässige Veröffentlichung, Verwendung, Verbreitung, Weiterleitung und das \
Kopieren dieser Mail und ihrer verknüpften Anhänge sind nicht gestattet.

------------------------------------------------------------------------------

_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-discuss mailing list

To *unsubscribe* or change your subscription options, see the bottom of this page:
https://lists.sourceforge.net/lists/listinfo/opennms-discuss


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic