'Re: [opennms-install] Antw: Performance issues and false outages?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opennms-install
Subject:    Re: [opennms-install] Antw:  Performance issues and false outages?
From:       "Roskens, Ronald" <roskens () BIWORLDWIDE ! com>
Date:       2009-08-17 14:40:57
Message-ID: BBB15F75F55BF743943A181FFEF8FBF203DFAB01 () EXCHANGE2 ! biperf ! com
[Download RAW message or body]

Something else to check on could be the JVM settings.

I was having a similar problem with our OpenNMS server and
datacollection failures, and this is what I have done: (not using
storebygroup)

1 - opennms.conf:

JAVA_HEAP_SIZE072
VERBOSE_GC=1
ADDITIONAL_MANAGER_OPTIONS="-d64 -XX:+UnlockDiagnosticVMOptions
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution"
ADDITIONAL_MANAGER_OPTIONS="${ADDITIONAL_MANAGER_OPTIONS}
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ParallelGCThreads 
-XX:SurvivorRatio=8 -XX:TargetSurvivorRatio
-XX:MaxTenuringThreshold1"

Output.log gets a little big because of the GC logging options. I've
been pretty happy with the new settings (CMS GC) since the JVM seems to
spend a whole lot less time doing full GC's than without it. It got so
bad at one point prior to this change, it was doing full GC's every 30
seconds. CMS seems to use CPU more often for doing GC sweeps, but it
doesn't seem to have the "pauses" that the other GC did.

We also had upgraded the hardware for this system from a Sun 480 (2 cpu
1200mhz) to a Sun 490 (4 cpu 1500mhz), and that was probably the bigger
point for reducing the "datacollection failed" warnings. Not sure how
the system would be reacting if we went back to using the default GC for
64bit JVM.

I still am seeing a large number of INFO "Postponing poll for X ...
org.opennms.netmgt.poller.pollables.LockUnavailable" in the
poller.log's, but am unsure if the point to a problem.



-----Original Message-----
From: Michael Seibold [mailto:Michael.Seibold@Gek.de]
Sent: Monday, August 17, 2009 3:23 AM
To: opennms-install@lists.sourceforge.net
Subject: [opennms-install] Antw: Performance issues and false outages?

Hi Kjetil,

I sometimes have opennms running on an test server with MUCH too less
memory and disks, so it will run at 100% CPU all the time and
permanently has a lot of I/O waits. But even with this high load I don't
get ICMP outages or "datacollection failed". So I think you should check
other possibilities first.

The pings and snmp walks you tried were from the opennms server during
the "outages"?

How long are the response times from your devices using cacti during the
"outages"? Maybe there are timeouts in opennms and cacti has higher
threshold values?

Can you monitor the network load / network traffic of your opennms
server during the outages? Maybe there is an overload situation on your
local network interface.

Are there ifInDiscards, ifInErrors or something like this on your local
interface of the opennms server?

Are there CRC-Errors or other errors on the interface of the (probably)
switch your opennms server is attached to?

Try tcpdump or wireshark to see if the icmp pakets are leaving your
server (and are returning) during the "outages".

Is there someone else working on this server who might interfere with
your work (starting firelwall, generating network load, ...)?

Change logging for collectd and poller to DEBUG in log4j.properties and
check the logs if there are a lot of warnings for checking unresponsive
services. Maybe you check a lot of things that are non existent and
loose a lot of time waiting for timeouts.

If you can't find any problem there - try to raise the number of polling
threads in poller-configuration.xml and/or collectd-configuration.xml,
but I don't think that this is a problem with only 42 devices to
monitor. I have 70 threads for the poller and 50 threads in collectd for
2.000 services checked every 60 seconds.

...enough work for you now for a while ;-))

- Michael


> > > Kjetil Roso <kjetil@roso.no> 17.08.2009 01:21 >>>
Hi,

I'm running OpenNMS v1.6.5-1 (stable) on the following server
configuration:
Java Version:    1.5.0_18 Sun Microsystems Inc.
Java Virtual Machine:    1.5.0_18-b02 Sun Microsystems Inc.
Operating System:    Linux 2.6.23.1-42.fc8 (amd64)
Hardware: HP ProLiant G5 3.0GHz XEON-QuadCore with 6 GB RAM

OpenNMS parameters:
JAVA_HEAP_SIZE@96
All PostgreSQL-tuning is according to the wiki:
"Systems_with_lots_of_RAM_and_PostgreSQL_8.2".
All other relevant configuration files are "out of the box".

I'm currently monitoring 42 nodes (5xCisco7606, 30xCisco ME3400G + some
other devices). For the time being, I only monitoring the ICMP service
on the nodes. Every aspect of OpenNMS is working great except for the
two issues below that I need help to resolve.

Occasionally, I get storms of
"uei.opennms.org/nodes/dataCollectionFailed"-events. I cannot see any
reason for this. This has no other practical consequences than
generating a lot of events in the event-log in addition to generating
"holes" in my graphs.
These states of data collection failures typically lasts from 2 to 30
minutes I'm running Cacti on another server polling the same devices
without any errors. Cacti has been running for 2 years without any
mentionable data collecting-errors.

My main problem is ICMP-outages. 3-4 times a day, I get a storm of
ICMP-outages on all nodes. This results in a notification storm telling
"Node down". The outages lasts for approx. 20 minutes. The "funny" thing
is that during the outages, all nodes are working fine. Both Ping and
snmpwalk works fine. There's no indications in the nodes'
syslog-messages either.
TOP shows me that java takes up 25% of both memory and cpu. PostgreSQL
takes up 10% of both cpu and memory (at most).

I have eliminated network problems and all nodes have less than 9%
utilization. I have also grep'ed all daemon log-files for "error" and
"fatal" with no results. All signs points to that these problems are
related to OpenNMS.

Most likely, I have missed a few tweaks and settings on the way, but I
need some tips on where to look.

Any tips and hints are deeply appreciated,

Kjetil
Network Manager



------------------------------------------------------------------------
------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008
30-Day trial. Simplify your report design, integration and deployment -
and focus on what you do best, core application coding. Discover what's
new with Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-install mailing list

To *unsubscribe* or change your subscription options, see the bottom of
this page:
https://lists.sourceforge.net/lists/listinfo/opennms-install

------------------------------------------------------------------------
------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008
30-Day trial. Simplify your report design, integration and deployment -
and focus on what you do best, core application coding. Discover what's
new with Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-install mailing list

To *unsubscribe* or change your subscription options, see the bottom of
this page:
https://lists.sourceforge.net/lists/listinfo/opennms-install

This e-mail message is being sent solely for use by the intended recipient(s) and may \
contain confidential information.  Any unauthorized review, use, disclosure or \
distribution is prohibited.  If you are not the intended recipient, please contact \
the sender by phone or reply by e-mail, delete the original message and destroy all \
copies. Thank you.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Please read the OpenNMS Mailing List FAQ:
http://www.opennms.org/index.php/Mailing_List_FAQ

opennms-install mailing list

To *unsubscribe* or change your subscription options, see the bottom of this page:
https://lists.sourceforge.net/lists/listinfo/opennms-install


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic