[prev in list] [next in list] [prev in thread] [next in thread] 

List:       npaci-rocks-discussion
Subject:    Re: [Rocks-Discuss] Ganglia Problem w/ new switch
From:       Jim Kusznir <jkusznir () gmail ! com>
Date:       2011-07-28 20:34:49
Message-ID: CAA3eeYCUQtq9i2keS1JFefEsg-+id0nbaD_seOX3Y8SGzGEdOw () mail ! gmail ! com
[Download RAW message or body]

Interestingly enough, IGMP snooping was on, and when I turned it off,
I didn't notice anything.  I did find that when I removed an uplink to
another switch of mine (which I recall has IGMP snooping turned on),
then ganglia lost ALL hosts immediately, and they never came back...

So, I went the other way and turned all the IGMP snooping (and a few
other generic IGMP) options on, and after a few moments, everything
came back.  I can't say I understand what the different options did,
nor which one(s) specifically fixed the problem, but I am now
operational.

Thanks!

--Jim

On Thu, Jul 28, 2011 at 10:23 AM, Philip Papadopoulos
<philip.papadopoulos@gmail.com> wrote:
> Ganglia uses multicast to send up dates.  You may have to turn igmp snooping
> off on your switches, especially if you see up/down/up/down .....
> 
> -P
> 
> 
> On Thu, Jul 28, 2011 at 9:15 AM, Jim Kusznir <jkusznir@gmail.com> wrote:
> 
> > Hi all:
> > 
> > I just performed a major overhaul of my network infastructure on our
> > cluster in preparation for some upgrades/expansions.  After the dust
> > settled from this, I ended up with one very strange bug, and I'm not
> > exactly sure what to look to as potential causes.  If you take a look
> > at our ganglia page, you should see what's up:
> > 
> > https://aeolus.wsu.edu/ganglia
> > 
> > On here, you'll notice that some percentage of our nodes are always
> > coming and going.  Pinging to those nodes appears nice and stable, but
> > ganglia doesn't see it that way.  As a side note, after I bought the
> > nodes online, I had some switch trauma that ended up forcing me to
> > reboot the switch.  When it came back up, ganglia was showing all
> > nodes as down.  When I reset gmetad on the head node, nodes started
> > re-appearing, but in their present state of up/down randomness.  I'm
> > still 95% this is a switch problem, but I don't know what to look for
> > on the switch.  Suggestions?
> > 
> > Thanks!
> > -Jim
> > 
> > 
> 
> 
> --
> Philip Papadopoulos, PhD
> University of California, San Diego
> 858-822-3628 (Ofc)
> 619-331-2990 (Fax)
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20110728/186c78dc/attachment.html
>  


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic