[prev in list] [next in list] [prev in thread] [next in thread]
List: ganglia-general
Subject: Re: [Ganglia-general] Ganglia 3.6.1 and CentOS 6.5
From: Jared David Baker <Jared.Baker () uwyo ! edu>
Date: 2015-02-21 21:01:14
Message-ID: CY1PR0501MB125933842F0F457B8283808C9D2B0 () CY1PR0501MB1259 ! namprd05 ! prod ! outlook ! com
[Download RAW message or body]
Sergio,
Thanks for the suggestion. I wrangled one of our network administrators and did some \
further debugging. However, before that, I went and physically removed the network \
switch (Juniper EX series) to see if it was an OS problem by just connecting two \
machines via cat6. Turns out, once I removed the switch and using `omping`, multicast \
was able to stay alive. After that, I went and talked with the network admin more and \
he looked into the switch configuration. Apparently multicast was on to be routed \
between different VLANs and thus after so long, would dump the packets for some \
reason. Turning multicast off and letting the switch assume it is just broadcast \
traffic completely fixed the issue. Perhaps someone will find this useful down the \
road.
Regards,
Jared
From: Sergio Ballestrero [mailto:sergio.ballestrero@gmail.com]
Sent: Thursday, February 19, 2015 12:58 AM
To: Jared David Baker
Cc: ganglia-general@lists.sourceforge.net
Subject: Re: [Ganglia-general] Ganglia 3.6.1 and CentOS 6.5
Hello Jared,
yes, most likely this is because of multicasting.
Unless you really want to use multiple gmond as collectors, it's simpler and more \
robust to use unicast to the gmond on the host which runs gmetad. Otherwise, to debug \
multicast the first thing would be to tcpdump on the host running gmetad, to see that \
you are actually receiving multicast there.
Cheers,
Sergio
On 19 Feb 2015, at 04:16, Jared David Baker \
<Jared.Baker@uwyo.edu<mailto:Jared.Baker@uwyo.edu>> wrote:
I posted a while back, left for a bit and am now coming back. I'm attempting to get \
Ganglia working on a cluster and have followed the instructions fairly closely (no \
radical changes). The issue that I'm having is that nothing is aggregating to gmetad. \
There are times when we see compute nodes for ~2 minutes, then they disappear and \
doesn't come back until I restart gmond (then another 2 minute cycle, etc.). I think \
it may perhaps be related to multicasting, but not quite sure. Here is some basic \
output:
[root@l1 ~]# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ib0
224.0.0.0 0.0.0.0 240.0.0.0 U 0 0 0 eth0
0.0.0.0 192.168.1.250 0.0.0.0 UG 0 0 0 eth0
[root@l1 ~]# netstat -gn
IPv6/IPv4 Group Memberships
Interface RefCnt Group
--------------- ------ ---------------------
lo 1 224.0.0.1
eth0 1 239.2.11.71
eth0 1 224.0.0.1
ib0 1 224.0.0.1
lo 1 ff02::1
eth0 1 ff02::202
eth0 1 ff02::1:ff11:fb15
eth0 1 ff02::1
eth1 1 ff02::1
ib0 1 ff02::1:ff3d:1
ib0 1 ff02::1
[root@l1 ~]# ip link show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen \
1000 link/ether 40:16:7e:11:fb:15 brd ff:ff:ff:ff:ff:ff
So I can see the multicast route is there, multicast is enabled on the interface. If \
I telnet to the client host from the gmetad server, I get the XML data. The switches \
in the cluster are configured to support multicast (as I'm told by our networking \
team). However, the web server still claims that all nodes are down, even though I \
can see gmond clearly running by checking the process and querying the node via \
telnet. I haven't seen anything relevant in the log file or when debugging gmetad in \
the foreground. Any help would be greatly appreciated.
Thanks!
--
Jared
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net<mailto:Ganglia-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/ganglia-general
[Attachment #3 (text/html)]
<html xmlns:v="urn:schemas-microsoft-com:vml" \
xmlns:o="urn:schemas-microsoft-com:office:office" \
xmlns:w="urn:schemas-microsoft-com:office:word" \
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" \
xmlns="http://www.w3.org/TR/REC-html40"> <head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:Consolas;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:Consolas;color:#1F497D">Sergio, \
<o:p></o:p></span></p> <p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:Consolas;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:Consolas;color:#1F497D">Thanks for the \
suggestion. I wrangled one of our network administrators and did some further \
debugging. However, before that, I went and physically removed the network switch \
(Juniper EX series) to see if it was an OS problem by just connecting two machines \
via cat6. Turns out, once I removed the switch and using `omping`, multicast was able \
to stay alive. After that, I went and talked with the network admin more and he \
looked into the switch configuration. Apparently multicast was on to be routed \
between different VLANs and thus after so long, would dump the packets for some \
reason. Turning multicast off and letting the switch assume it is just broadcast \
traffic completely fixed the issue. Perhaps someone will find this useful down the \
road.<o:p></o:p></span></p> <p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:Consolas;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:Consolas;color:#1F497D">Regards,<o:p></o:p></span></p>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:Consolas;color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:Consolas;color:#1F497D">Jared<o:p></o:p></span></p>
<p class="MsoNormal"><span \
style="font-size:11.0pt;font-family:Consolas;color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span \
style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span \
style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> \
Sergio Ballestrero [mailto:sergio.ballestrero@gmail.com] <br>
<b>Sent:</b> Thursday, February 19, 2015 12:58 AM<br>
<b>To:</b> Jared David Baker<br>
<b>Cc:</b> ganglia-general@lists.sourceforge.net<br>
<b>Subject:</b> Re: [Ganglia-general] Ganglia 3.6.1 and CentOS \
6.5<o:p></o:p></span></p> </div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Hello Jared,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">yes, most likely this is because of \
multicasting. <o:p></o:p></p> </div>
<div>
<p class="MsoNormal">Unless you really want to use multiple gmond as collectors, it's \
simpler and more robust to use unicast to the gmond on the host which runs \
gmetad.<o:p></o:p></p> </div>
<div>
<p class="MsoNormal">Otherwise, to debug multicast the first thing would be to \
tcpdump on the host running gmetad, to see that you are actually receiving multicast \
there.<o:p></o:p></p> </div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Cheers,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> Sergio<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On 19 Feb 2015, at 04:16, Jared David Baker <<a \
href="mailto:Jared.Baker@uwyo.edu">Jared.Baker@uwyo.edu</a>> wrote:<o:p></o:p></p> \
</div> <p class="MsoNormal"><br>
<br>
<o:p></o:p></p>
<div>
<div>
<div>
<div>
<p class="MsoNormal"><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif"">I \
posted a while back, left for a bit and am now coming back. I’m attempting to \
get Ganglia working on a cluster and have followed the instructions fairly closely \
(no radical changes). The issue that I’m having is that nothing is aggregating \
to gmetad. There are times when we see compute nodes for ~2 minutes, then they \
disappear and doesn’t come back until I restart gmond (then another 2 minute \
cycle, etc.). I think it may perhaps be related to multicasting, but not quite sure. \
Here is some basic output:<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p> </o:p></span></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">[root@l1 ~]# \
netstat -rn</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">Kernel IP \
routing table</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">Destination \
Gateway Genmask \
Flags MSS Window irtt Iface</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">192.168.3.0 \
0.0.0.0 255.255.255.0 U \
0 0 0 ib0</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">192.168.1.0 \
0.0.0.0 255.255.255.0 U \
0 0 0 eth0</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">169.254.0.0 \
0.0.0.0 255.255.0.0 U \
0 0 0 eth0</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">169.254.0.0 \
0.0.0.0 255.255.0.0 U \
0 0 0 ib0</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">224.0.0.0 \
0.0.0.0 240.0.0.0 \
U 0 0 0 \
eth0</span><span style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">0.0.0.0 \
192.168.1.250 0.0.0.0 \
UG 0 0 0 \
eth0</span><span style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">[root@l1 ~]# \
netstat -gn</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">IPv6/IPv4 \
Group Memberships</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">Interface \
RefCnt Group</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:10.5pt;font-family:Consolas">--------------- ------ \
---------------------</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">lo \
1 224.0.0.1</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">eth0 \
1 239.2.11.71</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">eth0 \
1 224.0.0.1</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">ib0 \
1 224.0.0.1</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">lo \
1 ff02::1</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">eth0 \
1 ff02::202</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">eth0 \
1 ff02::1:ff11:fb15</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">eth0 \
1 ff02::1</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">eth1 \
1 ff02::1</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">ib0 \
1 ff02::1:ff3d:1</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">ib0 \
1 ff02::1</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">[root@l1 ~]# \
ip link show eth0</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">2: eth0: \
<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen \
1000</span><span style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas"> \
link/ether 40:16:7e:11:fb:15 brd ff:ff:ff:ff:ff:ff</span><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif"">So I \
can see the multicast route is there, multicast is enabled on the interface. If I \
telnet to the client host from the gmetad server, I get the XML data. The switches \
in the cluster are configured to support multicast (as I’m told by our \
networking team). However, the web server still claims that all nodes are down, even \
though I can see gmond clearly running by checking the process and querying the node \
via telnet. I haven’t seen anything relevant in the log file or when debugging \
gmetad in the foreground. Any help would be greatly \
appreciated.<o:p></o:p></span></p> </div>
<div>
<p class="MsoNormal"><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif"">Thanks!<o:p></o:p></span></p>
</div>
<div>
<div>
<div>
<p class="MsoNormal"><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif"">-- <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><b><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif"">Jared </span></b><span \
style="font-size:10.5pt;font-family:"Calibri","sans-serif""><o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal">------------------------------------------------------------------------------<br>
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server<br>
from Actuate! Instantly Supercharge Your Business Reports and Dashboards<br>
with Interactivity, Sharing, Native Excel Exports, App Integration & more<br>
Get technology previously reserved for billion-dollar corporations, FREE<br>
<a href="http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clk \
trk_______________________________________________">http://pubads.g.doubleclick.net/ga \
mpad/clk?id=190641631&iu=/4140/ostg.clktrk_______________________________________________</a><br>
Ganglia-general mailing list<br>
<a href="mailto:Ganglia-general@lists.sourceforge.net">Ganglia-general@lists.sourceforge.net</a><br>
<a href="https://lists.sourceforge.net/lists/listinfo/ganglia-general">https://lists.sourceforge.net/lists/listinfo/ganglia-general</a><o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>
[Attachment #4 (--===============2037042782775186113==)]
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic