[prev in list] [next in list] [prev in thread] [next in thread]
List: ganglia-general
Subject: Re: [Ganglia-general] nested grids questions/issues
From: David Lee <david.yi.lee () gmail ! com>
Date: 2012-03-12 5:37:12
Message-ID: CALNZKTJyB+D5simS9X1sduj39LT-HZUMaA0XEvFNfWGYEj_+xQ () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
Same thing for us. Reverting the central gmetad to 3.1.7 (leaving all
nested gmetad at 3.2 - 3.3.1) goes back to the intended behavior (scalable
ON).
DL
On Sun, Feb 26, 2012 at 6:30 PM, Vladimir Vuksan <vlists@veus.hr> wrote:
> I am suspecting support for tagging has something to do with this. I will
> need to look into it.
>
> Vladimir
>
> On Sun, 26 Feb 2012, Ozzie Sabina wrote:
>
> > I'm glad I remembered reading something about this here recently. I
> added myself to Bug 324, but I figured I'd go ahead and echo the issue here
> as well.
> >
> > I'm on CentOS 5, same exact issue with 3.3.1 for only the aggregation
> gmetads: No summaries are produced unless "scalable off" is set, in which
> case the grids are combined into one, which is not what we want.
> >
> > Reverting them to 3.1.7 gets me back to working as designed.
> >
> > My debug output pattern is identical to that reported by Alexander (with
> 3.3.1 vs. 3.1.7).
> >
> > So it seems like 3.3.1 (3.2.0+) is unusable for grid summaries on at
> least several Unices.
> >
> > I suppose I could cross-post to the developer list, but is there anyone
> familiar with this code here that might be able to hand out some clue? I'm
> happy to help in any way I can.
> >
> > Thanks!
> >
> > Oz
> >
> > On Feb 22, 2012, at 9:07 AM, Matthew Nicholson wrote:
> >> Yeah, telnetting to the "remote" gmetad's is just fine. Its as soon as
> >> I upgrade the gmetad talking to those, i stop getting info.
> >>
> >> Bug report time...
> >>
> >> On Wed, Feb 22, 2012 at 5:39 AM, Alexander Karner <AKA@de.ibm.com>
> wrote:
> >>> Hi!
> >>>
> >>> Please check if your remote gmetad's export data by running
> >>> telnet <host> <xml port>
> >>>
> >>> --> I had a similar behaviour as I upgraded my central gmetad to
> 3.2.0. No
> >>> data was collected from the other grids but running the telnet command
> >>> returned a long list of XML data.
> >>> Switching back to 3.1.7 solved the problem.
> >>>
> >>> If your remote systems are able to export data on the XML port, your
> central
> >>> gmetad seems to have the same problem that I had
> >>>
> >>>
> >>>
> >>>
> >>> From: Matthew Nicholson <matthew.a.nicholson@gmail.com>
> >>> To: ganglia-general@lists.sourceforge.net,
> >>> Date: 19.02.2012 19:07
> >>> Subject: [Ganglia-general] nested grids questions/issues
> >>> ________________________________
> >>>
> >>>
> >>>
> >>> So, I recently inherited a large ganglia installation we use to
> >>> monitor out HPC cluster and associated services. Due to our module, we
> >>> have the needs to break our cluster and storage up into smaller
> >>> "clusters" that serve specific purposes, aggregate those into grids ,
> >>> and then those grids into another "master" level grid, though in one
> >>> case there is a 3rd level of grid aggregation.
> >>>
> >>> This is all unicast based, and we (I' sure I'll be told to do other
> >>> wise, but thats not an option currently), run ~55 gmond's and ~
> >>> gmetads on our "ganglia" box. Everything communicates to this on a
> >>> range of unicast ports.
> >>>
> >>> More info on our nesting:
> >>> Master(gmetad) -> 3 other gmetad's -> lots and lots of gmond's
> >>> -> 1 gmetad for storage -> 2 gmetads (lustre +
> >>> nfs) -> lots of gmonds
> >>>
> >>> Thats basically it.
> >>>
> >>> Okay, so this works. It is currently working, but, the gmetad's fall
> >>> over form time to time, and is running ganglia 3.1.4. We would like to
> >>> get everything up to 3.3.0/1, and update our web frontend as well.
> >>>
> >>> I've been updating the gmond's service side without issues, and the
> >>> immediate parent gmetads (that is, gmetad's that only collect from
> >>> gmond's) also without issue.
> >>>
> >>> However, as soon as I restart a gmetad that polls other gmetads (the
> >>> gmetad_storage, for example), I get no summary information at all. The
> >>> only change is I'm starting a different binary in the init script. It
> >>> runs/starts without error, and with debugging, I get:
> >>>
> >>> Going to run as user nobody
> >>> Sources are ...
> >>> Source: [NFS, step 15] has 1 sources
> >>> 127.0.0.1
> >>> Source: [Lustre, step 15] has 1 sources
> >>> 127.0.0.1
> >>> xml listening on port 8657
> >>> interactive xml listening on port 8658
> >>> cleanup thread has been started
> >>> Data thread 1168345408 is monitoring [NFS] data source
> >>> 127.0.0.1
> >>> Data thread 1178835264 is monitoring [Lustre] data source
> >>> 127.0.0.1
> >>>
> >>> Where, as, with the older, 3.1.4 binary:
> >>> Going to run as user nobody
> >>> Sources are ...
> >>> Source: [NFS, step 15] has 1 sources
> >>> 127.0.0.1
> >>> Source: [Lustre, step 15] has 1 sources
> >>> 127.0.0.1
> >>> xml listening on port 8657
> >>> interactive xml listening on port 8658
> >>> Data thread 1170368832 is monitoring [NFS] data source
> >>> 127.0.0.1
> >>> Data thread 1180858688 is monitoring [Lustre] data source
> >>> 127.0.0.1
> >>> cleanup thread has been started
> >>> [NFS] is a 2.5 or later data stream
> >>> hash_create size = 50
> >>> hash->size is 53
> >>> Found a <GRID>, depth is now 1
> >>> Found a </GRID>, depth is now 0
> >>> Writing Summary data for source NFS, metric
> >>> storage_local__nfs_cleanenergy1_size
> >>> Writing Summary data for source NFS, metric disk_free
> >>> Writing Summary data for source NFS, metric
> >>> storage_local__nfs_nobackup2_percent_used
> >>> Writing Summary data for source NFS, metric
> storage_local__itc1_percent_used
> >>> Writing Summary data for source NFS, metric
> storage_local__mnt_emcback7_size
> >>> Writing Summary data for source NFS, metric
> >>> storage_local__nfs_atlascode_size
> >>> Writing Summary data for source NFS, metric bytes_out
> >>> etc etc etc
> >>>
> >>>
> >>> I've been unable to find much on issues like this, no noted changes to
> >>> the way gmetad can read downstream gmetads, and no obvious config
> >>> options in 3.3.0.
> >>>
> >>> Am I missing something?
> >>> I'll happily provide gmetad configs if needed.
> >>>
> >>> --
> >>> Matthew Nicholson
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Virtualization & Cloud Management Using Capacity Planning
> >>> Cloud computing makes use of virtualization - but cloud computing
> >>> also focuses on allowing computing to be delivered as a service.
> >>> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> >>> _______________________________________________
> >>> Ganglia-general mailing list
> >>> Ganglia-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Virtualization & Cloud Management Using Capacity Planning
> >>> Cloud computing makes use of virtualization - but cloud computing
> >>> also focuses on allowing computing to be delivered as a service.
> >>> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> >>> _______________________________________________
> >>> Ganglia-general mailing list
> >>> Ganglia-general@lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> >>>
> >>
> >>
> >>
> >> --
> >> Matthew Nicholson
> >>
> >>
> ------------------------------------------------------------------------------
> >> Virtualization & Cloud Management Using Capacity Planning
> >> Cloud computing makes use of virtualization - but cloud computing
> >> also focuses on allowing computing to be delivered as a service.
> >> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> >> _______________________________________________
> >> Ganglia-general mailing list
> >> Ganglia-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/ganglia-general
> >
> >
> >
> ------------------------------------------------------------------------------
> > Virtualization & Cloud Management Using Capacity Planning
> > Cloud computing makes use of virtualization - but cloud computing
> > also focuses on allowing computing to be delivered as a service.
> > http://www.accelacomm.com/jaw/sfnl/114/51521223/
> > _______________________________________________
> > Ganglia-general mailing list
> > Ganglia-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/ganglia-general
> >
>
>
> ------------------------------------------------------------------------------
> Try before you buy = See our experts in action!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-dev2
> _______________________________________________
> Ganglia-general mailing list
> Ganglia-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ganglia-general
>
[Attachment #5 (text/html)]
Same thing for us. Reverting the central gmetad to 3.1.7 (leaving all nested gmetad \
at 3.2 - 3.3.1) goes back to the intended behavior (scalable \
ON).<div><br></div><div>DL<br><br><div class="gmail_quote">On Sun, Feb 26, 2012 at \
6:30 PM, Vladimir Vuksan <span dir="ltr"><<a \
href="mailto:vlists@veus.hr">vlists@veus.hr</a>></span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">I am suspecting support for tagging has something to do with \
this. I will<br> need to look into it.<br>
<br>
Vladimir<br>
<div><div class="h5"><br>
On Sun, 26 Feb 2012, Ozzie Sabina wrote:<br>
<br>
> I'm glad I remembered reading something about this here recently. I added \
myself to Bug 324, but I figured I'd go ahead and echo the issue here as \
well.<br> ><br>
> I'm on CentOS 5, same exact issue with 3.3.1 for only the aggregation \
gmetads: No summaries are produced unless "scalable off" is set, in which \
case the grids are combined into one, which is not what we want.<br>
><br>
> Reverting them to 3.1.7 gets me back to working as designed.<br>
><br>
> My debug output pattern is identical to that reported by Alexander (with 3.3.1 \
vs. 3.1.7).<br> ><br>
> So it seems like 3.3.1 (3.2.0+) is unusable for grid summaries on at least \
several Unices.<br> ><br>
> I suppose I could cross-post to the developer list, but is there anyone familiar \
with this code here that might be able to hand out some clue? I'm happy to help \
in any way I can.<br> ><br>
> Thanks!<br>
><br>
> Oz<br>
><br>
> On Feb 22, 2012, at 9:07 AM, Matthew Nicholson wrote:<br>
>> Yeah, telnetting to the "remote" gmetad's is just fine. Its as \
soon as<br> >> I upgrade the gmetad talking to those, i stop getting info.<br>
>><br>
>> Bug report time...<br>
>><br>
>> On Wed, Feb 22, 2012 at 5:39 AM, Alexander Karner <<a \
href="mailto:AKA@de.ibm.com">AKA@de.ibm.com</a>> wrote:<br> >>> Hi!<br>
>>><br>
>>> Please check if your remote gmetad's export data by running<br>
>>> telnet <host> <xml port><br>
>>><br>
>>> --> I had a similar behaviour as I upgraded my central gmetad to \
3.2.0. No<br> >>> data was collected from the other grids but running the \
telnet command<br> >>> returned a long list of XML data.<br>
>>> Switching back to 3.1.7 solved the problem.<br>
>>><br>
>>> If your remote systems are able to export data on the XML port, your \
central<br> >>> gmetad seems to have the same problem that I had<br>
>>><br>
>>><br>
>>><br>
>>><br>
>>> From: Matthew Nicholson <<a \
href="mailto:matthew.a.nicholson@gmail.com">matthew.a.nicholson@gmail.com</a>><br> \
>>> To: <a \
href="mailto:ganglia-general@lists.sourceforge.net">ganglia-general@lists.sourceforge.net</a>,<br>
>>> Date: 19.02.2012 19:07<br>
>>> Subject: [Ganglia-general] nested grids questions/issues<br>
>>> ________________________________<br>
>>><br>
>>><br>
>>><br>
>>> So, I recently inherited a large ganglia installation we use to<br>
>>> monitor out HPC cluster and associated services. Due to our module, \
we<br> >>> have the needs to break our cluster and storage up into \
smaller<br> >>> "clusters" that serve specific purposes, aggregate \
those into grids ,<br> >>> and then those grids into another \
"master" level grid, though in one<br> >>> case there is a 3rd \
level of grid aggregation.<br> >>><br>
>>> This is all unicast based, and we (I' sure I'll be told to do \
other<br> >>> wise, but thats not an option currently), run ~55 gmond's \
and ~<br> >>> gmetads on our "ganglia" box. Everything \
communicates to this on a<br> >>> range of unicast ports.<br>
>>><br>
>>> More info on our nesting:<br>
>>> Master(gmetad) -> 3 other gmetad's -> lots and lots of \
gmond's<br> >>> -> 1 gmetad for storage -> \
2 gmetads (lustre +<br> >>> nfs) -> lots of gmonds<br>
>>><br>
>>> Thats basically it.<br>
>>><br>
>>> Okay, so this works. It is currently working, but, the gmetad's \
fall<br> >>> over form time to time, and is running ganglia 3.1.4. We would \
like to<br> >>> get everything up to 3.3.0/1, and update our web frontend as \
well.<br> >>><br>
>>> I've been updating the gmond's service side without issues, and \
the<br> >>> immediate parent gmetads (that is, gmetad's that only \
collect from<br> >>> gmond's) also without issue.<br>
>>><br>
>>> However, as soon as I restart a gmetad that polls other gmetads (the<br>
>>> gmetad_storage, for example), I get no summary information at all. \
The<br> >>> only change is I'm starting a different binary in the init \
script. It<br> >>> runs/starts without error, and with debugging, I get:<br>
>>><br>
>>> Going to run as user nobody<br>
>>> Sources are ...<br>
>>> Source: [NFS, step 15] has 1 sources<br>
>>> 127.0.0.1<br>
>>> Source: [Lustre, step 15] has 1 sources<br>
>>> 127.0.0.1<br>
>>> xml listening on port 8657<br>
>>> interactive xml listening on port 8658<br>
>>> cleanup thread has been started<br>
>>> Data thread 1168345408 is monitoring [NFS] data source<br>
>>> 127.0.0.1<br>
>>> Data thread 1178835264 is monitoring [Lustre] data source<br>
>>> 127.0.0.1<br>
>>><br>
>>> Where, as, with the older, 3.1.4 binary:<br>
>>> Going to run as user nobody<br>
>>> Sources are ...<br>
>>> Source: [NFS, step 15] has 1 sources<br>
>>> 127.0.0.1<br>
>>> Source: [Lustre, step 15] has 1 sources<br>
>>> 127.0.0.1<br>
>>> xml listening on port 8657<br>
>>> interactive xml listening on port 8658<br>
>>> Data thread 1170368832 is monitoring [NFS] data source<br>
>>> 127.0.0.1<br>
>>> Data thread 1180858688 is monitoring [Lustre] data source<br>
>>> 127.0.0.1<br>
>>> cleanup thread has been started<br>
>>> [NFS] is a 2.5 or later data stream<br>
>>> hash_create size = 50<br>
>>> hash->size is 53<br>
>>> Found a <GRID>, depth is now 1<br>
>>> Found a </GRID>, depth is now 0<br>
>>> Writing Summary data for source NFS, metric<br>
>>> storage_local__nfs_cleanenergy1_size<br>
>>> Writing Summary data for source NFS, metric disk_free<br>
>>> Writing Summary data for source NFS, metric<br>
>>> storage_local__nfs_nobackup2_percent_used<br>
>>> Writing Summary data for source NFS, metric \
storage_local__itc1_percent_used<br> >>> Writing Summary data for source \
NFS, metric storage_local__mnt_emcback7_size<br> >>> Writing Summary data \
for source NFS, metric<br> >>> storage_local__nfs_atlascode_size<br>
>>> Writing Summary data for source NFS, metric bytes_out<br>
>>> etc etc etc<br>
>>><br>
>>><br>
>>> I've been unable to find much on issues like this, no noted changes \
to<br> >>> the way gmetad can read downstream gmetads, and no obvious \
config<br> >>> options in 3.3.0.<br>
>>><br>
>>> Am I missing something?<br>
>>> I'll happily provide gmetad configs if needed.<br>
>>><br>
>>> --<br>
>>> Matthew Nicholson<br>
>>><br>
>>> ------------------------------------------------------------------------------<br>
>>> Virtualization & Cloud Management Using Capacity Planning<br>
>>> Cloud computing makes use of virtualization - but cloud computing<br>
>>> also focuses on allowing computing to be delivered as a service.<br>
>>> <a href="http://www.accelacomm.com/jaw/sfnl/114/51521223/" \
target="_blank">http://www.accelacomm.com/jaw/sfnl/114/51521223/</a><br> >>> \
_______________________________________________<br> >>> Ganglia-general \
mailing list<br> >>> <a \
href="mailto:Ganglia-general@lists.sourceforge.net">Ganglia-general@lists.sourceforge.net</a><br>
>>> <a href="https://lists.sourceforge.net/lists/listinfo/ganglia-general" \
target="_blank">https://lists.sourceforge.net/lists/listinfo/ganglia-general</a><br> \
>>><br> >>><br>
>>><br>
>>> ------------------------------------------------------------------------------<br>
>>> Virtualization & Cloud Management Using Capacity Planning<br>
>>> Cloud computing makes use of virtualization - but cloud computing<br>
>>> also focuses on allowing computing to be delivered as a service.<br>
>>> <a href="http://www.accelacomm.com/jaw/sfnl/114/51521223/" \
target="_blank">http://www.accelacomm.com/jaw/sfnl/114/51521223/</a><br> >>> \
_______________________________________________<br> >>> Ganglia-general \
mailing list<br> >>> <a \
href="mailto:Ganglia-general@lists.sourceforge.net">Ganglia-general@lists.sourceforge.net</a><br>
>>> <a href="https://lists.sourceforge.net/lists/listinfo/ganglia-general" \
target="_blank">https://lists.sourceforge.net/lists/listinfo/ganglia-general</a><br> \
>>><br> >><br>
>><br>
>><br>
>> --<br>
>> Matthew Nicholson<br>
>><br>
>> ------------------------------------------------------------------------------<br>
>> Virtualization & Cloud Management Using Capacity Planning<br>
>> Cloud computing makes use of virtualization - but cloud computing<br>
>> also focuses on allowing computing to be delivered as a service.<br>
>> <a href="http://www.accelacomm.com/jaw/sfnl/114/51521223/" \
target="_blank">http://www.accelacomm.com/jaw/sfnl/114/51521223/</a><br> >> \
_______________________________________________<br> >> Ganglia-general mailing \
list<br> >> <a \
href="mailto:Ganglia-general@lists.sourceforge.net">Ganglia-general@lists.sourceforge.net</a><br>
>> <a href="https://lists.sourceforge.net/lists/listinfo/ganglia-general" \
target="_blank">https://lists.sourceforge.net/lists/listinfo/ganglia-general</a><br> \
><br> ><br>
> ------------------------------------------------------------------------------<br>
> Virtualization & Cloud Management Using Capacity Planning<br>
> Cloud computing makes use of virtualization - but cloud computing<br>
> also focuses on allowing computing to be delivered as a service.<br>
> <a href="http://www.accelacomm.com/jaw/sfnl/114/51521223/" \
target="_blank">http://www.accelacomm.com/jaw/sfnl/114/51521223/</a><br> > \
_______________________________________________<br> > Ganglia-general mailing \
list<br> > <a href="mailto:Ganglia-general@lists.sourceforge.net">Ganglia-general@lists.sourceforge.net</a><br>
> <a href="https://lists.sourceforge.net/lists/listinfo/ganglia-general" \
target="_blank">https://lists.sourceforge.net/lists/listinfo/ganglia-general</a><br> \
><br> <br>
</div></div>------------------------------------------------------------------------------<br>
Try before you buy = See our experts in action!<br>
The most comprehensive online learning library for Microsoft developers<br>
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,<br>
Metro Style Apps, more. Free future releases when you subscribe now!<br>
<a href="http://p.sf.net/sfu/learndevnow-dev2" \
target="_blank">http://p.sf.net/sfu/learndevnow-dev2</a><br> <div class="HOEnZb"><div \
class="h5">_______________________________________________<br> Ganglia-general \
mailing list<br> <a href="mailto:Ganglia-general@lists.sourceforge.net">Ganglia-general@lists.sourceforge.net</a><br>
<a href="https://lists.sourceforge.net/lists/listinfo/ganglia-general" \
target="_blank">https://lists.sourceforge.net/lists/listinfo/ganglia-general</a><br> \
</div></div></blockquote></div><br></div>
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic