[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-devel
Subject:    Re: Where do MDSHealthMetrics show up?
From:       John Spray <john.spray () redhat ! com>
Date:       2014-11-24 10:05:31
Message-ID: CAGd4Wr0StfpqEJMPKm_Ym3+=JZ2mp=thBU=A=_gRzpR-rL=78w () mail ! gmail ! com
[Download RAW message or body]

On Fri, Nov 21, 2014 at 1:54 AM, Michael Sevilla <mikesevilla3@gmail.com> wrote:
> Hi. Where do the MDSHealthMetrics in MMDSBeacon (e.g.,
> MDS_HEALTH_TRIM) show up in the monitors? When we run ceph -s? I
> suspect I don't see them because I'd have to run ceph -s at the exact
> moment when the MDS is trimming. Is there an easier way to see these
> warning or is there some debug flag I need to turn on?

In the specific case of MDS_HEALTH_TRIM, this is aimed at detecting
systems that are trimming at a pathologically bad rate (or perhaps
stuck entirely due to a bug), so the in such an unhealthy system we
would expect the state to stick around for a while -- it shouldn't
just be a "blink and you miss it" status.  However, you would have to
look at the status sometime in the unhealthy period: there's currently
nothing in the cluster log for that health check.

For the new MDS health warnings, we have some overlapping coverage
between health indications (i.e. things that show up in "ceph -s") and
cluster log messages (i.e. things that show up in "ceph -w").  There
is a general problem here for the health stuff (not just for the MDS
things) that it is only generated on-demand when someone looks at it
-- e.g. things like clock skew also only show up if you happen to run
ceph -s at the right moment.  Internally this corresponds to the
various get_health() functions in the mon subsystems.

It would be good to have a generic way for health indicators (MDS and
beyond) to emit clog messages when they appear and disappear, so that
you don't have to look at the status at the right moment.  That would
be a little hard to implement at the moment because the health
messages are just freeform strings, but I put some notes on cleaning
up health reporting here a while back:
http://tracker.ceph.com/issues/7192

Cheers,
John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic