[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ceph-users
Subject:    [ceph-users] Re: Monitoring ceph cluster
From:       Anthony D'Atri <anthony.datri () gmail ! com>
Date:       2022-01-27 1:05:23
Message-ID: 4F61EC12-B290-4E13-9F97-4E12A57BF1E9 () gmail ! com
[Download RAW message or body]


What David said!

A couple of additional thoughts:

o Nagios (and derivatives like Icinga and check_mk) have been popular for years.  \
Note that they're monitoring solutions vs metrics solutions — it's good to have \
both.  One issue I've seen multiple times with Nagios-family monitoring is that over \
time as checks and the fleet grow, the server tends to bog down, and the litany of \
active checks starts taking longer to run than the check interval.  Prometheus \
alertmanager is more scalable, and with some thought most active checks can be recast \
in terms of metrics.

o Prometheus (forked node_exporter) was INVALUABLE to me when characterizing and \
engaging two seperate SSD firmware design flaw issues. It includes a data query \
interface for ad-hoc queries and expression development

o Grafana pairs well with Prometheus for dashboard-style visualization and trending \
across many clusters / nodes


> On Jan 26, 2022, at 1:22 PM, David Orman <ormandj@corenode.com> wrote:
> 
> What version of Ceph are you using? Newer versions deploy a dashboard and
> prometheus module, which has some of this built in. It's a great start to
> seeing what can be done using Prometheus and the built in exporter. Once
> you learn this, if you decide you want something more robust, you can do an
> external deployment of Prometheus (clusters), Alertmanager, Grafana, and
> all the other tooling that might interest you for a more scalable solution
> when dealing with more clusters. It's the perfect way to get your feet wet
> and it showcases a lot of the interesting things you can do with this
> solution!
> 
> https://docs.ceph.com/en/latest/mgr/dashboard/
> https://docs.ceph.com/en/latest/mgr/prometheus/
> 
> David
> 
> On Wed, Jan 26, 2022 at 1:42 AM Michel Niyoyita <micou12@gmail.com> wrote:
> 
> > Thank you for your email Szabo, these can be helpful , can you provide
> > links then I start to work on it.
> > 
> > Michel.
> > 
> > On Tue, 25 Jan 2022, 18:51 Szabo, Istvan (Agoda), <Istvan.Szabo@agoda.com>
> > wrote:
> > 
> > > Which monitoring tool? Like prometheus or nagios style thing?
> > > We use sensu for keepalive and ceph health reporting + prometheus with
> > > grafana for metrics collection.
> > > 
> > > Istvan Szabo
> > > Senior Infrastructure Engineer
> > > ---------------------------------------------------
> > > Agoda Services Co., Ltd.
> > > e: istvan.szabo@agoda.com
> > > ---------------------------------------------------
> > > 
> > > On 2022. Jan 25., at 22:38, Michel Niyoyita <micou12@gmail.com> wrote:
> > > 
> > > Email received from the internet. If in doubt, don't click any link nor
> > > open any attachment !
> > > ________________________________
> > > 
> > > Hello team,
> > > 
> > > I would like to monitor my ceph cluster using one of the
> > > monitoring tool, does someone has a help on that ?
> > > 
> > > Michel
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-leave@ceph.io
> > > 
> > > 
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-leave@ceph.io
> > 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-leave@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic