'[gpfsug-discuss] Looking for a way to see which node is having an impact on server?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       gpfsug-discuss
Subject:    [gpfsug-discuss] Looking for a way to see which node is having an impact on server?
From:       viccornell () gmail ! com (Vic Cornell)
Date:       2013-12-10 10:13:20
Message-ID: 6726F05D-3332-4FF4-AB9D-F78B542E2249 () gmail ! com
[Download RAW message or body]

Have you looked at mmpmon? Its a bit much for 600 nodes but if you run it with a \
reasonable interface specified then the output shouldn't be too hard to parse.

Quick recipe:

create a file called mmpmon.conf that looks like 

################# cut here #########################
nlist add node1 node2 node3 node4 node5
io_s
reset
################# cut here #########################

Where node1,node2 etc are your node names - it might be as well to do this for \
batches of 50 or so.

then run something like:

/usr/lpp/mmfs/bin/mmpmon -i mmpmon.conf -d 10000 -r 0 -p

That will give you a set of stats for all of your named nodes aggregated over a 10 \
second period

Dont run more than one of these as each one will reset the stats for the other :-)

parse out the stats with something like:

awk -F_ '{if ($2=="io"){print $8,$16/1024/1024,$18/1024/1024}}'

which will give you read and write throughput.

The docs (GPFS advanced Administration Guide) are reasonable.

Cheers,

Vic Cornell
viccornell at gmail.com

On 9 Dec 2013, at 19:52, Alex Chekholko <chekh at stanford.edu> wrote:

> Hi Richard,
> 
> I would just use something like 'iftop' to look at the traffic between the nodes.  \
> Or 'collectl'.  Or 'dstat'. 
> e.g. dstat -N eth0 --gpfs --gpfs-ops --top-cpu-adv --top-io 2 10
> http://dag.wiee.rs/home-made/dstat/
> 
> For the NSD balance question, since GPFS stripes the blocks evenly across all the \
> NSDs, they will end up balanced over time.  Or you can rebalance manually with \
> 'mmrestripefs -b' or similar. 
> It is unlikely that particular files ended up on a single NSD, unless the other \
> NSDs are totally full. 
> Regards,
> Alex
> 
> On 12/06/2013 04:31 PM, Richard Lefebvre wrote:
> > Hi,
> > 
> > I'm looking for a way to see which node (or nodes) is having an impact
> > on the gpfs server nodes which is slowing the whole file system? What
> > happens, usually, is a user is doing some I/O that doesn't fit the
> > configuration of the gpfs file system and the way it was explain on how
> > to use it efficiently.  It is usually by doing a lot of unbuffered byte
> > size, very random I/O on the file system that was made for large files
> > and large block size.
> > 
> > My problem is finding out who is doing that. I haven't found a way to
> > pinpoint the node or nodes that could be the source of the problem, with
> > over 600 client nodes.
> > 
> > I tried to use "mmlsnodes -N waiters -L" but there is too much waiting
> > that I cannot pinpoint on something.
> > 
> > I must be missing something simple. Anyone got any help?
> > 
> > Note: there is another thing I'm trying to pinpoint. A temporary
> > imbalance was created by adding a new NSD. It seems that a group of
> > files have been created on that same NSD and a user keeps hitting that
> > NSD causing a high load.  I'm trying to pinpoint the origin of that too.
> > At least until everything is balance back. But will balancing spread
> > those files since they are already on the most empty NSD?
> > 
> > Richard
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at gpfsug.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> > 
> 
> -- 
> Alex Chekholko chekh at stanford.edu
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[prev in list] [next in list] [prev in thread] [next in thread]