[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opensolaris-nfs-discuss
Subject:    Re: [nfs-discuss] nfsd, huge thread count and CPU utilization
From:       Mahesh Siddheshwar <siddheshwar.mahesh () oracle ! com>
Date:       2010-06-03 3:10:40
Message-ID: 4C071D30.2030204 () oracle ! com
[Download RAW message or body]


I'd start off with understanding the kind of workload
that the clients are running during that period
to understand the big picture and then
collect info through the different stats (mpstat,
iostat, lockstat, etc.)

For a quick deep dive that is particularly helpful
to developers with knowledge of the code is
collecting profile data. That's going to tell about
what's keeping the CPUs active.

Say like,

dtrace -n 'profile-1234 { @[stack()] = count(); } tick-10sec { exit(0); } END { 
trunc(@, 25); }'

would collect DTrace profile data for 10sec and truncate to
top 25 stacks.

If you suspect lock contention, then lockstat info like,

lockstat -s8 -n 1000000 sleep 10

will be useful.

And in the unfortunate event of having to reboot the system, do
'reboot -d' to collect a system crash dump for postmortem analysis.

Regards,
Mahesh

Giovanni Tirloni wrote:
> Hello,
> 
>  Today one of our NFS servers running OpenSolaris 2009.6 became very
> unstable. It's a dual Quad-Core Xeon with 32GB of RAM and a dual
> Gigabit Ethernet adapter.
> 
>  Network access was very intermittent (>10000ms RTT), SSH was flaky
> and the NFS clients were hanging randomly. After further inspecting
> the server locally it was evident that:
> 
> - nfsd had spawned lots of LWPs (>2400)
> - nfsd was consuming 100% of one CPU and the other seven were totally idle
> - system load avg was 1.2-1.3
> 
>  We didn't have much time to troubleshoot this further and, after nfsd
> refused to stop for several minutes, the server was forced to reboot.
> 
>  This is the third occurrence in a few months and we are guessing it's
> a combination of bad NFS clients (RHEL5) and not enough limits on our
> side.
> 
>  Since we are expecting this to happen again, what kind of information
> would be helpful to troubleshoot it and submit a bug if needed ?
> 
>  Has anyone else seen this problem ?
> 
> Thank you,
> 

_______________________________________________
nfs-discuss mailing list
nfs-discuss@opensolaris.org
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic