'Re: [zfs-discuss] Solaris Based Systems "Lock Up" - Possibly'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       zfs-discuss
Subject:    Re: [zfs-discuss] Solaris Based Systems "Lock Up" - Possibly
From:       Lachlan Mulcahy <lmulcahy () marinsoftware ! com>
Date:       2011-10-31 21:10:52
Message-ID: CANv=DFvSmha13xF+GqQYLRM5Shj6CrFM6AJ17XH=asxzuNjRzg () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

Hi Marion,

Thanks for your swifty reply!

Have you got the latest firmware on your LSI 1068E HBA's?  These have been
> known to have lockups/timeouts when used with SAS expanders (disk
> enclosures)
> with incompatible firmware revisions, and/or with older mpt drivers.
>

I'll need to check that out -- I'm 90% sure that these are fresh out of box
HBAs.

Will try an upgrade there and see if we get any joy there...

> The MD1220 is a 6Gbit/sec device.  You may be better off with a matching
> HBA  -- Dell has certainly told us the MD1200-series is not intended for
> use with the 3Gbit/sec HBA's.  We're doing fine with the LSI SAS 9200-8e,
> for example, when connecting to Dell MD1200's with the 2TB "nearline SAS"
> disk drives.
>

I was aware the MD1220 is a 6G device, but I figured that since our IO
throughput doesn't actually come close to saturating 3Gbit/sec that it
would just operate at the lower speed and be OK. I guess it is something to
look at if I run out of other options...

Last, are you sure it's memory-related?  You might keep an eye on "
> arcstat.pl"
> output and see what the ARC sizes look like just prior to lockup.  Also,
> maybe you can look up instructions on how to force a crash dump when the
> system hangs -- one of the experts around here could tell a lot from a
> crash dump file.
>

I'm starting to doubt that it is a memory issue now -- especially since I
now have some results from my latest "test"...

output of arcstat.pl looked like this just prior to the lock up:

19:57:36    24G   24G   94   161    61   194       1       1
19:57:41    24G   24G   96   174    62   213       0       0
    time  arcsz     c  mh%  mhit  hit%  hits  l2hit%  l2hits
19:57:46    23G   24G   94   161    62   192       1       1
19:57:51    24G   24G   96   169    63   205       0       0
19:57:56    24G   24G   95   169    61   206       0       0

^-- This is the very last line printed...

I actually discovered and rebooted the machine via DRAC at around 20:44, so
it had been in it's bad state for around 1 hour.

Some snippets from the output some 20 minutes earlier shows the point at
while the arcsz grew to reach the maximum:

    time  arcsz     c  mh%  mhit  hit%  hits  l2hit%  l2hits
19:36:45    21G   24G   95   152    58   177       0       0
19:37:00    22G   24G   95   156    57   182       0       0
19:37:15    22G   24G   95   159    59   185       0       0
19:37:30    23G   24G   94   153    58   178       0       0
19:37:45    23G   24G   95   169    59   195       0       0
19:38:00    24G   24G   95   160    59   187       0       0
19:38:25    24G   24G   96   151    58   177       0       0

So it seems that arcsz reaching the 24G maximum wasn't necessarily to
blame, since the system operated for a good 20mins in this state.

I was also logging "vmstat 5" prior to the crash (though I forgot to
include some timestamps in my output) and these are the final lines
recorded in that log:

 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 s3   in   sy   cs us sy
id
 0 0 0 25885248 18012208 71 2090 0 0 0 0 0  0  0  0 22 17008 210267 30229 1
5 94
 0 0 0 25884764 18001848 71 2044 0 0 0 0 0  0  0  0 25 14846 151228 25911 1
5 94
 0 0 0 25884208 17991876 71 2053 0 0 0 0 0  0  0  0  8 16343 185416 28946 1
5 93

So it seems there was some 17-18G free in the system when the lock up
occurred. Curious...

I was also capturing some arc info from mdb -k  and the output prior to the
lock up was...

Monday, October 31, 2011 07:57:51 PM UTC
arc_no_grow               =         0
arc_tempreserve           =         0 MB
arc_meta_used             =      4621 MB
arc_meta_limit            =     20480 MB
arc_meta_max              =      4732 MB
Monday, October 31, 2011 07:57:56 PM UTC
arc_no_grow               =         0
arc_tempreserve           =         0 MB
arc_meta_used             =      4622 MB
arc_meta_limit            =     20480 MB
arc_meta_max              =      4732 MB

Looks like metadata was not primarily responsible for consuming all of that
24G of ARC in arcstat.pl output...

Also seems nothing interesting in /var/adm/messages leading up to my
rebooting :

Oct 31 18:42:57 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency
error 512 PPM exceeds tolerance 500 PPM
Oct 31 18:44:01 mslvstdp02r last message repeated 1 time
Oct 31 18:45:05 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency
error 512 PPM exceeds tolerance 500 PPM
Oct 31 18:46:09 mslvstdp02r last message repeated 1 time
Oct 31 18:47:23 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency
error 505 PPM exceeds tolerance 500 PPM
Oct 31 19:06:13 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency
error 505 PPM exceeds tolerance 500 PPM
Oct 31 19:09:27 mslvstdp02r last message repeated 4 times
Oct 31 19:25:04 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency
error 505 PPM exceeds tolerance 500 PPM
Oct 31 19:28:17 mslvstdp02r last message repeated 3 times
Oct 31 19:46:17 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency
error 505 PPM exceeds tolerance 500 PPM
Oct 31 19:49:32 mslvstdp02r last message repeated 4 times
Oct 31 20:44:33 mslvstdp02r genunix: [ID 540533 kern.notice] ^MSunOS
Release 5.11 Version snv_151a 64-bit
Oct 31 20:44:33 mslvstdp02r genunix: [ID 877030 kern.notice] Copyright (c)
1983, 2010, Oracle and/or its affiliates. All rights reserved.

Just some ntpd stuff really...

I'm going to check out a firmware upgrade next.. I can't believe that this
is really an out of memory situation now when there is 17-18G free reported
by vmstat....

Lets see...

Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080

[Attachment #5 (text/html)]

Hi Marion,<br><br>Thanks for your swifty reply!<br><br><div \
class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex;"> Have you got the latest firmware \
on your LSI 1068E HBA&#39;s?  These have been<br> known to have lockups/timeouts when \
used with SAS expanders (disk enclosures)<br> with incompatible firmware revisions, \
and/or with older mpt drivers.<br></blockquote><div><br>I&#39;ll need to check that \
out -- I&#39;m 90% sure that these are fresh out of box HBAs.<br><br>Will try an \
upgrade there and see if we get any joy there...<br> </div><div> </div><blockquote \
class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, \
204, 204); padding-left: 1ex;"> The MD1220 is a 6Gbit/sec device.  You may be better \
off with a matching<br> HBA  -- Dell has certainly told us the MD1200-series is not \
intended for<br> use with the 3Gbit/sec HBA&#39;s.  We&#39;re doing fine with the LSI \
SAS 9200-8e,<br> for example, when connecting to Dell MD1200&#39;s with the 2TB \
&quot;nearline SAS&quot;<br> disk drives.<br></blockquote><div><br>I was aware the \
MD1220 is a 6G device, but I figured that since our IO throughput doesn&#39;t \
actually come close to saturating 3Gbit/sec that it would just operate at the lower \
speed and be OK. I guess it is something to look at if I run out of other \
options...<br> <br><br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt \
0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> Last, are \
you sure it&#39;s memory-related?  You might keep an eye on &quot;<a \
href="http://arcstat.pl" target="_blank">arcstat.pl</a>&quot;<br> output and see what \
the ARC sizes look like just prior to lockup.  Also,<br> maybe you can look up \
instructions on how to force a crash dump when the<br> system hangs -- one of the \
experts around here could tell a lot from a<br> crash dump \
file.<br></blockquote><div><br>I&#39;m starting to doubt that it is a memory issue \
now -- especially since I now have some results from my latest \
&quot;test&quot;...<br><br>output of <a href="http://arcstat.pl">arcstat.pl</a> \
looked like this just prior to the lock up:<br> <br>19:57:36    24G   24G   94   161  \
61   194       1       1  <br>19:57:41    24G   24G   96   174    62   213       0    \
0  <br>    time  arcsz     c  mh%  mhit  hit%  hits  l2hit%  l2hits  <br>19:57:46    \
23G   24G   94   161    62   192       1       1  <br> 19:57:51    24G   24G   96   \
169    63   205       0       0  <br>19:57:56    24G   24G   95   169    61   206     \
0       0  <br><br>^-- This is the very last line printed... <br><br>I actually \
discovered and rebooted the machine via DRAC at around 20:44, so it had been in \
it&#39;s bad state for around 1 hour. <br> <br>Some snippets from the output some 20 \
minutes earlier shows the point at while the arcsz grew to reach the maximum:<br><br> \
time  arcsz     c  mh%  mhit  hit%  hits  l2hit%  l2hits  <br>19:36:45    21G   24G   \
95   152    58   177       0       0  <br> 19:37:00    22G   24G   95   156    57   \
182       0       0  <br>19:37:15    22G   24G   95   159    59   185       0       0 \
<br>19:37:30    23G   24G   94   153    58   178       0       0  <br>19:37:45    23G \
24G   95   169    59   195       0       0  <br> 19:38:00    24G   24G   95   160    \
59   187       0       0  <br>19:38:25    24G   24G   96   151    58   177       0    \
0  <br><br>So it seems that arcsz reaching the 24G maximum wasn&#39;t necessarily to \
blame, since the system operated for a good 20mins in this state.<br> <br>I was also \
logging &quot;vmstat 5&quot; prior to the crash (though I forgot to include some \
timestamps in my output) and these are the final lines recorded in that log:<br><br> \
kthr      memory            page            disk          faults      cpu<br>  r b w  \
swap  free  re  mf pi po fr de sr s0 s1 s2 s3   in   sy   cs us sy id<br> 0 0 0 \
25885248 18012208 71 2090 0 0 0 0 0  0  0  0 22 17008 210267 30229 1 5 94<br> 0 0 0 \
25884764 18001848 71 2044 0 0 0 0 0  0  0  0 25 14846 151228 25911 1 5 94<br>  0 0 0 \
25884208 17991876 71 2053 0 0 0 0 0  0  0  0  8 16343 185416 28946 1 5 93<br><br>So \
it seems there was some 17-18G free in the system when the lock up occurred. \
Curious...<br><br>I was also capturing some arc info from mdb -k  and the output \
prior to the lock up was...<br> <br>Monday, October 31, 2011 07:57:51 PM \
UTC<br>arc_no_grow               =         0<br>arc_tempreserve           =         0 \
MB<br>arc_meta_used             =      4621 MB<br>arc_meta_limit            =     \
20480 MB<br> arc_meta_max              =      4732 MB<br>Monday, October 31, 2011 \
07:57:56 PM UTC<br>arc_no_grow               =         0<br>arc_tempreserve           \
=         0 MB<br>arc_meta_used             =      4622 MB<br>arc_meta_limit          \
=     20480 MB<br> arc_meta_max              =      4732 MB<br><br>Looks like \
metadata was not primarily responsible for consuming all of that 24G of ARC in <a \
href="http://arcstat.pl">arcstat.pl</a> output...<br><br>Also seems nothing \
interesting in /var/adm/messages leading up to my rebooting :<br> <br>Oct 31 18:42:57 \
mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency error 512 PPM exceeds \
tolerance 500 PPM<br>Oct 31 18:44:01 mslvstdp02r last message repeated 1 time<br>Oct \
31 18:45:05 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency error 512 PPM \
                exceeds tolerance 500 PPM<br>
Oct 31 18:46:09 mslvstdp02r last message repeated 1 time<br>Oct 31 18:47:23 \
mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency error 505 PPM exceeds \
tolerance 500 PPM<br>Oct 31 19:06:13 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] \
                frequency error 505 PPM exceeds tolerance 500 PPM<br>
Oct 31 19:09:27 mslvstdp02r last message repeated 4 times<br>Oct 31 19:25:04 \
mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency error 505 PPM exceeds \
                tolerance 500 PPM<br>Oct 31 19:28:17 mslvstdp02r last message \
                repeated 3 times<br>
Oct 31 19:46:17 mslvstdp02r ntpd[368]: [ID 702911 daemon.notice] frequency error 505 \
PPM exceeds tolerance 500 PPM<br>Oct 31 19:49:32 mslvstdp02r last message repeated 4 \
times<br>Oct 31 20:44:33 mslvstdp02r genunix: [ID 540533 kern.notice] ^MSunOS Release \
                5.11 Version snv_151a 64-bit<br>
Oct 31 20:44:33 mslvstdp02r genunix: [ID 877030 kern.notice] Copyright (c) 1983, \
2010, Oracle and/or its affiliates. All rights reserved.<br><br>Just some ntpd stuff \
really...<br><br>I&#39;m going to check out a firmware upgrade next.. I can&#39;t \
believe that this is really an out of memory situation now when there is 17-18G free \
reported by vmstat....<br clear="all"> <br></div></div>Lets \
see...<br><br>Regards,<br>-- <br><div><div style="font-family:Helvetica">Lachlan \
Mulcahy</div><div style="font-family:Helvetica">Senior DBA, </div><div \
style="font-family:Helvetica">Marin Software Inc.</div> <div \
style="font-family:Helvetica">San Francisco, USA</div><div \
style="font-family:Helvetica"><br>AU Mobile: +61 458 448 721<br></div><div \
style="font-family:Helvetica">US Mobile: +1 (415) 867 2839</div><div \
style="font-family:Helvetica"> Office : +1 (415) 671 6080</div></div><br>

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[prev in list] [next in list] [prev in thread] [next in thread]