[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-bugs
Subject:    misc/117688: mpt disk timeout and hang
From:       Matt Lehner <matt () aim2game ! com>
Date:       2007-10-30 20:25:37
Message-ID: 200710302025.l9UKPbgB030431 () www ! freebsd ! org
[Download RAW message or body]


> Number:         117688
> Category:       misc
> Synopsis:       mpt disk timeout and hang
> Confidential:   no
> Severity:       serious
> Priority:       medium
> Responsible:    freebsd-bugs
> State:          open
> Quarter:        
> Keywords:       
> Date-Required:
> Class:          sw-bug
> Submitter-Id:   current-users
> Arrival-Date:   Tue Oct 30 20:30:00 UTC 2007
> Closed-Date:
> Last-Modified:
> Originator:     Matt Lehner
> Release:        7.0-BETA1
> Organization:
> Environment:
FreeBSD vault.buffalo.rr.com 7.0-BETA1 FreeBSD 7.0-BETA1 #0: Mon Oct 22 07:41:02 UTC \
2007     root@vault.buffalo.rr.com:/usr/obj/usr/src/sys/VAULT  amd64
> Description:
I installed FreeBSD7 so I could take advantage of the ZFS support. While testing out \
the ZFS support, I came across an issue with the mpt(4) driver. After an extended \
period of moderate to heavy load on the disks, I would get following errors in dmesg. \
Moderate to heavy disk load would be ~50-70MB/s with bursts to 86MB/s and 600 ops/s \
per disk according to gstat.

Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6f150:13878 timed out for \
                ccb 0xffffff0001a15000 (req->ccb 0xffffff0001a15000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e75450:13879 timed out for \
                ccb 0xffffff0001a10000 (req->ccb 0xffffff0001a10000)
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
                function 0
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6ea00:13880 timed out for \
                ccb 0xffffff0001998400 (req->ccb 0xffffff0001998400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e70740:13881 timed out for \
                ccb 0xffffff0001395400 (req->ccb 0xffffff0001395400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e69ab0:13886 timed out for \
                ccb 0xffffff000157dc00 (req->ccb 0xffffff000157dc00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e762f0:13887 timed out for \
                ccb 0xffffff0001982400 (req->ccb 0xffffff0001982400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6b520:13888 timed out for \
                ccb 0xffffff000198ec00 (req->ccb 0xffffff000198ec00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e7a820:13889 timed out for \
                ccb 0xffffff00019bf000 (req->ccb 0xffffff00019bf000)
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
                function 0
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6dda0:13890 timed out for \
                ccb 0xffffff0001983400 (req->ccb 0xffffff0001983400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6df50:13891 timed out for \
                ccb 0xffffff00019be000 (req->ccb 0xffffff00019be000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6b9a0:13892 timed out for \
                ccb 0xffffff00018c4400 (req->ccb 0xffffff00018c4400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e72a20:13893 timed out for \
                ccb 0xffffff0001a10800 (req->ccb 0xffffff0001a10800)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e696c0:13894 timed out for \
                ccb 0xffffff000197ec00 (req->ccb 0xffffff000197ec00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e74d90:13895 timed out for \
                ccb 0xffffff00018c4000 (req->ccb 0xffffff00018c4000)
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
                function 0
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e78e40:13904 timed out for \
                ccb 0xffffff0001a0f000 (req->ccb 0xffffff0001a0f000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6f8a0:13905 timed out for \
                ccb 0xffffff0001a0ac00 (req->ccb 0xffffff0001a0ac00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e75f00:13906 timed out for \
                ccb 0xffffff000194e000 (req->ccb 0xffffff000194e000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e772b0:13907 timed out for \
                ccb 0xffffff0001984000 (req->ccb 0xffffff0001984000)
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
                function 0
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
                function 0
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
                function 0
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed

The last two lines would continue to repeat (indefinately I would assume) until I had \
to power-cycle the machine. When the server would come back online, ZFS would \
function fine and it reported no checksum errors or anything. I did a scrub and again \
no problems. But if I put enough load onto the disks for an extended period of time \
it would again crash with the same errors. There doesn't appear to be a certain \
length of time or exact combination of factors that causes the errors. Sometimes it \
would occur much more quickly than other times. When the errors were scrolling the \
screen, one disk or the other or both would have their activity light on steady.

Currently the machine boots over the network (using pxeboot) from another machine. \
The ZFS array is the only physical disks it has. So while this is happening, the \
system itself does not lock up.

Motherboard: Tyan Tiger i7501 S2723
CPU: Dual Opteron 244
Controller: LSI SAS3041X-R
Harddrives: 2x 1TB Hitachi Deskstars

vault# zfs list
NAME           USED  AVAIL  REFER  MOUNTPOINT
tank           824G  89.6G    18K  /tank
tank/storage   824G  89.6G   824G  /storage
vault#

vault# zpool status
  pool: tank
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            da0     ONLINE       0     0     0
            da1     ONLINE       0     0     0

errors: No known data errors
vault#

mpt0: <LSILogic SAS/SATA Adapter> port 0x8800-0x88ff mem \
                0xfc2fc000-0xfc2fffff,0xfc2e0000-0xfc2effff irq 28 at device 3.0 on \
                pci1
mpt0: [ITHREAD]
mpt0: MPI Version=1.5.10.0
> How-To-Repeat:
do a lot of IO over an mpt(4) device for an extended period
> Fix:


> Release-Note:
> Audit-Trail:
> Unformatted:
_______________________________________________
freebsd-bugs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic