[prev in list] [next in list] [prev in thread] [next in thread]
List: freebsd-bugs
Subject: misc/117688: mpt disk timeout and hang
From: Matt Lehner <matt () aim2game ! com>
Date: 2007-10-30 20:25:37
Message-ID: 200710302025.l9UKPbgB030431 () www ! freebsd ! org
[Download RAW message or body]
> Number: 117688
> Category: misc
> Synopsis: mpt disk timeout and hang
> Confidential: no
> Severity: serious
> Priority: medium
> Responsible: freebsd-bugs
> State: open
> Quarter:
> Keywords:
> Date-Required:
> Class: sw-bug
> Submitter-Id: current-users
> Arrival-Date: Tue Oct 30 20:30:00 UTC 2007
> Closed-Date:
> Last-Modified:
> Originator: Matt Lehner
> Release: 7.0-BETA1
> Organization:
> Environment:
FreeBSD vault.buffalo.rr.com 7.0-BETA1 FreeBSD 7.0-BETA1 #0: Mon Oct 22 07:41:02 UTC \
2007 root@vault.buffalo.rr.com:/usr/obj/usr/src/sys/VAULT amd64
> Description:
I installed FreeBSD7 so I could take advantage of the ZFS support. While testing out \
the ZFS support, I came across an issue with the mpt(4) driver. After an extended \
period of moderate to heavy load on the disks, I would get following errors in dmesg. \
Moderate to heavy disk load would be ~50-70MB/s with bursts to 86MB/s and 600 ops/s \
per disk according to gstat.
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6f150:13878 timed out for \
ccb 0xffffff0001a15000 (req->ccb 0xffffff0001a15000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e75450:13879 timed out for \
ccb 0xffffff0001a10000 (req->ccb 0xffffff0001a10000)
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
function 0
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6ea00:13880 timed out for \
ccb 0xffffff0001998400 (req->ccb 0xffffff0001998400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e70740:13881 timed out for \
ccb 0xffffff0001395400 (req->ccb 0xffffff0001395400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e69ab0:13886 timed out for \
ccb 0xffffff000157dc00 (req->ccb 0xffffff000157dc00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e762f0:13887 timed out for \
ccb 0xffffff0001982400 (req->ccb 0xffffff0001982400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6b520:13888 timed out for \
ccb 0xffffff000198ec00 (req->ccb 0xffffff000198ec00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e7a820:13889 timed out for \
ccb 0xffffff00019bf000 (req->ccb 0xffffff00019bf000)
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
function 0
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6dda0:13890 timed out for \
ccb 0xffffff0001983400 (req->ccb 0xffffff0001983400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6df50:13891 timed out for \
ccb 0xffffff00019be000 (req->ccb 0xffffff00019be000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6b9a0:13892 timed out for \
ccb 0xffffff00018c4400 (req->ccb 0xffffff00018c4400)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e72a20:13893 timed out for \
ccb 0xffffff0001a10800 (req->ccb 0xffffff0001a10800)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e696c0:13894 timed out for \
ccb 0xffffff000197ec00 (req->ccb 0xffffff000197ec00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e74d90:13895 timed out for \
ccb 0xffffff00018c4000 (req->ccb 0xffffff00018c4000)
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
function 0
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e78e40:13904 timed out for \
ccb 0xffffff0001a0f000 (req->ccb 0xffffff0001a0f000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e6f8a0:13905 timed out for \
ccb 0xffffff0001a0ac00 (req->ccb 0xffffff0001a0ac00)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e75f00:13906 timed out for \
ccb 0xffffff000194e000 (req->ccb 0xffffff000194e000)
Oct 29 13:53:40 vault kernel: mpt0: request 0xffffffff80e772b0:13907 timed out for \
ccb 0xffffff0001984000 (req->ccb 0xffffff0001984000)
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
function 0
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
function 0
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
Oct 29 13:53:40 vault kernel: mpt0: attempting to abort req 0xffffffff80e6f150:13878 \
function 0
Oct 29 13:53:40 vault kernel: mpt0: abort of req 0xffffffff80e6f150:13878 completed
The last two lines would continue to repeat (indefinately I would assume) until I had \
to power-cycle the machine. When the server would come back online, ZFS would \
function fine and it reported no checksum errors or anything. I did a scrub and again \
no problems. But if I put enough load onto the disks for an extended period of time \
it would again crash with the same errors. There doesn't appear to be a certain \
length of time or exact combination of factors that causes the errors. Sometimes it \
would occur much more quickly than other times. When the errors were scrolling the \
screen, one disk or the other or both would have their activity light on steady.
Currently the machine boots over the network (using pxeboot) from another machine. \
The ZFS array is the only physical disks it has. So while this is happening, the \
system itself does not lock up.
Motherboard: Tyan Tiger i7501 S2723
CPU: Dual Opteron 244
Controller: LSI SAS3041X-R
Harddrives: 2x 1TB Hitachi Deskstars
vault# zfs list
NAME USED AVAIL REFER MOUNTPOINT
tank 824G 89.6G 18K /tank
tank/storage 824G 89.6G 824G /storage
vault#
vault# zpool status
pool: tank
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 0 0 0
errors: No known data errors
vault#
mpt0: <LSILogic SAS/SATA Adapter> port 0x8800-0x88ff mem \
0xfc2fc000-0xfc2fffff,0xfc2e0000-0xfc2effff irq 28 at device 3.0 on \
pci1
mpt0: [ITHREAD]
mpt0: MPI Version=1.5.10.0
> How-To-Repeat:
do a lot of IO over an mpt(4) device for an extended period
> Fix:
> Release-Note:
> Audit-Trail:
> Unformatted:
_______________________________________________
freebsd-bugs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic