[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-scsi
Subject:    RE: Problem using software RAID1 - injected by SCSI Format
From:       "Cress, Andrew R" <andrew.r.cress () intel ! com>
Date:       2001-11-19 16:10:19
[Download RAW message or body]

Jeremy,

What happens is that while the one disk (ID 1) is in progress of formatting,
the other disk (ID 0) has its IOs suspended also.  When the formatting
completes, both disks are freed up.  My configuration includes software
RAID1, which is why both disks get the same IOs, however the md driver does
not do retries, so either the SCSI mid-layer or the aic7xxx driver (6.2.2)
are suspending/retrying both IOs.  

I can't tell why the IOs are suspended to both disks.
The first error messages I see are:

Nov 13 13:12:09 telco3 kernel: scsi0:0:1:0: Attempting to queue an ABORT
message
Nov 13 13:12:10 telco3 kernel: DevQ(0:0:0): 0 waiting
Nov 13 13:12:10 telco3 kernel: DevQ(0:1:0): 0 waiting
Nov 13 13:12:10 telco3 kernel: (scsi0:A:1:0): Queuing a recovery SCB
Nov 13 13:12:10 telco3 kernel: scsi0:0:1:0: Device is disconnected,
re-queuing SCB
[...]

This check takes place in aic7xxx_linux.c/ahc_linux_queue_recovery_cmd(),
but I can't see anything in that code that would affect the other disk.  
Justin?

Andy

-----Original Message-----
From: Jeremy Higdon [mailto:jeremy@classic.engr.sgi.com]
Sent: Friday, November 16, 2001 12:56 AM
To: Cress, Andrew R; 'linux-scsi@vger.kernel.org'
Subject: Re: Problem using software RAID1 - injected by SCSI Format


On Nov 15,  9:21am, Cress, Andrew R wrote:
> 
> Jeremy,
> 
> Thanks for your response, but this doesn't really address issue (1) below,
> which is the most important to me.
> I'm looking for how to make the other unaffected disk work normally when
the
> first disk is in this condition (elapsed time is around 30 minutes).  The
> IOs should be separate, but for some reason all IOs on the bus are
affected.
> 
> Andy
> 
> -----Original Message-----
> From: Jeremy Higdon [mailto:jeremy@classic.engr.sgi.com]
> Sent: Thursday, November 15, 2001 3:34 AM
> To: Cress, Andrew R; 'linux-scsi@vger.kernel.org'
> Subject: Re: Problem using software RAID1 - injected by SCSI Format
> 
> 
> On Nov 14,  6:11am, Cress, Andrew R wrote:
> [...]
> > When the format is in progress, and a command is issued to read
non-cached
> 
> > data from the disk (e.g. 'cat /var/log/messages'), all SCSI IOs to the 
> > mirror are held pending until the format completes.  The format can be
in 
> > progress up to 30 minutes.  After that, the IOs do complete as if
nothing 
> > else was wrong, and the mdstat still shows all partitions active.
> > 
> > So this presents two issues:
> > 1) The unaffected RAID1 SCSI disk can't process IOs, so the whole system
> is 
> >    unavailable during this condition.
> > 2) It may be desirable to put some limit on how long it will retry
before
> >    marking the disk as offline.



So are you saying that I/O is switched to the second disk, and no requests
are issued to the first?  If so (how does it know that the first disk is
busy?), then it sounds like an HBA driver problem.

jeremy
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic