[prev in list] [next in list] [prev in thread] [next in thread]
List: evms-devel
Subject: Re: [Evms-devel] Re: Bug#146564: kernel-patch-evms: kernel-panics if
From: "Mark Peloquin" <peloquin () us ! ibm ! com>
Date: 2002-05-15 16:47:24
[Download RAW message or body]
On Fri, May 10, 2002, Steve Langasek wrote:
> On Fri, May 10, 2002 at 06:14:19PM -0400, Matt Zimmerman wrote:
> > On Fri, May 10, 2002 at 05:02:45PM -0500, Steve Langasek wrote:
> > > In deciding how to best deploy evms for use on a fiber-channel HA
failover
> > > system, we've found that evms becomes very unhappy if any of the
block
> > > device drivers it has references to are ever unloaded: specifically,
if we
> > > unload the driver for our fiber channel card, and then run 'echo
probe |
> > > evms' to request that evms re-probe, we get a kernel panic and a
segfault,
> > > and we're no longer able to use evms on the system until after a
reboot.
> > > While we can implement a workaround here in wetware, it would
certainly be
> > > nice if evms could do something more graceful when other device
drivers
> > > are unloaded.
Yes, EVMS should handle this more gracefully. I put code in just
handle such cases.
I've went ahead and built my aic7xxx driver as a module and
played around with loading the driver, running evms_rediscover,
unloading the driver, and attempting to mount a volume. And
guess what, I segfaulted. So I dug into this a bit to determine
its cause, and this is what I found. The SCSI driver, in 2.4.18,
has a long standing bug, where it incorrectly returns a non-NULL
value for a device that no longer exists. I don't recall reading
what kernel version you said you were running. From looking at
2.4.19-pre8, I see that this bug has been fixed. After applying
that fix to my sd.c file, I no longer segfaulted, and EVMS handles
the device going away more gracefully without faulting.
I'll create a patch for sd.c that you can try. I'm still
digging further into this to see if there are any other
downstream effects from having an adapter driver for
an in-memory volume removed.
> > It should not be possible to unload modules that EVMS has references
to; the
> > module reference counts should be incremented when the devices are in
use.
> > I have run into a strange situation a few times where the module
reference
> > count becomes negative, but I have not experienced that situation
recently.
> > I have used EVMS 1.0.0 in an environment where I routinely load and
unload
> > host adapter modules, and have never had EVMS crash during discovery.
> > At the point where you remove the module, are there any volumes in use?
Can
> > you reproduce the problem with a sequence of steps? If so, please
provide
> > such a sequence. What do you do before unloading the driver?
> Sequence of events:
> modprobe sd_mod (SCSI disk)
> modprobe isp_mod (QLogic FC 2200 controller)
> echo 'probe' | evms
> modprobe -r isp_mod
> modprobe -r sd_mod
> echo 'probe' | evms
> <segfault>
> All of the volumes we're currently testing with are compatibility mode
> volumes; I don't know yet what effect (if any) creating EVMS volumes
> would have on this behavior, or on the reference counts shown in lsmod
> output. However, lsmod output currently shows that although EVMS
> depends on the presence of the host adapter modules for proper
> functioning, it does *not* increment the module reference count.
> > Please send the output of lsmod at various points during the process,
the
> > kernel log messages, the EVMS engine log (if available) from
> > /var/log/evmsEngine*, and anything else that you feel would be helpful.
> > I'm forwarding your report to the EVMS developers, who will probably
request
> > other information as well.
> Below is the output of lsmod, both before and after asking evms to probe
> for available volumes:
> # lsmod
> Module Size Used by Tainted: P
> isp_mod 426448 0 (unused)
> sd_mod 9852 0 (unused)
> scsi_mod 81752 2 [isp_mod sd_mod]
> EVMS is of course built into the kernel at this point. We also tested
> with EVMS compiled as modules; the behavior was identical, though one
> thing we noticed is that the reference count on the module 'dos_part'
> was equal to the total number of detected compatibility volumes, and the
> reference count on the module 'ldev_mgr' was equal to the total number
> of referenced targets. Neither of these reference counts decremented
> when unloading the module for the host adapter.
> I am attaching the most recent evmsEngine.log from one of our crashes.
> The kernel oops is included inline below.
> Finally, we've found that if we only unload isp_mod and THEN rerun the
> evms probe command, that is, with sd_mod still loaded, the kernel oops
> does not occur. I imagine knowing that will narrow the search a little
> bit.
Mark
diff -Naur org/drivers/scsi/sd.c new/drivers/scsi/sd.c
--- org/drivers/scsi/sd.c Wed May 15 11:39:38 2002
+++ new/drivers/scsi/sd.c Wed May 15 11:38:57 2002
@@ -279,7 +279,7 @@
target = DEVICE_NR(dev);
dpnt = &rscsi_disks[target];
- if (!dpnt)
+ if (!dpnt->device)
return NULL; /* No such device */
return &dpnt->device->request_queue;
}
@@ -302,7 +302,7 @@
dpnt = &rscsi_disks[dev];
if (devm >= (sd_template.dev_max << 4) ||
- !dpnt ||
+ !dpnt->device ||
!dpnt->device->online ||
block + SCpnt->request.nr_sectors > sd[devm].nr_sects) {
SCSI_LOG_HLQUEUE(2, printk("Finishing %ld sectors\n", SCpnt->request.nr_sectors));
_______________________________________________________________
Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
_______________________________________________
Evms-devel mailing list
Evms-devel@lists.sourceforge.net
To subscribe/unsubscribe, please visit:
https://lists.sourceforge.net/lists/listinfo/evms-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic