[prev in list] [next in list] [prev in thread] [next in thread] 

List:       evms-devel
Subject:    Re: [Evms-devel] Re: Bug#146564: kernel-patch-evms: kernel-panics if
From:       "Mark Peloquin" <peloquin () us ! ibm ! com>
Date:       2002-05-15 16:47:24
[Download RAW message or body]


On Fri, May 10, 2002, Steve Langasek wrote:
> On Fri, May 10, 2002 at 06:14:19PM -0400, Matt Zimmerman wrote:
> > On Fri, May 10, 2002 at 05:02:45PM -0500, Steve Langasek wrote:

> > > In deciding how to best deploy evms for use on a fiber-channel HA
failover
> > > system, we've found that evms becomes very unhappy if any of the
block
> > > device drivers it has references to are ever unloaded: specifically,
if we
> > > unload the driver for our fiber channel card, and then run 'echo
probe |
> > > evms' to request that evms re-probe, we get a kernel panic and a
segfault,
> > > and we're no longer able to use evms on the system until after a
reboot.
> > > While we can implement a workaround here in wetware, it would
certainly be
> > > nice if evms could do something more graceful when other device
drivers
> > > are unloaded.

Yes, EVMS should handle this more gracefully. I put code in just
handle such cases.

I've went ahead and built my aic7xxx driver as a module and
played around with loading the driver, running evms_rediscover,
unloading the driver, and attempting to mount a volume. And
guess what, I segfaulted. So I dug into this a bit to determine
its cause, and this is what I found. The SCSI driver, in 2.4.18,
has a long standing bug, where it incorrectly returns a non-NULL
value for a device that no longer exists. I don't recall reading
what kernel version you said you were running. From looking at
2.4.19-pre8, I see that this bug has been fixed. After applying
that fix to my sd.c file, I no longer segfaulted, and EVMS handles
the device going away more gracefully without faulting.

I'll create a patch for sd.c that you can try. I'm still
digging further into this to see if there are any other
downstream effects from having an adapter driver for
an in-memory volume removed.

> > It should not be possible to unload modules that EVMS has references
to; the
> > module reference counts should be incremented when the devices are in
use.
> > I have run into a strange situation a few times where the module
reference
> > count becomes negative, but I have not experienced that situation
recently.
> > I have used EVMS 1.0.0 in an environment where I routinely load and
unload
> > host adapter modules, and have never had EVMS crash during discovery.

> > At the point where you remove the module, are there any volumes in use?
Can
> > you reproduce the problem with a sequence of steps?  If so, please
provide
> > such a sequence.  What do you do before unloading the driver?

> Sequence of events:

> modprobe sd_mod (SCSI disk)
> modprobe isp_mod (QLogic FC 2200 controller)
> echo 'probe' | evms
> modprobe -r isp_mod
> modprobe -r sd_mod
> echo 'probe' | evms
> <segfault>

> All of the volumes we're currently testing with are compatibility mode
> volumes; I don't know yet what effect (if any) creating EVMS volumes
> would have on this behavior, or on the reference counts shown in lsmod
> output.  However, lsmod output currently shows that although EVMS
> depends on the presence of the host adapter modules for proper
> functioning, it does *not* increment the module reference count.

> > Please send the output of lsmod at various points during the process,
the
> > kernel log messages, the EVMS engine log (if available) from
> > /var/log/evmsEngine*, and anything else that you feel would be helpful.

> > I'm forwarding your report to the EVMS developers, who will probably
request
> > other information as well.

> Below is the output of lsmod, both before and after asking evms to probe
> for available volumes:

> # lsmod
> Module                  Size  Used by    Tainted: P
> isp_mod               426448   0  (unused)
> sd_mod                  9852   0  (unused)
> scsi_mod               81752   2  [isp_mod sd_mod]

> EVMS is of course built into the kernel at this point.  We also tested
> with EVMS compiled as modules; the behavior was identical, though one
> thing we noticed is that the reference count on the module 'dos_part'
> was equal to the total number of detected compatibility volumes, and the
> reference count on the module 'ldev_mgr' was equal to the total number
> of referenced targets.  Neither of these reference counts decremented
> when unloading the module for the host adapter.

> I am attaching the most recent evmsEngine.log from one of our crashes.
> The kernel oops is included inline below.

> Finally, we've found that if we only unload isp_mod and THEN rerun the
> evms probe command, that is, with sd_mod still loaded, the kernel oops
> does not occur.  I imagine knowing that will narrow the search a little
> bit.

Mark

diff -Naur org/drivers/scsi/sd.c new/drivers/scsi/sd.c
--- org/drivers/scsi/sd.c     Wed May 15 11:39:38 2002
+++ new/drivers/scsi/sd.c     Wed May 15 11:38:57 2002
@@ -279,7 +279,7 @@
      target = DEVICE_NR(dev);

      dpnt = &rscsi_disks[target];
-     if (!dpnt)
+     if (!dpnt->device)
            return NULL;      /* No such device */
      return &dpnt->device->request_queue;
 }
@@ -302,7 +302,7 @@

      dpnt = &rscsi_disks[dev];
      if (devm >= (sd_template.dev_max << 4) ||
-         !dpnt ||
+         !dpnt->device ||
          !dpnt->device->online ||
          block + SCpnt->request.nr_sectors > sd[devm].nr_sects) {
            SCSI_LOG_HLQUEUE(2, printk("Finishing %ld sectors\n", SCpnt->request.nr_sectors));




_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
_______________________________________________
Evms-devel mailing list
Evms-devel@lists.sourceforge.net
To subscribe/unsubscribe, please visit:
https://lists.sourceforge.net/lists/listinfo/evms-devel
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic