[prev in list] [next in list] [prev in thread] [next in thread] 

List:       lustre-discuss
Subject:    [Lustre-discuss] [wc-discuss] System Deadlock
From:       adilger () whamcloud ! com (Andreas Dilger)
Date:       2011-08-18 17:09:30
Message-ID: F37E338E-215D-4437-96F3-E54ACA0A3B06 () whamcloud ! com
[Download RAW message or body]

On 2011-08-18, at 7:50 AM, "Roger Spellman" <Roger.Spellman at terascala.com> wrote:

> Andreas, 
> Thanks for you reply.  It was very helpful.  See my responses, below.
> 
> > > I am in the process of porting Lustre client 1.8.4 to a recent
> kernel,
> > 2.6.38.8.
> > 
> > That is somewhat an unfortunate starting point, since 1.8.6 clients at
> > least work with 2.6.32 kernels.
> 
> I understand.  I started this project before 1.8.6 came out, and I
> wanted to stick with 1.8.4, in case any problems came up with 1.8.6.  As
> soon as I am done with 1.8.4, I will port my patch to 1.8.6.  
> 
> > It's difficult to make any kind of assessment without knowing what
> changes
> > you have made to the client.  It would be useful if you would submit a
> > series of patches so that we can take a look at your patches.
> 
> My plan was to get it working, then post the patch to anyone who wanted
> it.  That should be pretty soon.  I'm assuming that other people are
> wanting to run Lustre with recent kernels.
> 
> > No, the Linux stack traces are terrible, they just print anything that
> > looks like the address of a kernel or module function.  That includes
> > function addresses that are passed as function parameters, such as
> > callback functions.  It must have hit an interrupt at one point, but I
> > think it is just random garbage on the stack.
> 
> Too bad.  I compiled the Kernel with Frame Pointers, so I hoped that the
> kernel could unwind the stack properly.  

That helps, but AFAIK it isn't 100% correct even then. 

> Now that I know to ignore the Stack Trace, I can instrument the code to
> track down this problem. 

I don't think you need to ignore the stack, just treat it with caution and look for a \
valid callpath through the listed functions. 

Cheers, Andreas


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic