[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-serviceability-dev
Subject:    Re: RFR: 8032901: WaitForMultipleObjects() return value not handled appropriately
From:       Staffan Larsen <staffan.larsen () oracle ! com>
Date:       2014-05-15 9:11:05
Message-ID: 90D75432-E4D0-4B5F-89BB-0BA1F4653482 () oracle ! com
[Download RAW message or body]


On 15 maj 2014, at 03:48, David Holmes <david.holmes@oracle.com> wrote:

> On 14/05/2014 11:18 PM, Aleksej Efimov wrote:
> > David, Vitaly,
> > 
> > I totally agree with Vitaly's explanation (Vitaly - thank you for that)
> > and code in shmemBase.c (the usage of enterMutex() function for
> > sending/receiving bytes through shared memory connection) illustrates on
> > how the connection shutdown event is used as a "cancellation token".
> 
> Thanks for clarifying that.
> 
> So if we were to encounter an abandoned mutex the code would presently have \
> acquired the mutex but return an error, thus preventing a subsequent release, and \
> preventing other threads from acquiring (but allowing current thread to \
> recurisevely acquire. So this could both hang and cause data corruption. 
> The new code will still return an error but release the mutex. So no more hangs \
> (other than by conditions caused by data corruption) but more opportunity for data \
> corruption. 
> Obviously it depends on exactly what happens in the critical sections guarded by \
> this mutex, but in general I don't agree with this rationale for making the change: \
>  204     /* If the mutex is abandoned we want to return an error
> 205      * and also release the mutex so that we don't cause
> 206      * other threads to be blocked. If a mutex was abandoned
> 207      * we are in scary state. Data may be corrupted or inconsistent
> 208      * but it is still better to let other threads run (and possibly
> 209      * crash) than having them blocked (on the premise that a crash
> 210      * is always easier to debug than a hang).
> 
> Considering something has to have gone drastically wrong for the mutex to become \
> abandoned, I'm more inclined to consider this a fatal error and abort. 
> But I'll let the serviceability folk chime in here.

I was involved in fixing this and writing the comment, so obviously I thought it a \
good solution :-)

I do agree that it would probably be a good idea to consider this a fatal error and \
abort. At that point in the code we don’t have any really nice ways of doing that, \
though. We could just print an error and call abort(). What we are doing now is \
returning an error from sysIPMutexEnter() and delegating the error handling to the \
caller. We have tried to check all call paths to verify that they do “the right \
thing” in the face of an error. It is obviously hard to verify, but it looks like \
they all terminate the connection with some kind of error message.

/Staffan


> 
> Thanks,
> David
> 
> 
> > Thank you,
> > -Aleksej
> > 
> > 
> > On 05/14/2014 01:05 PM, David Holmes wrote:
> > > On 14/05/2014 11:06 AM, Vitaly Davidovich wrote:
> > > > In windows, you acquire a mutex by waiting on it using one of the wait
> > > > functions, one of them employed in the code in question.  If
> > > > WaitForMultipleObjects succeeds and returns the index of the mutex,
> > > > current thread has ownership now.
> > > 
> > > Yes I understand the basic mechanics :)
> > > 
> > > > It's also common to use multi wait functions where the event is a
> > > > "cancelation token", e.g. manual reset event; this allows someone to
> > > > cancel waiting on mutex acquisition and return from the wait function.
> > > > Presumably that's the case here, but I'll let Aleksej confirm; just
> > > > wanted to throw this out there in the meantime :).
> > > 
> > > Ah I see - yes cancellable lock acquisition would make sense.
> > > 
> > > Thanks,
> > > David
> > > 
> > > > Sent from my phone
> > > > 
> > > > On May 13, 2014 6:46 PM, "David Holmes" <david.holmes@oracle.com
> > > > <mailto:david.holmes@oracle.com>> wrote:
> > > > 
> > > > Hi Aleksej,
> > > > 
> > > > Thanks for the doc references regarding abandonment.
> > > > 
> > > > Let me rephrase my question. What is this logic trying to achieve by
> > > > waiting on both a mutex and an event? Do we already own the mutex
> > > > when this function is called?
> > > > 
> > > > David
> > > > 
> > > > On 13/05/2014 11:19 PM, Aleksej Efimov wrote:
> > > > 
> > > > David,
> > > > 
> > > > The Windows has a different terminology for mutex objects (much
> > > > differs
> > > > from the POSIX one). This one link gave me some understanding of
> > > > it [1].
> > > > 
> > > > Here is the MSDN [1] description of what "abandoned mutex" is:
> > > > " If a thread terminates without releasing its ownership of a
> > > > mutex
> > > > object, the mutex object is considered to be abandoned. A
> > > > waiting thread
> > > > can acquire ownership of an abandoned mutex object, but the wait
> > > > function will return*WAIT_ABANDONED*to indicate that the mutex
> > > > object is
> > > > abandoned. An abandoned mutex object indicates that an error has
> > > > occurred and that any shared resource being protected by the
> > > > mutex
> > > > object is in an undefined state. If the thread proceeds as
> > > > though the
> > > > mutex object had not been abandoned, it is no longer considered
> > > > abandoned after the thread releases its ownership. This restores
> > > > normal
> > > > behavior if a handle to the mutex object is subsequently
> > > > specified in a
> > > > wait function."
> > > > 
> > > > 
> > > > What does it mean to wait on mutex and ownership of the mutex
> > > > object:
> > > > "Any thread with a handle to a mutex object can use one of
> > > > thewait
> > > > functions
> > > > <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687069%28v=vs.85%29.aspx
> > > >  <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687069%28v=vs.85%29.aspx>>to
> > > >  request ownership of the mutex object. If the mutex object is
> > > > owned by
> > > > another thread, the wait function blocks the requesting thread
> > > > until the
> > > > owning thread releases the mutex object using the*ReleaseMutex*
> > > > <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms685066%28v=vs.85%29.aspx
> > > >  <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms685066%28v=vs.85%29.aspx>>__function."
> > > >  
> > > > How we can release mutex and wait on already owned mutex:
> > > > " After a thread obtains ownership of a mutex, it can specify
> > > > the same
> > > > mutex in repeated calls to the wait-functions
> > > > <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687069%28v=vs.85%29.aspx
> > > >  <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687069%28v=vs.85%29.aspx>>__without
> > > >  blocking its execution. This prevents a thread from deadlocking
> > > > itself
> > > > while waiting for a mutex that it already owns. To release its
> > > > ownership
> > > > under such circumstances, the thread must call*ReleaseMutex*
> > > > <http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms685066%28v=vs.85%29.aspx
> > > >  <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms685066%28v=vs.85%29.aspx>>__once
> > > >  for each time that the mutex satisfied the conditions of a wait
> > > > function."
> > > > 
> > > > [1]
> > > > http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms684266(v=vs.85).aspx
> > > >  <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms684266(v=vs.85).aspx>
> > > >  
> > > > -Aleksej
> > > > 
> > > > On 05/13/2014 04:00 PM, David Holmes wrote:
> > > > 
> > > > I don't understand this one at all. What is an "abandoned
> > > > mutex"? For
> > > > that matter why does the code wait on a mutex and an event?
> > > > Do we
> > > > already own the mutex? If so what does it mean to wait on
> > > > it? If not
> > > > then how can we release it?
> > > > 
> > > > ???
> > > > 
> > > > Thanks,
> > > > David
> > > > 
> > > > 
> > > > On 13/05/2014 8:57 PM, Alan Bateman wrote:
> > > > 
> > > > 
> > > > This is debugger's shared memory transport so cc'ing
> > > > serviceability-dev
> > > > as that is there this code is maintained.
> > > > 
> > > > Is there a test case or any outline of the conditions
> > > > that cause this? I
> > > > think that would be useful to understand the issue
> > > > further.
> > > > 
> > > > -Alan
> > > > 
> > > > On 13/05/2014 11:46, Aleksej Efimov wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > Can I have a review for 8032901 bug [1] fix [2].
> > > > There is a possible
> > > > case when 'WaitForMultipleObjects' function can
> > > > return the
> > > > WAIT_ABANDONED_0 [3] error value.
> > > > In such case it's better to release the mutex and
> > > > return error value.
> > > > This will prevent other threads to be blocked on
> > > > abandoned mutex.
> > > > 
> > > > Thank you,
> > > > Aleksej
> > > > 
> > > > [1]
> > > > https://bugs.openjdk.java.net/__browse/JDK-8032901
> > > > <https://bugs.openjdk.java.net/browse/JDK-8032901>
> > > > [2]
> > > > http://cr.openjdk.java.net/~__aefimov/8032901/9/webrev.00/
> > > > <http://cr.openjdk.java.net/~aefimov/8032901/9/webrev.00/>
> > > > [3]
> > > > http://msdn.microsoft.com/en-__gb/library/windows/desktop/__ms687025(v=vs.85).aspx
> > > >  <http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687025(v=vs.85).aspx>
> > > >  
> > > > 
> > > > 
> > > > 
> > 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic