[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-2d-dev
Subject:    Re: [OpenJDK 2D-Dev] <AWT Dev> Request for review: 8026385: [macosx] (awt) setjmp/longjmp changes th
From:       Staffan Larsen <staffan.larsen () oracle ! com>
Date:       2014-05-20 13:55:00
Message-ID: DBB2C7E5-57A8-436E-A8E3-3BF805FB7F1A () oracle ! com
[Download RAW message or body]


On 20 maj 2014, at 10:34, Volker Simonis <volker.simonis@gmail.com> wrote:

> Hi everybody,
> 
> I've analyzed this issue on AIX (and HPUX) and came to the conclusion
> that it is not a problem on these platforms.
> 
> Just to make sure I got everything right, I'll summarize my
> understanding of the problem here:
> 
> On MacOS, setjmp() saves the signal mask of the current thread BUT
> restores the signal mask of the whole process after the corresponding
> longjmp(). This obviously changes the signal mask of all threads which
> had a different signal mask than the initial thread who called
> setjmp(). I could easily reproduce this by loading a class which
> provokes a verification exception in the un-fixed libverify.so.
> Afterwards, the VM-thread won't listen to SIGQUIT anymore because it
> now masks this signal in the same way like the Java thread which had
> the verification exception.
> 
> On AIX/HPUX setjmp()/longjmp() is defined in the same way like on
> MacOS in that it saves and restores the signal mask (and both
> platforms provide _setjmp()/_longjmp() versions as well which do not
> manipulate the signal mask). But while it is not clear from the
> documentation, all my tests showed that longjmp() only restores the
> signal mask of the current thread, which should be OK (i.e. I couldn't
> see a blocked VM thread which doesn't react on SIQUIT anymore).
> 
> So to cut a long story short - no action seems to be required to fix
> this issue on AIX:
> 
> As a side note I wonder why we didn't use sigsetjmp()/siglongjmp()
> with a zero 'savemask' argument to fix this problem? It is
> standardized by POSIX and would avoid the usage of "#ifdef __APPLE”.

It’s not clear to me if sigsetjmp()/siglongjmp() sets the signal mask on the process \
or on the pthread. The problem on OS X was that the VM sets up the sigprocmask on the \
pthreads (using pthread_sigmask()), but longjmp() restores the mask for the process \
(using sigprocmask()).


/Staffan


> 
> Thank you and best regards,
> Volker
> 
> 
> On Fri, May 16, 2014 at 6:04 PM, David DeHaven <david.dehaven@oracle.com> wrote:
> > 
> > I'd thought about that, since Apple borrowed most of it's underpinnings from BSD, \
> > but had no evidence to suggest it was necessary. 
> > Anyways, at least you've identified and can rectify the issue :)
> > 
> > -DrD-
> > 
> > > Wow, sometimes it really makes sense to read apparently unrelated
> > > email-threads on Friday afternoon:)
> > > 
> > > It seems that AIX/HPUX have exactly the same problem like MacOS X.
> > > From the AIX setjmp man-page:
> > > 
> > > "The setjmp and longjmp subroutines save and restore the signal mask
> > > sigmask (2), while _setjmp and _longjmp manipulate only the stack
> > > context."
> > > 
> > > From the HPUXM setjmp man-page:
> > > 
> > > "setjmp(), longjmp()         These always save and restore the signal mask.
> > > _setjmp(), _longjmp()        These never manipulate the signal mask."
> > > 
> > > I think it doesn't make sense for you to wait pushing this until I
> > > provide the corresponding AIX patches because anyway I'll have to
> > > provide a fix not only for this issue but also for the other bugs you
> > > mentioned (i.e. 8023786 and 8023720). So I'll better create a new bug
> > > for AIX.
> > > 
> > > Thank you and best regards,
> > > Volker
> > > 
> > > On Fri, May 16, 2014 at 5:38 PM, David DeHaven <david.dehaven@oracle.com> \
> > > wrote:
> > > > 
> > > > Thanks!
> > > > 
> > > > -DrD-
> > > > 
> > > > > The splashscreen changes look fine to me. Approved.
> > > > > 
> > > > > --
> > > > > best regards,
> > > > > Anthony
> > > > > 
> > > > > On 5/16/2014 7:18 PM, David DeHaven wrote:
> > > > > > 
> > > > > > Could someone on AWT team approve the splashscreen changes?
> > > > > > 
> > > > > > -DrD-
> > > > > > 
> > > > > > > Approved.
> > > > > > > 
> > > > > > > -phil.
> > > > > > > 
> > > > > > > On 5/15/2014 9:31 AM, David DeHaven wrote:
> > > > > > > > Ping!
> > > > > > > > 
> > > > > > > > Does this look OK?
> > > > > > > > 
> > > > > > > > I've also filed an issue against JavaFX:
> > > > > > > > https://javafx-jira.kenai.com/browse/RT-37125
> > > > > > > > 
> > > > > > > > -DrD-
> > > > > > > > 
> > > > > > > > > > > > I tried not modifying libpng but still ended up with \
> > > > > > > > > > > > lingering references to longjmp in pngread.o, despite libpng \
> > > > > > > > > > > > having png_ptr->longjmp_fn (bug in libpng?). pngread.c calls \
> > > > > > > > > > > > setjmp to set a default location to jump to in case the \
> > > > > > > > > > > > caller doesn't call setjmp, so if we continue down this path \
> > > > > > > > > > > > something in libpng must be modified. The only other option \
> > > > > > > > > > > > is to create our own setjmp.h and order it before \
> > > > > > > > > > > > /usr/include/setjmp.h, which seems dubious at best. 
> > > > > > > > > > > > I'm curious if the libpng changes are even needed since it's \
> > > > > > > > > > > > only used for splashscreen, which happens very early in the \
> > > > > > > > > > > > launch process. Also note that we didn't originally even call \
> > > > > > > > > > > > png_set_longjmp_fn, so any error should have resulted in an \
> > > > > > > > > > > > abort() instead of a call to longjmp... it appears we could \
> > > > > > > > > > > > retain the functionality we have today and #undef \
> > > > > > > > > > > > PNG_SETJMP_SUPPORTED (pngconf.h?). That would put the onus on \
> > > > > > > > > > > > developers to make sure their pngs don't have errors in them, \
> > > > > > > > > > > > or libsplashscreen will abort()... 
> > > > > > > > > > > > 
> > > > > > > > > > > That's an interesting question and the answer might extend to \
> > > > > > > > > > > the splashscreen changes too. Its platform specific code and on \
> > > > > > > > > > > MAC, the thread is created using pthreads directly and that \
> > > > > > > > > > > thread goes away once splashscreen is done. But its running at \
> > > > > > > > > > > the same time as the VM is booting up and creating threads and \
> > > > > > > > > > > setting their signal masks. So I don't think you can guarantee \
> > > > > > > > > > > that it won't mess up the masks on the JRE threads if the PNG \
> > > > > > > > > > > is bad. And I'm also not sure you want to remove error handling \
> > > > > > > > > > > from the library either. So a HIGHLY VISIBLE DO NOT REMOVE \
> > > > > > > > > > > comment might be the best you can do here.
> > > > > > > > > > I have a better idea:
> > > > > > > > > > 
> > > > > > > > > > png_default_error is the only place where png_longjmp is called. \
> > > > > > > > > > We could call png_set_error_fn to set up our own error handler \
> > > > > > > > > > (for Mac only), compile with PNG_SETJMP_SUPPORTED unset so it \
> > > > > > > > > > doesn't pull in setjmp/longjmp and our own implementation of the \
> > > > > > > > > > error handler would call _longjmp, which would jump back to where \
> > > > > > > > > > we call setjmp currently.
> > > > > > > > > Ok, I figured out what's going on. It's not quite intuitive...
> > > > > > > > > 
> > > > > > > > > png_jmpbuf is a macro defined in png.h, this calls \
> > > > > > > > > png_set_longjmp_fn with longjmp, which is why I was seeing \
> > > > > > > > > references to longjmp in the object file. That's what was throwing \
> > > > > > > > > me off as it seems like it should only be getting the jmp_buf ptr \
> > > > > > > > > stored in the png_ptr. I guess the intention was that \
> > > > > > > > > setjmp/longjmp was optional, if you don't call setjmp then it just \
> > > > > > > > > abort()s. 
> > > > > > > > > 
> > > > > > > > > I changed splashscreen_png.c to:
> > > > > > > > > #ifdef __APPLE__
> > > > > > > > > if (_setjmp(png_set_longjmp_fn(png_ptr, _longjmp, \
> > > > > > > > > sizeof(jmp_buf)))) { #else
> > > > > > > > > if (setjmp(png_jmpbuf(png_ptr))) {
> > > > > > > > > #endif
> > > > > > > > > 
> > > > > > > > > and it calls _longjmp instead. I verified this works by changing \
> > > > > > > > > the macro to set png_longjmp to exit() and without the above change \
> > > > > > > > > it does indeed exit prematurely with a bad png, with the change it \
> > > > > > > > > reports the error but continues to load the application as would be \
> > > > > > > > > expected. 
> > > > > > > > > pngread.o still has a symbol table entry for _longjmp instead of \
> > > > > > > > > __longjmp, but it's benign since we're ultimately forcing it to use \
> > > > > > > > > the correct function. So I've left libpng completely unchanged. 
> > > > > > > > > 
> > > > > > > > > With the change and using a bad png for splashscreen, I was able to \
> > > > > > > > > get a stack trace once the application was running. Without the \
> > > > > > > > > change to splashscreen_png.c, jstack was unable to connect to the \
> > > > > > > > > process. So splashscreen absolutely can interfere with the signal \
> > > > > > > > > handling. 
> > > > > > > > > 
> > > > > > > > > Updated webrev:
> > > > > > > > > http://cr.openjdk.java.net/~ddehaven/8026385/jdk.1/
> > > > > > > > > 
> > > > > > > > > I can look into writing a regression test for this. It might not be \
> > > > > > > > > trivial though since we're dealing with signal handlers, and if \
> > > > > > > > > timing is a factor the test may not be reliable. 
> > > > > > > > > -DrD-
> > > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > 
> > 


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic