[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-runtime-dev
Subject:    Re: RFR(XXS): 8188109 JVM should print a warning message that -Xshare:on may cause VM to abort start
From:       Thomas_Stüfe <thomas.stuefe () gmail ! com>
Date:       2018-05-31 6:33:05
Message-ID: CAA-vtUxO0LMMw_8LDt=MfuvGd5RZpKLafuWmK3h+NKB9qE8-LA () mail ! gmail ! com
[Download RAW message or body]

On Thu, May 31, 2018 at 8:09 AM, Ioi Lam <ioi.lam@oracle.com> wrote:
>
>
> On 5/30/18 10:57 PM, Thomas St=C3=BCfe wrote:
>>
>> On Thu, May 31, 2018 at 7:50 AM, Ioi Lam <ioi.lam@oracle.com> wrote:
>>>
>>>
>>> On 5/30/18 9:53 PM, David Holmes wrote:
>>>>
>>>> On 31/05/2018 2:24 PM, Ioi Lam wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 5/30/18 9:13 PM, David Holmes wrote:
>>>>>>
>>>>>> Hi Ioi,
>>>>>>
>>>>>> On 31/05/2018 2:01 PM, Ioi Lam wrote:
>>>>>>>
>>>>>>> On 5/30/18 6:47 PM, David Holmes wrote:
>>>>>>>>
>>>>>>>> Hi Ioi,
>>>>>>>>
>>>>>>>> Sorry but this troubles me ...
>>>>>>>>
>>>>>>>> On 31/05/2018 9:39 AM, Ioi Lam wrote:
>>>>>>>>>
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8188109
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~iklam/jdk11/8188109-xshare-on-print-w=
arning.v01/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Please review this one-liner patch.
>>>>>>>>>
>>>>>>>>> -Xshare:on may cause infrequent/intermittent start-up failure due
>>>>>>>>> to
>>>>>>>>> the presence of Address Space Layout Randomization (ASLR). This
>>>>>>>>> option is
>>>>>>>>> intended for testing (the internals of CDS) only and should not b=
e
>>>>>>>>> used in
>>>>>>>>> production environments.
>>>>>>>>>
>>>>>>>>> With this patch, the following warning message is printed when
>>>>>>>>> -Xshare:on is specified:
>>>>>>>>>
>>>>>>>>> $ java -Xshare:on -version
>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM warning: -Xshare:on is for
>>>>>>>>> testing
>>>>>>>>> purpose only and may cause JVM start-up failure. Use -Xshare:auto
>>>>>>>>> instead.
>>>>>>>>> java version "11-internal" 2018-09-25
>>>>>>>>> Java(TM) SE Runtime Environment 18.9 (fastdebug build
>>>>>>>>> 11-internal+0-adhoc.iklam.open)
>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM 18.9 (fastdebug build
>>>>>>>>> 11-internal+0-adhoc.iklam.open, mixed mode, sharing)
>>>>>>>>
>>>>>>>>
>>>>>>>> So should this warning only be enabled in product builds?
>>>>>>>>
>>>>>>>> Even then it may be annoying for anyone who runs with -Xshare:on a=
s
>>>>>>>> they've set up CDS as documented [1][2] and they know their
>>>>>>>> environment
>>>>>>>> works ok - now they get a warning.
>>>>>>>>
>>>>>>>> Also I'm unclear how "on" fails due to ASLR but "auto" keeps going=
?
>>>>>>>>
>>>>>>> The documentation [1] says:
>>>>>>>
>>>>>>> -Xshare:on
>>>>>>> To enable class data sharing. If class data sharing can't be enable=
d,
>>>>>>> print an error message and exit.
>>>>>>>
>>>>>>> -Xshare:auto
>>>>>>> To enable class data sharing by default. Enable class data sharing
>>>>>>> whenever possible.
>>>>>>>
>>>>>>> So if mapping fails due to ASLR, "on" will exit and "auto" will
>>>>>>> disable
>>>>>>> CDS and continue .
>>>>>>
>>>>>>
>>>>>> Ah! In the bug you state "-Xshare:auto continue to execute (with CDS
>>>>>> enabled)" - so that should be disabled.
>>>>>>
>>>>>>> The documentation in [2] is wrong.  It says "Ensure that you have
>>>>>>> specified the option -Xshare:on or -Xshare:auto.", but -Xshare:on
>>>>>>> should not
>>>>>>> be used in production environments. I have filed a doc bug for this
>>>>>>> (https://bugs.openjdk.java.net/browse/JDK-8204141).
>>>>>>>
>>>>>>> The main reason for doing this REF is -- if people have been
>>>>>>> following
>>>>>>> [2] and using -Xshare:on, their setup is NOT OK. ASLR may happen ju=
st
>>>>>>> very
>>>>>>> rarely, but you don't want your program suddenly failing (e.g., if
>>>>>>> some
>>>>>>> admin has turned on more aggressive ASLR settings).
>>>>>>
>>>>>>
>>>>>> But conversely you don't want your application to suddenly and
>>>>>> silently
>>>>>> stop using CDS because of ASLR and you've masked that by using "auto=
"!
>>>>>>
>>>>>>> As more people are moving their Java workload to micro-services typ=
e
>>>>>>> of
>>>>>>> environments, JVM launches will happen more, and there will be more
>>>>>>> chances
>>>>>>> of running into the ASLR issue. Therefore, we should fix the docs,
>>>>>>> and warn
>>>>>>> people that they should switch to -Xshare:auto.
>>>>>>
>>>>>>
>>>>>> That's only a partial solution. If ASLR is a problem then that needs
>>>>>> to
>>>>>> be known and addressed.
>>>>>>
>>>>>>>> Maybe only if the archive mapping fails and "on" was used then giv=
e
>>>>>>>> a
>>>>>>>> warning? Or just improve the message given when the VM aborts?
>>>>>>>>
>>>>>>> That's already too late, especially for people running critical
>>>>>>> services.
>>>>>>>
>>>>>>> We want people to see this warning and actively fix their scripts t=
o
>>>>>>> get rid -Xshare:on.
>>>>>>
>>>>>>
>>>>>> I think what we want/need people to realize is that ASLR can serious=
ly
>>>>>> impact the ability to use CDS and if you are trying to use CDS for
>>>>>> startup
>>>>>> or footprint reasons then you're going to have a major problem if CD=
S
>>>>>> is
>>>>>> silently disabled!
>>>>>>
>>>>>> I think "on" and "auto" are both just as valid. "on" is for people w=
ho
>>>>>> need CDS reliably and want to fail fast if it's not working. "auto" =
is
>>>>>> for
>>>>>> people who would like CDS but can live without it.
>>>>>>
>>>>> I think -Xshare:on has the potential to do much more harm than good.
>>>>>
>>>>> People have lived with the fact that optimizations in the JVM are not
>>>>> always deterministic. They want their programs to run regardless. CDS
>>>>> is the
>>>>> only optimization that has an option to say "let the program fail if
>>>>> the
>>>>> optimization is not available".
>>>>
>>>>
>>>> I don't agree with that characterization - I think it oversimplifies t=
he
>>>> situation. If you want to push this analogy then the right analogy wou=
ld
>>>> be
>>>> allowing "-server" to silently run the interpreter instead because the=
re
>>>> was
>>>> some error configuring the JIT! I wonder how many users would be happy
>>>> with
>>>> that! But the bulk of optimizations are not things that can fail as su=
ch
>>>> so
>>>> I don't think the comparison holds.
>>>>
>>>> I think this is purely a documentation and education issue. Particular=
ly
>>>> because this is not something new with JDK 11 - this is an issue that
>>>> exists
>>>> with CDS in all releases. So you want to get the message out to all
>>>> users
>>>> that "auto" may be preferable to "on".
>>>>
>>>> Until something actually fails I doubt anyone would notice your warnin=
g
>>>> anyway.
>>>>
>>>> Maybe we need -Xshare:on_and_I_really_mean_it ;-)
>>>>
>>> -Xshare:on was an ill-conceived and dangerous option. It was designed
>>> when
>>> ASLR wasn't common. With ASLR more in common use, and short-lived JVMs
>>> becoming more common, the danger is getting bigger and bigger.
>>>
>> Just out of curiosity (I never payed close attention to CDS), how do
>> you communicate the mapping address the first process establishes to
>> the subsequent processes attaching? Or do you just have a fixed well
>> known address value baked into the jvm?
>>
>> I just try to understand how probable a failed mapping could be in a
>> 64bit address space.
>
>
> The CDS archive is created at a fixed address. The default value is in th=
e
> SharedBaseAddress option (0x800000000 on 64-bit). This is usually a good
> range on Linux, as ASLR (usually) does not place shared library segments =
in
> this range.
>
> However, we have analyzed our distributed test runs and found a small
> percentage (<5%?? can't remember the exact number) of cases where the
> mapping would fail. So overall you get the benefit of CDS, but not every
> single time.
>
> We have been thinking about making the CDS archive relocatable, but we're
> not there yet :-(
>
> In the short term, we can relocate by patching all the pointers. We just
> added the ability to iterate over all metaspace pointers in JDK 10 (see
> metaspaceClosure.hpp) so we can map to an alternative address and relocat=
e.
> We can probably do part of the relocation incrementally as the classes ar=
e
> being loaded.
>

Okay... so, am I understanding this right, you fix up pointers in the
java heap or in other non-shared process local memory sections,
pointing to the metaspace? E.g. the object Klass* pointers?

What do you do about metaspace internal pointers, e.g. pointers from
one Metachunk to another? You cannot fix them, right, since you share
the memory with other processes which may have mapped it at different
bases?

> In the long term, we probably should make the metadata position independe=
nt,
> so it can be mapped at any address. Not quite sure how to do that yet ...
>

This sounds (to my very uninformed mind) more promising - by using
indexes instead of pointers into the metaspace. E.g. to get a Klass*,
add base + index. Like we do already with compressed class pointers.

For many things one could even use 32bit indices, at least for Klass*
pointers, since the compressed class space cannot exceed 32bit anyway.

I recently though about a similar thing myself, at a much much smaller
scale, I wondered whether it would be worth to replace all linking
pointers in Metachunk with 32bit indices (or even 16bit) indices to
shrink the Metachunk header.

One may have to get rid of the notion of linking Metaspace nodes
together (VirtualSpaceList) in favour of one continuous memory block -
but that may have other advantages too (make metaspace coding simpler,
allow for page-wise de-commit to shrink process memory, reduce number
of memory mappings per process...)

Thanks Thomas

>
> Thanks
> - Ioi
>
>>> Just because a bad option has existed for a long time does not mean we
>>> should not fix it.
>>>
>>> The only reason for this option to exist is for diagnostic purposes. It
>>> can
>>> check for
>>>
>>>   * Did my archive fail to map due to ASLR?
>>>   * Did I specify a bad path to the archive?
>>>   * Did I specify a bad archive file (e.g., one created by a different
>>>     JDK version)?
>>>
>>> So Thomas's suggestion of providing this functionality as a diagnostic
>>> flag
>>> makes sense.
>>>
>>> In any case, I think this particular REF is not a good way of handling
>>> the
>>> problem, so I am withdrawing it.
>>>
>>> I'll file a separate CSR to actually change how -Xshare:on works, after
>>> more
>>> discussion on how best to change it.
>>
>> I agree with all your points.
>>
>> ..Thomas
>>
>>> Thanks
>>> - Ioi
>>>
>>>
>>>> Cheers,
>>>> David
>>>>
>>>>> There are diagnostic options to find out if CDS is enabled. If you ru=
n
>>>>> with -showversion, it will tell you if sharing is enabled. People don=
't
>>>>> need
>>>>> their program to die just to find this out.
>>>>>
>>>>> Thanks
>>>>> - Ioi
>>>>>
>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Thanks
>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>> [1]
>>>>>>>>
>>>>>>>> https://docs.oracle.com/javase/10/vm/class-data-sharing.htm#JSJVM-=
GUID-0260F857-A70E-4399-A1DF-A5766BE33285
>>>>>>>> [2]
>>>>>>>>
>>>>>>>> https://docs.oracle.com/javase/10/tools/java.htm#JSWOR-GUID-31503F=
CE-93D0-4175-9B4F-F6A738B2F4C4
>>>>>>>>
>>>>>>>>>      --- vs ---
>>>>>>>>>
>>>>>>>>> $ java-Xshare:auto -version
>>>>>>>>> java version "11-internal" 2018-09-25
>>>>>>>>> Java(TM) SE Runtime Environment 18.9 (fastdebug build
>>>>>>>>> 11-internal+0-adhoc.iklam.open)
>>>>>>>>> Java HotSpot(TM) 64-Bit Server VM 18.9 (fastdebug build
>>>>>>>>> 11-internal+0-adhoc.iklam.open, mixed mode, sharing)
>>>>>>>>>
>>>>>>>>> I am testing with HotSpot tiers 1-3 to make sure the tests don't
>>>>>>>>> get
>>>>>>>>> tripped by this new warning message.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> - Ioi
>>>>>>>
>>>>>>>
>
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic