'Re: RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-runtime-dev
Subject:    Re: RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters
From:       Yasumasa Suenaga <suenaga () oss ! nttdata ! com>
Date:       2020-02-28 7:39:08
Message-ID: d0f8952a-778b-167e-b6f3-252fad461326 () oss ! nttdata ! com
[Download RAW message or body]

Hi Thomas,

Thanks for your review!
I refactored entire of this function as result :)

Please review again this webrev:

   http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.01/

On 2020/02/28 14:43, Thomas Stüfe wrote:
> Hi Yasumasa,
> 
> thanks for fixing this. CC'ing Ralf since this was his change.
> 
> -----
> 
> Not sure MB_ERR_INVALID_CHARS with a subsequent return NULL is a good option for \
> conversion errors? Would it not better to use the default behavior of replacing \
> invalid code points with U+FFFD?

I think we should use MB_ERR_INVALID_CHARS at this point because invalid chars should \
not be used in the path. If invalid char(s) are converted to special char (e.g. \
U+FFFD), it is not mapped to original path.

> -----
> 
> Can the result buffer overflow? UTF16 is a variable sized char set, code points can \
> overflow into the upper plane and use 32bit. Result buffer size is calculated in \
> terms of sizeof(wchar_t). Not sure what size wchar_t is; if it is 16bit the result \
> buffer could be too small. If it is 32bit I'd be really confused too :) since we \
> would have had 32bit-wide characters before and now we deal with UTF16? 
> Should it be too small, there are two solutions:
> 
> 1) either over-allocate for the worst case: for each input char allocate space for \
> two 16bit code points. 2) or, call MultiByteToWideChar twice, first time with 0 as \
> output buffer size. Then, function just returns output buffer size, which you can \
> allocate and repeat the call. 
> I would at least like to see an explicit assert as toward (sizeof(uint_16) * 2 * \
> num output chars) < result buffer size.

We can get required buffer size when we pass zero to cchWideChar.

   MultiByteToWideChar (Microsoft Docs)
     https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar


New webrev passed zero to cchWideChar at first, and it would allocate buffer with its \
value. So buffer overflow would not occur.

> -----
> 
> Come to think of it, the intermixing of wchar_t* with LPWSTR is a bit confusing. It \
> feeds a wchar_t* as result buffer to MultiByteToWideChar. Can you make path_start a \
> LPWSTR and cast it explicitely?

I changed wchar_t to WCHAR, and wchar_t* to LPWSTR.

> ----
> 
> I assume you call ZeroMemory to make the result string zero terminated? A cheaper \
> and simpler way would be to either pass -1 as input buffer length or to fix up the \
> terminating zero after the call. I'd probably do the former.

If we pass -1 to cbMultiByte, MultiByteToWideChar() processes the entire input \
string, including the terminating null character (See Microsoft Doc). So I use it in \
new webrev.

> ----
> 
> Not a must do, but regression tests would be nice. Easiest thing I could think of \
> would be testing wide_abs_unc_path() in a gtest with some CJK literals. Now that I \
> think of it, regression tests may be good for this function in general. 
> For that wide_abs_unc_path would have to be exported (the static removed) and \
> declared as a prototype in the associated gtest, if we do not want to pollute a \
> header with it.

It is nice idea, but MultiByteToWideChar() would perform with available locale. If we \
check wide_abs_unc_path() with Japanese characters, Japanese language need to be \
installed (See Microsoft Doc).

So it is difficult to test on all platform.


Thanks,

Yasumasa


> -----
> 
> Cheers, Thomas
> 
> 
> On Fri, Feb 28, 2020 at 2:55 AM Yasumasa Suenaga <suenaga@oss.nttdata.com \
> <mailto:suenaga@oss.nttdata.com>> wrote: 
> Hi all,
> 
> Please review this change:
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8240197
> webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.00/
> 
> JVM cannot start as below when CJK characters are included in $JAVA_HOME.
> 
> ```
> PS > .\build\windows-x86_64-server-fastdebug\images\jdk日本語\bin\java.exe --version
> Error occurred during initialization of VM
> Failed setting boot class path.
> ```
> 
> I think this bug has been occurred since JDK-8191521.
> It uses mbstowcs_s() to convert char* to wchar_t*. But Windows API for wchar_t* \
> (they are named *W()) requires Unicode chars for their arguments. mbstowcs_s() \
> would not convert to Unicode. We need to use MultiByteToWideChar(). 
> This issue has been also confirmed in AdoptOpenJDK 11.0.6 [1]. So we also need to \
> change jdk11u. 
> I tested this change on submit repo, then it was failed \
> (mach5-one-ysuenaga-JDK-8240197-20200228-0138-9049982). However the error was not \
> caused by this change because it was occurred in all platforms in spite of I \
> changed for Windows only. (Of course, it works well on my Windows 10 laptop.)
> 
> I'm happy if you help to investigate cause of error.
> 
> 
> Thanks,
> 
> Yasumasa
> 
> 
> [1] https://github.com/AdoptOpenJDK/openjdk-build/issues/1496
> 


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic