[prev in list] [next in list] [prev in thread] [next in thread]
List: openjdk-hotspot-runtime-dev
Subject: Re: RFR: 8240197: Cannot start JVM when $JAVA_HOME includes CJK characters
From: Yasumasa Suenaga <suenaga () oss ! nttdata ! com>
Date: 2020-02-28 7:39:08
Message-ID: d0f8952a-778b-167e-b6f3-252fad461326 () oss ! nttdata ! com
[Download RAW message or body]
Hi Thomas,
Thanks for your review!
I refactored entire of this function as result :)
Please review again this webrev:
http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.01/
On 2020/02/28 14:43, Thomas Stüfe wrote:
> Hi Yasumasa,
>
> thanks for fixing this. CC'ing Ralf since this was his change.
>
> -----
>
> Not sure MB_ERR_INVALID_CHARS with a subsequent return NULL is a good option for \
> conversion errors? Would it not better to use the default behavior of replacing \
> invalid code points with U+FFFD?
I think we should use MB_ERR_INVALID_CHARS at this point because invalid chars should \
not be used in the path. If invalid char(s) are converted to special char (e.g. \
U+FFFD), it is not mapped to original path.
> -----
>
> Can the result buffer overflow? UTF16 is a variable sized char set, code points can \
> overflow into the upper plane and use 32bit. Result buffer size is calculated in \
> terms of sizeof(wchar_t). Not sure what size wchar_t is; if it is 16bit the result \
> buffer could be too small. If it is 32bit I'd be really confused too :) since we \
> would have had 32bit-wide characters before and now we deal with UTF16?
> Should it be too small, there are two solutions:
>
> 1) either over-allocate for the worst case: for each input char allocate space for \
> two 16bit code points. 2) or, call MultiByteToWideChar twice, first time with 0 as \
> output buffer size. Then, function just returns output buffer size, which you can \
> allocate and repeat the call.
> I would at least like to see an explicit assert as toward (sizeof(uint_16) * 2 * \
> num output chars) < result buffer size.
We can get required buffer size when we pass zero to cchWideChar.
MultiByteToWideChar (Microsoft Docs)
https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-multibytetowidechar
New webrev passed zero to cchWideChar at first, and it would allocate buffer with its \
value. So buffer overflow would not occur.
> -----
>
> Come to think of it, the intermixing of wchar_t* with LPWSTR is a bit confusing. It \
> feeds a wchar_t* as result buffer to MultiByteToWideChar. Can you make path_start a \
> LPWSTR and cast it explicitely?
I changed wchar_t to WCHAR, and wchar_t* to LPWSTR.
> ----
>
> I assume you call ZeroMemory to make the result string zero terminated? A cheaper \
> and simpler way would be to either pass -1 as input buffer length or to fix up the \
> terminating zero after the call. I'd probably do the former.
If we pass -1 to cbMultiByte, MultiByteToWideChar() processes the entire input \
string, including the terminating null character (See Microsoft Doc). So I use it in \
new webrev.
> ----
>
> Not a must do, but regression tests would be nice. Easiest thing I could think of \
> would be testing wide_abs_unc_path() in a gtest with some CJK literals. Now that I \
> think of it, regression tests may be good for this function in general.
> For that wide_abs_unc_path would have to be exported (the static removed) and \
> declared as a prototype in the associated gtest, if we do not want to pollute a \
> header with it.
It is nice idea, but MultiByteToWideChar() would perform with available locale. If we \
check wide_abs_unc_path() with Japanese characters, Japanese language need to be \
installed (See Microsoft Doc).
So it is difficult to test on all platform.
Thanks,
Yasumasa
> -----
>
> Cheers, Thomas
>
>
> On Fri, Feb 28, 2020 at 2:55 AM Yasumasa Suenaga <suenaga@oss.nttdata.com \
> <mailto:suenaga@oss.nttdata.com>> wrote:
> Hi all,
>
> Please review this change:
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8240197
> webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240197/webrev.00/
>
> JVM cannot start as below when CJK characters are included in $JAVA_HOME.
>
> ```
> PS > .\build\windows-x86_64-server-fastdebug\images\jdk日本語\bin\java.exe --version
> Error occurred during initialization of VM
> Failed setting boot class path.
> ```
>
> I think this bug has been occurred since JDK-8191521.
> It uses mbstowcs_s() to convert char* to wchar_t*. But Windows API for wchar_t* \
> (they are named *W()) requires Unicode chars for their arguments. mbstowcs_s() \
> would not convert to Unicode. We need to use MultiByteToWideChar().
> This issue has been also confirmed in AdoptOpenJDK 11.0.6 [1]. So we also need to \
> change jdk11u.
> I tested this change on submit repo, then it was failed \
> (mach5-one-ysuenaga-JDK-8240197-20200228-0138-9049982). However the error was not \
> caused by this change because it was occurred in all platforms in spite of I \
> changed for Windows only. (Of course, it works well on my Windows 10 laptop.)
>
> I'm happy if you help to investigate cause of error.
>
>
> Thanks,
>
> Yasumasa
>
>
> [1] https://github.com/AdoptOpenJDK/openjdk-build/issues/1496
>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic