[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-hotspot-runtime-dev
Subject:    Re: RFR(S): 8188122: Path length limits on Windows leads to obscure class loading failures
From:       Thomas_Stüfe <thomas.stuefe () gmail ! com>
Date:       2017-10-27 7:55:10
Message-ID: CAA-vtUw0UVGABJBEMTUARdYa1WJENdus3DTKjT3aedNsc1z5Ug () mail ! gmail ! com
[Download RAW message or body]

Hi Calvin,

On Fri, Oct 27, 2017 at 2:03 AM, Calvin Cheung <calvin.cheung@oracle.com>
wrote:

> Hi Thomas,
>
> On 10/25/17, 11:54 PM, Thomas Stüfe wrote:
>
>> Hi Calvin,
>>
>> this is a worthwhile addition, thank you for your work!
>>
> Thanks for your review.


Thanks for your work :)

First off, there is another issue in file_attribute_data_to_stat(). We
actually had this issue before, but your work uncovered it:

According to POSIX (
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/sys/stat.h.html)
and every unix manpage I looked at (solaris, linux, aix..), st_ctime is not
the file creation time but the last time file status was changed. Only
Microsoft gets this wrong in their posix layer, its stat() function returns
 (https://msdn.microsoft.com/en-us/library/14h5k7ff.aspx) creation time in
st_ctime.

But as os::stat() is a platform independent layer, I'd say we should behave
the same on all platforms, and that would be the Posix way.

I did not find any code calling os::stat() and using st_ctime, so this is
still save to change for windows. (Note that I found that
perfMemory_xxx.cpp uses plain OS ::stat and misuses ctime as "creation
time" - I opened a bug for that (
https://bugs.openjdk.java.net/browse/JDK-8190260 - but that does not affect
us, as they do not call os::stat()).

There is one more complication: in os::stat() we use plain ::stat() in the
single byte case.:

*+ if (strlen(path) < MAX_PATH) {*

*+     ret = ::stat(pathbuf, sbuf);**+   } else {*


But ::stat() on Windows is broken, as I wrote above. We should not use it,
especially not if we change the meaing of st_ctime in the double byte path.

My suggestion would be to just always call GetFileAttributes(), also for
the single byte path. A very simple solution would be to just always go the
double byte path with UNC translation, regardless of the path length. Or,
if you are worried about performance, for paths shorter than MAX_PATH we
use GetFileAttributesA(), for longer paths GetFileAttributesW with UNC
translation. In both cases you get a WIN32_FILE_ATTRIBUTE_DATA which you
can feed tp your  file_attribute_data_to_stat() routine and have the struct
stat you return be always consistent.


But onward:


>
>> =========================
>>
>> src/hotspot/os/windows/os_windows.cpp
>>
>> Could you please use wchar_t instead of WCHAR?
>>
> Fixed.
>
>>
>> ---------------
>> in os::stat():
>>
>> This cumbersome malloc/strcpy sequence:
>>
>> !   size_t path_len = strlen(path) + 1;
>> +   char* pathbuf = (char*)os::malloc(path_len * sizeof(char),
>> mtInternal);
>> +   if (pathbuf == NULL) {
>> +     errno = ENOMEM;
>>       return -1;
>>     }
>>     os::native_path(strcpy(pathbuf, path));
>>
>>  can be reduced to a simple strdup:
>>
>> char* pathbuf = os::strdup(path, mtInternal);
>> if (pathbuf == NULL) {
>>   errno = ENOMEM;
>>   return -1;
>> }
>> os::native_path(pathbuf);
>>
>> This also would separate strcpy() from os::native_path() which is a bit
>> unreadable.
>>
> I've made the change.
>
>>
>> --------------------
>>
>> The single-byte path (strdup, ::stat()), together with its free(), can be
>> moved inside the (strlen(path) < MAX_PATH) condition. os::native_path will
>> not change the path length (it better not, as it operates on the input
>> buffer). That avoids  having two allocations on the doublebyte path.
>>
>
> os::native_path() converts a path like C:\\\\somedir to C:\\somedir
> So I'll need the converted path in the wchar_t case too. The freeing of
> the pathbuf is being done at the end of the function or in the middle of
> the wchar_t case if there's an error.


Oh, you are right,of course. I missed that it this is needed for both paths.


>
>
>> -----------------------
>>
>> But seeing that translation to UNC paths is something we might want more
>> often, I would combine allocation and UNC prefix adding to one function
>> like this, to avoid the memove and increase reusability:
>>
>> // Creates an unc path from a single byte path. Return buffer is
>> allocated in C heap and needs to be freed by caller. Returns NULL on error.
>> static whchar_t* create_unc_path(const char* s) {
>>   - does s start with \\?\ ?
>> - yes:
>>             - os::malloc(strlen(s) + 1) and mbstowcs_s
>>         - no:
>>             - os::malloc(strlen(s) + 1 + 4), mbstowcs_s to fourth
>> position in string, prefix with L"\\?\"
>> }
>>
>
> I also include the case for adding  L"\\\\?\\UNC\0" at the beginning to be
> consistent with libjava/canonicalize_md.c.


>> We also need error checking to mbstowcs_s.
>>
> I've added assert like the following after the call:
>
> assert(converted_chars == path_len, "sanity");



create_unc_path() :

- could you convert the /**/ to // comments, please?

- not sure about the assert after mbstowcs_s: if we happen to encounter an
invalid multibyte character, function will fail and return an error:

https://msdn.microsoft.com/en-us/library/eyktyxsx.aspx
"If mbstowcs_s encounters an invalid multibyte character, it puts 0 in
*``pReturnValue, sets the destination buffer to an empty string, sets errno
to EILSEQ, and returns EILSEQ."

 As this is dependent on user data, we should not assert, but handle the
return code of mbstowcs_s gracefully. Same goes for
libjava/canonicalize_md.c.

- Here: ::mbstowcs_s(&converted_chars, &wpath[7], path_len + 7, path,
path_len);
third parameter is wrong, as we hand in an offset into the buffer, we must
decrement the buffer size by this offset, so correct would be path_len +7 -
7 or just path_len.

- Same error below: + ::mbstowcs_s(&converted_chars, &wpath[4], path_len +
4, path, path_len);



>
>
>> Just for cleanliness, I would then wrap deallocation into an own function.
>>
>> static viud destroy_unc_path(whchar_t* s) { os::free(s); }
>>
> I've added the destroy function.
>
>>
>> These two functions could be candidates of putting into os::win32
>> namespace, as this form of ANSI->UNC path translation is quite common -
>> whoever wants to use the xxxxW() functions must do this.
>>
> I'm leaving them in os_windows.cpp since they're being used only within
> that file.


Fine by me.


>
>
>> -----------------------
>>
>> FindFirstFileW:
>>
>> I am pretty sure that you can achieve the same result with
>> GetFileAttributesExW(). It also returns WIN32_FIND_DATAW.
>>
> It actually returns WIN32_FILE_ATTRIBUTE_DATA and is very similar to
> WIN32_FIND_DATAW.
>
>>
>> It is way more straightforward to use than FindFirstFileW, as it does not
>> require you to write a callback hook.
>>
> I've switched to using GetFileAttributesExW().


Thank you, this is way more readable.

Another issue: do we still need the fix for 6539723 if we switch from
stat() to GetFileAttributesExW()? This fix looks important, but the comment
indicates that it could break things if the original bug is not present.

Btw, this is another strong argument for scrapping ::stat() altogether on
all code paths, not only for long input paths, because stat() and
GetFileAttributesExW() may behave
differently. So it would be good to use the same API on all code paths, in
order to get the best test coverage.


>
>> -------
>>
>> eval_find_data(): This is more of a generic helper function, could you
>> rename this to something clearer, e.g. make_double_word() ?
>>
> Ok. I've renamed it.
>
>>
>> Also, setup_stat(), could this be renamed more clearly to something like
>> WIN32_FIND_DATA_to_stat? or lowercase if this bothers you :)
>>
> I'm naming the function as file_attribute_data_to_stat() to match with the
> data structure name.


Thanks for taking my suggestions.


>
>
>> ==================================
>> src/hotspot/share/classfile/classLoader.cpp
>>
>> In ClassPathDirEntry::open_stream(), I would feel better if we were
>> asserting _dir and name to be != NULL before feeding it to strlen.
>>
> I've added an assert statement.
>
>>
>> ===================
>>
> Here's an updated webrev:
>     http://cr.openjdk.java.net/~ccheung/8188122/webrev.02/
>
>
Thanks!

Thomas


> thanks,
> Calvin
>
>
>> Thanks!
>>
>> Thomas
>>
>>
>>
>>
>>
>>
>> On Wed, Oct 25, 2017 at 9:48 PM, Calvin Cheung <calvin.cheung@oracle.com
>> <mailto:calvin.cheung@oracle.com>> wrote:
>>
>>     I've reworked this fix by using the Unicode version of open
>>     (wopen) to handle path name longer than max path with the path
>>     prefixed to indicate an extended length path as described in [1].
>>
>>     The Unicode version of stat (wstat) doesn't work well with long
>>     path [2]. So FindFirstFileW will be used.The data in
>>     WIN32_FIND_DATA returned from FindFirstFileW needs to be
>>     transferred to the stat structure since the caller expects a
>>     return stat structure and other platforms return a stat structure.
>>
>>     In classLoader.cpp, calculate the size of buffer required instead
>>     of limiting it to JVM_MAXPATHLEN.
>>     In os_windows.cpp, dynamically allocate buffers in os::open and
>>     os::stat.
>>
>>     updated webrev:
>>     http://cr.openjdk.java.net/~ccheung/8188122/webrev.01/
>>     <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.01/>
>>
>>     It passed hs-tier2 testing using mach5.
>>
>>     thanks,
>>     Calvin
>>
>>     [1]
>>     https://msdn.microsoft.com/en-us/library/windows/desktop/aa3
>> 65247(v=vs.85).aspx#MAX_PATH
>>     <https://msdn.microsoft.com/en-us/library/windows/desktop/aa
>> 365247%28v=vs.85%29.aspx#MAX_PATH>
>>
>>     [2]
>>     https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c093
>> ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral
>>     <https://social.msdn.microsoft.com/Forums/vstudio/en-US/3c09
>> 3ea9-f0aa-446d-b648-2dabe8480430/stat-and-long-names?forum=vcgeneral>
>>
>>
>>
>>     On 10/16/17, 3:15 PM, Calvin Cheung wrote:
>>
>>         JBS: https://bugs.openjdk.java.net/browse/JDK-8188122
>>         <https://bugs.openjdk.java.net/browse/JDK-8188122>
>>
>>         Adding a warning message if the full path or the directory
>>         length exceeds MAX_PATH on windows.
>>
>>         Example warning messages.
>>
>>         1) The full path exceeds MAX_PATH:
>>
>>         Java HotSpot(TM) 64-Bit Server VM warning: construct full path
>>         name failed: total length 270 exceeds max length of 260
>>             dir
>>         T:\\testoutput\\jtreg\\JTwork\\classes\\2\\runtime\\LoadClas
>> s\\LongPath.d\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>> xxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>         length 259
>>             name Hello.class length 11
>>
>>         2) The directory path exceeds MAX_PATH:
>>
>>         Java HotSpot(TM) 64-Bit Server VM warning: os::stat failed:
>>         path length 265 exceeds max length 260
>>             path
>>         T:\\testoutput\\jtreg\\JTwork\\classes\\2\\runtime\\LoadClas
>> s\\LongPath.d\\xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>> xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>> xxxxxxxxxxxxxxxxxxxxxxxxxxxx\\xxxxx
>>
>>         webrev:
>>         http://cr.openjdk.java.net/~ccheung/8188122/webrev.00/
>>         <http://cr.openjdk.java.net/%7Eccheung/8188122/webrev.00/>
>>
>>         Testing:
>>             JPRT (including the new test)
>>
>>         thanks,
>>         Calvin
>>
>>
>>
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic