[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-jdk8-dev
Subject:    Improvements to Java Native Interface API's in JDK 8
From:       john_platts () hotmail ! com (John Platts)
Date:       2011-07-09 14:33:49
Message-ID: SNT116-W61A96805F5B44F81C865299D430 () phx ! gbl
[Download RAW message or body]


I do agree that the problem with using Modified UTF-8 instead of Standard UTF-8, as \
described in bug #5030776, really needs to be addressed.

However, here is another problem with the Java Native Interface API's that is not \
described in bug #5030776. The problem is that the JNI invocation API currently \
expects arguments to the JNI_CreateJavaVM  method to be encoded in the platform \
default encoding. There is a real need to add support for passing in arguments to the \
JNI_CreateJavaVM method using UTF-8, Modified UTF-8, or UTF-16 in Java SE 8 for the \
                following reasons:
- Conversion from UTF-8, Modified UTF-8, or UTF-16 to the platform default encoding \
can replace characters that are not in the platform default encoding with the default \
replacement character. In some cases, this can be a potential security risk on \
platforms where the platform default encoding is a non-Unicode-based encoding (such \
                as on Microsoft Windows).
- The NetBeans platform, Eclipse Equinox, and Eclipse RCP can launch a Java VM using \
                the JNI Invocation API.
- Filenames can be passed into arguments passed into the JNI_CreateJavaVM method. The \
JNI_CreateJavaVM API cannot accommodate filenames containing characters that are not \
in the platform default encoding, but the java.io and java.nio API's can accommodate \
                filenames containing characters that are not in the platform default \
                encoding.
- The main method of Java applications, System.getProperty, System.getProperties, and \
System.getEnv methods all support UTF-16 strings with characters that are not in the \
                platform default encoding.
- Windows NT-based operating systems, including Windows XP, Windows Vista, Windows 7, \
and Windows Server, can start processes with arguments, file names, or environment \
variables that contain Unicode characters that are not in the platform default \
encoding with the CreateProcessW and ShellExecuteW functions. On OpenJDK 7 and other \
Java SE implementations, whenever a process is started with ProcessBuilder.start or \
Runtime.exec methods on Windows-based platforms, processes are actually started using \
                the CreateProcessW function.
- On Windows platforms, wide character literals are encoded in UTF-16. However, wide \
                character literals are not necessarily UTF-16 encoded on non-Windows \
                platforms.
- The C1x and C++0x standards provide support for UTF-8, UTF-16, and UTF-32, \
including the ability to define UTF-8, UTF-16, and UTF-32 string literals. The C1x \
standard also defines Unicode conversion functions in uchar.h. The jchar type maps to \
the char16_t type on C compilers supporting the C1x standard and C++ compilers \
supporting the C++0x standard, and maps to wchar_t and WCHAR on Windows platforms.

There is still a need to support allowing arguments and options to be passed into the \
JNI_CreateJavaVM method using the platform-default encoding for backwards \
compatibility and to support non-Windows platforms.

On Windows platforms, the NetBeans launcher and the executable files in the bin \
directory of the Java Runtime Environment and Java Development Kit need to be updated \
to use wmain or wWinMain, and to pass in arguments and options to a Java SE 8 or \
later VM using Unicode. Arguments and environment variables actually get converted \
from Unicode to the platform default encoding if main and WinMain are used instead of \
wmain and wWinMain, or if they are obtained using getenv, GetCommandLineA, \
GetEnvironmentStringsA, or GetEnvironmentVariableA functions instead of the _wgetenv, \
GetCommandLineW, GetEnvironmentStringsW, or GetEnvironmentVariableW functions.

Allowing options to be passed into the JNI_CreateJavaVM method using UTF-16 \
eliminates the need to convert options from UTF-16 to the platform default encoding \
on Windows platforms. These options do get converted into Java strings, and these \
strings have to get converted back into UTF-16 whenever options are passed into the \
JNI_CreateJavaVM using the platform default encoding. If these options get passed \
into the JNI_CreateJavaVM method using UTF-16, the need to convert these options from \
the platform default encoding to UTF-16 can be avoided.

----------------------------------------
> Date: Mon, 27 Jun 2011 17:00:44 -0600
> From: daniel.daugherty at oracle.com
> To: john_platts at hotmail.com; jdk8-dev at openjdk.java.net
> Subject: Re: Improvements to Java Native Interface API's in JDK 8
> 
> This (old) bug seems related to this proposal:
> 
> 5030776 4/5 UTF-8 strings support doc change
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5030776
> 
> Dan
> 
> 
> On 6/23/2011 9:03 AM, John Platts wrote:
> > One of the issues with the Java Native Interface Invocation API is that the \
> > arguments passed into the JNI_CreateJavaVM method are in the default platform \
> >                 encoding. Here are the problems with this approach:
> > - On Windows, the default platform encoding is set to an non-Unicode charset.
> > - There are Unicode-only locales in Windows 2000 and later, and these locales use \
> >                 characters that are not in ASCII.
> > - File names on Windows NT-based operating systems can contain characters that \
> >                 are not in the default platform encoding.
> > - JVM arguments can contain characters that are not in the default platform \
> > encoding. The conversion of strings containing characters that are not in the \
> >                 default platform encoding might pose a security risk in certain \
> >                 circumstances.
> > - The JVM converts arguments passed into the JVM from the platform default \
> > encoding to UTF-16. 
> > There needs to be a mechanism that allows Unicode-encoded arguments to be passed \
> > into the JNI_CreateJavaVM method on Java SE 8 or later. This mechanism requires \
> > new versions of JavaVMInitArgs, and a UTF-16 version of JavaVMOption (which is \
> > used when the UTF-16 encoding is specified). 
> > Here are updated definitions of the Java Native Interface Invocation API in Java \
> > SE 8 to support passing in VM options in Unicode, although the definitions are \
> > still subject to change at this point: #define JNI_VERSION_1_8 0x00010008
> > 
> > #define JNI_ENCODING_DEFAULT 0
> > #define JNI_ENCODING_MODIFIED_UTF8 1
> > #define JNI_ENCODING_STANDARD_UTF8 2
> > #define JNI_ENCODING_UTF16 3
> > 
> > typedef struct JavaVMOption8 {
> > char *optionString;
> > void *extraInfo;
> > } JavaVMOption8;
> > 
> > typedef struct JavaVMOption8_UTF16 {
> > jchar *optionString;
> > void *extraInfo;
> > } JavaVMOption8;
> > 
> > typedef struct JavaVMInitArgs8 {
> > jint version; /* must be set to JNI_VERSION_1_8 */
> > 
> > /* optionCharEncoding must be set to one of the following values: */
> > /* JNI_ENCODING_DEFAULT - Platform default encoding */
> > /* JNI_ENCODING_MODIFIED_UTF8 - Modified UTF-8 encoding */
> > /* JNI_ENCODING_STANDARD_UTF8 - Standard UTF-8 encoding */
> > jint optionCharEncoding;
> > 
> > jint nOptions;
> > /* The optionString value of each of the options is in the */
> > /* encoding specified in optionCharEncoding. */
> > JavaVMOption8 *options;
> > 
> > jboolean ignoreUnrecognized;
> > } JavaVMInitArgs8;
> > 
> > typedef struct JavaVMInitArgs8_UTF16 {
> > jint version; /* must be set to JNI_VERSION_1_8 */
> > jint optionCharEncoding; /* must be set to JNI_ENCODING_UTF16 */
> > jint nOptions;
> > JavaVMOption8_UTF16 *options;
> > jboolean ignoreUnrecognized;
> > } JavaVMInitArgs8;
> > 
> > Here are advantages of the new definitions:
> > - The JVM can verify that Modified UTF-8, Standard UTF-8, and UTF-16 input is not \
> >                 malformed.
> > - The programmer must specify the encoding used for the options passed into the \
> > VM. This improves correctness, improves portability, minimizes security risks, \
> >                 and makes review of code using the JNI Invocation API easier.
> > - JVM options containing characters that are not in the platform default encoding \
> > can be passed into the JNI invocation API, as long as the options contain valid \
> >                 Unicode characters.
> > - There is no longer a need to convert from UTF-16 strings to the platform \
> > specific encoding on Windows platforms. This makes writing code using the JNI \
> > Invocation API easier on Windows platform, since there is no longer a need to use \
> > WideCharToMultiByte to convert UTF-16-encoded options to the default platform \
> >                 encoding.
> > - The NetBeans and Eclipse launchers can start the Java VM using the JNI \
> > invocation API. The updates above can solve problems with the NetBeans and \
> > Eclipse launchers on Windows platforms, as the updates allow VM options to be \
> > passed in using Unicode instead of the default platform encoding. 
> > The executable files in the bin directory of JDK 8 and later need to be \
> > Unicode-enabled on Windows platforms. In addition, the NetBeans launcher needs to \
> > be Unicode-enabled on Windows platforms, and pass in options using Unicode \
> > whenever a Java SE 8 or later VM is launched through the NetBeans launcher. 
> > The Java Native Interface API's use Modified UTF-8 encoding instead of Standard \
> >                 UTF-8. There are several issues with having strings encoded as \
> >                 Modified UTF-8:
> > - These strings are often incorrectly treated as Standard UTF-8 strings or \
> >                 strings encoded in the default platform encoding.
> > - Many native APIs (with the exception of the Java Native Interface API's) expect \
> >                 strings to be in the default platform encoding, standard UTF-8, \
> >                 or UTF-16.
> > - Many JNI native libraries have bugs because they incorrectly treat modified \
> > UTF-8 strings as standard UTF-8 strings or strings in the default platform \
> > encoding. Some of these libraries also incorrectly pass in standard UTF-8-encoded \
> > strings or strings encoded in the default platform encoding into JNI methods \
> > without converting these strings into modified UTF-8. 
> > New versions of the following JNI methods need to be added into JNI in Java SE 8, \
> >                 with an additional argument to specify the character encoding \
> >                 used:
> > - DefineClass
> > - FindClass
> > - ThrowNew
> > - FatalError
> > - GetFieldID
> > - GetMethodID
> > - GetStaticFieldID
> > - GetStaticMethodID
> > - NewStringUTF
> > - GetStringUTFLength
> > - GetStringUTFChars
> > - ReleaseStringUTFChars
> > - GetStringUTFRegion
> > - RegisterNatives
> > - AttachCurrentThread
> > - AttachCurrentThreadAsDaemon
> > 
> > New versions of these API's are needed to address correctness issues with JNI \
> > code. The semantics of the existing versions of these methods need to remain \
> > unchanged to avoid breaking backwards compatibility. 
 		 	   		  


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic