'[capi-sig]Learning from JNI [was: Opaque handle API]'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       python-capi-sig
Subject:    [capi-sig]Learning from JNI [was: Opaque handle API]
From:       Neil Schemenauer <nas-python () arctrix ! com>
Date:       2019-03-01 19:23:25
Message-ID: 20190301192325.irrvwgdhhzwv2cfg () python ! ca
[Download RAW message or body]

On 2019-02-28, Carl Shapiro wrote:
> Because of all of the accumulated experience with handles in other systems,
> I think CPython is positioned to do much better than its predecessors.

I spent some time last night reading about JNI and I see that it
solves many of the problems we are trying to solve.  Certainly we
should learn from it.

You can download a PDF copy of the JNI book here:

    http://java.sun.com/docs/books/jni/

This looks like a useful article as well, outlining common mistakes
when using the JNI:

    https://www.ibm.com/developerworks/library/j-jni/index.html

The JNI book is pretty old so I'm not sure if JNI has evolved a lot
since then.  However, after only skimming the book last night, I
find lots of interesting ideas.

First, JNI passes a JNIEnv pointer as the first argument of all
native methods.  I think it is similar to our threadstate structure.
Explicitly passing it avoids some problems.  Since Java doesn't have
a GIL, smoothly handling threading is a big deal.  I don't know if
we should emulate that and pass threadstate (or something similar)
as well.

At a recent core sprint, I recall discussing an idea like that with
Dino and Carl.  E.g. a new flag for extension modules that would
make CPython pass the threadstate to extension functions.  I'm
pretty ignorant when it comes to multi-threading but I think those
guys thought looking it up in thread local storage might be quick
enough, rather explicitly passing it everywhere.

Rather than the JNI API being functions you can call, like
PyObject_Something(x), they are implemented as a vtable on the
JNIEnv structure.  So, you do something like:

    Java_do_something(JNIEnv *env, jobject obj)
    {
        (*env)->DoSomething(env, obj)
    }

JNI provides strict binary compatiblity so you really can't have
macros or inlined functions as part of the API.  This vtable idea
has some nice advantages.  You can start the JVM with different
command line parameters and a different vtable can be used.  CPython
does something like this for tracemalloc.  The JNI way seems
cleaner and maybe more powerful.

Using a macro would seem cleaner to me, e.g.

    #define DoSomething(env, obj) ((*env)->DoSomething(env, ob))

    Java_do_something(JNIEnv *env, jobject obj)
    {
        DoSomething(env, obj);
    }

Maybe we could have it both ways (binary compatiblity or lower
overhead).  Use an inline function like the following:

    static inline void
    DoSomething(JNIEnv *env, jobject obj)
    {
#ifdef STABLE_BINARY_INTERFACE
        ((*env)->DoSomething(env, ob))
#else
        ... inline implementation of env->DoSomething
#fi
    }

There are three kinds of opaque object references (handles): local
references, global references and weak global references.  As I
understand, local references are a handle that gets closed when your
native method returns.  That sounds useful and makes life easier
for extension authors (harder to leak memory if forgetting to close
handles).  You are limited in the number of local references you can
use (default 16?) but the limit can be increased.  You can also
explicitly close local handles so you don't run out or so you free
large chunks of memory.  E.g.

    lref = ...  /* a large Java object */
    ...
    (*env)->DeleteLocalRef(env, lref);

Local references sound very much like what Carl Shapiro and Larry
Hastings were suggesting as a way to deal with borrowed references
in the CPython API.  I.e. make them a local reference and then close
them when native function returns.

Global references are what I was thinking of for the PyHandle API.
They would live beyond your native function call and you have to
remember to close them.  Weak global references are pretty
obviously.  We would want to provide them too.

JNI uses a similar scheme to CPython to deal with errors.  I.e. JNI
methods typically return NULL on error and set something inside the
JNIEnv structure to record the details of the error.  They have a
method that is like PyErr_Occurred(), e.g.

    if ((*env)->ExceptionCheck(env)) {
        return NULL // error case
    }

They spell out explicitly which JNI methods are safe to call when an
error has occurred.  In the JNI book, they say:

    It is extremely important to check, handle, and clear a pending
    exception before calling any subsequent JNI functions.

I gather this is a source of many bugs.  I wonder if it would be
better to return an object that enforces correct error handling.
One example I found was the LLVM Error class:

    https://llvm.org/doxygen/classllvm_1_1Error.html#details

I don't know how you would implement something like that in C.
Maybe returning NULL is okay as it is working for JNI and matches
what CPython does internally.

The JNI has to solve a similar problem to Python and provide a rich
set of accessor functions for object handles.  The JNI approach
works no matter how the Java virtual machine represents objects
internally.  This abstration has a cost and so they provide a faster
way for repeated access to primitive data types, such as arrays and
strings.  E.g. a function that gets a "pinned" version of the array
elements.

JNI provides native access to fields and methods of Java objects.
The JNI identifies methods and fields by their symbolic names and
type descriptors. A two-step process factors out the cost of
locating the field or method from its name and descriptor. For
example, to read an integer instance field i in class cls, native
code first obtains a field ID, as follows:

    jfieldID fid = env->GetFieldID(env, cls, "i", "I");

The native code can then use the field ID repeatedly, without the
cost of field lookup, as follows:

    jint value = env->GetIntField(env, obj, fid);

There are rules about how the field ID can be cached.  The advantage
of this design is that JNI does not impose any restrictions on how
field and method IDs are implemented internally.

Regards,

  Neil
_______________________________________________
capi-sig mailing list -- capi-sig@python.org
To unsubscribe send an email to capi-sig-leave@python.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic