[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-core-libs-dev
Subject:    Re: RFR: 8282648: Weaken the InflaterInputStream specification in order to allow faster Zip implemen
From:       Joe Darcy <darcy () openjdk ! org>
Date:       2022-08-31 21:09:13
Message-ID: wkJxgzgxGxONYFioHAYNg-to3e2XtwDfvk5TSO-Z_Ac=.aab06671-d4e8-42d0-9b99-6189733fe812 () github ! com
[Download RAW message or body]

On Mon, 29 Aug 2022 11:58:15 GMT, Volker Simonis <simonis@openjdk.org> wrote:

> > Add an API note to `InflaterInputStream::read(byte[] b, int off, int len)` to \
> > highlight that it might  write more bytes than the returned number of inflated \
> > bytes into the buffer `b`. 
> > The superclass `java.io.InputStream` specifies that `read(byte[] b, int off, int \
> > len)` will leave the content beyond the last read byte in the read buffer `b` \
> > unaffected. However, the overridden `read` method in `InflaterInputStream` passes \
> > the read buffer `b` to `Inflater::inflate(byte[] b, int off, int len)` which \
> > doesn't provide this guarantee. Depending on implementation details, \
> > `Inflater::inflate` might write more than the returned number of inflated bytes \
> > into the buffer `b`. 
> > ### TL;DR
> > 
> > `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater \
> > functionality. `Inflater::inflate(byte[] output, int off, int len)` currently \
> > calls zlib's native `inflate(..)` function and passes the address of \
> > `output[off]` and `len` to it via JNI. 
> > The specification of zlib's `inflate(..)` function (i.e. the [API documentation \
> > in the original zlib \
> > implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400)) \
> > doesn't give any guarantees with regard to usage of the output buffer. It only \
> > states that upon completion the function will return the number of bytes that \
> > have been written (i.e. "inflated") into the output buffer. 
> > The original zlib implementation only wrote as many bytes into the output buffer \
> > as it inflated. However, this is not a hard requirement and newer, more \
> > performant implementations of the zlib library like \
> > [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/) \
> > or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more bytes of \
> > the output buffer than they actually inflate as a scratch buffer. See \
> > https://github.com/simonis/zlib-chromium for a more detailed description of their \
> > approach and its performance benefit. 
> > These new zlib versions can still be used transparently from Java (e.g. by \
> > putting them into the `LD_LIBRARY_PATH` or by using `LD_PRELOAD`), because they \
> > still fully comply to specification of `Inflater::inflate(..)`. However, we might \
> > run into problems when using the `Inflater` functionality from the \
> > `InflaterInputStream` class. `InflaterInputStream` is derived from from \
> > `InputStream` and as such, its `read(byte[] b, int off, int len)` method is quite \
> > constrained. It specifically specifies that if *k* bytes have been read, then \
> > "these bytes will be stored in elements `b[off]` through `b[off+`*k*`-1]`, \
> > leaving elements `b[off+`*k*`]` through `b[off+len-1]` **unaffected**". But \
> > `InflaterInputStream::read(byte[] b, int off, int len)` (which is constrained by \
> > `InputStream::read(..)`'s specification) calls `Inflater::inflate(byte[] b, int \
> > off, int len)` and directly passes its output buffer down to the native zlib \
> > `inflate(..)` method which is free to change the bytes beyond `b[off+`
 *k*`]` (where *k* is the number of inflated bytes).
> > 
> > From a practical point of view, I don't see this as a big problem, because \
> > callers of `InflaterInputStream::read(byte[] b, int off, int len)` can never know \
> > how many bytes will be written into the output buffer `b` (and in fact its \
> > content can always be completely overwritten). It therefore makes no sense to \
> > depend on any data there being untouched after the call. Also, having used \
> > zlib-cloudflare productively for about two years, we haven't seen real-world \
> > issues because of this behavior yet. However, from a specification point of view \
> > it is easy to artificially construct a program which violates \
> > `InflaterInputStream::read(..)`'s postcondition if using one of the alterantive \
> > zlib implementations. A recently integrated JTreg test \
> > (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" fails with \
> > zlib-chromium but can fixed easily to run with alternative implementations as \
> > well (see [JDK-8283756](https://bugs.openjdk.java.net/browse/JDK-8283756)).
> 
> Volker Simonis has updated the pull request incrementally with one additional \
> commit since the last revision: 
> Updated JavaDoc according to @mbreinhold's suggestions

src/java.base/share/classes/java/util/jar/JarInputStream.java line 170:

> 168: 
> 169:     /**
> 170:      * Reads from the current ZIP entry into an array of bytes, returning the \
> number of

Please change "Reads from the current ZIP entry..." to "Reads from the current JAR \
entry..."

-------------

PR: https://git.openjdk.org/jdk/pull/7986


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic