[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openjdk-2d-dev
Subject:    [OpenJDK 2D-Dev] 6541476 (Eval): PNG imageio plugin incorrectly handles iTXt chunk
From:       Martin.vGagern () gmx ! net (Martin von Gagern)
Date:       2008-11-10 16:20:04
Message-ID: 49185F34.1010307 () gmx ! net
[Download RAW message or body]

Bug 6541476; State: 8-Fix Available, bug; Priority: 4-Low
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6541476

This is the second half od the issues mentioned in bug report 6541476.
While my previous post for this bug number concentrated on the metadata
class and the use of typesafe generics, this one here deals with the
reader and writer classes and how they handle UTF-8 data.

The current status of the reporitory does this:
1. translatedKeyword written in latin1 (writeBytes) but tried to read
   as modified UTF-8 prefixed with byte count (readUTF). The spec
   requires unmodified UTF-8 without byte count.
2. uncompressed text read and written as modified UTF-8 with byte count,
   in violation of the spec which requires unmodified UTF-8 and no byte
   count.
3. compressed text is stored and loaded as latin1. High bits are
   silently dropped when storing. The spec again requires unmodified
   UTF-8.

My patch atempts to fix these issues, and to verify them using a
suitable test case. As often there are multiple ways to do things. I'll
highlight a few of the choices I made, if you want to discuss them.

I use the Charset class as a central instrument for UTF-8 conversion.
I use nio buffers as input and output of its encode and decode methods.
This means that I have a bit more syntactic overhead than I would have
using simply String.getBytes(utf8) and friends. It also means that I
have the nio objects at hand if I ever chose to work on improving
performance.

Other than these buffer objects, I stick with the Stream API, not the
nio channels. I wrote a wrapper class to turn an ImageOutputStream into
an OutputStream, in order to layer a DeflatorOutputStream around it.

For writing from a ByteBuffer to an [Image]OutputStream, I constructed a
little helper to distinguish the cases of buffers with and without
backing array. This should benefit performance. In the other hand, in
the reader I simply copied code from the inflate method which reads
characters one at a time, probably terribly inefficient. One might get
better performance here at the cost of increased syntactic overhead.

In the Test I actually check the resulting byte sequence for the
uncompressed text chunk. This is done in order to rule out any symmetric
errors, which would allow correct read-back and still leave incorrect
data in the files. E.g. consistent use of modified UTF-8 would be such a
case. The compressed chunk I don't check exactly, just its header, as
the compression algorithm may yield different but equivalent results for
different implementations.

The attached patch is from my mercurial patch queue. Once you consider
it ready for inclusion, I will commit it locally and export a mercurial
patch instead.

Greetings,
 Martin von Gagern
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bug6541476-UTF.patch
Url: http://mail.openjdk.java.net/pipermail/2d-dev/attachments/20081110/eba11d83/attachment.ksh 

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic