[prev in list] [next in list] [prev in thread] [next in thread] 

List:       velocity-dev
Subject:    RE: Velocity v1.1-rc1 released
From:       "Geir Magnusson Jr." <gmj () xyris ! com>
Date:       2001-05-24 13:36:35
[Download RAW message or body]

 From: Ilkka Priha [mailto:ipriha@surfeu.fi]
>
> I've also tried the same changes suggested by Michael with our test
> templates (Russian/ISO-8859-5, Japanese/Shift_JIS, Chinese/GBK,
> Hebrew/ISO-8859-8 plus the same ones with UTF-8) and the
> modified ASCII
> stream worked well. ParserTokenManager required the
> "UNICODE_INPUT=true"
> option, as noted by Michael, otherwise it didn't accept any
> non-zero high
> order bytes.
>
> As Velocity lets InputStreamReader to perform the encoding,
> the characters
> returned by it are always correct 16 bit Unicode (if the
> given encoding
> corresponds to the one used when writing the templates). It
> should be safe
> to use them as they are without modifications performed by
> the stream. The
> ASCII stream without stripping is a suitable kind of raw
> stream for that.
>
> I didn't manage to make the USER_CHAR_STREAM option to work
> as it didn't
> produce a working parser (JavaCC 1.1), but a Parser.java
> class with missing
> methods. Also, the reuse of a customized stream wasn't supported
> automatically. If you have solved those problems, I think
> that this solution
> is more reliable than the previous one. Actually, I haven't
> yet understood
> why the original ASCII stream works at all with 16 bit
> encodings even though
> the high-order bytes are lost?

I think they are only lost for the char by char parsing of the stream (in
which case it doesn't matter since our grammer 'delimiters' are expressed in
7-bit characters anyway...)

I came to the same conlusions you did - we are safe because by the time we
get to parsing, we have a 'correct' character stream, rather than a
questionable byte stream.  That is why I think the stripping happens - they
don't want to trust that they 'wrap the input stream in a reader [hack]'
that they do is safe, so they knock of the high byte, and name the thing
ASCII_CharStream so they can say 'I told you so' when things go bad with
full 16 bit characters :)

So therefore, since our 'parser API' requires a proper char stream via
Reader rather than byte stream via InputStream, all is well.  I looked at
all the code generated by javacc, and I am pretty sure we won't have any
problems.

I have a new Parser that implements USER_CHAR_STREAM, and am using a
slightly modified (Michael's suggestion plus an 'implments')
ASCII_CharStream as VelocityCharStream.  I do this as I want it to be
*lucidly* clear what we are doing, and don't want any accidents to happen in
the future by somone accidentally replacing with autogenerated code.

I have all working now at home - I am going to beat it hard and make a
painful test case for it as well.  When things settle down on daedalus, I
will check in to the HEAD, we can all try it.  I did some more experiments
in on the train this morning, and believe that we are safe to put into 1.1 -
after a few people (such as yourself) beat on it a bit, I think we will be
more comfortable moving forward with that decision.

geir

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic