[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xml-cocoon-users
Subject:    Re: character encoding of a HttpServletRequest
From:       Dominic Mitchell <dom () happygiraffe ! net>
Date:       2010-01-11 11:45:07
Message-ID: 45c308811001110345j6d4430f4l899a779379a946d2 () mail ! gmail ! com
[Download RAW message or body]

On Mon, Jan 11, 2010 at 10:34 AM, Jos Snellings <Jos.Snellings@pandora.be>wrote:

> That is right!
> It is just a confusing situation :-(
> The filter works fine. The init() method of a generator does not give a
> chance to call setCharacterEncoding, as the parsing already happened.
> The good thing is that the code is already in spring, so, no new
> external dependencies. Maybe later on I add a
> "tryToGuessEncodingFilter".
>
>
Trying to guess encodings isn't a good idea, in general.  About the only one
that can be reliably detected is UTF-8.  In past projects, I've done
something like this:

  String result;
  try {
    result = new String(someBytes, "UTF-8");
  catch (EncodingError e) {
    result = new String(someBytes, "Windows-1252");
  }

In my experience, Windows-1252 was a better guess than ISO-8859-1, as users
tend to paste in stuff from word documents with curly quotes.

-Dom

[Attachment #3 (text/html)]

On Mon, Jan 11, 2010 at 10:34 AM, Jos Snellings <span dir="ltr">&lt;<a \
href="mailto:Jos.Snellings@pandora.be">Jos.Snellings@pandora.be</a>&gt;</span> \
wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" \
style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; \
padding-left: 1ex;"> That is right!<br>
It is just a confusing situation :-(<br>
The filter works fine. The init() method of a generator does not give a<br>
chance to call setCharacterEncoding, as the parsing already happened.<br>
The good thing is that the code is already in spring, so, no new<br>
external dependencies. Maybe later on I add a<br>
&quot;tryToGuessEncodingFilter&quot;.<br>
<font color="#888888"><br></font></blockquote><div><br>Trying to guess encodings \
isn&#39;t a good idea, in general.   About the only one that can be reliably detected \
is UTF-8.   In past projects, I&#39;ve done something like this:<br> <br>   String \
result;<br>   try {<br>       result = new String(someBytes, &quot;UTF-8&quot;);<br>  \
catch (EncodingError e) {<br>       result = new String(someBytes, \
&quot;Windows-1252&quot;);<br>   }<br><br>In my experience, Windows-1252 was a better \
guess than ISO-8859-1, as users tend to paste in stuff from word documents with curly \
quotes.<br> <br>-Dom <br></div></div><br>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic